Open Source AI
What does Open Source AI Even Mean

Mohammad Shojaei, Applied AI Engineer
11 Sep 2025
Deconstructing an AI Model
The Complete AI Lifecycle: From Training to Model Weights
Prerequisites
Training Process
Learning algorithms optimize parameters
Model Weights
Crystallized knowledge as numbers
Training transforms raw ingredients into learned knowledge
The Four Freedoms Applied to AI
The Free Software Foundation's Four Freedoms provide a robust framework for understanding AI openness
Freedom 1: The Freedom to Run
Run AI systems for any purpose without restriction
Freedom 2: The Freedom to Study
Study how AI systems work and adapt them to your needs
Freedom 3: The Freedom to Redistribute
Redistribute copies of AI systems to help others
Freedom 4: The Freedom to Distribute Modified Versions
Distribute modified versions to benefit the community
These freedoms ensure that AI systems remain accessible, transparent, shareable, and improvable for everyone
The Spectrum: From Locked Down to Actually Open
Understanding the four levels of AI transparency and what you actually get
Closed / API-Only
None of: Training Data, Architecture, Training Code, Training Process, Model Weights
Open-Weight
Model Weights only (missing Training Data, Architecture, Training Code, Training Process)
Open-Source AI
Architecture, Training Code, Model Weights (Training Data often limited)
Radical Openness
All components: Training Data, Architecture, Training Code, Training Process, Model Weights
The spectrum reveals a harsh reality: most "open" AI is actually openwashing
True openness requires complete transparency, permissive licensing, and reproducible methodology — not just model weights
The Gold Standard
Exemplars of True Openness
Pythia (EleutherAI)
OLMo (AI2)
SmolLM (Hugging Face)
TinyLlama
Big Tech's Response to OSS Pressure
How Open Source Communities Forced Strategic Shifts
Company Responses: The Open-Weight Convergence
OpenAI
Gpt-oss 20b/120b
Model Weight, Apache-2.0
Gemma 1-3
Model Weight, Gemma Terms of Use
xAI
Grok 1-2
Model Weight, Architecture, Apache-2.0
Meta
Llama 1-4
Model Weight, Llama Community License
Microsoft
Phi 3/3.5/4
Model Weight, MIT License
Apple
OpenELM
Model Weight, Training Code, Apple License
NVIDIA
Nemotron/Minitron
Architecture, Training Code, Training Process, Model Weight, NVIDIA Open Model License
Alibaba
Qwen 2/2.5/3
Model Weight, Apache-2.0
Open source communities successfully pressured Big Tech to converge on open-weight releases, fundamentally shifting the AI landscape from closed APIs to permissionless innovation ecosystems.
The Open Ecosystem
Core Open Tools & Frameworks by Stage
Distribution & Training
Local Inference
Production Inference
Application Dev
Who Released the Most Open Models?
China's Leading Open Models
DeepSeek-R1/V3
MIT License
Reasoning models with downloadable weights + distills
Qwen3
Apache-2.0
Alibaba's permissive foundation suite (text + coder + VL)
Kimi K2
Open-Weight
Moonshot's trillion-param MoE on Hugging Face
GLM-4.5
MIT License
Zhipu's agentic/coding focus with thinking modes
Export controls drove China's pivot to open source AI, enabling global reach
Multilingual AI
Open Source Democratizes Language Technology
Community-driven collaboration enables developers worldwide to freely access, modify, and contribute to models supporting underrepresented languages through shared datasets and fine-tuning pipelines.
Adaptation Techniques
Vocabulary Expansion
Adding language-specific tokens to base models
Continual Pre-training
Training on language-specific corpora
Instruction Fine-tuning
Task-specific adaptation with cultural context
LoRA Adaptation
Low-rank efficient fine-tuning on consumer hardware
Open source prevents digital extinction by enabling community-led development that closes performance gaps by 40-50% for low-resource languages, ensuring linguistic diversity thrives in an AI-driven world.
Let's Connect
Mohammad Shojaei
Applied AI Engineer