Portfolio
Mohammad Shojaei
Machine Learning Engineer – Large Language Models & NLP
📧 shojaei.dev@gmail.com | LinkedIn | GitHub | Hugging Face
Summary
Results-driven Machine Learning Engineer with over two years of experience designing, fine-tuning, and deploying LLM-powered products across finance, real estate, and healthcare. Specialized in Retrieval-Augmented Generation (RAG), parameter-efficient fine-tuning (LoRA/QLoRA), and inference optimization. Proven ability to reduce infrastructure costs by 35%, achieve sub-250ms p95 latency, and increase user engagement by 40%.
Core Competencies
- LLM Development: GPT-style Models (Gemma, Phi, Llama), Instruction & Alignment Tuning (SFT, RLHF), PEFT (LoRA, QLoRA)
- RAG & Vector Search: Qdrant, Milvus, Approximate Nearest Neighbor (ANN) Search, Re-ranking, Custom Chunking
- Inference Optimization: Quantization (AWQ, GGUF), KV-Cache, Speculative Decoding, Throughput Tuning
- MLOps & Deployment: Docker, FastAPI, Uvicorn, Hugging Face Hub, LangServe
- General: Python, SQL, AWS, Runpod, Git, Agile Methodologies
Professional Experience
No Limits Market Ltd., Remote – Dubai, UAE LLM Engineer, January 2025 – Present
- Architected a RAG-based financial research platform serving over 50 analysts, reducing research time by 45%.
- Deployed AWQ-quantized models with optimized batching, achieving a 35% cost reduction and sub-250ms p95 latency.
- Automated JSON-schema report generation via structured LLM outputs, increasing analyst throughput by 60%.
- Implemented a CI/CD pipeline with automated evaluation metrics (perplexity, factual consistency), maintaining a 95%+ pass rate.
Owlio, Remote – Turin, IT AI Engineer, September 2024 – January 2025
- Designed an educational AI agent using RAG and fine-tuned LLMs to convert passive lecture content into interactive learning experiences.
- Engineered custom chunking strategies for video transcript processing, optimizing retrieval accuracy for educational content.
- Implemented a token-based caching system that reduced GPT API costs by 40% while maintaining response quality.
- Built automated evaluation pipelines to measure RAG performance, content accuracy, and student engagement metrics.
Real-Estate Startup, Remote – Moscow, RU Generative AI Engineer (Contract), May 2025 – Present
- Fine-tuned a multilingual LLM using LoRA with domain-specific datasets, achieving 90% accuracy in handling over 50,000 monthly queries.
- Integrated Speech-to-Text/Text-to-Speech microservices via FastAPI to enable 24/7 automated client interaction.
- Developed a comprehensive evaluation framework to measure coherence, relevance, and safety, maintaining over 85% performance across all KPIs.
- Reduced safety incidents to less than 1% through systematic prompt engineering and content filtering.
Alpha Neuroscience Co., Remote – Tehran, IR Machine Learning Engineer, May 2023 – April 2025
- Developed an EEG artifact detection system using 1D-CNN and Transformer architectures, achieving a 70% improvement in accuracy.
- Built and deployed a PyQt5 desktop application to over 200 medical technicians, reducing per-patient review time by 15 minutes.
- Implemented a RAG-powered clinical assistant that decreased clinician query response time by 50% and improved diagnostic precision by 18 percentage points.
- Collaborated with medical professionals to validate model outputs and ensure clinical compliance.
Kerman Motor, Bam, Kerman Province, Iran – On-site Generative AI Consultant (Seasonal), December 2023 – January 2024
- Conducted technical workshops on implementing LLMs in industrial settings.
- Mentored development teams on RAG implementation and prompt engineering best practices.
- Developed proof-of-concept applications demonstrating practical AI applications.
Previous Role: Freelance AI & Generative AI Consultant (October 2023 – January 2025)
Open-Source Contributions
- Hugging Face Impact: Published 12 Persian LLMs and 21 datasets, accumulating over 100,000 downloads.
- ReActMCP (140 ⭐): Developed a reactive agent framework enabling real-time web search capabilities.
- Ollama-Desktop & Ollama-RAG (60 ⭐): Created a local GUI and FAISS-powered RAG toolkit for offline LLM deployment.
- Community Leadership: Delivered technical workshops on LLM implementation and was recognized as a Community Leader.
Education
Bachelor of Science, Computer Engineering — University of Bam, May 2024
Professional Certifications
- CS224N: Natural Language Processing with Deep Learning – Stanford University, 2024
- LangChain for LLM Application Development – DeepLearning.AI, 2024
- Machine Learning Specialization – DeepLearning.AI, 2024
- Harvard CS50P: Introduction to Programming with Python – Harvard University, 2023
Technical Expertise
- Languages & Frameworks: Python, PyTorch, Transformers, JAX, PEFT, bitsandbytes, FlashAttention, FastAPI, LangChain, LangGraph
- Data & ML Infrastructure: Vector Databases (FAISS, Milvus, Qdrant), RAG, KV-Cache, Quantization (GPTQ, AWQ)
- MLOps & Deployment: Docker, GitHub Actions, CI/CD, Monitoring, AWS, GCP, MLOps, LLMOps
- AI Techniques: RLHF, Prompt Engineering, Agent Development, Model Fine-tuning, Inference Optimization