Resume
Mohammad Shojaei
Machine Learning Engineer โ Large Language Models & NLP
๐ง shojaei.dev@gmail.comโ|โLinkedInโ|โGitHubโ|โHugging Face
Summary
Mid-level ML Engineer with 2 + years of hands-on experience designing, fine-tuning, and shipping LLM-powered products in finance, real-estate, and healthcare. Deep knowledge of Retrieval-Augmented Generation (RAG), parameter-efficient tuning (LoRA / QLoRA), and inference optimisation (quantisation, KV-cache, speculative decoding). Proven to:
- Cut cloud spend 35 %+, deliver p95 latency < 250 ms, and lift user engagement 40 %+
Core Competencies
- LLM Development: GPT-style models (Gemma, Phi, Llama), Instruction & Alignment (SFT, RLHF), PEFT (LoRA, QLoRA)
- RAG & Vector Search: Qdrant, Milvus, ANN search, Re-ranking, Custom chunking, Retrieval eval (Recall@k)
- Inference Optimisation: Quantisation (AWQ, GGUF), KV-Cache, Speculative decoding, Batching, Throughput tuning
- MLOps & Deployment: Docker, FastAPI, Uvicorn, HF Hub, LangServe
- Programming & Cloud: Python, SQL, AWS, Runpod, Git
Professional Experience
LLM Engineer ยท No Limits Market Ltd ยท Remote โ Dubai, UAE
Jan 2025 โ Present
- Built RAG-based financial research platform; slashed analyst research time 45 %
- Deployed AWQ-quantised models & optimised batching; reduced infra cost 35 % and p95 latency to < 250 ms
- Automated JSON-schema report generation via structured LLM outputs, boosting analyst throughput 60 %
- Instrumented CI/CD and automated evaluation (perplexity, factual consistency), ensuring 95 %+ pass rate
Gen AI Engineer (Contract) ยท Real-Estate Startup ยท Remote โ Moscow, RU
May 2025 โ Present
- Fine-tuned multilingual LLM with LoRA & domain data; chatbot handles 50 k+ monthly queries at 90 % accuracy
- Integrated STT/TTS micro-services through FastAPI, enabling automated client calls and 24/7 lead generation
- Delivered comprehensive eval suite (coherence, relevance, safety < 1 %); achieved > 85 % across KPIs
ML Engineer ยท Alpha Neuroscience Co. ยท Remote โ Tehran, IR
May 2023 โ Apr 2025
- Achieved 70 % EEG artifact-detection accuracy using 1D-CNN & Transformer hybrids
- Shipped PyQt5 desktop app to 200 + technicians; cut per-patient review time by 15 min
- Rolled out RAG assistant that reduced clinician query time 50 % and improved diagnostic precision 18 pp
AI Engineer ยท Owlio ยท Remote โ Turin, IT
Sep 2024 โ Jan 2025
- Designed educational AI agent using RAG & fine-tuned LLMs; converted lecture content into interactive learning materials
- Developed custom chunking strategies for video transcripts, optimising retrieval accuracy and content relevance
- Implemented token-based caching system, reducing GPT API costs 40 % while maintaining response quality
- Built evaluation pipelines measuring RAG performance, content accuracy, and student engagement metrics
Previous engagements: Freelance AI & Gen-AI Consultant (Oct 2023 โ Jan 2025)
Open-Source & Community
- Hugging Face: 12 Persian LLMs & 21 datasets (100 k+ downloads)
- ReActMCP (140 โญ): Reactive agent framework for real-time web search
- Ollama-Desktop & Ollama-RAG (60 โญ): Local GUI & FAISS-powered RAG toolkit
- Speaker & mentor at LLM workshops; Hugging Face Community Leader
Education
B.Sc. Computer Engineering โ University of Bam, May 2024
Certifications
- CS224N: NLP with Deep Learning โ Stanford University (2024)
- LangChain for LLM Application Development โ DeepLearning.AI (2024)
- Machine Learning Specialisation โ DeepLearning.AI (2024)
- Harvard CS50P โ Programming with Python (2023)
Technical Stack (ATS Keywords)
Python, PyTorch, Transformers, JAX, PEFT, bitsandbytes, FlashAttention, FastAPI, LangChain, LangGraph, Vector Databases, FAISS, Milvus, RAG, KV-Cache, Quantisation, LoRA, QLoRA, GPTQ, AWQ, RLHF, Prompt Engineering, Agents, CI/CD, Docker, GitHub Actions, Monitoring, SQL, AWS, GCP, MLOps, LLMOps