Resume

Mohammad Shojaei

Machine Learning Engineer – Large Language Models & NLP
📧 shojaei.dev@gmail.com | LinkedIn | GitHub | Hugging Face

Summary

Mid-level ML Engineer with 2 + years of hands-on experience designing, fine-tuning, and shipping LLM-powered products in finance, real-estate, and healthcare. Deep knowledge of Retrieval-Augmented Generation (RAG), parameter-efficient tuning (LoRA / QLoRA), and inference optimisation (quantisation, KV-cache, speculative decoding). Proven to:

Cut cloud spend 35 %+, deliver p95 latency < 250 ms, and lift user engagement 40 %+

Core Competencies

LLM Development: GPT-style models (Gemma, Phi, Llama), Instruction & Alignment (SFT, RLHF), PEFT (LoRA, QLoRA)
RAG & Vector Search: Qdrant, Milvus, ANN search, Re-ranking, Custom chunking, Retrieval eval (Recall@k)
Inference Optimisation: Quantisation (AWQ, GGUF), KV-Cache, Speculative decoding, Batching, Throughput tuning
MLOps & Deployment: Docker, FastAPI, Uvicorn, HF Hub, LangServe
Programming & Cloud: Python, SQL, AWS, Runpod, Git

Professional Experience

LLM Engineer · No Limits Market Ltd · Remote – Dubai, UAE

Jan 2025 – Present

Built RAG-based financial research platform; slashed analyst research time 45 %
Deployed AWQ-quantised models & optimised batching; reduced infra cost 35 % and p95 latency to < 250 ms
Automated JSON-schema report generation via structured LLM outputs, boosting analyst throughput 60 %
Instrumented CI/CD and automated evaluation (perplexity, factual consistency), ensuring 95 %+ pass rate

Gen AI Engineer (Contract) · Real-Estate Startup · Remote – Moscow, RU

May 2025 – Present

Fine-tuned multilingual LLM with LoRA & domain data; chatbot handles 50 k+ monthly queries at 90 % accuracy
Integrated STT/TTS micro-services through FastAPI, enabling automated client calls and 24/7 lead generation
Delivered comprehensive eval suite (coherence, relevance, safety < 1 %); achieved > 85 % across KPIs

ML Engineer · Alpha Neuroscience Co. · Remote – Tehran, IR

May 2023 – Apr 2025

Achieved 70 % EEG artifact-detection accuracy using 1D-CNN & Transformer hybrids
Shipped PyQt5 desktop app to 200 + technicians; cut per-patient review time by 15 min
Rolled out RAG assistant that reduced clinician query time 50 % and improved diagnostic precision 18 pp

AI Engineer · Owlio · Remote – Turin, IT

Sep 2024 – Jan 2025

Designed educational AI agent using RAG & fine-tuned LLMs; converted lecture content into interactive learning materials
Developed custom chunking strategies for video transcripts, optimising retrieval accuracy and content relevance
Implemented token-based caching system, reducing GPT API costs 40 % while maintaining response quality
Built evaluation pipelines measuring RAG performance, content accuracy, and student engagement metrics

Previous engagements: Freelance AI & Gen-AI Consultant (Oct 2023 – Jan 2025)

Open-Source & Community

Hugging Face: 12 Persian LLMs & 21 datasets (100 k+ downloads)
ReActMCP (140 ⭐): Reactive agent framework for real-time web search
Ollama-Desktop & Ollama-RAG (60 ⭐): Local GUI & FAISS-powered RAG toolkit
Speaker & mentor at LLM workshops; Hugging Face Community Leader

Education

B.Sc. Computer Engineering — University of Bam, May 2024

Certifications

CS224N: NLP with Deep Learning — Stanford University (2024)
LangChain for LLM Application Development — DeepLearning.AI (2024)
Machine Learning Specialisation — DeepLearning.AI (2024)
Harvard CS50P — Programming with Python (2023)

Technical Stack (ATS Keywords)

Python, PyTorch, Transformers, JAX, PEFT, bitsandbytes, FlashAttention, FastAPI, LangChain, LangGraph, Vector Databases, FAISS, Milvus, RAG, KV-Cache, Quantisation, LoRA, QLoRA, GPTQ, AWQ, RLHF, Prompt Engineering, Agents, CI/CD, Docker, GitHub Actions, Monitoring, SQL, AWS, GCP, MLOps, LLMOps