Resume

Mohammad Shojaei

Machine Learning Engineer โ€“ Large Language Models & NLP
๐Ÿ“ง shojaei.dev@gmail.comโ€‚|โ€‚LinkedInโ€‚|โ€‚GitHubโ€‚|โ€‚Hugging Face


Summary

Mid-level ML Engineer with 2 + years of hands-on experience designing, fine-tuning, and shipping LLM-powered products in finance, real-estate, and healthcare. Deep knowledge of Retrieval-Augmented Generation (RAG), parameter-efficient tuning (LoRA / QLoRA), and inference optimisation (quantisation, KV-cache, speculative decoding). Proven to:

  • Cut cloud spend 35 %+, deliver p95 latency < 250 ms, and lift user engagement 40 %+

Core Competencies

  • LLM Development: GPT-style models (Gemma, Phi, Llama), Instruction & Alignment (SFT, RLHF), PEFT (LoRA, QLoRA)
  • RAG & Vector Search: Qdrant, Milvus, ANN search, Re-ranking, Custom chunking, Retrieval eval (Recall@k)
  • Inference Optimisation: Quantisation (AWQ, GGUF), KV-Cache, Speculative decoding, Batching, Throughput tuning
  • MLOps & Deployment: Docker, FastAPI, Uvicorn, HF Hub, LangServe
  • Programming & Cloud: Python, SQL, AWS, Runpod, Git

Professional Experience

LLM Engineer ยท No Limits Market Ltd ยท Remote โ€“ Dubai, UAE

Jan 2025 โ€“ Present

  • Built RAG-based financial research platform; slashed analyst research time 45 %
  • Deployed AWQ-quantised models & optimised batching; reduced infra cost 35 % and p95 latency to < 250 ms
  • Automated JSON-schema report generation via structured LLM outputs, boosting analyst throughput 60 %
  • Instrumented CI/CD and automated evaluation (perplexity, factual consistency), ensuring 95 %+ pass rate

Gen AI Engineer (Contract) ยท Real-Estate Startup ยท Remote โ€“ Moscow, RU

May 2025 โ€“ Present

  • Fine-tuned multilingual LLM with LoRA & domain data; chatbot handles 50 k+ monthly queries at 90 % accuracy
  • Integrated STT/TTS micro-services through FastAPI, enabling automated client calls and 24/7 lead generation
  • Delivered comprehensive eval suite (coherence, relevance, safety < 1 %); achieved > 85 % across KPIs

ML Engineer ยท Alpha Neuroscience Co. ยท Remote โ€“ Tehran, IR

May 2023 โ€“ Apr 2025

  • Achieved 70 % EEG artifact-detection accuracy using 1D-CNN & Transformer hybrids
  • Shipped PyQt5 desktop app to 200 + technicians; cut per-patient review time by 15 min
  • Rolled out RAG assistant that reduced clinician query time 50 % and improved diagnostic precision 18 pp

AI Engineer ยท Owlio ยท Remote โ€“ Turin, IT

Sep 2024 โ€“ Jan 2025

  • Designed educational AI agent using RAG & fine-tuned LLMs; converted lecture content into interactive learning materials
  • Developed custom chunking strategies for video transcripts, optimising retrieval accuracy and content relevance
  • Implemented token-based caching system, reducing GPT API costs 40 % while maintaining response quality
  • Built evaluation pipelines measuring RAG performance, content accuracy, and student engagement metrics

Previous engagements: Freelance AI & Gen-AI Consultant (Oct 2023 โ€“ Jan 2025)


Open-Source & Community

  • Hugging Face: 12 Persian LLMs & 21 datasets (100 k+ downloads)
  • ReActMCP (140 โญ): Reactive agent framework for real-time web search
  • Ollama-Desktop & Ollama-RAG (60 โญ): Local GUI & FAISS-powered RAG toolkit
  • Speaker & mentor at LLM workshops; Hugging Face Community Leader

Education

B.Sc. Computer Engineering โ€” University of Bam, May 2024


Certifications

  • CS224N: NLP with Deep Learning โ€” Stanford University (2024)
  • LangChain for LLM Application Development โ€” DeepLearning.AI (2024)
  • Machine Learning Specialisation โ€” DeepLearning.AI (2024)
  • Harvard CS50P โ€” Programming with Python (2023)

Technical Stack (ATS Keywords)

Python, PyTorch, Transformers, JAX, PEFT, bitsandbytes, FlashAttention, FastAPI, LangChain, LangGraph, Vector Databases, FAISS, Milvus, RAG, KV-Cache, Quantisation, LoRA, QLoRA, GPTQ, AWQ, RLHF, Prompt Engineering, Agents, CI/CD, Docker, GitHub Actions, Monitoring, SQL, AWS, GCP, MLOps, LLMOps