Portfolio

Mohammad Shojaei

Machine Learning Engineer – Large Language Models & NLP

📧 shojaei.dev@gmail.com | LinkedIn | GitHub | Hugging Face


Summary

Results-driven Machine Learning Engineer with over two years of experience designing, fine-tuning, and deploying LLM-powered products across finance, real estate, and healthcare. Specialized in Retrieval-Augmented Generation (RAG), parameter-efficient fine-tuning (LoRA/QLoRA), and inference optimization. Proven ability to reduce infrastructure costs by 35%, achieve sub-250ms p95 latency, and increase user engagement by 40%.


Core Competencies

  • LLM Development: GPT-style Models (Gemma, Phi, Llama), Instruction & Alignment Tuning (SFT, RLHF), PEFT (LoRA, QLoRA)
  • RAG & Vector Search: Qdrant, Milvus, Approximate Nearest Neighbor (ANN) Search, Re-ranking, Custom Chunking
  • Inference Optimization: Quantization (AWQ, GGUF), KV-Cache, Speculative Decoding, Throughput Tuning
  • MLOps & Deployment: Docker, FastAPI, Uvicorn, Hugging Face Hub, LangServe
  • General: Python, SQL, AWS, Runpod, Git, Agile Methodologies

Professional Experience

No Limits Market Ltd., Remote – Dubai, UAE LLM Engineer, January 2025 – Present

  • Architected a RAG-based financial research platform serving over 50 analysts, reducing research time by 45%.
  • Deployed AWQ-quantized models with optimized batching, achieving a 35% cost reduction and sub-250ms p95 latency.
  • Automated JSON-schema report generation via structured LLM outputs, increasing analyst throughput by 60%.
  • Implemented a CI/CD pipeline with automated evaluation metrics (perplexity, factual consistency), maintaining a 95%+ pass rate.

Owlio, Remote – Turin, IT AI Engineer, September 2024 – January 2025

  • Designed an educational AI agent using RAG and fine-tuned LLMs to convert passive lecture content into interactive learning experiences.
  • Engineered custom chunking strategies for video transcript processing, optimizing retrieval accuracy for educational content.
  • Implemented a token-based caching system that reduced GPT API costs by 40% while maintaining response quality.
  • Built automated evaluation pipelines to measure RAG performance, content accuracy, and student engagement metrics.

Real-Estate Startup, Remote – Moscow, RU Generative AI Engineer (Contract), May 2025 – Present

  • Fine-tuned a multilingual LLM using LoRA with domain-specific datasets, achieving 90% accuracy in handling over 50,000 monthly queries.
  • Integrated Speech-to-Text/Text-to-Speech microservices via FastAPI to enable 24/7 automated client interaction.
  • Developed a comprehensive evaluation framework to measure coherence, relevance, and safety, maintaining over 85% performance across all KPIs.
  • Reduced safety incidents to less than 1% through systematic prompt engineering and content filtering.

Alpha Neuroscience Co., Remote – Tehran, IR Machine Learning Engineer, May 2023 – April 2025

  • Developed an EEG artifact detection system using 1D-CNN and Transformer architectures, achieving a 70% improvement in accuracy.
  • Built and deployed a PyQt5 desktop application to over 200 medical technicians, reducing per-patient review time by 15 minutes.
  • Implemented a RAG-powered clinical assistant that decreased clinician query response time by 50% and improved diagnostic precision by 18 percentage points.
  • Collaborated with medical professionals to validate model outputs and ensure clinical compliance.

Kerman Motor, Bam, Kerman Province, Iran – On-site Generative AI Consultant (Seasonal), December 2023 – January 2024

  • Conducted technical workshops on implementing LLMs in industrial settings.
  • Mentored development teams on RAG implementation and prompt engineering best practices.
  • Developed proof-of-concept applications demonstrating practical AI applications.

Previous Role: Freelance AI & Generative AI Consultant (October 2023 – January 2025)


Open-Source Contributions

  • Hugging Face Impact: Published 12 Persian LLMs and 21 datasets, accumulating over 100,000 downloads.
  • ReActMCP (140 ⭐): Developed a reactive agent framework enabling real-time web search capabilities.
  • Ollama-Desktop & Ollama-RAG (60 ⭐): Created a local GUI and FAISS-powered RAG toolkit for offline LLM deployment.
  • Community Leadership: Delivered technical workshops on LLM implementation and was recognized as a Community Leader.

Education

Bachelor of Science, Computer Engineering — University of Bam, May 2024


Professional Certifications

  • CS224N: Natural Language Processing with Deep Learning – Stanford University, 2024
  • LangChain for LLM Application Development – DeepLearning.AI, 2024
  • Machine Learning Specialization – DeepLearning.AI, 2024
  • Harvard CS50P: Introduction to Programming with Python – Harvard University, 2023

Technical Expertise

  • Languages & Frameworks: Python, PyTorch, Transformers, JAX, PEFT, bitsandbytes, FlashAttention, FastAPI, LangChain, LangGraph
  • Data & ML Infrastructure: Vector Databases (FAISS, Milvus, Qdrant), RAG, KV-Cache, Quantization (GPTQ, AWQ)
  • MLOps & Deployment: Docker, GitHub Actions, CI/CD, Monitoring, AWS, GCP, MLOps, LLMOps
  • AI Techniques: RLHF, Prompt Engineering, Agent Development, Model Fine-tuning, Inference Optimization

Back to top

Copyright © 2025 Mohammad Shojaei. All rights reserved. You may copy and distribute this work, but please note that it may contain other authors' works which must be properly cited. Any redistribution must maintain appropriate attributions and citations.