Jonathan Muhire Jonathan Muhire
Software R&D + ML Research

I build ML systems, data pipelines, and research tooling.

I translate research ideas into scalable software with deep learning, multimodal modeling, and distributed systems.

Graduate researcher. Former co-founder at Neotix.

  • Multimodal modeling and representation learning across vision and language.
  • Deep learning pipelines for data curation, training, and evaluation.
  • Distributed systems for large-scale datasets, versioning, and reproducibility.
Open to: Research Engineer / ML Systems Focus: ML research + distributed systems
GSoC 2025 Open-source ML projects Research notes Data infra demos
Highlights
  • Built multimodal data pipelines for large-scale ML experimentation.
  • Shipped research tooling for dataset curation, labeling, and analysis.
  • Versioned datasets and training artifacts with scalable storage.
Core Stack
Python PyTorch Deep Learning Distributed Systems Data Infra
View My Work

How I think about ML research, data, and infrastructure.

Three system views that reflect how I scope research problems, build data pipelines, and ship models.

Multimodal ML Pipeline
From data intake to evaluation, with feedback loops for continuous learning.
Vision Language Ingest Curation Training Eval Deploy feedback loop
Distributed Data Platform
Storage, versioning, and compute aligned for reproducible experiments.
Sources Object Store Versioning Compute Registry Metrics lineage
Experiment Lifecycle
A research loop optimized for iteration speed and insight capture.
Hypothesis Baseline Ablations Insights Results Publish

Full project timeline

Recent ML research highlights are above. This is the complete catalog across research, systems, and software.

Research · Document AI

RenAIssance Document Analysis

End-to-end deep learning pipeline for layout understanding, OCR, and structured digitization of historical text.

PyTorch LayoutLMv3 Doc AI
Applied ML · NLP

ISSR — Crisis Detection

Social signal analysis for mental health crisis detection with sentiment modeling and geospatial insights.

NLP Social Data Geo Analytics
Research · Computer Vision

ArtExtract — Art Analysis AI

CNN-RNN model for artwork classification, style detection, and similarity search across art history.

PyTorch CV Representation
Additional projects (embodied AI, full-stack, mobile, games)

2025 - ML Research & Systems

PyMyCobot Teleoperation

Developed remote teleoperation system for bimanual dexterous manipulation and data collection. Implemented real-time control protocols for imitation learning tasks.

Python ROS
UMI Multi-Sensor Data Extraction

Engineered comprehensive data extraction pipeline implementing SLAM-ORB3 for trajectory extraction and real-time 7-DOF trajectory analysis.

SLAM-ORB3 Rerun
UMI Data Annotation Pipeline

Developed annotation tools for robotic manipulation datasets using embodied Chain-of-Thought methods for labeling gripper poses and keyframes.

PyTorch Computer Vision
MinIO + LakeFS Data Infrastructure

Architected scalable data storage and versioning system for robotics datasets with Git-like version control for large sensor data.

MinIO LakeFS
LeRobot ISO-101 Platform

Built end-to-end robotic learning platform with data collection pipelines and policy training infrastructure for rapid prototyping.

LeRobot ROS2
RenAIssance Document Analysis

Google Summer of Code 2025 project. Built AI pipeline for digitizing Renaissance-era texts using deep learning for layout recognition and OCR.

PyTorch LayoutLMv3 GSoC
ISSR - Mental Health Crisis Detection

GSoC candidate project. AI-powered system for suicide prevention through social media analysis, sentiment detection, and geospatial crisis mapping.

NLP Social Media APIs Mental Health AI
ArtExtract - Art Analysis AI

Deep learning system combining CNN-RNN architectures for artwork classification, style detection, and similarity analysis across art history.

CNN-RNN PyTorch Art Analysis

2024 - Full-Stack Development

NutrAI Health Assistant

AI-powered health and nutrition recommendation system. Built full-stack application with intelligent meal planning and dietary analysis.

JavaScript Python React
CampusBuddy Mobile App

Flutter-based student application for discovering campus events, dining options, and university resources. Designed to enhance student life experience.

Flutter Dart Mobile
Point of Sales Terminal

Comprehensive Java-based POS system with inventory management, multiple payment methods, and sales reporting. Built with modular architecture for retail environments.

Java OOP Retail Systems

2023 - Game Development

Space Invader Game

Modified version of the classic arcade game with enhanced gameplay mechanics and modern graphics. Implemented in C++ with custom game engine features.

C++ Game Dev
Pacman AI

AI-driven Pacman implementation featuring autonomous agents, pathfinding algorithms, and intelligent ghost behavior. Demonstrates search algorithms and game AI principles.

Python Pygame AI Algorithms

Background and research focus

A quick overview of my path and what I am working on now.

About

Learn more about my background, experience, and approach to building intelligent systems.

Research notes and systems breakdowns

Short technical writeups on robotics, ML systems, and multimodal research.

Open to ML research and systems roles.

Looking for teams building large-scale ML systems, deep learning pipelines, or research tooling. Happy to chat about collaboration or roles.

Fastest ways to reach me

Email is best for opportunities; LinkedIn for introductions.