Skip to content
View devYRPauli's full-sized avatar

Block or report devYRPauli

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
devYRPauli/README.md

Typing SVG

Website LinkedIn X Email Resume


whoami

class YashRajPandey:
    role        = "AI Agents Architect @ UF IFAS"
    based_in    = "Gainesville, Florida"
    builds      = ["AI infrastructure", "full-stack products",
                   "developer tools", "native + CLI apps"]
    languages   = ["Python", "TypeScript", "Swift", "Java", "C/C++"]
    open_source = "35+ merged PRs across 25+ projects, 4 ecosystems"
    philosophy  = "Find the real problem. Ship the simplest thing that works. Measure. Iterate."

I build AI infrastructure and production software. I'm the AI Agents Architect at the University of Florida's Institute of Food and Agricultural Sciences, where I lead a function I proposed myself: self-hosted, air-gapped AI systems built on open-weight models.

The work is end to end: open-weight models served on-prem, retrieval that pairs dense vector search with reranking, agents that call real tools, and evaluation gates that decide what ships. None of it leaves the building, and it holds up under real users rather than a demo.

That is the day job, not the whole picture. I ship full-stack products end to end, contribute fixes upstream to the inference engines and ML frameworks I run, and build native macOS and CLI tools in Swift. The throughline is range: in a given week I might tune a Postgres query plan, debug a Metal quantization kernel, and wire up a TypeScript CRDT.

I joined UF as a Software Engineer in March 2025, was promoted to Lead Software Engineer in October 2025, and moved into the AI Agents Architect role in April 2026.

A year in

  • Promoted twice in ~13 months - Software Engineer to Lead Software Engineer to AI Agents Architect
  • 35+ merged open-source pull requests across 25+ projects, spanning Apple's MLX, ggml / llama.cpp (C/C++), Python ML frameworks, and the TypeScript / Node ecosystem
  • A full-stack platform I built solo grew to 5M+ live records and became the primary system for 30+ researchers
  • Shipped software people actually use - a Chrome Web Store extension, a published npm package, and several deployed web apps
  • Herbert Wertheim College of Engineering Achievement Award, alongside an M.S. in CISE (GPA 3.8)

What I build

AI infrastructure and local LLMs

TurboQuant on Apple Silicon - independent evaluation of TurboQuant (arXiv 2504.19874), a near-optimal LLM quantization method, ported to run CPU-only on Apple Silicon. Fixed five blocking bugs, then benchmarked long-context retrieval across an MLX path and a llama.cpp Metal path.

Needle-in-a-haystack retrieval: 0% to 100% at 16K tokens, with a large KV cache memory reduction, all on a consumer M1 Pro with no dedicated GPU.

Python MLX llama.cpp Quantization  Repo ->

Looma - a local-first command-line tool that turns Claude Code, Codex, and Cursor history into resumable project context: active work, decisions, blockers, the commits and files in flight, and the next likely step. No cloud, no API keys, zero third-party dependencies.

  • Reconstructs structured WorkItems (features, bugfixes, refactors, migrations) from raw agent transcripts, instead of keyword-searching logs
  • Emits token-budgeted context packs so one agent can hand off to another without replaying the whole history
  • Honest by design: every reconstruction carries a confidence score and shows alternatives instead of guessing

Python SQLite License  Repo ->

mddocs - a git-native, self-hostable editor for Markdown: real-time multiplayer, inline comments, and accept/reject suggestions, plus a first-class HTTP API so AI agents read, edit, and review documents the same way people do. Published on npm.

  • Local-first and git-native: every change is a commit, no central database to run
  • Real-time collaboration backed by a CRDT (Yjs) document model that merges concurrent edits without conflicts
  • Agent-facing HTTP API with per-agent tokens and rate limits, so automated writers are first-class collaborators, not bolt-ons

TypeScript Node.js npm  Repo ->

Full-stack and product

Blue Omics - a Django, React, and PostgreSQL research data platform I designed and built from scratch for a research lab. It grew from zero to 5M+ live records across 32 data models and 58 API endpoints, replaced years of manual spreadsheet workflows, and became the primary system for every lab submission.

  • Tuned PostgreSQL with explicit indexing and caching to hold low-millisecond median latency under concurrent access by 30+ researchers
  • Built 7 ingestion pipelines for heterogeneous formats: PDF, Excel, CSV, Word, PowerPoint
  • Automated R Markdown reporting, cutting a 2-3 hour manual process to 15-20 minutes
  • Deployed on GCP with Kubernetes and Terraform; optimized the frontend from 8s to 3s load time

Django React PostgreSQL GCP Kubernetes

Private repository - a walkthrough is available on request.

ApplyScore - a published Chrome extension that scores how well a resume matches any job posting on the web, with evidence-linked gaps and no hallucinated fluff.

  • Universal scraper that pierces Shadow DOM to read postings across LinkedIn, Greenhouse, Ashby, Lever, Workday and more
  • Strict, evidence-based analysis: a confidence-weighted fit score with requirement-by-requirement matches linked to the exact resume bullets that prove them
  • Privacy-first and bring-your-own-key (OpenAI, Anthropic, or Google), so data and model choice stay with the user

JavaScript Chrome LLM  Chrome Web Store ->

Football Hub and World Cup 2026 Picks - two deployed web apps I built because I love the game: a football data hub, and a self-hostable World Cup 2026 prediction pool with scoring, leaderboards, and shareable rooms for small groups.

TypeScript React  Football Hub ->   World Cup 2026 Picks ->

Open source and systems

I fix real bugs in the software I build on and the tools I use every day, across four ecosystems. 35+ merged pull requests across 25+ open-source projects, each one a root-caused fix backed by a regression test.

LLM inference and ML frameworks

  • llama.cpp (ggml, C/C++) - fixed rms_norm_back producing wrong output under in-place aliasing on the CPU backend
  • MLX (Apple) - signed-integer overflow in roll and tile; undefined behavior in arange(step=0)
  • MLX-LM (Apple) - server 404 on short prompts; sampler top_k bound fix
  • RAGFlow - language detection, Excel cell handling, and CJK-safe text splitting
  • mem0 - Redis, FAISS, and Weaviate crashes and filtered-search truncation

Native macOS and Swift

  • Status-model logic, terminal-rendering width invariants, and cost / timezone / Unicode correctness across several macOS menu-bar and TUI developer tools - each fix paired with a Swift Testing regression test.

Developer tooling

  • Cost-accuracy and edge-case fixes across a suite of open-source CLI tools: pricing-unit bugs (per-token vs per-request), timezone and Unicode handling, performance regressions, and more.

See all merged contributions ->

Stack

Languages Python, TypeScript, Swift, Java, C/C++, SQL
AI / ML Open-weight LLMs, vLLM, Ollama, RAG, hybrid retrieval, Qdrant, rerankers, embeddings, agents and tool use, quantization, evaluation harnesses
Backend Django, Django REST, FastAPI, Node.js, PostgreSQL, Redis
Frontend React, TypeScript, Modern HTML/CSS
Infra Docker, Kubernetes, Terraform, GCP, AWS, CI/CD, GitHub Actions

Recognition and education

Herbert Wertheim College of Engineering Achievement Award, University of Florida. Merit scholarship awarded for top academic standing.

  • M.S. Computer and Information Science and Engineering, University of Florida - GPA 3.8 / 4.0
  • Semester Exchange, University of Florida - GPA 3.7 / 4.0
  • B.Tech Computer Science and Engineering, Jaypee University of Engineering and Technology - GPA 9.1 / 10.0

Writing

I keep playbooks on what I learn shipping local AI, over on yashrajpandey.com:

  • Self-hosting open-weight LLMs without sending data to a cloud API
  • RAG that holds up in production: retrieval, reranking, and the evals that keep it honest
  • Evaluation-gated releases for LLM systems

Off the clock: football (so much that I built apps for it, above), tactical FPS, story-rich RPGs, and lo-fi for flow state.

Open to good conversations on AI infrastructure, systems, and building things that ship.

Pinned Loading

  1. looma looma Public

    Looma turns coding-agent history into resumable project context.

    Python 1

  2. mddocs mddocs Public

    Local-first, git-native collaborative Markdown editor, real-time multiplayer, comments & suggestions, and an HTTP API for AI agents. Self-hostable, built on proof-sdk.

    TypeScript

  3. turboquant_plus turboquant_plus Public

    Forked from TheTom/turboquant_plus

    Python

  4. world-cup-2026-picks world-cup-2026-picks Public

    Self-hostable World Cup 2026 prediction pool for small groups - pick match outcomes and scores, choose qualifiers, and compete on a live leaderboard. Built with Next.js and Supabase.

    TypeScript

  5. football-hub football-hub Public

    JavaScript

  6. portfolio portfolio Public

    TypeScript