cpu-inference

🧠 A comprehensive toolkit for benchmarking, optimizing, and deploying local Large Language Models. Includes performance testing tools, optimized configurations for CPU/GPU/hybrid setups, and detailed guides to maximize LLM performance on your hardware.

cuda gpu-acceleration model-management inference-optimization model-quantization cpu-inference llama-cpp local-llm llm-deployment llm-benchmarking ollama-optimization hybrid-inference wsl-ai-setup context-window-scaling

Updated Mar 27, 2025
Shell

JohnClaw / chatllm.rs

Star

rust api wrapper for llm-inference chatllm.cpp

rust chatbot inference bindings api-wrapper llama quantization gemma mistral cpu-inference llm llms chatllm ggml llm-inference qwen

Updated Nov 27, 2024
Rust

JohnClaw / chatllm.lua

Star

lua api wrapper for llm-inference chatllm.cpp

lua chatbot luajit inference bindings api-wrapper llama quantization gemma mistral cpu-inference llm llms chatllm ggml llm-inference qwen

Updated Nov 26, 2024
Lua

JohnClaw / gemma-2-2b-it.cs

Star

gemma-2-2b-it int8 cpu inference in one file of pure C#

csharp inference quantization gemma int8 inference-engine model-serving int8-inference int8-quantization cpu-inference llm llms llm-serving llm-inference gemma2 gemma2-2b-it

Updated Jun 14, 2025
C#

JohnClaw / chatllm.kt

Star

kotlin api wrapper for llm-inference chatllm.cpp

kotlin chatbot inference bindings api-wrapper llama quantization gemma mistral cpu-inference llm llms chatllm ggml llm-inference qwen

Updated Nov 26, 2024
C

JohnClaw / qwen3.java

Star

Java-port of qwen3.c

java inference quantization inference-engine java-ports q8 cpu-inference llm llms llm-serving llm-inference llm-serve qwen qwen3

Updated Jul 16, 2025
Java

SIYAKS-ARES / survival-with-llms

Star

The Ark Project: Selecting the perfect AI model to reboot civilization from a 64GB USB drive. Comprehensive analysis of open-source LLMs under extreme constraints, with final recommendation: Meta Llama 3.1 70B Instruct (Q6_K GGUF). Includes interactive tools, detailed comparisons, and complete implementation guide for offline deployment.

model-quantization cpu-inference llm llama-cpp qwen gguf open-source-ai mixtral offline-ai meta-llama survival-technology civilization-reboot

Updated Aug 7, 2025
HTML

Improve this page

Add a description, image, and links to the cpu-inference topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the cpu-inference topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cpu-inference

Here are 24 public repositories matching this topic...

kennethleungty / Llama-2-Open-Source-LLM-CPU-Inference

CoderLSF / fast-llama

rbitr / llm.f90

jozsefszalma / homelab

yybit / pllm

laelhalawani / gguf_llama

lucienhuangfu / eLLM

JohnClaw / chatllm.v

codito / arey

JohnClaw / chatllm.vb

JohnClaw / chatllm.cs

JohnClaw / chatllm.nim

chinese-soup / cbot-telegram-whisper

BjornMelin / local-llm-workbench

JohnClaw / chatllm.rs

JohnClaw / chatllm.lua

JohnClaw / gemma-2-2b-it.cs

JohnClaw / chatllm.kt

JohnClaw / qwen3.java

SIYAKS-ARES / survival-with-llms

Improve this page

Add this topic to your repo