This repository contains the complete analysis and output for "The Ark Project," a mission to select a single, self-contained Large Language Model (LLM) to serve as a digital seed of knowledge for rebooting civilization in a post-infrastructure world.
The project evaluates top open-source LLMs against a strict 64 GB storage limit for offline, CPU-only operation on scavenged hardware. It provides a final recommendation, a detailed analysis, and interactive web applications to explore the findings.
This repository is structured to provide the project's findings in multiple formats, catering to different use cases from quick overviews to in-depth analysis.
- File:
infographic.html
- Purpose: A visually engaging, single-page infographic that tells the story of the project. It's the best place to start for a quick, high-level understanding of the mission, the constraints, the final recommendation, and the "how-to" guide.
- How to Use: Open this file in any modern web browser to view the interactive charts and narrative.
- File:
web-page.html
- Purpose: A more detailed, interactive single-page web application. It functions as a dynamic report with a navigation bar, allowing users to jump directly to specific sections like the detailed model comparison, the storage constraint analysis, or the final justification.
- How to Use: Open this file in a web browser for a deeper, self-guided exploration of the project's data and reasoning.
- Files:
AI Model for Civilization Reboot_.pdf
(Recommended for reading)AI Model for Civilization Reboot_.docx
AI Model for Civilization Reboot_.txt
- Purpose: This is the complete, in-depth academic report. It contains the full executive summary, a rigorous analysis of all constraints, a detailed comparative table of the candidate models, and the final, justified recommendation with full citations.
- How to Use: Use the PDF for the best reading and sharing experience. The DOCX and TXT files are provided for accessibility and ease of editing or data extraction.
The core challenge was to select the single best open-source LLM that could be stored and run from a 64 GB USB drive ("The Ark") without internet or high-end GPUs. The chosen model, Meta Llama 3.1 70B (Q6_K GGUF), run with Llama.cpp, was selected as the ultimate "philosopher-engineer"—a tool capable of not just solving technical problems but also guiding the ethical and social reconstruction of society.
Why This Model:
- Storage Efficiency: ~57.89 GB model file + ~15-30 MB runner software = ~58 GB total
- Quality Balance: Q6_K quantization provides optimal balance between file size and model fidelity
- Comprehensive Capabilities: Excels in reasoning, instruction-following, and human-like communication
- Survival-Ready: Can function as teacher, engineer, lawmaker, and philosopher for societal reconstruction
The storage constraint forced critical trade-offs:
- Too Large: Llama 3.1 70B at Q8_0 quantization (~75 GB) exceeds capacity
- Too Small: 30B models leave significant unused space without capability benefits
- Sweet Spot: 70B models at Q6_K quantization fit perfectly with operational buffer
-
Meta Llama 3.1 70B Instruct ⭐ WINNER
- Exceptional reasoning and instruction-following
- Human-like, nuanced communication style
- Perfect storage fit at Q6_K quantization
- Current knowledge base (up to December 2023)
-
Alibaba Qwen 2.5 72B Instruct
- Superior STEM and coding benchmarks
- Slightly larger size (~64.35 GB) requires lower quantization
- More "robotic" communication style
- Occasional non-English character output
-
Mistral Mixtral 8x7B Instruct
- Efficient MoE architecture
- Smaller file size but older generation
- Surpassed by newer 70B models in most benchmarks
ark_model.gguf
: Llama 3.1 70B Instruct model file (Q6_K, ~57.89 GB)runner_linux/
: Linux llama-cli executablerunner_windows/
: Windows llama-cli.exe executableREADME.txt
: Simple instructions for survivors
Windows:
E:
cd runner_windows
llama-cli.exe -m../ark_model.gguf -i --color -ins -c 4096 -t 4
Linux:
cd /media/user/ARK/runner_linux
./llama-cli -m../ark_model.gguf -i --color -ins -c 4096 -t 4
The models were evaluated across five critical dimensions for survival and societal reconstruction:
- Practical Survival & Engineering: Ability to synthesize novel solutions and explain complex processes
- Scientific Knowledge: Depth of understanding in foundational sciences (MMLU scores)
- Humanities & Governance: Quality of writing for laws, education, and philosophy
- Code Generation: Programming capabilities for rebuilding technology
- Versatility & Reasoning: Adaptability to novel problems and complex reasoning
This repository serves as a blueprint and a guide for developers and researchers interested in:
- Offline AI: Running powerful models without internet connectivity
- Model Quantization: Balancing file size and model quality
- Knowledge Preservation: Creating self-contained knowledge systems
- Survival Technology: Practical applications of AI in extreme scenarios
- Specifically designed for efficient CPU inference
- Supports multiple quantization levels (2-bit to 8-bit)
- No external dependencies required
- Broad hardware compatibility
- Lightweight, self-contained C++ application
- Optimized for consumer CPUs
- Pre-compiled binaries available (~10-30 MB)
- No complex installation procedures
- Best balance between file size reduction and accuracy preservation
- Superior performance on CPUs compared to newer I-quant methods
- Mature and well-supported quantization technique
This project represents a comprehensive analysis of offline AI capabilities. Contributions are welcome for:
- Additional model evaluations
- Performance benchmarks on different hardware
- Alternative implementation strategies
- Documentation improvements
This project is open source and available under appropriate licenses for educational and research purposes.
"The Ark Project" - Preserving human knowledge for the future, one model at a time.