Skip to content

The Ark Project: Selecting the perfect AI model to reboot civilization from a 64GB USB drive. Comprehensive analysis of open-source LLMs under extreme constraints, with final recommendation: Meta Llama 3.1 70B Instruct (Q6_K GGUF). Includes interactive tools, detailed comparisons, and complete implementation guide for offline deployment.

License

Notifications You must be signed in to change notification settings

SIYAKS-ARES/survival-with-llms

Repository files navigation

The Ark Project: A Guide to Survival with LLMs

This repository contains the complete analysis and output for "The Ark Project," a mission to select a single, self-contained Large Language Model (LLM) to serve as a digital seed of knowledge for rebooting civilization in a post-infrastructure world.

The project evaluates top open-source LLMs against a strict 64 GB storage limit for offline, CPU-only operation on scavenged hardware. It provides a final recommendation, a detailed analysis, and interactive web applications to explore the findings.


🚀 Repository Contents

This repository is structured to provide the project's findings in multiple formats, catering to different use cases from quick overviews to in-depth analysis.

1. Interactive Infographic

  • File: infographic.html
  • Purpose: A visually engaging, single-page infographic that tells the story of the project. It's the best place to start for a quick, high-level understanding of the mission, the constraints, the final recommendation, and the "how-to" guide.
  • How to Use: Open this file in any modern web browser to view the interactive charts and narrative.

2. Interactive Web Report

  • File: web-page.html
  • Purpose: A more detailed, interactive single-page web application. It functions as a dynamic report with a navigation bar, allowing users to jump directly to specific sections like the detailed model comparison, the storage constraint analysis, or the final justification.
  • How to Use: Open this file in a web browser for a deeper, self-guided exploration of the project's data and reasoning.

3. The Full Report

  • Files:
    • AI Model for Civilization Reboot_.pdf (Recommended for reading)
    • AI Model for Civilization Reboot_.docx
    • AI Model for Civilization Reboot_.txt
  • Purpose: This is the complete, in-depth academic report. It contains the full executive summary, a rigorous analysis of all constraints, a detailed comparative table of the candidate models, and the final, justified recommendation with full citations.
  • How to Use: Use the PDF for the best reading and sharing experience. The DOCX and TXT files are provided for accessibility and ease of editing or data extraction.

🎯 Mission Objective

The core challenge was to select the single best open-source LLM that could be stored and run from a 64 GB USB drive ("The Ark") without internet or high-end GPUs. The chosen model, Meta Llama 3.1 70B (Q6_K GGUF), run with Llama.cpp, was selected as the ultimate "philosopher-engineer"—a tool capable of not just solving technical problems but also guiding the ethical and social reconstruction of society.

🔍 Key Findings

Final Recommendation: Meta Llama 3.1 70B Instruct (Q6_K GGUF)

Why This Model:

  • Storage Efficiency: ~57.89 GB model file + ~15-30 MB runner software = ~58 GB total
  • Quality Balance: Q6_K quantization provides optimal balance between file size and model fidelity
  • Comprehensive Capabilities: Excels in reasoning, instruction-following, and human-like communication
  • Survival-Ready: Can function as teacher, engineer, lawmaker, and philosopher for societal reconstruction

The Tyranny of 64 Gigabytes

The storage constraint forced critical trade-offs:

  • Too Large: Llama 3.1 70B at Q8_0 quantization (~75 GB) exceeds capacity
  • Too Small: 30B models leave significant unused space without capability benefits
  • Sweet Spot: 70B models at Q6_K quantization fit perfectly with operational buffer

Top Contenders Analysis

  1. Meta Llama 3.1 70B InstructWINNER

    • Exceptional reasoning and instruction-following
    • Human-like, nuanced communication style
    • Perfect storage fit at Q6_K quantization
    • Current knowledge base (up to December 2023)
  2. Alibaba Qwen 2.5 72B Instruct

    • Superior STEM and coding benchmarks
    • Slightly larger size (~64.35 GB) requires lower quantization
    • More "robotic" communication style
    • Occasional non-English character output
  3. Mistral Mixtral 8x7B Instruct

    • Efficient MoE architecture
    • Smaller file size but older generation
    • Surpassed by newer 70B models in most benchmarks

🛠️ Implementation Guide

Contents of "The Ark" (64 GB USB Drive)

  • ark_model.gguf: Llama 3.1 70B Instruct model file (Q6_K, ~57.89 GB)
  • runner_linux/: Linux llama-cli executable
  • runner_windows/: Windows llama-cli.exe executable
  • README.txt: Simple instructions for survivors

Quick Start Commands

Windows:

E:
cd runner_windows
llama-cli.exe -m../ark_model.gguf -i --color -ins -c 4096 -t 4

Linux:

cd /media/user/ARK/runner_linux
./llama-cli -m../ark_model.gguf -i --color -ins -c 4096 -t 4

📊 Evaluation Criteria

The models were evaluated across five critical dimensions for survival and societal reconstruction:

  1. Practical Survival & Engineering: Ability to synthesize novel solutions and explain complex processes
  2. Scientific Knowledge: Depth of understanding in foundational sciences (MMLU scores)
  3. Humanities & Governance: Quality of writing for laws, education, and philosophy
  4. Code Generation: Programming capabilities for rebuilding technology
  5. Versatility & Reasoning: Adaptability to novel problems and complex reasoning

🎨 Design Philosophy

This repository serves as a blueprint and a guide for developers and researchers interested in:

  • Offline AI: Running powerful models without internet connectivity
  • Model Quantization: Balancing file size and model quality
  • Knowledge Preservation: Creating self-contained knowledge systems
  • Survival Technology: Practical applications of AI in extreme scenarios

📚 Technical Details

Why GGUF Format?

  • Specifically designed for efficient CPU inference
  • Supports multiple quantization levels (2-bit to 8-bit)
  • No external dependencies required
  • Broad hardware compatibility

Why Llama.cpp?

  • Lightweight, self-contained C++ application
  • Optimized for consumer CPUs
  • Pre-compiled binaries available (~10-30 MB)
  • No complex installation procedures

Why Q6_K Quantization?

  • Best balance between file size reduction and accuracy preservation
  • Superior performance on CPUs compared to newer I-quant methods
  • Mature and well-supported quantization technique

🤝 Contributing

This project represents a comprehensive analysis of offline AI capabilities. Contributions are welcome for:

  • Additional model evaluations
  • Performance benchmarks on different hardware
  • Alternative implementation strategies
  • Documentation improvements

📄 License

This project is open source and available under appropriate licenses for educational and research purposes.


"The Ark Project" - Preserving human knowledge for the future, one model at a time.

About

The Ark Project: Selecting the perfect AI model to reboot civilization from a 64GB USB drive. Comprehensive analysis of open-source LLMs under extreme constraints, with final recommendation: Meta Llama 3.1 70B Instruct (Q6_K GGUF). Includes interactive tools, detailed comparisons, and complete implementation guide for offline deployment.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages