GAIA is made of more than 450 non-trivial question with an unambiguous answer, requiring different levels of tooling and autonomy to solve. It is therefore divided in 3 levels, where level 1 should be breakable by very good LLMs, and level 3 indicate a strong jump in model capabilities. Each level is divided into a fully public dev set for validation, and a test set with private answers and metadata.
This project is aim to use ADK and A2A protocol to solve GAIA.
The system employs a hierarchical multi-agent architecture where a central OrchestratorAgent
analyzes incoming GAIA tasks and delegates sub-tasks to a suite of specialized agents, each equipped with specific tools and instructions for functions like web searching, document processing, code execution, or calculations. This division of labor allows each specialist agent to focus on its core competency, with the orchestrator coordinating their efforts to achieve the final solution.
# .env
GOOGLE_API_KEY=AIzaSy...YOUR...KEY...HERE
GOOGLE_GENAI_USE_VERTEXAI=FALSE
# Optional: Vertex AI settings if GOOGLE_GENAI_USE_VERTEXAI=TRUE
# GOOGLE_CLOUD_PROJECT="your-gcp-project-id"
# GOOGLE_CLOUD_LOCATION="your-gcp-region"
orchestrator_model
: LLM used by the main coordinating agent.specialist_model_flash
: LLM for fast, general-purpose specialist agents.specialist_model_pro
: LLM for more complex specialist agents (e.g., document/multimodal processing).ollama_model
: (Optional) Name of a locally run Ollama model.gaia_data_dir
: Path to the GAIA dataset directory.api_port
: Network port for the agent's API server.runner_strategy
: How to select tasks forrun_gaia.py
(all
,single
,first_n
).runner_task_id
: Specific task ID to run ifrunner_strategy
issingle
.runner_first_n
: Number of tasks to run ifrunner_strategy
isfirst_n
.runner_max_retries
: Maximum retry attempts for a failed task.runner_max_workers
: Number of concurrent tasksrun_gaia.py
can process.gaia_split
: Which GAIA dataset split to use (validation
ortest
).save_debug_chat_log
: Boolean (true
/false
) to enable/disable saving detailed LLM interaction logs.
Validation Set ( 2025-05-28 ):
Level | Tasks Evaluated | Correct Answers | Accuracy (%) |
---|---|---|---|
Level 1 | 53 | 40 | 75.47% |
Level 2 | 86 | 49 | 56.98% |
Level 3 | 26 | 11 | 42.31% |
Overall | 165 | 100 | 60.61% |
Test Set ( 2025-05-10 ):
- Accuracy: 37.54% * from GAIA official website
- Improve the prompts, agents and tools task by task 🧠💡
- Use and build the client independently of GAIA 🧩
Contributions are welcome! If you find a bug or have suggestions for improvement, please open an Issue or Pull Request.
This software is licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at:
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.