EXAONE Deep

🤗 Hugging Face | 📝 Blog | 📑 Documentation

Introduction

We introduce EXAONE Deep, which exhibits superior capabilities in various reasoning tasks including math and coding benchmarks, ranging from 2.4B to 32B parameters developed and released by LG AI Research. Evaluation results show that 1) EXAONE Deep 2.4B outperforms other models of comparable size, 2) EXAONE Deep 7.8B outperforms not only open-weight models of comparable scale but also a proprietary reasoning model OpenAI o1-mini, and 3) EXAONE Deep 32B demonstrates competitive performance against leading open-weight models.

Our documentation consists of the following sections:

Performance: Experimental results of EXAONE Deep models.
Quickstart: A basic guide to using EXAONE Deep models with Transformers.
Quantized Models: An explanation of quantized EXAONE Deep weights in AWQ and GGUF format.
Run Locally: A guide to running EXAONE Deep models locally with llama.cpp and Ollama frameworks.
Deployment: A guide to running EXAONE Deep models with TensorRT-LLM, vLLM, and SGLang deployment frameworks.
Usage Guideline: A guide to utilizing EXAONE Deep models to achieve the expected performance.

News

2025.03.18: We release the EXAONE Deep, reasoning enhanced language models, including 2.4B, 7.8B, and 32B. Check out the 📑 Documentation!

Performance

Some experimental results are shown below. The full evaluation results can be found in the Documentation.

Models	MATH-500 (pass@1)	AIME 2024 (pass@1 / cons@64)	AIME 2025 (pass@1 / cons@64)	CSAT Math 2025 (pass@1)	GPQA Diamond (pass@1)	Live Code Bench (pass@1)
EXAONE Deep 32B	95.7	72.1 / 90.0	65.8 / 80.0	94.5	66.1	59.5
DeepSeek-R1-Distill-Qwen-32B	94.3	72.6 / 83.3	55.2 / 73.3	84.1	62.1	57.2
QwQ-32B	95.5	79.5 / 86.7	67.1 / 76.7	94.4	63.3	63.4
DeepSeek-R1-Distill-Llama-70B	94.5	70.0 / 86.7	53.9 / 66.7	88.8	65.2	57.5
DeepSeek-R1 (671B)	97.3	79.8 / 86.7	66.8 / 80.0	89.9	71.5	65.9

EXAONE Deep 7.8B	94.8	70.0 / 83.3	59.6 / 76.7	89.9	62.6	55.2
DeepSeek-R1-Distill-Qwen-7B	92.8	55.5 / 83.3	38.5 / 56.7	79.7	49.1	37.6
DeepSeek-R1-Distill-Llama-8B	89.1	50.4 / 80.0	33.6 / 53.3	74.1	49.0	39.6
OpenAI o1-mini	90.0	63.6 / 80.0	54.8 / 66.7	84.4	60.0	53.8

EXAONE Deep 2.4B	92.3	52.5 / 76.7	47.9 / 73.3	79.2	54.3	46.6
DeepSeek-R1-Distill-Qwen-1.5B	83.9	28.9 / 52.7	23.9 / 36.7	65.6	33.8	16.9

Quickstart

You need to install transformers>=4.43.1 for the EXAONE Deep models. The latest version is recommended to use.

Here is the example code to show how to use EXAONE Deep models.

Tip

In all the examples below, you can use another size model by changing 7.8B to 32B or 2.4B.

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, TextIteratorStreamer
from threading import Thread

model_name = "LGAI-EXAONE/EXAONE-Deep-7.8B"
streaming = True    # choose the streaming option

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16,
    trust_remote_code=True,
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

# Choose your prompt:
#   Math example (AIME 2024)
prompt = r"""Let $x,y$ and $z$ be positive real numbers that satisfy the following system of equations:
\[\log_2\left({x \over yz}\right) = {1 \over 2}\]\[\log_2\left({y \over xz}\right) = {1 \over 3}\]\[\log_2\left({z \over xy}\right) = {1 \over 4}\]
Then the value of $\left|\log_2(x^4y^3z^2)\right|$ is $\tfrac{m}{n}$ where $m$ and $n$ are relatively prime positive integers. Find $m+n$.

Please reason step by step, and put your final answer within \boxed{}."""
#   Korean MCQA example (CSAT Math 2025)
prompt = r"""Question : $a_1 = 2$인 수열 $\{a_n\}$과 $b_1 = 2$인 등차수열 $\{b_n\}$이 모든 자연수 $n$에 대하여\[\sum_{k=1}^{n} \frac{a_k}{b_{k+1}} = \frac{1}{2} n^2\]을 만족시킬 때, $\sum_{k=1}^{5} a_k$의 값을 구하여라.

Options :
A) 120
B) 125
C) 130
D) 135
E) 140
 
Please reason step by step, and you should write the correct option alphabet (A, B, C, D or E) within \\boxed{}."""

messages = [
    {"role": "user", "content": prompt}
]
input_ids = tokenizer.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True,
    return_tensors="pt"
)

if streaming:
    streamer = TextIteratorStreamer(tokenizer)
    thread = Thread(target=model.generate, kwargs=dict(
        input_ids=input_ids.to("cuda"),
        eos_token_id=tokenizer.eos_token_id,
        max_new_tokens=32768,
        do_sample=True,
        temperature=0.6,
        top_p=0.95,
        streamer=streamer
    ))
    thread.start()

    for text in streamer:
        print(text, end="", flush=True)
else:
    output = model.generate(
        input_ids.to("cuda"),
        eos_token_id=tokenizer.eos_token_id,
        max_new_tokens=32768,
        do_sample=True,
        temperature=0.6,
        top_p=0.95,
    )
    print(tokenizer.decode(output[0]))

Important

The EXAONE Deep models are trained with an optimized configuration, so we recommend following the Usage Guideline section to achieve optimal performance.

Quantized Models

We introduce a series of quantized weights of EXAONE Deep models.

AWQ

We provide AWQ-quantized weights of EXAONE Deep models, quantized using AutoAWQ library. Please refer to the EXAONE Deep collection for pre-quantized weights, and the AutoAWQ documentation for more details.

You need to install the latest version of AutoAWQ library (autoawq>=0.2.8) to load the AWQ-quantized version of EXAONE Deep models.

pip install autoawq

You can load the model in similar ways to the original models, only changing the model name. It automatically loads with AWQ configuration of the model. Please check the Quickstart section above for more details.

GGUF

We provide weights in BF16 format and quantized weights in Q8_0, Q6_K, Q5_K_M, Q4_K_M, IQ4_XS.

The example below is for the 7.8B model in BF16 format. Please refer to the EXAONE Deep collection to find quantized models. You may need to install huggingface_hub to download the GGUF weights.

# (optional) install huggingface_hub
pip install huggingface_hub

# Download the GGUF weights
huggingface-cli download LGAI-EXAONE/EXAONE-Deep-7.8B-GGUF \
    --include "EXAONE-Deep-7.8B-BF16*.gguf" \
    --local-dir .

Run Locally

For end users, we introduce two ways to run EXAONE Deep models locally.

Note

We highly recommend to use repetition penalty not exceeding 1.0 for better generation quality.

llama.cpp

You can run EXAONE models with llama.cpp as follows:

Install llama.cpp. Please refer to the llama.cpp repository for more details.
Download EXAONE Deep model in GGUF format.

huggingface-cli download LGAI-EXAONE/EXAONE-Deep-7.8B-GGUF \
    --include "EXAONE-Deep-7.8B-BF16*.gguf" \
    --local-dir .

Run the model with llama.cpp in conversational mode. We set chat-template explicitly to handle reasoning steps properly.

llama-cli -m ./EXAONE-Deep-7.8B-BF16.gguf \
    -sys "" \
    -c 32768 \
    --temp 0.6 \
    --top-p 0.95 \
    --jinja \
    --chat-template "{% for message in messages %}{% if loop.first and message['role'] != 'system' %}{{ '[|system|][|endofturn|]\n' }}{% endif %}{% set content = message['content'] %}{% if '</thought>' in content %}{% set content = content.split('</thought>')[-1].lstrip('\\n') %}{% endif %}{{ '[|' + message['role'] + '|]' + content }}{% if not message['role'] == 'user' %}{{ '[|endofturn|]' }}{% endif %}{% if not loop.last %}{{ '\n' }}{% endif %}{% endfor %}{% if add_generation_prompt %}{{ '\n[|assistant|]<thought>\n' }}{% endif %}"

In case of using EXAONE Deep 32B model with BF16 precision, you may need to download all split files and merge them before running the model.

# Download all split files
huggingface-cli download LGAI-EXAONE/EXAONE-Deep-32B-GGUF \
    --include "EXAONE-Deep-32B-BF16*.gguf" \
    --local-dir .

# Merge all split files
llama-gguf-split --merge \
    ./EXAONE-Deep-32B-BF16-00001-of-00002.gguf \
    ./EXAONE-Deep-32B-BF16.gguf

Ollama

EXAONE Deep models are uploaded to Ollama model library. You can easily use EXAONE models as follows:

Install Ollama. Please refer to the Ollama repository for more details.
Run EXAONE Deep model as follows:

ollama run exaone-deep:7.8b

Note

In above example, the model exaone-deep:7.8b is quantized in Q4_K_M. If you would like to know a list of available models, please refer to the EXAONE Deep Ollama page for more details.

Or, you can create and run EXAONE Deep models with GGUF format for customizing.

Install Ollama. Please refer to the Ollama repository for more details.
Download EXAONE Deep model in GGUF format. Please refer to the GGUF section for more details.
Write the Modelfile for EXAONE Deep.

# Model path (choose appropriate GGUF weights on your own)
FROM ./EXAONE-Deep-7.8B-BF16.gguf

# Parameter values
PARAMETER stop "[|endofturn|]"
PARAMETER repeat_penalty 1.0
PARAMETER num_ctx 32768
PARAMETER temperature 0.6
PARAMETER top_p 0.95

# Chat template
#   Note: currently there is no feature of removing `<thought></thought>` steps from context 
#   because ollama does not support yet. We will update when according feature is available.
TEMPLATE """{{- range $i, $_ := .Messages }}
{{- $last := eq (len (slice $.Messages $i)) 1 -}}
{{ if eq .Role "system" }}[|system|]{{ .Content }}[|endofturn|]
{{ continue }}
{{ else if eq .Role "user" }}[|user|]{{ .Content }}
{{ else if eq .Role "assistant" }}[|assistant|]{{ .Content }}[|endofturn|]
{{ end }}
{{- if and (ne .Role "assistant") $last }}[|assistant|]<thought>
{{ end }}
{{- end -}}"""

# System prompt
SYSTEM """"""

# License
LICENSE """EXAONE AI Model License Agreement 1.1 - NC """

Convert the model to Ollama.

ollama create exaone -f Modelfile

Run the model with Ollama.

ollama run exaone

LM-Studio

You can run EXAONE Deep models on your device with LM-Studio.

Install LM-Studio. Please refer to the LM-Studio Page for more details.
Download EXAONE Deep model in GGUF format. You can search and find proper model at Model Search.

Configure the prompt setting.

Set "Reasoning Section Parsing" as <thought> and </thought>

Set "Template (Jinja)" as that of EXAONE 3.5.

{% for message in messages %}{% if loop.first and message['role'] != 'system' %}{{ '[|system|][|endofturn|]\n' }}{% endif %}{{ '[|' + message['role'] + '|]' + message['content'] }}{% if message['role'] == 'user' %}{{ '\n' }}{% else %}{{ '[|endofturn|]\n' }}{% endif %}{% endfor %}{% if add_generation_prompt %}{{ '[|assistant|]' }}{% endif %}

Or, you can use custom prompt as below.

Deployment

EXAONE Deep models have been integrated into various deployment frameworks.

Important

Before your deployment of EXAONE Deep models, we recommend following the Usage Guideline section to achieve the expected performance.

TensorRT-LLM

TensorRT-LLM has supported EXAONE language models since EXAONE 3.0. We recommend using TensorRT-LLM for the best performance. You can run EXAONE Deep models with TensorRT-LLM by following the instructions on TensorRT-LLM EXAONE Example.

Note

When you convert EXAONE Deep models to TensorRT-LLM format, you may need to set the environment variable TRTLLM_DISABLE_UNIFIED_CONVERTER=1.

Note

TensorRT-LLM also supports AWQ on their own methods. If you want to use AWQ with TensorRT-LLM, please refer to the AWQ section in TensorRT-LLM EXAONE Example.

vLLM

You can easily run EXAONE Deep models with vLLM.

Install vLLM (vllm>=0.6.0). Please refer to the vLLM quickstart guide for more details.

pip install vllm

Run the models with vLLM.

vllm serve LGAI-EXAONE/EXAONE-Deep-7.8B

Send a request with the following curl command after the server starts.

curl -X POST http://localhost:8000/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{
        "model": "LGAI-EXAONE/EXAONE-Deep-7.8B",
        "messages": [
            {"role": "user", "content": "How many golf balls can fit in a school bus?"}
        ],
        "max_tokens": 30720,
        "temperature": 0.6,
        "top_p": 0.95
    }'

Note

If you want to serve GGUF quantized models with vLLM, please refer to the vLLM GGUF documentation.

SGLang

You can also run EXAONE Deep models with SGLang.

Install SGLang. Please refer to the SGLang documentation for more details.
Run the server with the following command.

python -m sglang.launch_server --model-path LGAI-EXAONE/EXAONE-Deep-7.8B \
    --port 30000 --host 0.0.0.0

Note

In case of using EXAONE Deep 2.4B model, you need to install sglang>=0.3.6 and use --attention-backend triton option.

Additionally, we are currently working on a PR to fix the incompatibility of flashinfer with EXAONE Deep 2.4B. Please refer to the PR for details.

Send a request with the following curl command after the server starts.

curl -s http://0.0.0.0:30000/v1/chat/completions \
    -d '{
        "model": "LGAI-EXAONE/EXAONE-Deep-7.8B",
        "messages": [
            {"role": "user", "content": "How many golf balls can fit in a school bus?"}
        ],
        "max_tokens": 30720,
        "temperature": 0.6,
        "top_p": 0.95,
    }'

Usage Guideline

To achieve the expected performance, we recommend using the following configurations:

Ensure the model starts with <thought>\n for reasoning steps. The model's output quality may be degraded when you omit it. You can easily apply this feature by using tokenizer.apply_chat_template() with add_generation_prompt=True. Please check the example code on Quickstart section.
The reasoning steps of EXAONE Deep models enclosed by <thought>\n...\n</thought> usually have lots of tokens, so previous reasoning steps may be necessary to be removed in multi-turn situation. The provided tokenizer handles this automatically.
Avoid using system prompt, and build the instruction on the user prompt.
Additional instructions help the models reason more deeply, so that the models generate better output.
- For math problems, the instructions "Please reason step by step, and put your final answer within \boxed{}." are helpful.
- For more information on our evaluation setting including prompts, please refer to our Documentation.
In our evaluation, we use temperature=0.6 and top_p=0.95 for generation.
When evaluating the models, it is recommended to test multiple times to assess the expected performance accurately.

Limitation

The EXAONE language model has certain limitations and may occasionally generate inappropriate responses. The language model generates responses based on the output probability of tokens, and it is determined during learning from training data. While we have made every effort to exclude personal, harmful, and biased information from the training data, some problematic content may still be included, potentially leading to undesirable responses. Please note that the text generated by EXAONE language model does not reflects the views of LG AI Research.

Inappropriate answers may be generated, which contain personal, harmful or other inappropriate information.
Biased responses may be generated, which are associated with age, gender, race, and so on.
The generated responses rely heavily on statistics from the training data, which can result in the generation of semantically or syntactically incorrect sentences.
Since the model does not reflect the latest information, the responses may be false or contradictory.

LG AI Research strives to reduce potential risks that may arise from EXAONE language models. Users are not allowed to engage in any malicious activities (e.g., keying in illegal information) that may induce the creation of inappropriate outputs violating LG AI's ethical principles when using EXAONE language models.

License

The model is licensed under EXAONE AI Model License Agreement 1.1 - NC

Citation

@article{exaone-deep,
  title={EXAONE Deep: Reasoning Enhanced Language Models},
  author={{LG AI Research}},
  journal={arXiv preprint arXiv:2503.12524},
  year={2025}
}

Contact

LG AI Research Technical Support: [email protected]

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
assets		assets
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

EXAONE Deep

Introduction

News

Performance

Quickstart

Quantized Models

AWQ

GGUF

Run Locally

llama.cpp

Ollama

LM-Studio

Deployment

TensorRT-LLM

vLLM

SGLang

Usage Guideline

Limitation

License

Citation

Contact

About

Uh oh!

Releases

Packages

Contributors 3

License

LG-AI-EXAONE/EXAONE-Deep

Folders and files

Latest commit

History

Repository files navigation

EXAONE Deep

Introduction

News

Performance

Quickstart

Quantized Models

AWQ

GGUF

Run Locally

llama.cpp

Ollama

LM-Studio

Deployment

TensorRT-LLM

vLLM

SGLang

Usage Guideline

Limitation

License

Citation

Contact

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Packages