Skip to content

[Bug] Colab gpt-oss-(20B)-Fine-tuning.ipynb Failed Generate T4 #3144

@HarrisDePerceptron

Description

@HarrisDePerceptron
  1. Did you update? yes. colab
  2. Colab
  3. Number GPUs used, 1
  4. Which notebook? Please link!
  5. Which Unsloth version, TRL version, transformers version, PyTorch version?
Unsloth version: 2025.8.4
Transformers version: 4.56.0.dev0
Torch version: 2.8.0+cu128
  1. Which trainer? SFTTrainer, GRPOTrainer etc : SFTTrainer
messages = [
    {"role": "user", "content": "Solve x^5 + 3x^4 - 10 = 3."},
]
inputs = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt = True,
    return_tensors = "pt",
    return_dict = True,
    reasoning_effort = "low", # **NEW!** Set reasoning effort to low, medium or high
).to(model.device)

_ = model.generate(**inputs, max_new_tokens = 512, streamer = TextStreamer(tokenizer))

Cell/Error Output

<|start|>system<|message|>You are ChatGPT, a large language model trained by OpenAI.
Knowledge cutoff: 2024-06
Current date: 2025-08-13

Reasoning: low

# Valid channels: analysis, commentary, final. Channel must be included for every message.
Calls to these tools must go to the commentary channel: 'functions'.<|end|><|start|>user<|message|>Solve x^5 + 3x^4 - 10 = 3.<|end|><|start|>assistant<|channel|>analysis<|message|>Equation: x^5 + 3x^4 - 10 = 3. So x^5 + 3x^4 - 13 =0. Solve for real roots? maybe numeric. Let's try approximate.

We can test integer roots: try x=1 => 1+3-13=-9. x=2 =>32+48-13=67. So root between 1 and 2. Try x=1.5 => (1.5)^5=7.59375 +3*(1.5^4=5.0625)=15.1875 total 22.78125-13=9.78125>0. So root between 1 and 1.5. Try x=1.2: 1.2^5=2.48832 +3*1.2^4=3*2.0736=6.2208 total=8.70912-13=-4.29088 negative. x=1.3: 1.3^5=3.71293 +3*1.3^4=3*2.856=8.568 total=12.28093-13=-0.71907. x=1.35: 1.35^5=1.35^4*1.35. 1.35^2=1.8225, ^4=3.325, times 1.35=4.4918. plus 3*1.35^4=3*3.325=9.975 total=14.4668-13=1.4668. So root between 1.3 and 1.35. Interpolate. try 1.32: 1.32^5: (1.32^2=1.7424, ^4=3.0364, *1.32=4.0055) plus 

---------------------------------------------------------------------------

AcceleratorError                          Traceback (most recent call last)

[/usr/local/lib/python3.11/dist-packages/unsloth/models/vision.py](https://localhost:8080/#) in unsloth_base_fast_generate(self, *args, **kwargs)
    232         with torch.inference_mode(), autocaster:
--> 233             output = self._old_generate(*args, **kwargs)
    234     except:

9 frames

[/usr/local/lib/python3.11/dist-packages/torch/utils/_contextlib.py](https://localhost:8080/#) in decorate_context(*args, **kwargs)
    119         with ctx_factory():
--> 120             return func(*args, **kwargs)
    121 

[/usr/local/lib/python3.11/dist-packages/transformers/generation/utils.py](https://localhost:8080/#) in generate(self, inputs, generation_config, logits_processor, stopping_criteria, prefix_allowed_tokens_fn, synced_gpus, assistant_model, streamer, negative_prompt_ids, negative_prompt_attention_mask, use_model_defaults, custom_generate, **kwargs)
   2475             # 12. run assisted generate
-> 2476             result = self._assisted_decoding(
   2477                 input_ids,

[/usr/local/lib/python3.11/dist-packages/transformers/generation/utils.py](https://localhost:8080/#) in _assisted_decoding(self, input_ids, candidate_generator, logits_processor, stopping_criteria, generation_config, synced_gpus, streamer, **model_kwargs)
   4875                 # Ensure we don't generate beyond max_len or an EOS token
-> 4876                 if is_done_candidate and n_matches == candidate_length:
   4877                     n_matches -= 1

AcceleratorError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.


During handling of the above exception, another exception occurred:

AcceleratorError                          Traceback (most recent call last)

[/tmp/ipython-input-1892116402.py](https://localhost:8080/#) in <cell line: 0>()
     12 ).to(model.device)
     13 
---> 14 _ = model.generate(**inputs, max_new_tokens = 512, streamer = TextStreamer(tokenizer))

[/usr/local/lib/python3.11/dist-packages/peft/peft_model.py](https://localhost:8080/#) in generate(self, *args, **kwargs)
    884         with self._enable_peft_forward_hooks(*args, **kwargs):
    885             kwargs = {k: v for k, v in kwargs.items() if k not in self.special_peft_forward_args}
--> 886             return self.get_base_model().generate(*args, **kwargs)
    887 
    888     def _get_base_model_class(self, is_prompt_tuning=False):

[/usr/local/lib/python3.11/dist-packages/unsloth/models/vision.py](https://localhost:8080/#) in unsloth_base_fast_generate(self, *args, **kwargs)
    236         kwargs.pop("prompt_lookup_num_tokens", None)
    237         with torch.inference_mode(), autocaster:
--> 238             output = self._old_generate(*args, **kwargs)
    239     finally:
    240         pass

[/usr/local/lib/python3.11/dist-packages/torch/utils/_contextlib.py](https://localhost:8080/#) in decorate_context(*args, **kwargs)
    118     def decorate_context(*args, **kwargs):
    119         with ctx_factory():
--> 120             return func(*args, **kwargs)
    121 
    122     return decorate_context

[/usr/local/lib/python3.11/dist-packages/transformers/generation/utils.py](https://localhost:8080/#) in generate(self, inputs, generation_config, logits_processor, stopping_criteria, prefix_allowed_tokens_fn, synced_gpus, assistant_model, streamer, negative_prompt_ids, negative_prompt_attention_mask, use_model_defaults, custom_generate, **kwargs)
   2311 
   2312         device = inputs_tensor.device
-> 2313         self._prepare_special_tokens(generation_config, kwargs_has_attention_mask, device=device)
   2314 
   2315         # decoder-only models must use left-padding for batched generation.

[/usr/local/lib/python3.11/dist-packages/transformers/generation/utils.py](https://localhost:8080/#) in _prepare_special_tokens(self, generation_config, kwargs_has_attention_mask, device)
   2053             return torch.tensor(token, device=device, dtype=torch.long)
   2054 
-> 2055         bos_token_tensor = _tensor_or_none(generation_config.bos_token_id, device=device)
   2056         eos_token_tensor = _tensor_or_none(generation_config.eos_token_id, device=device)
   2057         pad_token_tensor = _tensor_or_none(generation_config.pad_token_id, device=device)

[/usr/local/lib/python3.11/dist-packages/transformers/generation/utils.py](https://localhost:8080/#) in _tensor_or_none(token, device)
   2051             if isinstance(token, torch.Tensor):
   2052                 return token.to(device)
-> 2053             return torch.tensor(token, device=device, dtype=torch.long)
   2054 
   2055         bos_token_tensor = _tensor_or_none(generation_config.bos_token_id, device=device)

AcceleratorError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

@shaper @fenglui @corbt @DiogoNeves @Qubitium

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions