[Feature Request] Llama 4

## Checklist
* [ ] Support [temperature tuning](https://github.com/meta-llama/llama-models/blob/2b2e5b2645c962f92dc004aa868696ec0e53b05c/models/llama4/model.py#L233-L239)
* [ ] [chunk attention mask](https://github.com/meta-llama/llama-models/blob/2b2e5b2645c962f92dc004aa868696ec0e53b05c/models/llama4/model.py#L437-L446), we don't really need new kernels, just separate different chunk into different instances in the batch (considering they are totally independency).
  * [ ] We an example about this.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature Request] Llama 4 #1004

Checklist

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Feature Request] Llama 4 #1004

Description

Checklist

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions