port `t5` and `clip` to `nnx.Module`

Currently, we load the `t5` and `clip` models as torch modules and wrap their outputs as JAX arrays. This prevents OOM during inference (since we can use the `.to()` method); having a pure JAX inference pipeline would be ideal.

## References

1. https://github.com/huggingface/transformers/issues/24711
2. https://github.com/huggingface/transformers/pull/15295