feat(model): add support for multiple chat models #4228

lkk214 · 2025-05-09T18:12:04Z

Description

Add an optional title field (defaults to model_name if empty) to HttpModelConfig to customize model name display and avoid conflicts between identical names from different providers
Maintain backward compatibility with the single chat model configuration, while adding support for multiple models, each with independent settings (e.g., rate_limit, prompt_template)

Configuration Examples

Single Model (Legacy Format)

[model.chat.http]
title = "single_chat"  # <-- New optional field
kind = "openai/chat"
api_endpoint = "http://192.168.1.234:11344/v1"
model_name = "starcoder:1b"
supported_models = ["starcoder:1b", "qwen2.5-coder:1.5b"]

Multiple Models (New Format)

[[model.chat.http]]  # <-- Array syntax
kind = "openai/chat"
api_endpoint = "http://192.168.1.234:11344/v1"
model_name = "deepseek-chat"

[[model.chat.http]]
title = "dp-chat"  # <-- Unique display name
kind = "openai/chat"
model_name = "deepseek-chat"
api_endpoint = "https://api.deepseek.com/v1"
api_key = "sk-xxxxxxxx"
supported_models = ["deepseek-chat", "deepseek-reasoner"]

- Introduce `ModelConfigVariant` to handle single or multiple model configurations - Update validation logic to check for duplicate models in configurations - Update chat model handling to support multiple HTTP models - Add `MultiChatStream` to manage multiple chat streams and handle model selection - Modify health check and model loading to work with new configuration structure

Temporarily use the first model config in the list for chat model health conversion.

zwpaper · 2025-05-16T16:59:14Z

Hi @lkk214, thanks for the contribution, this is really a big changes, may I ask what is your scenarios by add multiple http models?

we depend on the models in many place, It's not safe to do such a big refactor in short time

lkk214 · 2025-05-16T18:10:09Z

Hi @zwpaper,thanks for your review! Here’s the rationale for multi-model support:

Task-specific optimization: Different models excel at different tasks (e.g., code analysis, doc generation, vulnerability detection). Multi-model configs let users switch models per use case.
Cost/performance trade-offs: Users may combine large (high-quality) and small (cost-efficient) models, dynamically choosing based on task complexity.
Real-time comparison(A/B testing): Developers can compare model outputs for specific scenarios - especially useful when evaluating large models, and find the results they want across multiple model outputs
Extended flexibility: While we currently support multi-model switching under a single API endpoint, this PR extends support to cross-provider models (e.g., local + cloud models). Backward compatibility is maintained for single-model users.

Moreover, we already support multiple http model providers, such as deepseek, openrouter, etc. It would be a better improvement if we can support multiple model configurations, at least we can use them at the same time.

zwpaper · 2025-06-02T18:04:50Z

Thank you for the comprehensive reply. However, as models become increasingly powerful and versatile, it seems less essential to employ distinct models for various applications.

Tabby relies heavily on the model's usage; we may need to discuss further before implementing such a significant change.

May I inquire if this is utilized in your business scenario?

lkk214 · 2025-06-04T03:22:12Z

@zwpaper Thanks for your reply. I use multiple models in my work. Although a single model can solve most problems, I prefer to find satisfactory answers from multiple models.

This is more like an enhancement than a core change. It won’t affect previous configurations—it simply provides more possibilities for model configuration.

I understand your concerns. Compared to this change, which might impact the core functionality, maintaining code stability is far more important—it’s absolutely critical.

This PR may not be a high priority for Tabby. If the team decides not to merge it, I can close it or leave it as-is for future reference. This can be closed anytime if needed.

getpantoai · 2025-07-01T15:37:02Z

PR Summary:

This PR introduces support for managing multiple chat models by adding a MultiChatStream that aggregates multiple HTTP chat streams, and it updates the configuration model to accept multiple HTTP model configs for chat. Key changes include:

• New functions to create a composite chat stream (create_multiple, add_engine).
• Updated model configuration validation and parsing to support both single and multiple chat model definitions.
• Propagation of these changes across API bindings, service loaders, and related tests.

getpantoai · 2025-07-01T15:38:08Z

crates/tabby/src/serve.rs

@@ -222,7 +222,12 @@ async fn load_model(config: &Config) {
        download_model_if_needed(&model.model_id, ModelKind::Completion).await;
    }

-    if let Some(ModelConfig::Local(ref model)) = config.model.chat {
+    if let Some(model) = config


Consider logging a warning or error when a chat model is not a local configuration in load_model, if that case is unexpected per business requirements.

getpantoai · 2025-07-01T15:38:10Z

Reviewed up to commit:cdea1a5fb3afab9155bf03281e03e71dd6510036

Additional Suggestion

crates/tabby/src/routes/models.rs, line:30-34

Verify that the aggregation of chat model titles (combining primary title and supported models) does not introduce duplicates or ambiguities in display names.

crates/tabby/src/serve.rs, line:398-404

Validate that using unwrap_or for chat_device is adequate—consider explicitly checking for misconfiguration to avoid silently falling back to a default device.

crates/tabby/src/services/health.rs, line:90-96

Ensure get_model_configs() returns a non-empty array before accessing index 0 to avoid a potential panic.

let chat = model_config
    .chat
    .as_ref()
    .and_then(|m| m.get_model_configs().get(0).cloned())
    .as_ref()
    .map(ModelHealth::from);

crates/http-api-bindings/src/chat/mod.rs, line:80

Consider returning a Result or using graceful error handling for unsupported model kinds instead of panicking, so that the error can be managed upstream.

fn create_engine(model: &HttpModelConfig) -> Box<dyn ChatCompletionStream> {
    let api_endpoint = model
        .api_endpoint
        .as_deref()
        .expect("api_endpoint is required");

    match model.kind.as_str() {
        "azure/chat" => {
            let config = async_openai_alt::config::AzureConfig::new()
                .with_api_base(api_endpoint)
                .with_api_key(model.api_key.clone().unwrap_or_default())
                .with_api_version(AZURE_API_VERSION.to_string())
                .with_deployment_id(model.model_name.as_deref().expect("Model name is required"));
            Box::new(
                async_openai_alt::Client::with_config(config)
                    .with_http_client(create_reqwest_client(api_endpoint)),
            )
        }
        "openai/chat" | "mistral/chat" => {
            let config = OpenAIConfig::default()
                .with_api_base(api_endpoint)
                .with_api_key(model.api_key.clone().unwrap_or_default());

            let mut builder = ExtendedOpenAIConfig::builder();
            builder
                .base(config)
                .kind(model.kind.clone())
                .supported_models(model.supported_models.clone())
                .model_name(model.model_name.as_deref().expect("Model name is required"));

            Box::new(
                async_openai_alt::Client::with_config(
                    builder.build().expect("Failed to build config"),
                )
                .with_http_client(create_reqwest_client(api_endpoint)),
            )
        }
        _ => {
            // Instead of panicking, return a dummy implementation or handle gracefully.
            // For illustration, returning a Box::new(UnsupportedModelKind) or similar.
            // You may define an UnsupportedModelKind struct implementing ChatCompletionStream.
            Box::new(crate::unsupported::UnsupportedModelKind::new(model.kind.clone()))
        }
    }
}

Or refactor to return a Result and propagate error handling upstream.

_{Reviewed by Panto AI}

getpantoai · 2025-07-01T15:38:12Z

Panto AI has reviewed this pull request and provided key insights to improve your code.
Panto AI delivers highly relevant, noise-free code reviews for developers' repositories.

Need more in-depth reviews or have questions? Reach out to us at www.getpanto.ai or via email at [email protected] 🚀

lkk214 added 3 commits May 10, 2025 02:03

Merge branch 'main' into feat/multiple-chat-models

19b0c63

fix(model): adjust chat model health conversion logic

cdea1a5

Temporarily use the first model config in the list for chat model health conversion.

wsxiaoys requested a review from zwpaper May 19, 2025 23:07

getpantoai reviewed Jul 1, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(model): add support for multiple chat models #4228

feat(model): add support for multiple chat models #4228

Uh oh!

lkk214 commented May 9, 2025

Uh oh!

zwpaper commented May 16, 2025

Uh oh!

lkk214 commented May 16, 2025

Uh oh!

zwpaper commented Jun 2, 2025

Uh oh!

lkk214 commented Jun 4, 2025

Uh oh!

getpantoai commented Jul 1, 2025

Uh oh!

getpantoai Jul 1, 2025

Uh oh!

getpantoai commented Jul 1, 2025

Uh oh!

getpantoai commented Jul 1, 2025

Uh oh!

Uh oh!

feat(model): add support for multiple chat models #4228

Are you sure you want to change the base?

feat(model): add support for multiple chat models #4228

Uh oh!

Conversation

lkk214 commented May 9, 2025

Description

Configuration Examples

Single Model (Legacy Format)

Multiple Models (New Format)

Uh oh!

zwpaper commented May 16, 2025

Uh oh!

lkk214 commented May 16, 2025

Uh oh!

zwpaper commented Jun 2, 2025

Uh oh!

lkk214 commented Jun 4, 2025

Uh oh!

getpantoai commented Jul 1, 2025

Uh oh!

getpantoai Jul 1, 2025

Choose a reason for hiding this comment

Uh oh!

getpantoai commented Jul 1, 2025

Uh oh!

getpantoai commented Jul 1, 2025

Uh oh!

Uh oh!