Skip to content

feat(model): add support for multiple chat models #4228

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

lkk214
Copy link

@lkk214 lkk214 commented May 9, 2025

Description

  • Add an optional title field (defaults to model_name if empty) to HttpModelConfig to customize model name display and avoid conflicts between identical names from different providers

  • Maintain backward compatibility with the single chat model configuration, while adding support for multiple models, each with independent settings (e.g., rate_limit, prompt_template)

Configuration Examples

Single Model (Legacy Format)

[model.chat.http]
title = "single_chat"  # <-- New optional field
kind = "openai/chat"
api_endpoint = "http://192.168.1.234:11344/v1"
model_name = "starcoder:1b"
supported_models = ["starcoder:1b", "qwen2.5-coder:1.5b"]

Multiple Models (New Format)

[[model.chat.http]]  # <-- Array syntax
kind = "openai/chat"
api_endpoint = "http://192.168.1.234:11344/v1"
model_name = "deepseek-chat"

[[model.chat.http]]
title = "dp-chat"  # <-- Unique display name
kind = "openai/chat"
model_name = "deepseek-chat"
api_endpoint = "https://api.deepseek.com/v1"
api_key = "sk-xxxxxxxx"
supported_models = ["deepseek-chat", "deepseek-reasoner"]

lkk214 added 3 commits May 10, 2025 02:03
- Introduce `ModelConfigVariant` to handle single or multiple model configurations
- Update validation logic to check for duplicate models in configurations
- Update chat model handling to support multiple HTTP models
- Add `MultiChatStream` to manage multiple chat streams and handle model selection
- Modify health check and model loading to work with new configuration structure
Temporarily use the first model config in the list for chat model health conversion.
@zwpaper
Copy link
Member

zwpaper commented May 16, 2025

Hi @lkk214, thanks for the contribution, this is really a big changes, may I ask what is your scenarios by add multiple http models?

we depend on the models in many place, It's not safe to do such a big refactor in short time

@lkk214
Copy link
Author

lkk214 commented May 16, 2025

Hi @zwpaper,thanks for your review! Here’s the rationale for multi-model support:

  • Task-specific optimization: Different models excel at different tasks (e.g., code analysis, doc generation, vulnerability detection). Multi-model configs let users switch models per use case.

  • Cost/performance trade-offs: Users may combine large (high-quality) and small (cost-efficient) models, dynamically choosing based on task complexity.

  • Real-time comparison(A/B testing): Developers can compare model outputs for specific scenarios - especially useful when evaluating large models, and find the results they want across multiple model outputs

  • Extended flexibility: While we currently support multi-model switching under a single API endpoint, this PR extends support to cross-provider models (e.g., local + cloud models). Backward compatibility is maintained for single-model users.

Moreover, we already support multiple http model providers, such as deepseek, openrouter, etc. It would be a better improvement if we can support multiple model configurations, at least we can use them at the same time.

@wsxiaoys wsxiaoys requested a review from zwpaper May 19, 2025 23:07
@zwpaper
Copy link
Member

zwpaper commented Jun 2, 2025

Thank you for the comprehensive reply. However, as models become increasingly powerful and versatile, it seems less essential to employ distinct models for various applications.

Tabby relies heavily on the model's usage; we may need to discuss further before implementing such a significant change.

May I inquire if this is utilized in your business scenario?

@lkk214
Copy link
Author

lkk214 commented Jun 4, 2025

@zwpaper Thanks for your reply. I use multiple models in my work. Although a single model can solve most problems, I prefer to find satisfactory answers from multiple models.

This is more like an enhancement than a core change. It won’t affect previous configurations—it simply provides more possibilities for model configuration.

I understand your concerns. Compared to this change, which might impact the core functionality, maintaining code stability is far more important—it’s absolutely critical.

This PR may not be a high priority for Tabby. If the team decides not to merge it, I can close it or leave it as-is for future reference. This can be closed anytime if needed.

@getpantoai
Copy link

PR Summary:

This PR introduces support for managing multiple chat models by adding a MultiChatStream that aggregates multiple HTTP chat streams, and it updates the configuration model to accept multiple HTTP model configs for chat. Key changes include:

• New functions to create a composite chat stream (create_multiple, add_engine).
• Updated model configuration validation and parsing to support both single and multiple chat model definitions.
• Propagation of these changes across API bindings, service loaders, and related tests.

@@ -222,7 +222,12 @@ async fn load_model(config: &Config) {
download_model_if_needed(&model.model_id, ModelKind::Completion).await;
}

if let Some(ModelConfig::Local(ref model)) = config.model.chat {
if let Some(model) = config

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider logging a warning or error when a chat model is not a local configuration in load_model, if that case is unexpected per business requirements.

@getpantoai
Copy link

Reviewed up to commit:cdea1a5fb3afab9155bf03281e03e71dd6510036

Additional Suggestion
crates/tabby/src/routes/models.rs, line:30-34 Verify that the aggregation of chat model titles (combining primary title and supported models) does not introduce duplicates or ambiguities in display names.
crates/tabby/src/serve.rs, line:398-404 Validate that using unwrap_or for chat_device is adequate—consider explicitly checking for misconfiguration to avoid silently falling back to a default device.
crates/tabby/src/services/health.rs, line:90-96 Ensure get_model_configs() returns a non-empty array before accessing index 0 to avoid a potential panic.
let chat = model_config
    .chat
    .as_ref()
    .and_then(|m| m.get_model_configs().get(0).cloned())
    .as_ref()
    .map(ModelHealth::from);
crates/http-api-bindings/src/chat/mod.rs, line:80 Consider returning a Result or using graceful error handling for unsupported model kinds instead of panicking, so that the error can be managed upstream.
fn create_engine(model: &HttpModelConfig) -> Box<dyn ChatCompletionStream> {
    let api_endpoint = model
        .api_endpoint
        .as_deref()
        .expect("api_endpoint is required");

    match model.kind.as_str() {
        "azure/chat" => {
            let config = async_openai_alt::config::AzureConfig::new()
                .with_api_base(api_endpoint)
                .with_api_key(model.api_key.clone().unwrap_or_default())
                .with_api_version(AZURE_API_VERSION.to_string())
                .with_deployment_id(model.model_name.as_deref().expect("Model name is required"));
            Box::new(
                async_openai_alt::Client::with_config(config)
                    .with_http_client(create_reqwest_client(api_endpoint)),
            )
        }
        "openai/chat" | "mistral/chat" => {
            let config = OpenAIConfig::default()
                .with_api_base(api_endpoint)
                .with_api_key(model.api_key.clone().unwrap_or_default());

            let mut builder = ExtendedOpenAIConfig::builder();
            builder
                .base(config)
                .kind(model.kind.clone())
                .supported_models(model.supported_models.clone())
                .model_name(model.model_name.as_deref().expect("Model name is required"));

            Box::new(
                async_openai_alt::Client::with_config(
                    builder.build().expect("Failed to build config"),
                )
                .with_http_client(create_reqwest_client(api_endpoint)),
            )
        }
        _ => {
            // Instead of panicking, return a dummy implementation or handle gracefully.
            // For illustration, returning a Box::new(UnsupportedModelKind) or similar.
            // You may define an UnsupportedModelKind struct implementing ChatCompletionStream.
            Box::new(crate::unsupported::UnsupportedModelKind::new(model.kind.clone()))
        }
    }
}

Or refactor to return a Result and propagate error handling upstream.

Reviewed by Panto AI

@getpantoai
Copy link

Panto AI has reviewed this pull request and provided key insights to improve your code.
Panto AI delivers highly relevant, noise-free code reviews for developers' repositories.

Need more in-depth reviews or have questions? Reach out to us at www.getpanto.ai or via email at [email protected] 🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants