.NET: [Breaking Change] Auto-wire ChatClient with OpenTelemetryChatClient in OpenTelemetryAgent#5750
Merged
Merged
Conversation
Copilot
AI
changed the title
[WIP] Fix MAF telemetry flow by default
.NET: Auto-wire ChatClient with OpenTelemetryChatClient in OpenTelemetryAgent
May 11, 2026
This was referenced May 29, 2026
Closed
wdhm
added a commit
to wdhm/hosted-triage-agent
that referenced
this pull request
Jun 4, 2026
…plain DAC, drop dup OTel source) (#86) Two cleanups, no behaviour additions: 1. Credential — back to `new DefaultAzureCredential()`. Every official sample in microsoft-foundry/foundry-samples/samples/csharp/ hosted-agents/agent-framework/* (hello-world, simple-agent, local-tools, mcp-tools, etc.) does exactly this and only this: AIAgent agent = new AIProjectClient(projectEndpoint, new DefaultAzureCredential()) .AsAIAgent(model: deployment, instructions: "...", ...); The Foundry runtime injects AZURE_CLIENT_ID + AZURE_TENANT_ID pointing at the per-agent Entra identity that `azd` auto-creates AND auto-grants the `Foundry User` role at project scope. Verified live: $ az role assignment list --assignee 22839df0-... --all Foundry User /...accounts/ai-account-ivok5ban2cqiq/projects/ai-project-triage-agent-dev DAC picks up that identity via the env vars; no manual wiring needed. Removed: - The `[ModuleInitializer]` / top-level AZURE_TOKEN_CREDENTIALS= WorkloadIdentityCredential lock from PR #81 — we cargo-culted it from an AKS WIF pattern that Foundry does NOT use. App Insights confirms zero WIF activity in the process; the live identity is a classic user-assigned MI reachable via IMDS with our AZURE_CLIENT_ID. - The `new WorkloadIdentityCredential()` branch gated on AZURE_FEDERATED_TOKEN_FILE — that env var is never set on Foundry containers (confirmed by `azd env get-values` and by IMDS traces). 2. OpenTelemetry — drop `.AddSource("Experimental.Microsoft.Extensions.AI")`. Foundry's internal `Agent365Exporter` already subscribes to that source (Microsoft.Agents.AI.Foundry.Hosting ≥ 1.6.1, PR microsoft/agent-framework#5750). Registering it ourselves caused the formatter to walk the same span attribute set twice, hitting: System.ArgumentException: An item with the same key has already been added. Key: openai.api.type at Microsoft.Agents.A365.Observability.Runtime.Common.ExportFormatter.MapAttributes at Microsoft.Agents.A365.Observability.Runtime.Tracing.Exporters.Agent365Exporter.Export Once per export batch, in App Insights, on every invocation since v39. Async path so it didn't break requests, but it polluted the exception stream and risked future tighter coupling between exporter health and request handling. Kept the three sources we own end-to-end: - Experimental.Microsoft.Agents.AI (invoke_agent spans from our .UseOpenTelemetry() wrap on each agent) - Azure.AI.AgentServer.Invocations (invocations endpoint baggage) - TriageAgent.Tools (our own GitHub REST tool spans) What this fix does NOT change: - The 3-minute-first-attempt slowness pattern in CI. That is NOT auth-related (DAC matches the samples, RBAC is correct) and will be investigated next. Candidates: container cold-start budget, Invocations protocol path vs the simpler Responses path the samples use, or workflow-side retry cadence. Co-authored-by: wdhm <wdhm@users.noreply.github.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This was referenced Jun 5, 2026
Closed
This was referenced Jun 12, 2026
This was referenced Jun 24, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation and Context
OpenTelemetryAgentinstrumented only the agent-levelinvoke_agentspan; the underlyingIChatClientwas untouched, so model-level chat spans, usage metrics, and Azure Monitor traces never flowed unless callers manually wrapped every chat client. End result: telemetry silently absent in Foundry/hosted agent samples even with an exporter configured.Description
OpenTelemetryAgentnow auto-wraps the innerChatClientAgent'sIChatClientwithOpenTelemetryChatClienton each invocation, so chat-level telemetry flows alongsideinvoke_agent.OpenTelemetryAgentctors — the originalOpenTelemetryAgent(AIAgent innerAgent, string? sourceName = null)constructor signature is preserved (binary-compatible) and delegates with auto-wiring enabled. A new overloadOpenTelemetryAgent(AIAgent innerAgent, string? sourceName, bool autoWireChatClient)adds the opt-out. ThesourceNameis normalized once in the constructor (empty string is treated asOpenTelemetryConsts.DefaultSourceName) and reused by both the outerOpenTelemetryChatClientand the auto-wired innerOpenTelemetryChatClient, so agent-level and chat-level spans always land on the sameActivitySource. The opt-out flag lives only on the constructor (not on theAIAgentBuilderextension), keeping chat-client terminology out of the abstract builder surface.ForwardingChatClient.GetResponseAsync/GetStreamingResponseAsyncroute theAgentRunOptionsthrough a newGetRunOptionsWithChatClientWiringhelper that:ChatClientAgentviaInnerAgent.GetService<ChatClientAgent>()(no type assumption onInnerAgent; wrapping agents that surface a nestedChatClientAgentthroughGetServiceare supported). No-ops when the result isnull.chatClientAgent.GetService<ChatClientAgentOptions>()?.UseProvidedChatClientAsIs is true(respects the user's explicit opt-out on the chat client pipeline;ChatClientAgent.GetServicealready exposesChatClientAgentOptions).chatClientAgent.GetService<IChatClient>()?.GetService<OpenTelemetryChatClient>()is already non-null (avoids double-wrap).ChatClientAgentRunOptions(or, when the caller passes a plainAgentRunOptions, creates one and copies the base properties:ContinuationToken,AllowBackgroundResponses,AdditionalProperties,ResponseFormat) and setsChatClientFactoryto chain onto any user-supplied factory rather than replacing it. The factory step also inspects the post-user-factory result viaGetService(typeof(OpenTelemetryChatClient))and skips wrapping when the chat client is already instrumented, so a user factory that itself callsUseOpenTelemetry(...)does not produce duplicate chat spans.UseOpenTelemetryextension — unchanged public surface; the existing extension simply constructs anOpenTelemetryAgent(with auto-wiring on by default), so telemetry now flows automatically for existing call sites.GetRunOptionsWithChatClientWiringandWrapIfNeeded: default-on (sync + streaming), explicit opt-out via the constructor (sync + streaming),UseProvidedChatClientAsIsopt-out, non-ChatClientAgentno-op, no-double-wrap when the underlying chat client is pre-instrumented, chaining of a user-suppliedChatClientFactory, no-double-wrap when the user factory itself returns an OpenTelemetry-instrumented client, plainAgentRunOptionspropagation ofAllowBackgroundResponses/AdditionalProperties/ResponseFormatand (separately)ContinuationToken,ChatClientAgentRunOptionsclone path with no user factory (assertsChatOptionsare preserved and the caller's instance is not mutated), andnull/ emptysourceNamenormalization (Theory: both produce spans onOpenTelemetryConsts.DefaultSourceName).Behavioral breaking change
This PR introduces a behavioral (not API) change. Existing call sites of
new OpenTelemetryAgent(innerAgent)andAIAgentBuilder.UseOpenTelemetry(...)will now begin emitting an additional chat span per invocation when the inner agent is (or surfaces) aChatClientAgentwhoseIChatClientis not already wrapped withOpenTelemetryChatClient.Impact is limited to a specific subset of users:
new OpenTelemetryAgent(innerAgent, sourceName: null, autoWireChatClient: false)ChatClientAgentitself:new ChatClientAgent(chatClient, new ChatClientAgentOptions { UseProvidedChatClientAsIs = true })Source compatibility, binary compatibility, and the
UseOpenTelemetryextension surface are all preserved. Only the runtime telemetry shape changes, and only in the direction of emitting strictly more (and previously missing) signal.Opt-out (constructor-only)
Or via
ChatClientAgentOptions:Contribution Checklist