Agent Monitoring for .NET Apps

Alex Sohn - Invalid date

We launched agent monitoring earlier this year, allowing our users to instrument LLM usage and tool calls in their applications. However, we only had Agent Monitoring support for Python and JavaScript. We’ve been working on creating an Agent Monitoring SDK for .NET — specifically for Microsoft.Extensions.AI.Abstractions.

Introducing Sentry.Extensions.AI

Sentry.Extensions.AI is our drop-in instrumentation layer for .NET LLM packages that are based on Microsoft.Extensions.AI.Abstractions. You can instrument your LLM usage including:

LLM calls
Inputs and outputs
Token count
Model name
Tool calls input/output
Issues related to the LLM call
Total cost

All of this is available to see in Sentry as spans and events, so you can correlate AI behaviour with the rest of your application: HTTP requests, background jobs, database queries, and more.

What is `Microsoft.Extensions.AI.Abstractions`?

The AI.Abstractions package is a low-level contract layer for many other libraries. It contains pure interfaces and data models for generative AI in .NET. It is intended for other libraries to implement. It has minimal dependencies so it can be the base for this ecosystem of libraries.

This is not to be confused with Microsoft.Extensions.AI, which includes utilities such as ChatClientBuilder, and built-in capabilities such as logging and tool invocation. The relationship between the abstraction package and Microsoft.Extensions.AI are very similar to the relationship between Microsoft.Extensions.Logging and its abstraction package.

Building our agent monitoring around Microsoft.Extensions.AI.Abstractions allow our users to use any LLM library they want, as long as they implement IChatClient from the abstractions package. For example, our ASP.NET Core sample project uses Microsoft.Extensions.AI.OpenAI , which provides us with IChatClient implementation with OpenAI APIs. One can just as easily swap out which LLM it is using by using a different library with IChatClient implementation.

How it works

Sentry.Extensions.AI works by wrapping your existing IChatClient and tools, so that every LLM call and tool invocation is automatically instrumented without changing your application logic.

In code, it looks roughly like this:

Click to Copy

using Microsoft.Extensions.AI;

var openAiClient = new OpenAI.Chat.ChatClient("gpt-4o-mini", openAiApiKey)
    .AsIChatClient()
    .AddSentry(options =>
    {
        // AI-specific settings
        options.Experimental.RecordInputs = true;
        options.Experimental.RecordOutputs = true;
    });

var client = new ChatClientBuilder(openAiClient)
    .UseFunctionInvocation()
    .Build();

var options = new ChatOptions
{
    // Tools the LLM can call
    Tools = [ /* ... tool definitions ... */ ]
}.AddSentryToolInstrumentation();

var response = await client.GetResponseAsync(
    "Please help me with the following tasks...",
    options);

AddSentry wraps the OpenAI IChatClient, and AddSentryToolInstrumentation instruments tool calls. We intercept requests and responses, measure how long operations take, capture token usage and errors, and then pass everything through to the underlying client so the behaviour of your app doesn’t change.

Most of the work in this library was about doing that as transparently and cheaply as possible, while still handling tricky cases like streaming responses and multi-step tool-call loops.

Handling streaming responses without breaking error handling

One of the trickiest parts was instrumenting IChatClient.GetStreamingResponseAsync, which returns an IAsyncEnumerable<ChatResponseUpdate>. I wanted to:

Wrap the streaming loop with Sentry spans
Keep overhead minimal
Catch any exception thrown while fetching the next token, record it, and still re-throw it to the caller

But C# doesn’t let you yield return from inside a try/catch that needs to cover MoveNextAsync, and using foreach would implicitly wrap MoveNextAsync and yield return together.

The solution was to work with the async enumerator directly and separate the logic between advancing the stream and yielding the value:

Click to Copy

public override async IAsyncEnumerable<ChatResponseUpdate> GetStreamingResponseAsync(
    IEnumerable<ChatMessage> messages,
    ChatOptions? options = null,
    [EnumeratorCancellation] CancellationToken cancellationToken = default)
{
    // Start spans and do any request enrichment here...

    // Get the inner async enumerator
    await using var inner = _innerClient
        .GetStreamingResponseAsync(messages, options, cancellationToken)
        .GetAsyncEnumerator(cancellationToken);

    var allUpdates = new List<ChatResponseUpdate>();

    while (true)
    {
        ChatResponseUpdate? current;

        try
        {
            var hasNext = await inner.MoveNextAsync();
            if (!hasNext)
            {
                // Finalize spans using the collected updates, then exit
                // e.g. EnrichWithStreamingResponses(spans, allUpdates);
                yield break;
            }

            current = inner.Current;
            allUpdates.Add(current);
        }
        catch (Exception ex)
        {
            // Record the exception on the spans, then rethrow to user
            throw;
        }

        // Yield outside the try/catch so the compiler is happy
        yield return current;
    }
}

By calling MoveNextAsync() inside the try/catch and doing yield return afterwards, we can:

Preserve the original streaming behavior
Capture any exceptions thrown by the provider’s enumerator
Enrich and finish our spans once the stream ends or fails

The result is full visibility into streaming responses with essentially no extra overhead for the caller.

Capturing the whole tool-call loop in a single span

Another challenge was capturing one span that represents the entire “LLM + tools” loop, not just individual model calls or tool invocations. In the screenshot below, you can see that one span is a parent of all these other spans. This is what we call an agent span.

Screenshot of an AI spans trace UI: a GET /test request shows an “Agent – invoke_agent (gpt-4o-mini)” span taking 11.57s with nested chat/model and tool calls (GetPersonAge, GetWeather, ComplexCalculation, CalculateAverageAge), plus a right panel listing span details, input tasks, and a summarized output.

The agent span shows the duration of the whole LLM interaction, including any tool calls and text generation. It also contains the original input and the final output.

When you use the FunctionInvokingChatClient — or UseFunctionInvocation — from Microsoft.Extensions.AI, the LLM call flow looks roughly like this:

Flowchart of an LLM loop: user input goes to LLM text generation; if tools are needed it calls a tool and uses tool outputs to continue, otherwise it returns a response to the user.

We wanted one span that covered this entire loop. From the first LLM call, through all tool calls, to the final response. The problem was FunctionInvokingChatClient lives in Microsoft.Extensions.AI, not in Microsoft.Extensions.AI.Abstractions, and my instrumentation is built around the abstractions layer. There was no obvious hook at the “whole loop” level.

The workaround was to piggyback on FunctionInvokingChatClient's existing telemetry:

FunctionInvokingChatClient starts an Activity when its tool-call loop begins and stops it when the loop finishes.
We created an ActivityListener that taps into the Activity, with its ActivityStarted and ActivityStopped callback functions set to create Sentry spans.
Inside that span, we still record the individual LLM calls and tool calls as child spans.

This gives us exactly what we wanted. A single top-level span that represents the full agent/tool orchestration, without needing direct access to FunctionInvokingChatClient from the abstractions layer.

Future of Agent Monitoring in .NET

Because Microsoft.Extensions.AI.Abstractions sits at the base of many AI libraries, this integration is just the beginning.

Microsoft’s new agent framework, Microsoft.Agents.AI, builds on these abstractions, and so do other higher-level frameworks like Semantic Kernel. That means the same concepts we use today for instrumenting raw IChatClient calls can be extended to:

Track multi-step agent workflows
Visualize tool and plugin orchestration
Add observability to Semantic Kernel pipelines, planners, and skills

Our goal is for Sentry.Extensions.AI to become the standard way to monitor .NET AI workloads — whether you’re calling a single model directly or orchestrating complex agentic systems on top of Microsoft.Extensions.AI.Abstractions.

Holiday E-Commerce Checklist: A Developer’s Survival Guide

Contents

Agent Monitoring for .NET Apps

Introducing Sentry.Extensions.AI

What is `Microsoft.Extensions.AI.Abstractions`?

How it works

Handling streaming responses without breaking error handling

Capturing the whole tool-call loop in a single span

Future of Agent Monitoring in .NET

Listen to the Syntax Podcast

Holiday E-Commerce Checklist: A Developer’s Survival Guide

Contents

Agent Monitoring for .NET Apps

What is Microsoft.Extensions.AI.Abstractions?What is Microsoft.Extensions.AI.Abstractions?

How it worksHow it works

Handling streaming responses without breaking error handlingHandling streaming responses without breaking error handling

Capturing the whole tool-call loop in a single spanCapturing the whole tool-call loop in a single span

Future of Agent Monitoring in .NETFuture of Agent Monitoring in .NET

Listen to the Syntax Podcast

What is `Microsoft.Extensions.AI.Abstractions`?

How it works

Handling streaming responses without breaking error handling

Capturing the whole tool-call loop in a single span

Future of Agent Monitoring in .NET