头图

Inspired by a blog post on .NET performance by Stephen Toub, we're writing a similar article to highlight the performance improvements ASP.NET Core has made in 6.0.

Benchmark settings

Most of the examples throughout our process use BenchmarkDotNet . A repo is available at https://github.com/BrennanConroy/BlogPost60Bench , which includes most of the benchmarks used in this article.

Most of the benchmark results in this article were generated from the following command line:

dotnet run -c Release -f net48 --runtimes net48 netcoreapp3.1 net5.0 net6.0

Then select the specific benchmark to run from the list.

This command line gives BenchmarkDotNet instructions:

  • Build everything in release configuration.
  • It is built against the .NET Framework 4.8 peripheral area.
  • Run each benchmark on .NET Framework 4.8, .NET Core 3.1, .NET 5, and .NET 6.
  • For some benchmarks, they only run on .NET 6 (for example, if you compare two ways of coding on the same version):
    dotnet run -c Release-f net6.0--runtimes net6.0
    For others, only a subset of versions were run, e.g.
    dotnet run -c Release-f net5.0--runtimes net5.0 net6.0
    I will include the commands used to run each benchmark.

Most of the results in this article were generated by running the above benchmarks on Windows, primarily to include .NET Framework 4.8 in the result set. However, unless otherwise stated, in general, all of these benchmarks show fairly significant improvements when run on Linux or macOS. Just make sure you have installed each runtime you want to measure. These benchmarks use a build of .NET 6 RC1 , with the latest released downloads of .NET 5 and .NET Core 3.1.

span<T>

Since the addition of Span<T> in .NET 2.1, with each release we have converted more code to use Span both internally and as part of the public API to improve performance. This launch is no exception.

PR dotnet/aspnetcore#28855 removes temporary string assignment in PathString from string.SubString when adding two instances of PathString, instead uses Span\&lt;char\&gt; as temporary string. In the benchmarks below, we use a short string and a long string to show the performance difference of avoiding temporaries.

 dotnet run -c Release -f net48 --runtimes net48 net5.0 net6.0 --filter *PathStringBenchmark*

private PathString _first = new PathString("/first/");
private PathString _second = new PathString("/second/");
private PathString _long = new PathString("/longerpathstringtoshowsubstring/");

[Benchmark]
public PathString AddShortString()
{
    return _first.Add(_second);
}

[Benchmark]
public PathString AddLongString()
{
    return _first.Add(_long);
}
method run toolchain equally distributed ratio Assigned
AddShortString .NET Framework 4.8 net48 23.51ns 1.00 96 B
AddShortString .NET 5.0 net5.0 22.73ns 0.97 96 B
AddShortString .NET 6.0 net6.0 14.92ns 0.64 56 B
AddLongString .NET Framework 4.8 net48 30.89ns 1.00 201 B
AddLongString .NET 5.0 net5.0 25.18ns 0.82 192 B
AddLongString .NET 6.0 net6.0 15.69ns 0.51 104 B

dotnet/aspnetcore#34001 introduces a new Span-based API for enumerating query strings that are allocated free in the common case without encoded characters, and when the query string contains encoded characters, Allocation is lower.

 dotnet run -c Release -f net6.0 --runtimes net6.0 --filter *QueryEnumerableBenchmark*

#if NET6_0_OR_GREATER
    public enum QueryEnum
    {
        Simple = 1,
        Encoded,
    }
    [ParamsAllValues]
    public QueryEnum QueryParam { get; set; }

    private string SimpleQueryString = "?key1=value1&key2=value2";
    private string QueryStringWithEncoding = "?key1=valu%20&key2=value%20";
    [Benchmark(Baseline  = true)]
    public void QueryHelper()
    {
        var queryString = QueryParam == QueryEnum.Simple ? SimpleQueryString : QueryStringWithEncoding;
        foreach (var queryParam in QueryHelpers.ParseQuery(queryString))
        {
            _ = queryParam.Key;
            _ = queryParam.Value;
        }
    }
    [Benchmark]
    public void QueryEnumerable()
    {
        var queryString = QueryParam == QueryEnum.Simple ? SimpleQueryString : QueryStringWithEncoding;
        foreach (var queryParam in new QueryStringEnumerable(queryString))
        {
            _ = queryParam.DecodeName();
            _ = queryParam.DecodeValue();
        }
    }
#endif
method query parameters equally distributed ratio Assigned
QueryHelper Simple 243.13ns 1.00 360 B
QueryEnumerable Simple 91.43ns 0.38
QueryHelper Encoded 351.25ns 1.00 432 B
QueryEnumerable Encoded 197.59ns 0.56 152 B

It should be noted that there is no free lunch in the world . In the case of the new QueryStringEnumerable API, if you plan to enumerate query string values multiple times, it may actually be more expensive than using QueryHelpers.ParseQuery and storing a dictionary of parsed query string values.

@ paulomorgado 's dotnet/aspnetcore#29448 uses the string.Create method, which allows to initialize the string after it is created, if you know the final size of the string. This is used to remove some temporary string assignments in UriHelper.BuildAbsolute.

 dotnet run -c Release -f netcoreapp3.1 --runtimes netcoreapp3.1 net6.0 --filter *UriHelperBenchmark*

#if NETCOREAPP
    [Benchmark]
    public void BuildAbsolute()
    {
        _ = UriHelper.BuildAbsolute("https", new HostString("localhost"));
    }
#endif
method run toolchain equally distributed ratio Assigned
BuildAbsolute .NET Core 3.1 netcoreapp3.1 92.87ns 1.00 176 B
BuildAbsolute .NET 6.0 net6.0 52.88ns 0.57 64 B

PR dotnet/aspnetcore#31267 Converted some parsing logic in ContentDispositionHeaderValue to use a Span\&lt;T\&gt; based API to avoid temp strings and temp byte[] in common cases.

 dotnet run -c Release -f net48 --runtimes net48 netcoreapp3.1 net5.0 net6.0 --filter *ContentDispositionBenchmark*
[Benchmark]
public void ParseContentDispositionHeader()
{
    var contentDisposition = new ContentDispositionHeaderValue("inline");
    contentDisposition.FileName = "FileÃName.bat";
 }
method run toolchain average Proportion Assigned
ContentDispositionHeader .NET Framework 4.8 net48 654.9ns 1.00 570 B
ContentDispositionHeader .NET Core 3.1 netcoreapp3.1 581.5ns 0.89 536 B
ContentDispositionHeader .NET 5.0 net5.0 519.2ns 0.79 536 B
ContentDispositionHeader .NET 6.0 net6.0 295.4ns 0.45 312B

idle connection

One of the main components of ASP.NET Core is the hosting server, which presents many different problems to optimize. We're going to focus on improvements to idle connections in 6.0, where we've made a number of changes to reduce the amount of memory a connection uses while waiting for data.

We made three different types of changes, one is to reduce the size of the objects used by connections, this includes System.IO.Pipelines, SocketConnections and SocketSenders. The second type of change is to pool frequently accessed objects so we can reuse old instances and save allocations. A third type of change utilizes so-called "zero byte reads". Here we try to read data from the connection with a zero byte buffer, if there is data available, the read will return no data, but we know there is data available now and can provide a buffer to read immediately take this data. This avoids pre-allocating a buffer for possible future reads, so we can avoid large allocations until we know the data is available.

dotnet/runtime#49270 reduced the size of System.IO.Pipelines from ~560 bytes to ~368 bytes, a 34% reduction, with at least 2 pipes per connection, so this is a huge win.

dotnet/aspnetcore#31308 refactored Kestrel's Socket layer to avoid some asynchronous state machines and reduce the size of the remaining state machines, resulting in a 33% allocation savings per connection.

dotnet/aspnetcore#30769 removed the per-connection PipeOptions assignment and moved that assignment to a connection factory, so we only assign one server for the entire lifetime and reuse the same options for every connection. dotnet/aspnetcore#31311 from @benaadams replaces well-known header values in WebSocket requests with internal strings , which allows strings allocated during header parsing to be garbage collected, reducing memory usage for long-lived WebSocket connections. dotnet/aspnetcore#30771 refactored the Sockets layer in Kestrel to first avoid allocating SocketReceiver objects + SocketAwaitableEventArgs and combine them into a single object, which saves a few bytes and results in fewer objects allocated per connection. The PR also brings together the SocketSender class, so you now have an average of multiple core SocketSenders instead of creating one for each connection. So in the benchmark below, when we have 10,000 connections, only 16 connections are allocated on my machine instead of 10,000, which saves ~46mb!

Another similar size change is dotnet/runtime#49123 , which adds support for zero-byte reads in SslStream so that our 10,000 idle connections go from ~46mb to ~2.3 MB allocated by SslStream. dotnet/runtime#49117 added support for zero-byte reads on StreamPipeReader, then Kestrel used it in dotnet/aspnetcore#30863 to start using zero-byte reads in SslStream.

The net result of all these changes is a massive reduction in memory usage for idle connections.

The numbers below are not from the BenchmarkDotNet app as it measures idle connections and is easier to set up with client and server apps.

Console and WebApplication code is pasted in the gist below:

https://gist.github.com/BrennanConroy/02e8459d63305b4acaa0a021686f54c7

Below are 10,000 idle secure WebSocket connections (WSS) taking up the server's memory on different frames.

frame RAM
net48 665.4 MB
net5.0 603.1 MB
net6.0 160.8 MB

This is almost 4 times less memory than net5.

Entity Framework Core

EF Core has made a lot of improvements in version 6.0, with 31% faster query execution and 70% faster benchmark runtime updates, optimization benchmarks, and EF improvements from TechEmpower fortune .

These improvements come from object pooling improvements, smart checking if telemetry is enabled, and adding an option to opt out of thread-safety checking when you know your application is safely using the DbContext.

See the blog post on the release of Entity Framework Core 6.0 Preview 4: Performance Edition, which highlights many of the improvements in detail.

Blazor

native byte[] interop

Blazor now has efficient support for byte arrays when performing JavaScript interop. Previously, byte arrays sent to and from JavaScript were Base64 encoded so they could be serialized to JSON, which increased transfer size and CPU load. Base64 encoding has now been optimized in .NET6, allowing users to transparently use byte[] in .NET and Uint8Array in JavaScript. Explains how to use this feature for JavaScript to .NET and .NET to JavaScript .

Let's look at a quick benchmark to see how byte[] interop differs in .NET 5 and .NET 6. The following Razor code creates a 22 kB byte[] and sends it to JavaScript's receiveAndReturnBytes function, which returns the byte[] immediately. This data round trip is repeated 10,000 times and the time data is printed to the screen. This code is the same for .NET 5 and .NET 6.

 <button @onclick="@RoundtripData">Roundtrip Data</button>
<hr />
@Message
@code {
    public string Message { get; set; } = "Press button to benchmark";
    private async Task RoundtripData()
    {
        var bytes = new byte[1024*22];
        List<double> timeForInterop = new List<double>();
        var testTime = DateTime.Now;
        for (var i = 0; i < 10_000; i++)
        {
            var interopTime = DateTime.Now;
            var result = await JSRuntime.InvokeAsync<byte[]>("receiveAndReturnBytes", bytes);

            timeForInterop.Add(DateTime.Now.Subtract(interopTime).TotalMilliseconds);
        }
        Message = $"Round-tripped: {bytes.Length / 1024d} kB 10,000 times and it took on average {timeForInterop.Average():F3}ms, and in total {DateTime.Now.Subtract(testTime).TotalMilliseconds:F1}ms";
    }
}

Next we look at the receiveAndReturnBytes JavaScript function. in .NET 5. We must first decode the Base64 encoded byte array to Uint8Array so that it can be used in application code. Then, before returning the data to the server, we have to re-encode it to Base64.

 function receiveAndReturnBytes(bytesReceivedBase64Encoded) {
    const bytesReceived = base64ToArrayBuffer(bytesReceivedBase64Encoded);
    // Use Uint8Array data in application
    const bytesToSendBase64Encoded = base64EncodeByteArray(bytesReceived);
    if (bytesReceivedBase64Encoded != bytesToSendBase64Encoded) {
        throw new Error("Expected input/output to match.")
    }
    return bytesToSendBase64Encoded;
}
// https://stackoverflow.com/a/21797381
function base64ToArrayBuffer(base64) {
    const binaryString = atob(base64);
    const length = binaryString.length;
    const result = new Uint8Array(length);
    for (let i = 0; i < length; i++) {
        result[i] = binaryString.charCodeAt(i);
    }
    return result;
}
function base64EncodeByteArray(data) {
    const charBytes = new Array(data.length);
    for (var i = 0; i < data.length; i++) {
        charBytes[i] = String.fromCharCode(data[i]);
    }
    const dataBase64Encoded = btoa(charBytes.join(''));
    return dataBase64Encoded;
}

Encoding/decoding adds huge overhead on both client and server, while also requiring a lot of boilerplate code. So how does it work in .NET 6? Well, it's fairly simple:

 function receiveAndReturnBytes(bytesReceived) {
    // bytesReceived comes as a Uint8Array ready for use
    // and can be used by the application or immediately returned.
    return bytesReceived;
}

So it's definitely easier to write it, but how does it perform? Running these snippets in the blazorserver template for .NET 5 and .NET 6 respectively, under the Release configuration, we see that .NET 6 interops in byte[] There is a 78% performance improvement!

Note that streaming interop support can also efficiently download (large) files, see the documentation for more details.

The InputFile component has been upgraded to use streaming via dotnet/aspnetcore#33900.

—————– .NET 6 (ms) .NET 5 (ms) promote
total time 5273 24463 78%

Additionally, this byte array interop support is used in the framework to support bidirectional streaming interop between JavaScript and .NET. Users are now able to transfer arbitrary binary data. Documentation on streaming from .NET to JavaScript is available here and JavaScript to .NET documentation is available here.

input file

Using the Blazor Streaming Interop mentioned above, we now support uploading large files via the InputFile component (previously, uploads were limited to around 2GB). This component is also significantly faster due to the use of native byte[] streams instead of Base64 encoding. For example, a 100mb file uploads 77% faster compared to .NET 5.

.NET 6 (ms) .NET 5 (ms) percentage
2591 10504 75%
2607 11764 78%
2632 11821 78%
Average: 77%

Note that streaming interop support can also efficiently download (large) files, see the documentation for more details.

The InputFile component has been upgraded to use streaming via dotnet/aspnetcore#33900 .

hodgepodge

dotnet/aspnetcore#30320 from @benaadams modernized and optimized our Typescript library so sites load faster. The signalr.min.js file went from 36.8 kB compressed and 132 kB uncompressed to 16.1 kB compressed and 42.2 kB uncompressed. The blazor.server.js file is 86.7 kB compressed, 276 kB uncompressed, 43.9 kB compressed, and 130 kB uncompressed.

@benaadams ' dotnet/aspnetcore#31322 removes some unnecessary casts when getting common functions from the connect function collection. This provides about a 50% improvement in accessing common features in the collection. Unfortunately it's impossible to see performance improvements in benchmarks as it requires a bunch of internal types, so I'll include numbers here from a PR that includes benchmarks you can run if you're interested in running them Against internal code.

dotnet/aspnetcore#31519 , also from @benaadams , adds a default interface method to the IHeaderDictionary type to access public headers through properties named after the header name. No more mistyping common titles when accessing the title dictionary! What's more interesting in this blog post is that this change allows server implementations to return a custom header dictionary to more optimally implement these new interface methods. For example, a server might store the header value directly in a field and return that field directly, rather than looking up the header value in an internal dictionary, which would require hashing the key and looking up the entry. In some cases, this change can lead to improvements of up to 480% when getting or setting header values. Again, to properly benchmark this change to show that it needs to be set up with internal types, so I'll include the numbers from the PR, for those interested in trying it out, that the PR contains running on the internal code benchmark test.

method branch type average operations/sec Delta
GetHeaders before Plaintext 25.793ns 38,770,569.6
GetHeaders after Plaintext 12.775ns 78,279,480.0 +101.9%
GetHeaders before Common 121.355ns 8,240,299.3
GetHeaders after Common 37.598ns 26,597,474.6 +222.8%
GetHeaders before Unknown 366.456ns 2,728,840.7
GetHeaders after Unknown 223.472ns 4,474,824.0 +64.0%
SetHeaders before Plaintext 49.324ns 20,273,931.8
SetHeaders after Plaintext 34.996ns 28,574,778.8 +40.9%
SetHeaders before Common 635.060ns 1,574,654.3
SetHeaders after Common 108.041ns 9,255,723.7 +487.7%
SetHeaders before Unknown 1,439.945ns 694,470.8
SetHeaders after Unknown 517.067ns 1,933,985.7 +178.4%

dotnet/aspnetcore#31466 Use the new CancellationTokenSource.TryReset() method introduced in .NET 6 to reuse the CancellationTokenSource if the connection is closed but not cancelled. The numbers below were collected by running bombardier on 125 connections to Kestrel, which ran about 100,000 requests.

branch type distribute number of bytes
Before CancellationTokenSource 98,314 4,719,072
After CancellationTokenSource 125 6,000

dotnet/aspnetcore#31528 and dotnet/aspnetcore#34075 make similar changes to CancellationTokenSource that reuses the HTTPS handshake and HTTP3 streams, respectively.

dotnet/aspnetcore#31660 Improves server-to-client streaming performance by reusing assigned StreamItem objects in SignalR for the entire stream, rather than assigning one per stream item. And dotnet/aspnetcore#31661 stores the HubCallerClients object on the SignalR connection instead of assigning it for each Hub method call.

@ ShreyasJejurkar 's dotnet/aspnetcore#31506 refactored the internals of the WebSocket handshake to avoid temporary List allocations. dotnet/aspnetcore#32829 in @gfoidl Refactor QueryCollection to reduce allocations and vectorize some code. @benaadams ' dotnet/aspnetcore#32234 removed the unused field in the HttpRequestHeaders enum, which improved performance by no longer assigning fields to each enum's headers.

dotnet/aspnetcore#31333 from martincostello converts Http.Sys to use LoggerMessage.Define , a high performance logging API. This avoids unnecessary value type boxing, parsing of log format strings, and in some cases allocating strings or objects when the log level is not enabled.

dotnet/aspnetcore#31784 added a new IApplicationBuilder. Use overloads to register middleware to avoid some unnecessary per-request allocations when running middleware. The old code looks like this:

 app.Use(async (context, next) =>
{
    await next();
});

The new code is as follows:

 app.Use(async (context, next) =>
{
    await next(context);
});

The benchmarks below simulate the middleware pipeline without setting up a server to demonstrate the improvement. Use int instead of HttpContext for the request and the middleware returns a completed task.

 dotnet run -c Release -f net6.0 --runtimes net6.0 --filter *UseMiddlewareBenchmark*
static private Func<Func<int, Task>, Func<int, Task>> UseOld(Func<int, Func<Task>, Task> middleware)
{
    return next =>
    {
        return context =>
        {
            Func<Task> simpleNext = () => next(context);
            return middleware(context, simpleNext);
        };
    };
}
static private Func<Func<int, Task>, Func<int, Task>> UseNew(Func<int, Func<int, Task>, Task> middleware)
{
    return next => context => middleware(context, next);
}
Func<int, Task> Middleware = UseOld((c, n) => n())(i => Task.CompletedTask);
Func<int, Task> NewMiddleware = UseNew((c, n) => n(c))(i => Task.CompletedTask);
[Benchmark(Baseline = true)]
public Task Use()
{
    return Middleware(10);
}
[Benchmark]
public Task UseNew()
{
    return NewMiddleware(10);
}
method average ratio Assigned
Use 15.832ns 1.00 96 B
UseNew 2.592ns 0.16

Summarize

Hope you enjoyed reading about some of the improvements in ASP.NET Core 6.0! I encourage you to check out the .NET 6 blog post on runtime performance improvements .


MicrosoftReactor
109 声望19 粉丝

微软 Reactor 是微软为构建开发者社区而提供的一个社区空间,以“予力多元化社区建设,帮助每一个开发者成就不凡”为使命,旨在通过不定期举办的技术讲座、开发者交流会面及技术沙龙和专题活动,帮助开发者和初创企...


引用和评论

0 条评论