Cross-Platform Concurrency and Performance in .NET

In the ever-evolving landscape of software development, the need for efficient concurrency management and cross-platform compatibility has never been greater. Today, we delve into the exciting announcement of the cross-platform capabilities of Unified Concurrency, which now aligns with .NET Standard 2.0. This means that developers can leverage the power of Unified Concurrency across .NET 4.7+, .NET Core 2.0+, and Mono 5.4+ environments. The relentless march of cross-platform development in the .NET ecosystem has been a driving force behind this adaptation, and the adoption rate of .NET Standard 2.0 has surged in both open-source and commercial realms. While the legacy .NET 4.6 version will continue to be supported, the developmental focus has decisively shifted towards .NET Standard projects.

The push to bring the GreenSuperGreen library, which houses Unified Concurrency, under the .NET Standard 2.0 umbrella necessitated a comprehensive benchmarking process. This update allows for the execution of benchmarks, cross-benchmarks, and platform-cross-benchmarks on .NET, .NET Core, and potentially Mono. The prospect of Linux benchmarking is intriguing, but the current reliance on platform-specific PerformanceCounters presents a challenge that future versions of .NET Standard will likely address.

For those eager to explore the capabilities of Unified Concurrency, a variety of synchronization primitives are available. These are showcased in the unit test project for .NetCore 2.1, which can be downloaded from the provided link. The GreenSuperGreen library, which implements the Unified Concurrency framework, is an open-source project hosted on GitHub and available via NuGet for various .NET versions, including .NET 4.6, .NET Standard 2.0, and .NET Core 2.1.

The library now supports three additional synchronization primitives, with one more available exclusively for internal benchmarking purposes. These include AsyncSemaphoreSlimLockUC, which provides a fair access lock based on SemaphoreSlim's WaitAsync/Release methods, and offers performance comparable to AsyncLockUC. There's also SemaphoreSlimLockUC, which combines atomic instructions with a hybrid approach for a lock that may not adhere to FIFO order, potentially leading to thread stalls. Another addition is SemaphoreLockUC, which relies on the operating system's Semaphore for a roughly FIFO fair access lock, though fairness is not guaranteed. Lastly, MutexLockUC is reserved for benchmarking due to its requirement for thread affinity, a feature not supported by the design of Unified Concurrency.

The performance enhancements of .NET Core have been well-documented, with reports from Microsoft and the technical community highlighting significant speedups for existing codebases. Cross-platform benchmarks reveal two key areas of improvement: JIT Compilation and the C# lock (Monitor class). In benchmarking scenarios, .NET Core 2.1 demonstrated a throughput that was 1.997 times faster than .NET 4.7.2 on the same hardware. This indicates that certain code can be JITted more efficiently in .NET Core, though the extent of speedup is inherently code-dependent.

Furthermore, the C# lock (Monitor class) has shown considerable improvement under heavy load and bad neighbor scenarios in .NET Core 2.1. The .NET implementation was previously susceptible to CPU gridlock, where the C# lock consumed most CPU resources with minimal work done. The .NET Core 2.1 runtime, with its AwareLock class in C++, appears to have a more efficient implementation. This has led to a significant reduction in CPU resource waste, as evidenced by the performance charts comparing .NET and .NET Core.

The performance gains are not limited to JIT improvements. Multithreaded code, often riddled with C# locks, is prevalent across various projects. Even simple projects can benefit from these enhancements, making the upgrade to .NET Core 3.0, with its support for WinForms and WPF, an enticing proposition. While there is still room for improvement, as suggested by the performance charts comparing LockUC with C# locks, the potential for further optimization is promising. Modern architectural designs, accounting for the many-core processor era, can help achieve better throughput while managing CPU waste effectively.

沪ICP备2024098111号-1
上海秋旦网络科技中心:上海市奉贤区金大公路8218号1幢 联系电话:17898875485