struct
.This is the first blog post of a series about understanding how to improve performances developing with .net core and C# 7.2.
Some parts are purely theory, so it’s not about .net or C# 7.2, but it’s mostly the first post of the series, for the reader to understand the basics of CPU and Memory.
This series is intended for any kind of readers, especially the ones who a not familiar with the topic and are willing to understand the basics.
For the experts on the matter, you may find these posts are lacking depth, but it’s on purpose: the goal is not to thoroughly explain everything, it would be too big and ending up confusing most readers, but instead explaining what matters, why it matters and how to deal with it.
struct
.Memory<T>
and Span<T>
.If you have remarks, typo corrections, or simply read posts still in progress, you can check my dedicated GitHub repo.
It’s not a secret that Microsoft decided to focus on improving performances for the 2.1 release of .net core.
The main driver is to improve asp.net core but it doesn’t mean the new features only target the web server. Most of the time when it’s about performances you have to dig to the lowest layer in order to bring game changers and this time was no exception.
What is interesting, from my point of view, is that we’re starting to see some features that bring us closer to low level/high performance language such as C++.
The goal of this post series is to :
C# is about writing clean code to achieve high maintainability and meet good programming practices/standards. Writing optimized code often drives you away from these principles, finding the right balance is definitely a key aspect for the programmer.
Well, there are many reasons and I won’t detail all of them, mostly because I couldn’t, but there’s some of them we can focus on:
C# is a pretty high level programming language, it’s pretty easy/safe to use, that’s why you have things like the bullets #1 to #4 above. On the other hand it’s also easy to not being aware of what matters to optimize things up.
Let’s not focus on the #5, because there’s few things we can do about it. If we take a close look at #1 to #4 we will see there’s a common theme: memory!
Is memory important? Yes, you bet!
CPUs are getting more and more powerful the years passing by, but we don’t see the same trend going on for memory, see below:
Computer Architecture: A Quantitative Approach by John L. Hennessy, David A. Patterson, Andrea C. Arpaci-Dusseau
It means that in order to keep the CPU busy, we have to develop our code & data in a memory friendly way, because accessing data directly to memory will cost more than you may think!
There’s a very good analogy that you can read here that basically gives you crucial information.
Let’s summarize it.
Today, most of the CPU instructions that don’t involve memory access or very complex computation will take one cycle to execute, you have a 4Ghz CPU so it’s 4 billions instructions per second per logical core (so 32 billions for a hyper-threaded quad cores).
Let’s scale things to understand their impact better:
Access type | Real duration | Scaled duration |
---|---|---|
One CPU Cycle | 0.4ns | 1 second |
Cache L1 Access | 0.9ns | 2 seconds |
Cache L2 Access | 2.8ns | 7 seconds |
Cache L3 Access | 28ns | 1 minute |
Main memory Access | ~100ns | 4 minutes |
Compared to one CPU cycle:
So yes, you can understand that the more you are memory friendly (we’ll explain roughly what it implies) the better you’ll have chances to hit the CPU cache, getting you significant performance boost!
Put it differently, compared to main memory access:
So worrying about the JIT not being fast enough may not be the main reason, you can leverage things yourself by being aware of what the CPU needs to execute as fast as possible.
There are a lot of good, in-depth articles/posts out there explaining why the CPU cache is important and how to work with it. This topic can get really complex very quickly, here, again, we will try to keep things simple.
Few explanations/remarks:
4 x 32KBytes
, here the ‘4 x’ means we’ve a dedicated cache for each Core of the CPU: that’s right L1/L2 have dedicated caches for each CPU Core. ’32KBytes’ is the size of each for one CPU Core.8-way
is about ‘associativity’, which is a rather complex topic. Follow the link is you’re curious and brave!More about how a CPU cache works.
Yes, the GC is a very nice and handy feature, but as each feature, it’s not a silver bullet, it’s not something you have to rely on 100% of the time, and definitely not in .net! The GC is only used when you involve class
based types, struct
ones are not. So yes, there’re ways to minimize pressure on the GC and you should know about them!
It’s easy to copy data, to isolate it for the sake of a good design (or easy and well readable code), it may not harm when the size is small and the frequency of the operation is low, but when one of these two factor increase, things amplify and performances are dropping.
One of the best example is the String
class, it’s allocated on the heap and it’s immutable, which means all methods that change the string will return a new object! It’s a lot of memory traffic and the developer is most of the time not aware of this.
Luckily for us we have new weapons to improve things on this area.
C# is a high-level language, we don’t pay attention to how we define the data in the types we design and it’s a big mistake when we want things to be driven by performances. Again, this is more about convenience, because the language don’t prevent you to improve things: you just don’t know/care to do it.
This was just the first post of the series and we talked mostly about theory, it was important to lay these foundations for the posts to come.
Starting the next post we’ll start talking concrete stuffs with examples.