Profiling your code using Performance Sessions and PerfConsole
Categories
When I wrote SingleUserBlog (this blogging software) I had a single thing in mind and that was to create a blog that could be totally XCOPY deployable. In other words, you could just grab the application, copy it into a virtual directory and you'd be away - no database to install. To do this I decided to use good 'ol XML to store all of the data in the filesystem. In the space of a few weeks I bashed out the code for my blog and put it online... YEEHAR COWBOY :-)
Everything went fine for a long time after I deployed my blog and I was enjoying the freedom of knowing the source code intimately and therefore being able to make little tweaks at the drop of a hat. That was until one day when my blog started getting slow, real SLOW. With every post that I put up there it just got slower and slower until it pretty much just stopped. My code was failing big time.
You need to measure
When it came time to having to fix my blog I had no real idea where to start - although I had a fair idea that my elite file handling routines were in some way responsible for this mess. Instead of fumbling around I decided to immediately do some performance tuning and to measure what was going on under the hood. To do that I used the new Performance Sessions in Visual Studio Team System Developer. Performance Sessions allowed me to record my running code and view which methods were taking the longest and how often each of them was getting called. Using this information I was able to pinpoint the culprit within a very short space of time and fix it immediately without the risk of making random changes all over the place.
In this article I'll show the basics of getting a performance session up and running and also show how to use PerfConsole to view the raw data from a recorded Performance Session in a more visual manner.
Finding and Fixing a Bug
For this example, we have an application which makes calls to a business layer to get data which can sometimes be an expensive operation. Here we see some test code which simulates a bunch of calls to the service agent to test how our Service Agent method performs:
static void Run() {
Dictionary<string, int> items = new Dictionary<string, int>();
Random rand = new Random();
for (int i = 0; i < runs; i++) {
int id = rand.Next(10);
string retVal = ServiceAgent.MakeExternalCall(id);
}
}
Under the covers, the Service Agent calls an internal helper to process the request like so:
public static string MakeExternalCall(int id) {
return DoesWork(id) ;
}
And it's the DoesWork method which is expensive with each call taking around 500ms to complete. 400ms isn't too bad for a single request but in modern portal applications where data may be fetched many times in a single request each of these 500ms blocks start to add up when they are performed in a linear fashion. When we run our test we can see that for 40 runs our test takes 20 seconds to complete - 40 runs times 500ms. In this simple example our bottleneck is pretty clear but when we have hundreds of pages and classes and different assemblies all getting called together, finding the source of a performance issue is not so obvious. To pinpoint our bottleneck we will now create a performance session to run our test code and record the actions so that we can see some metrics behind what's going on. To get started select Tools|Performance Tools from the Visual Studio main menu like so:
When you've selected the Performance Tools option from the menu you will be presented with a dialog window for starting a new session as shown in the following image:
Profiling code produces VERY large files - maybe even gigabytes for what might be a short profiling session. Luckily for us we don't have a lot of code to profile so our session will be quite small this time and so we should select "Instrumentation" from the previous dialog window as opposed to "Sampling". Instrumentation will create much larger files and so when you are profiling a large amount of code you should start with Sampling turned on until you can zoom in on the problem area as this will create smaller vsp files and so you will be able to get through more performance sessions more quickly.
When you've created your Performance Session you will have a new session in the Performance Explorer window with a selected target for the session - this will typically be a project output that is observed during the session. In the following image we can see that the HasCacheableMethods project is loaded as a target for PerformanceSession1:
At this point you can press launch on the Performance Explorer window to start the session and fire up the target.
When the target has finished execution the session will end and you will be presented with a report containing some metrics from the session like so:
These metrics are pretty high level and can be quite ambiguous to begin with but by double-clicking on a row in that report you can drill-down into the next level of detail where things start to make some sense:
Here we start to get a really good view over the time that each method has spent running. In particular the "Elapsed Exclusive" metric details how long was spent only in that particular method whereas the "Elapsed Inclusive" metric is a rollup of that method and the time spent on methods called by that method and so on. In this report we can now see clearly that the Thread.Sleep call is where nearly all of the time was spent during our session, double click on that to see where it is called from so that we get some context about the work being done:
PerfConsole
There's another tool that has been released by Microsoft that was written by an internal developer which takes the output of a profiling session and allows you to write queries over it and to view a more graphical representation of the call tree, this tool is called PerfConsole. Before you can use PerfConsole you must save the results of your performance session out to a CSV file; this is done by right clicking on performance session report and selecting Export from the context menu:
Now you can choose what data to export and the location to create the exported CSV file
Now start up the PerfConsole tool and load the CSV file that was just created by using the load command and passing in the filepath (you can get a complete list of command line syntax for PerfConsole by typing ?? at the command line.
After loading my CSV file I can type in queries to view my performance metrics in a variety of ways, here I've used the following command to create a visual tree of the calltree
> calltree @HasCacheableMethods060811(1) | trim in > 5 | compacttree
Which displays the following result:
I'd highly encourage reading the PerfConsole command line syntax help or reading this blog article to learn more about the query syntax for PerfConsole:
http://blogs.msdn.com/joshwil/archive/2006/08/04/688599.aspx
After viewing these reports it's now very clear that ServiceAgent.MakeExternalCall is where the Thread.Sleep code is executed from and that, aside from the Thread.Sleep, the ServiceAgent.MakeExternalCall method actually runs fairly quickly - so Thread.Sleep is our enemy here. Now obviously we cannot optimize the Thread.Sleep code at all but we can start to ask some questions about how to get our code to spend less time in there. When we look at what is happening in our code (and remember that the Thread.Sleep in this example is a silly placeholder for a database call or some such other expensive operation) we can see that it's getting called on every call to ServiceAgent.MakeExternalCall whereas we could actually cache results for a distinct ID after the first call to it, so we change our MakeExternalCall code to add a layer of caching in like so:
static Dictionary<int, string> cache = new Dictionary<int, string>();
public static string MakeCachedExternalCall(int id) {
if (cache.ContainsKey(id))
return cache[id];
string work = DoesWork(id);
cache.Add(id, work) ;
return work;
}
Now that we've fixed our code we can re-run our tests and see that our test code run time has been reduced from 20 seconds to just over 4 seconds. Not a bad return for a little 30 minute performance fixing session! :-)