Pooling and performance: what benchmarks and profiling may reveal

Arseni Mourzenko
Founder and lead developer
176
articles
November 10, 2014
Tags: profiling 4 performance 13 benchmark 1 optimization 2

When bench­mark­ing Or­seis, I dis­cov­ered that with repet­i­tive queries, its per­for­mance dras­ti­cal­ly de­cline. In some con­di­tions, it was more than ten times slow­er com­pared to the di­rect us­age of ADO.NET. This is to­tal­ly un­ac­cept­able for a li­brary which claims be­ing su­per fast, at least fast enough to be im­ple­ment­ed in a con­text where per­for­mance does mater. So I start­ed to search what was so ter­ri­bly wrong which was trans­form­ing Or­seis into a tur­tle.

While search­ing for bot­tle­necks, it ap­peared that the poor per­for­mance is af­fect­ing the whole li­brary, in­clud­ing the part where I sim­ply get a sin­gle val­ue from the data­base (with the same ra­tio of ten when com­par­ing the speed of Or­seis to di­rect ADO.NET calls). The trick is that get­ting a sin­gle val­ue doesn't in­volve some­thing po­ten­tial­ly slow. There is no Re­flec­tion or no up-and-com­ing be­tween the client and the data­base.

To my sur­prise, the is­sue was the fact that Or­seis cre­ates, opens and then clos­es the data­base con­nec­tion on every call, while my im­ple­men­ta­tion of the ref­er­ence code - the one which used ADO.NET di­rect­ly, was cre­at­ing the con­nec­tion only once.

So I made a bench­mark com­par­ing code which reuses the SQL con­nec­tion and runs a query mul­ti­ple times and one which recre­ates the con­nec­tion for every query. The re­sult: the same ra­tio of ten in some cas­es. Very sur­pris­ing, since MSDN dis­cuss­es pools but tells noth­ing about the per­for­mance im­pact of not reusing the con­nec­tion, and since an­swers on Stack Over­flow are clear: close con­nec­tions as soon as pos­si­ble and let the pool do its job (ex­am­ple, an­oth­er one and an­oth­er one, etc.)

The fol­low­ing code was used to ob­tain the re­sult: source code on SVN.

I did sam­pling for 17 dif­fer­ent val­ues, from 21 to 217. For each sam­pling, I mea­sured two things:

The re­sult be­low il­lus­trates the re­sults for four con­fig­u­ra­tions:

The data­base is an .mdf file used through a lo­cal­ly in­stalled Mi­crosoft SQL Serv­er Ex­press. The ap­pli­ca­tion is com­piled in Re­lease mode, com­piled for x64 with code op­ti­miza­tion en­abled. Code con­tracts are dis­abled.

The re­sults be­low don't con­tain the first sam­pling, since the met­rics are mean­ing­less be­cause of the warm-up of the data­base.

enter image description here
Click for a larg­er ver­sion

Us­ing log­a­rith­mic scale, it looks pret­ty lin­ear:

enter image description here
Click for a larg­er ver­sion

Here are the ra­tios. Recre­at­ing the con­nec­tion de­creas­es per­for­mance al­ready at just four con­sec­u­tive queries and the per­for­mance im­pact be­comes much more im­por­tant when the num­ber of it­er­a­tions grow. No­tice that the disk speed mat­ters, since the per­for­mance im­pact is the most im­por­tant when test­ing on a hard disk dri­ve in­stead of an SSD.

enter image description here
Click for a larg­er ver­sion

What does pro­fil­er tell?

Study­ing the re­sults giv­en by the Vi­su­al Stu­dio pro­fil­er, it ap­pears that the per­for­mance is­sue is not on System.Data.Common.DbConnection.Open() com­bined with the con­nec­tion con­struc­tor and Dispose(), but on System.Data.Common.DbCommand.ExecuteScalar(). Cu­ri­ouser and cu­ri­ouser!

enter image description here
Click for a larg­er ver­sion

The re­sult might have been re­lat­ed to the fact that the com­mand I ex­e­cute is al­ways the same. The SQL com­mand is iden­ti­cal, and there are no pa­ra­me­ters. The fact is that the re­sult is near­ly the same if we in­tro­duce pa­ra­me­ters which change at every it­er­a­tion, or if the com­mand it­self varies in­side the loop.

Two pos­si­ble ex­pla­na­tions of those pro­fil­er re­sults:

A sim­ple mod­i­fi­ca­tion of the orig­i­nal code shows that the sec­ond hy­poth­e­sis is right. When code re­lat­ed to SQL com­mands is re­moved, the bench­mark for a sam­ple of 5000 is 0 ms. when reusing the con­nec­tion vs. 90 ms. when recre­at­ing it. Com­pare this to 505 ms. (reusing) and 6 079 ms. (recre­at­ing) when ac­tu­al­ly query­ing the data­base.

Con­clu­sion

Con­nec­tion pool­ing is great, since it sim­pli­fies your life by pro­vid­ing the abil­i­ty to open con­nec­tions when you need them and to close them as soon as you don't need them any longer, so they can re­turn to the pool. Us­ing pool­ing is also a great per­for­mance im­prove­ment: cre­at­ing 500 con­nec­tions takes ap­prox. 600 ms. in the test en­vi­ron­ment de­scribed be­low when pool­ing is used, and 4 700 ms. when pool­ing is ex­plic­it­ly dis­abled.

This great­ness still comes at a cost of per­for­mance com­pared to a code which reuses the same con­nec­tion, and in some sit­u­a­tions, recre­at­ing con­nec­tions can slow the code up to ten com­pared to the sim­i­lar code which uses only one in­stance of the con­nec­tion.

Does it mat­ter? Sort of. It's not like some per­son­al web­site will be­come mag­i­cal­ly faster just be­cause you start­ed to reuse con­nec­tions, but it may mat­ter when you have thou­sands of queries. Re­sults above are show­ing that even for an amount as small as 32 queries, you may re­duce the du­ra­tion from 34 ms. up to 4 ms., gain­ing 30 pre­cious mil­lisec­onds. In some spe­cif­ic cas­es, those mil­lisec­onds would mat­ter.

Just don't do pre­ma­ture op­ti­miza­tion, and don't as­sume you can eas­i­ly reuse the same con­nec­tion in every case. Think about what hap­pens if the same con­nec­tion is used at the same time from dif­fer­ent treads, or about the con­se­quences of keep­ing the con­nec­tion, while the con­nec­tion pool is starv­ing.