Languages usage over time

Arseni Mourzenko
Founder and lead developer
170
articles
May 20, 2021
Tags: productivity 36 quality 34

I start­ed pro­gram­ming a long, long time ago. The old­est source code I can find in the archives is from 2004. Back then, I was sev­en­teen years old, us­ing Vi­su­al Ba­sic and some­thing that I be­lieved be­ing C++. There were even old­er pro­jects, but I can't fig­ure out if I kept them any­where. In 2009 that I start­ed us­ing a ver­sion con­trol. Orig­i­nal­ly, I was cre­at­ing lo­cal repos­i­to­ries for every pro­ject. Lat­er on, I merged all that in a sin­gle repos­i­to­ry.

For the past few years, I was tempt­ed to check what use­ful data can I get from ten thou­sand com­mits over eleven years. Af­ter all, there is a lot of stuff there, and there should be a lot of op­por­tu­ni­ties to do some data analy­sis—sor­ry, I meant busi­ness in­tel­li­gence, no, wait, big data, or what­ev­er fash­ion­able term is be­ing used right now.

Among oth­er things, I was won­der­ing what lan­guages was I us­ing over time. I had sev­er­al hy­pothe­ses:

  1. As I moved to Lin­ux stack around 2014, I stopped us­ing C# for per­son­al pro­jects, and start­ed writ­ing a lot of Python, Bash, and JavaScript.

  2. More re­cent­ly, I start­ed us­ing C++ for Ar­duino pro­jects, but it re­mained mar­gin­al com­pared to the oth­er lan­guages.

  3. The use of PHP should not be as im­por­tant as C# from 2010 to 2014, and should com­plete­ly dis­ap­pear in the lat­er years.

  4. There should be a pret­ty heavy use of SQL be­fore 2014 where I was re­ly­ing a lot on Mi­crosoft SQL Serv­er, but not so much af­ter 2014, where I moved to Mon­goDB.

It was there­fore im­por­tant to get an im­age of the us­age of the dif­fer­ent lan­guages over time. Usu­al­ly, such com­par­isons are based on the num­ber of files in a repos­i­to­ry, or the num­ber of com­mits af­fect­ing those files, which gives only a rough and of­ten er­ro­neous idea. I tried to have a clear­er view by com­put­ing the num­ber of lines of code, for each lan­guage, be­ing added or re­moved dur­ing a com­mit.

Meth­ods

The re­sults are based on more than ten thou­sand com­mits in my per­son­al repos­i­to­ries. For every diff, I de­tect­ed the type of lan­guage be­ing used, based on the ex­ten­sion of the file. When the ex­ten­sion wasn't pro­vid­ed, the first line of the file was used to de­tect a pos­si­ble she­bang. I ig­nored some of the types, con­sid­er­ing them as con­fig­u­ra­tion files, im­ages, and any­thing else which doesn't rep­re­sent the source code. For in­stance, XML files, or DNS con­fig­u­ra­tion files, are not part of the re­sults.

Once the lan­guage was de­tect­ed for a giv­en file, I com­put­ed the num­ber of lines of code which were added and re­moved in the com­mit. Du­pli­cates, that is the line which was moved in a giv­en file or to a dif­fer­ent file with­in the same com­mit, were ig­nored. Note, how­ev­er, that there is no copy-paste de­tec­tion (i.e. a block of code copied from the source will be count­ed as new code), nor do I keep track of the changes be­tween the com­mits: a roll­back of an­oth­er com­mit would there­fore be count­ed as or­di­nary added or re­moved code.

The lines which con­tain only white-space char­ac­ters are ig­nored. On the oth­er hand, I do not ex­clude the lines which con­tain just ba­sic sym­bols such as a clos­ing curly brack­et, even if such lines can hard­ly be con­sid­ered as code. This has an im­por­tant reper­cus­sion when com­par­ing lan­guages which have a more ver­bose syn­tax, such as C#, with the ones which don't, such as Python. To avoid mak­ing the al­go­rithm too com­plex, I didn't make any ex­cep­tions for the com­ments ei­ther, that is, a line con­tain­ing only a com­ment would be con­sid­ered as an or­di­nary line of code.

The lines of code were then ag­gre­gat­ed to form a chart, where one line in a chart would rep­re­sent n days of com­mits. Those ag­gre­gat­ed re­sults are then rep­re­sent­ed us­ing a log­a­rith­mic scale, which is nec­es­sary giv­en the huge gap be­tween the weeks which had prac­ti­cal­ly no ac­tiv­i­ty and the ones where a lot of code was added or re­moved.

Re­sults

The in­ter­ac­tive chart be­low shows the LOCs for some of the lan­guages. In gray are all the lan­guages com­bined.

C++
Bash
Python
C#
EJS
Java
JavaScript
LESS
PHP
SQL
XSLT

Fig­ure 1 Num­ber of LOCs added and re­moved over time.

Dis­cus­sion

The data makes it pos­si­ble to val­i­date the orig­i­nal hy­pothe­ses:

  1. As I moved to Lin­ux stack around 2014, I stopped us­ing C# for per­son­al pro­jects, and start­ed writ­ing a lot of Python, Bash, and JavaScript.

    True, ex­cept for JavaScript. I was sur­prised to see that the us­age of JavaScript only de­creased af­ter 2014. The peak in 2014 cor­re­sponds to the de­vel­op­ment of the orig­i­nal blog en­gine in Node.js. The ac­tiv­i­ty from 2009 to 2014 should match an im­por­tant num­ber of web­sites and web ap­pli­ca­tions I was do­ing. Lat­er on, I was fo­cus­ing on REST ser­vices, which don't have any JavaScript, as well as on web ap­pli­ca­tions such as the Book­shelf app which con­tain near­ly no code run­ning in brows­er.

  2. More re­cent­ly, I start­ed us­ing C++ for Ar­duino pro­jects, but it re­mained mar­gin­al com­pared to the oth­er lan­guages.

    Not that mar­gin­al. As I use C++ in only one pro­ject, the com­mits af­fect­ing C++ code are not reg­u­lar, how­ev­er there are at least three zones where a lot of C++ was changed dur­ing sev­er­al weeks.

  3. The use of PHP should not be as im­por­tant as C# from 2010 to 2014, and should com­plete­ly dis­ap­pear in the lat­er years.

    True. As I was writ­ing a lot of PHP in 2008 and 2009, I was only oc­ca­sion­al­ly us­ing it since 2010, and com­plete­ly stopped af­ter 2014. I hope I'll get back to it, now that it has PSRs, Lar­avel, and some new cool fea­tures.

  4. There should be a pret­ty heavy use of SQL be­fore 2014 where I was re­ly­ing a lot on Mi­crosoft SQL Serv­er, but not so much af­ter 2014, where I moved to Mon­goDB.

    Most­ly true. I in­deed missed year 2018, where I was cre­at­ing an off­site back­up tool which re­lies on Post­greSQL.

Vi­su­al­iza­tions as this one make it easy to iden­ti­fy some pat­terns: lan­guages used the most, gaps, shifts to dif­fer­ent tech­nolo­gies. It has its lim­i­ta­tions, as any­thing based on lines of code. For in­stance, it looks like I was writ­ing more code be­fore 2014, but I would imag­ine that it has to do with C# us­ing more lines of code than Python. Over­all, it could be an in­ter­est­ing tool for some­one who wants to un­der­stand a bit more what lan­guages he used and uses now.