What's wrong with SQL Server's FILESTREAM?

Arseni Mourzenko
Founder and lead developer
176
articles
November 14, 2014
Tags: database 2 short 50

Mi­crosoft SQL Serv­er con­tains a fea­ture called FILESTREAM. For those who don't use SQL Serv­er, in short, this fea­ture al­lows to store very large chunks of data in a data­base with all the ben­e­fits and none of the draw­backs. Un­der the hood, Mi­crosoft SQL Serv­er stores just the list of files in the MDF (MDF be­ing the file where the data of the data­base is stored), while the ac­tu­al files are stored sep­a­rate­ly. The fea­ture tar­gets large files: files be­low 1 MB are ex­pect­ed to be stored in a varbinary(max) field.

This fea­ture avoids the usu­al draw­back of stor­ing large chunks of data in a re­la­tion­al data­base. Usu­al­ly, if you're stor­ing a few thou­sand of videos of 4 GB each, you'll quick­ly come to con­clu­sion that re­la­tion­al data­bas­es don't han­dle it... very well, and also that you need to buy a new serv­er with ten times the RAM you have right now. With FILESTREAM, there is prac­ti­cal­ly no im­pact of large files on the data­base. You can store chunks of 100 GB of data each, and it wouldn't mat­ter.

I did it too. Maybe not 100 GB files, but videos from my col­lec­tion, the larg­er be­ing 5 GB, with ap­prox­i­mate­ly 100 files. I ad­mit, this was a mis­take.

The ma­jor ben­e­fit of us­ing FILESTREAM—of stor­ing files in a data­base with­out per­for­mance draw­back—is the uni­fied man­age­ment of data. With this fea­ture, I don't need to deal with those files sep­a­rate­ly: Mi­crosoft SQL Serv­er deals with them as if it were or­di­nary data. Es­pe­cial­ly, I don't need to care about a sep­a­rate back­up for those files: a main­te­nance plan of the data­base will back up those files as well.

An­oth­er ben­e­fit could have been the trans­ac­tion­al ca­pa­bil­i­ty, but with .NET's TransactionScope and NTFS trans­ac­tions, you have the same ben­e­fit with or­di­nary files as well. This means that the sin­gle ben­e­fit re­mains the data man­age­ment.

But here comes the prob­lem: in most cas­es, we don't need uni­fied man­age­ment of data. More­over, we need to sep­a­rate as much as pos­si­ble the man­age­ment of large chunks of data from the man­age­ment of pieces of in­for­ma­tion which are usu­al­ly stored in a re­la­tion­al data­base.

Let's look at Adobe Light­room mod­el. Light­room has two lo­ca­tions where in­ter­est­ing stuff is stored:

This dis­tinc­tion be­tween con­stant­ly chang­ing cat­a­log con­tain­ing small pieces of in­for­ma­tion and large data chunks and the way those two are han­dled by Light­room leads to an ob­ser­va­tion that it doesn't make sense to man­age those two types of in­for­ma­tion in the same way.

Now, what hap­pens if pho­tos are stored with­in the cat­a­log? Right, you'll end up with an in­ad­e­quate back­up plan.

If pho­tos and cat­a­log are stored sep­a­rate­ly, you can use Ama­zon Glac­i­er for the pho­tos and Ama­zon S3 for the cat­a­log. If pho­tos and cat­a­log are mixed and should be man­aged the same way, you'll end up pay­ing much more to Ama­zon, be­cause ei­ther your cat­a­log will change too of­ten for Glac­i­er, or your pho­tos will take too much space for S3.

Does it mean that FILESTREAM is use­less?

Not at all. It can be used in some very spe­cif­ic ap­pli­ca­tions with great suc­cess. It's just that it is ex­treme­ly over­rat­ed, and peo­ple are us­ing it in sce­nar­ios where they would bet­ter be us­ing or­di­nary files.