The Challenge of Downloading HUGE Files

One of the things we do at Kaon Interactive is create Kaon v-Briefs: rich, interactive applications that tell some or all of our customer’s product and solutions stories. These Kaon v-Briefs often contain a lot of collateral (videos, PowerPoint presentations that we converted to video, PDFs, etc.), so they can get really, really big. When we deliver them for use on our Kaon v-OSK touch-screen appliances, that’s not a big deal: we can put the Kaon v-Briefs on 32GB USB Flash Drives if we need to. But part of our value proposition is that the field sales force can download and run this content on their laptop, using our Desktop media client (called Meson).

It turns out that downloading files >100MB can be a problem. We use Amazon S3, and CloudFront to distribute the content, which are the best-of-breed solutions for handling downloads (DropBox and Netflix use these same services).  But nonetheless, we’ve seen a steady stream of issues with people having trouble downloading massive files.

Part of the problem is speed. When someone comes to the portal, the first thing we do is download a 1MB image from CloudFront and see how long it takes. This is reported back using some AJAX. Here is the real distribution of download speed from about 4000 field sales people (speeds are in KBytes/second):

I should point out here that the people doing these downloads are, by and large, in the network and telecommunications business. This is a global sales force.  The median looks to be about 300 Kbyte/sec, and the range of reported speeds go from 21Kbit/sec to 15Mbit/sec (basically from telephone modems thru cable modems).

So let’s look at how long it would take to download a 4GB v-Brief if you are that 300 Kbyte/sec median employee: 4*1000*1000K / 300K/sec = 13,333 sec = 3.7 Hours. But that assumes you get maximum thruput from TCP, which you won’t. So it’s going to be more like 12 hours. I’ll let you in on a little secret: your internet connection isn’t that stable. And browsers are notoriously bad at handling interrupted service during a file download.

Part of the solution is to break up the Kaon v-Brief so it only has the stuff a particular person is selling. And part of it is to make the Desktop version of the Kaon v-Brief smaller by dropping videos, which don’t have enough value in a sales situation to justify the pain of downloading them.

But suppose you really want to download that file anyway. That’s where BitTorrent comes in. Since we use Amazon S3 for storage, we get a BitTorrent support without doing anything special. Just slap ?torrent onto the end of an S3 file and you get a tracker. BitTorrent is a fantastic way of sending huge files:

  • The download client is free, easy to install, and easy to use.
  • It’s resilient, so you can start/stop as often as you like, and you never lose any of the file you’ve gotten so far.
  • It’s fast, particularly if there are a lot of people downloading the same file.

But there’s a catch: S3 seeds torrents at just 70Kbyte/sec or so. So that isn’t fast at all (unless you are that poor employee stuck at 21Kbit/sec, in which case 70Kbyte/sec would seem amazing!). The way BitTorrent works, you can download the file from multiple places at the same time. The client fetches chunks from every peer it can find, and reassembles them. So we can speed this up considerably by having one of the computers in our data center also seed all the files we make available via BitTorrent. So that’s what we did.

After some poking around with different options, I settled on transmission-daemon. The server is running Slackware 13 x64, so I built the program from source. After chasing down a couple dependencies, and using this trick to get autoconfig to work right:

ln -s /usr/local/lib/pkgconfig /usr/local/lib64/pkgconfig

I was able to get transmission-daemon up and running. This is basically a headless BitTorrent client which can be easily configured to seed. After reading the help pages, I settled on this command line:

LD_LIBRARY_PATH=/usr/local/lib transmission-daemon \
--watch-dir /raid/torrent/torrents \
--incomplete-dir /raid/torrent/incomplete \
--download-dir /raid/torrent/download \
--no-portmap --no-global-seedratio --no-dht

When a user clicks the link to download a .torrent file from our portal (which is a cloud-based web application), the cloud app sends a message to a server in our data center to start serving that file. Because of the way we push the files into S3, that server is pretty much guaranteed to already have a copy. So before putting the torrent into the watch-dir, I copy the completed file into the download-dir. Voila – the server is instantly ready to seed.

The only glitch I hit was the need to add that –no-dht option. Without it, the BitTorrent clients I tested with would ignore this seed. (I won’t pretend to begin to understand what the rules are here.)

With this trick, we get the rock-solid reliability of Amazon S3, plus the speed of having a dedicated peer ready to go all-out to prime the swarm.

Next up: I’m looking at ways of making it easy for that 21kbit/sec user to get this content burned to a DVD and shipped by mail.

About kaonalphageek

CTO and co-Founder of Kaon Interactive Inc.
This entry was posted in Geek Talk, Hardware Trends and tagged , , , , , , , , , , , . Bookmark the permalink.

Tell us your thoughts

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s