Facebook's photo storage rewrite
http://www.niallkennedy.com/blog/2009/04/facebook-haystack.html
Cachr
"Facebook will complete its roll-out of a new photo storage system designed to reduce the social network's reliance on expensive proprietary solutions from NetApp and Akamai."Haystack - Search for Django
Will need eventually to replace __istartswith
Search doesn't have to be hard. Haystack lets you write your search code once and choose the search engine you want it to run on. With a familiar API that should make any Djangonaut feel right at home and an architecture that allows you to swap things in and out as you need to, it's how search ought to be.
Modular Search for Django
Haystack is a modular search framework for Django. It works directly with Django Models and provides a familiar API to people who are comfortable with Django.Engineering @ Facebook's Notes | Facebook
article about data architecture for facebook's photo system. Seems interesting
The Photos application is one of Facebook’s most popular features. Up to date, users have uploaded over 15 billion photos which makes Facebook the biggest photo sharing website. For each uploaded photo, Facebook generates and stores four images of different sizes, which translates to a total of 60 billion images and 1.5PB of storage. The current growth rate is 220 million new photos per week, which translates to 25TB of additional storage consumed weekly. At the peak there are 550,000 images served per second. These numbers pose a significant challenge for the Facebook photo storage infrastructure. NFS photo infrastructure The old photo infrastructure consisted of several tiers: * Upload tier receives users’ photo uploads, scales the original images and saves them on the NFS storage tier. * Photo serving tier receives HTTP requests for photo images and serves them from the NFS storage tier. * NFS storage tier built on top of commercial storage appliances. Since each ima
Since each image is stored in its own file, there is an enormous amount of metadata generated on the storage tier due to the namespace directories and file inodes. The amount of metadata far exceeds the caching abilities of the NFS storage tier, resulting in multiple I/O operations per photo upload or read request. The whole photo serving infrastructure is bottlenecked on the high metadata overhead of the NFS storage tier, which is one of the reasons why Facebook relies heavily on CDNs to serve photos. Two additional optimizations were deployed in order to mitigate this problem to some degree:Facebook | Engineering @ Facebook's Notes
Needle in a haystack: efficient storage of billions of photos