Pinterest co-founder Paul Sciarra shared a bit about their stack on Quora:
- Python + heavily-modified Django at the application layer
- Tornado and (very selectively) node.js as web-servers.
- Memcached and membase / redis for object- and logical-caching, respectively.
- RabbitMQ as a message queue.
- Nginx, HAproxy and Varnish for static-delivery and load-balancing.
- Persistent data storage using MySQL.
- MrJob on EMR for map-reduce.
Alex Popescu has created a cool diagram of the setup and provided some thoughtful analysis as well.
I’ve created the diagram above based on this very brief answer on Quora:
We use python + heavily-modified Django at the application layer. Tornado and (very selectively) node.js as web-servers. Memcached and membase / redis for object- and logical-caching, respectively. RabbitMQ as a message queue. Nginx, HAproxy and Varnish for static-delivery and load-balancing. Persistent data storage using MySQL. MrJob on EMR for map-reduce.
Data from October 2011 showed Pinterest having over 3 million users generating 400+ million pageviews. There are plently of questions to be answered though:
- what is node.js used for? what is RabbitMQ used for?Note: the whole section in the diagram about node.js and RabbitMQ is speculative.
- is Amazon Elastic MapReduce used for clickstream analysis only (log based analysis) or more than that?
- how is data loaded in the Amazon cloud?Note: if Amazon Elastic MapReduce is used only for analyzing logs, these are probably uploaded regularly on Amazon S3.
- why the need for both Redis and Membase?
With Pinterest we see a story very similar to that of Instagram. Huge growth, lots of users, lots of data, with remarkably few employees, all on the cloud.
While it’s true that both Pinterest and Instagram are not making great advances in science and technology, that is more indicator of the easy power of today’s commodity environments rather than a sign of Silicon Valley’s lack of innovation. The numbers are so huge and the valuations are so high we naturally want some sort of fundamental technological revolution to underlie their growth. The revolution is more subtle. It really is just that easy to attain such growth these days, if you can execute on the right idea. Get used to it. This is the new normal.
Here’s what Pinterest looks like today:
- 80 million objects stored in S3 with 410 terabytes of user data, 10x what they had in August. EC2 instances have grown by 3x. Around $39K fo S3 and $30K for EC2.
- 12 employees as of last December. Using the cloud a site can grow dramatically while maintaining a very small team. Looks like 31 employees as of now.
- Pay for what you use saves money. Most traffic happens in the afternoons and evenings, so they reduce the number of instances at night by 40%. At peak traffic $52 an hour is spent on EC2 and at night, during off peak, the spend is as little as $15 an hour.
- 150 EC2 instances in the web tier
- 90 instances for in-memory caching, which removes database load
- 35 instances used for internal purposes
- 70 master databases with a parallel set of backup databases in different regions around the world for redundancy
- Written in Python and Django
- Sharding is used, a database is split when it reaches 50% of capacity, allows easy growth and gives sufficient IO capacity
- ELB is used to load balance across instances. The ELB API makes it easy to move instances in and out of production.
- One of the fastest growing sites in history. Cites AWS for making it possible to handle 18 million visitors in March, a 50% increase from the previous month, with very little IT infrastructure.
- The cloud supports easy and low cost experimenation. New services can be tested without buying new servers, no big up front costs.
- Hadoop-based Elastic Map Reduce is used for data analysis and costs only a few hundred dollars a month.
This article was originally posted on http://highscalability.com/blog/2012/5/21/pinterest-architecture-update-18-million-visitors-10x-growth.html