The Design of 99designs – A Clean Tens of Millions Pageviews Architecture
99designs is a crowdsourced design contest marketplace based out of Melbourne Australia. The idea is that if you have a design you need created you create a contest and designers compete to give you the best design within your budget.
If you are a medium sized commerce site this is a clean example architecture of a site that reliably supports a lot of users and a complex workflow on the cloud. Lars Yencken wrote a nicely written overview of the architecture behind 99designs in Infrastructure at 99designs. Here’s a gloss on their architecture:
• Team has 8 devs, 2 dev ops, 2 ux/designers
• Hundreds of thousands of unique visitors a month
• Tens of millions pageviews a month
• Largely an Amazon based stack
• Elastic Load Balancer (ELB)
• PHP with Apache/mod_php
• Beanstalk for in-memory queing using Pheanstalk bindings
• Amazon’s RDS (MySQL)
• NewRelic, CloudWatch, Statsd
• Layered architecture: load balancing, acceleration, application, asynchronous tasks, storage and transient data.
• ELB is reliable and handles load balancing and terminating SSL connections so that traffic is unencrypted below the ELB. A separate ELB is used for each domain.
• Varnish is used to serve file based long tail media.
• Varnish is fast, configurabe, has a DSL, and has useful command line tools for debugging live traffic.
• Dynamic and uncached requests are served from a PHP application.
• Designs are stored on S3.
• S3 latencies are poor so designs are cached locally after each request.
• Requests that may take a long time are asynchronously queued to an in-memory queue implemented using Beanstalk, which is lightweight and performs well.
• PHP workers read work off the queue and execute the required functionality.
• Scheduled jobs are queued using cron at the appropriate time.
• Amazon’s RDS is used as the database and uses master-master replications across multiple availability zones for redundancy.
• Rolling RDS backups are used as disaster discovery.
• As load increases requests are load balanced across the read slaves.
• S3 stores media files and data files.
• Backups are made to Rackspace and Cloudfiles for disaster recovery.
• Memcached is run on every server and is used to cache queries.
• Capped collections in MongoDB are used to log errors and statistics.
• Redis stores per-user information about which features are enabled for a user.
• Per user configuration is used for dark launches, soft launches and incremental feature rollouts.
• Amazon allows them not to own any hardware and remain flexible.
• Emphasis on automation using “software as infrastructure” ethos.
• Rightscale manages servers configurations using Chef. Servers are disposable.
• Monitoring is implemented using NewRelic, CloudWatch, Statsd. Two large monitoring screens display a dashboard of site behavior.
• Test to scale down. Highly variable load means they make heavy use of the cloud’s scaling down capability, which requires a lot of testing to make work.
• International customers need a CDN. They have a lot of international customers and since they serve out of US-east these customers don’t always get a quality experience. They are looking at various CDNs to give better service to international customers.
• Maintaining stability while growing requires testing and automation. To support frequent releases they are implemented acceptance testing and more automation. The ability to turn features on and off on a per user basis allows testing new features against a subset of users.