Craigslist MongoDB Migration Case Study: Solving Archive Analytics Challenges
2011 NoSQL Case Study
Situation: The Growing Archive Problem
Craigslist, the American classified advertisements website serving 570 cities in 50 countries, faced a critical data management challenge in 2011. As one of the most popular classified platforms globally, Craigslist was processing 1.5 million new classified ads posted every day, generating an enormous volume of data that required careful management for both operational and legal compliance reasons.
The company's data architecture had been built around MySQL from its inception in 1995. For much of the history of Craigslist, MySQL was the only option for data storage, including the archive. Their infrastructure consisted of over one hundred MySQL servers handling both live postings and historical archives.
Legal requirements mandated that after a 60 day retention period in the live portion of the site, records must be migrated over to an archival space for legislative compliance. This meant Craigslist had to maintain billions of records in many different formats while ensuring they could query and report on these archives at runtime for various analytical needs.
The scale of the challenge was immense: by 2011, Craigslist had accumulated over a billion records in their archive system, with the data growing continuously as new postings expired and moved from the live database to historical storage.
Task: Critical Performance and Operational Challenges
Craigslist faced several interconnected challenges that threatened their ability to scale and serve customers effectively:
Schema Evolution Bottlenecks
The most critical issue was schema changes taking forever. The structure of their data had changed several times over the years. This alone made any change to the database schema a costly, prolonged nightmare, as changes often meant downtime, and of course, any alteration comes with the potential of unintended consequences.
When making changes to billions of rows in their MySQL cluster, Craigslist could not move data to the archive. The schema evolution process created a cascading problem: each change to the live database schema required a corresponding change to the entire archive – a process that took months every time.
Performance Degradation
During these updates, the archival process had to be put on hold, which meant stale data piled up in the live databases, slowing down the site's performance. Archive-ready data would pile up in the production database; performance on the live database deteriorated.
This created a vicious cycle where: 1. Schema changes blocked the archiving process 2. Expired posts accumulated in production databases 3. Production database performance suffered 4. User experience degraded 5. Business operations were impacted
Operational Complexity
Using a relational database system limited flexibility and caused lengthy delays because changes to the live database schema needed to be propagated to the archive system. The operational burden of maintaining synchronized schemas across over one hundred MySQL servers while ensuring zero data loss became increasingly unsustainable.
Analytics and Reporting Challenges
The rigid relational structure made it difficult to accommodate the diverse and evolving metadata associated with different types of classified posts. The nature of MySQL created complexity, forcing Craigslist to start exploring NoSQL options that could handle a huge amount of incoming data, simultaneously stream the archive process, and all while scale up easily over time.
Approach: Strategic Migration to MongoDB
Under the leadership of Jeremy Zawodny, lead developer and co-author of "High Performance MySQL," Craigslist developed a hybrid architecture strategy that preserved their proven MySQL infrastructure for live operations while solving the archive challenges with MongoDB.
Technology Evaluation and Selection
After evaluating several NoSQL options, Craigslist settled upon MongoDB. The selection criteria focused on several key capabilities:
Built-in Scalability: One compelling reason is that MongoDB boasts built-in scalability. MongoDB's auto-sharding capabilities meant Craigslist could scale horizontally across commodity hardware without having to write and maintain complex, custom sharding code.
Schema Flexibility: Each post and its metadata can be stored as a single document. As the schema changes on the live database, MongoDB can accommodate these changes without costly schema migrations.
Operational Simplicity: MongoDB's support for auto-sharding and high availability eased operational pain points for Craigslist.
Hybrid Architecture Design
Rather than replacing their entire MySQL infrastructure, Craigslist implemented a strategic hybrid approach:
MySQL for Live Operations: MySQL is still the active database for all of the online properties and postings for Craigslist. The proven MySQL infrastructure continued handling all live postings, user interactions, and real-time operations.
MongoDB for Archives: Craigslist is just using MongoDB for its archive of deleted and expired posts, not for the posts that are live on the site. However, the archive is in use by the live site – it's not just "cold storage" for old posts.
Seamless Data Flow: Once a posting goes dead, MongoDB reads into the MySQL and writes that posting into JSON-like documents, which is how MongoDB stores its information. By doing that, they were able to provide a schema-less design that allowed them the flexibility to archive multiple years of files without worrying about failure or flexibility in design.
Migration Strategy and Implementation
The migration process was carefully planned and executed:
Data Preparation: Including the time needed to sanitize and prep the data, migrating 1.5 billion postings to the new archive database only took about three months.
Scale Planning: Using auto-sharding, Craigslist's initial MongoDB deployment was designed to hold over 5 billion documents and 10TB of data.
Developer Transition: MongoDB concepts and features are similar, in many respects, to relational databases so Craigslist's developers found the transition seamless. Jeremy Zawodny noted: "Coming from a relational background, specifically a MySQL background, a lot of the concepts carry over…. It makes it very easy to get started".
Infrastructure Architecture
By 2012, Craigslist's infrastructure included:
- MySQL to serve live postings, handle abuse, data for monitoring system, and other immediate needs
- MongoDB to store almost 3 billion items related to archived (no longer live) postings
- HAproxy to direct requests to the proper MySQL server in a cluster
- Memcache, lying between the web servers and the MySQL database in classic fashion
- Redis for temporary items such as counters and blobs
Redundancy is built in at every stage (three HAproxy instances used in round robin, two Memcache instances holding the same data, two data centers for the MongoDB archive, etc.).
Results: Measurable and Intangible Benefits
The migration to MongoDB delivered significant improvements across multiple dimensions:
Measurable Performance Improvements
Rapid Migration Completion: In 2011, Craigslist migrated over two billion documents to MongoDB in approximately three months, demonstrating the efficiency of the new architecture.
Massive Scale Achievement: The MongoDB deployment successfully handled almost 3 billion items related to archived (no longer live) postings by 2012, with the system designed to scale to over 5 billion documents and 10TB of data.
Eliminated Schema Migration Downtime: Archiving now occurs seamlessly, even when the MySQL schema undergoes changes. The previous months-long schema migration processes were eliminated entirely.
Infrastructure Efficiency: For efficiency, they have switched to SSDs, allowing them to scale down from 20 servers to only 3 in certain components of their infrastructure.
Operational Benefits
Zero-Downtime Schema Evolution: The most significant operational improvement was the elimination of costly schema migrations. MongoDB's schema-flexible approach meant that changes to live database schemas no longer required corresponding changes to the archive system.
Continuous Archive Processing: Archive-ready data would pile up in the production database; performance on the live database deteriorated was no longer an issue. The archive system could continue processing regardless of schema changes to production systems.
Simplified Operations: MongoDB enabled Craigslist to scale horizontally across commodity hardware without having to write and maintain complex, custom sharding code, reducing operational complexity and maintenance overhead.
High Availability: The implementation included two data centers for the MongoDB archive, ensuring business continuity and data protection.
Intangible Business Benefits
Enhanced Analytics Capabilities: With posts and metadata stored as single documents, complex analytical queries across historical data became more efficient and flexible. The JSON-like document structure enabled richer metadata analysis without the constraints of normalized relational schemas.
Improved Development Velocity: MongoDB concepts and features are similar, in many respects, to relational databases so Craigslist's developers found the transition seamless. The familiar concepts combined with increased flexibility accelerated development cycles.
Future-Proof Architecture: The schema-flexible approach positioned Craigslist to adapt to changing business requirements without the architectural constraints that previously limited their evolution.
Competitive Advantage: To prevent further impediments to the company's growth and ability to serve its customers, the MongoDB solution eliminated a major technical bottleneck that could have constrained business growth.
Risk Mitigation: The hybrid approach preserved the stability of proven MySQL infrastructure for mission-critical live operations while solving the archive challenges with modern NoSQL technology.
Long-term Strategic Impact
The successful MongoDB implementation demonstrated that NoSQL technologies could solve specific architectural challenges without requiring wholesale replacement of existing systems. MySQL is still revered, it's a stellar relational database, and the people in charge didn't want to stop using it for data in active online postings. It was the dead postings that needed a better "graveyard".
This hybrid approach became a model for other organizations facing similar scale and flexibility challenges, showing that strategic technology adoption could deliver significant benefits while minimizing risk and preserving existing investments.
The case study demonstrates how document databases can solve specific data management challenges—particularly around schema evolution, analytics flexibility, and operational scalability—while working alongside existing relational database infrastructure in a complementary rather than replacement capacity.
Based on the research conducted for the Craigslist MongoDB migration case study, here are the detailed references:
References
-
MongoDB Case Study: Craigslist - 2012 - MongoDB Inc. - Official MongoDB case study documenting Craigslist's migration of over 2 billion documents, highlighting auto-sharding capabilities and Jeremy Zawodny's developer perspective on the seamless transition from MySQL.
-
The Craigslist Dilemma: A case study for big data and NoSQL solutions - 2019 - TheServerSide - Comprehensive analysis of Craigslist's big data challenges, including detailed examination of schema migration bottlenecks, performance degradation issues, and the strategic decision to implement a hybrid MySQL-MongoDB architecture.
-
How NoSQL, MySQL and MongoDB worked together to solve a big-data problem - 2019 - TheServerSide - Technical deep-dive into Craigslist's hybrid architecture approach, featuring insights from Samantha Kosko of 10gen on the archive processing workflow and schema-less design benefits.
-
MongoDB — Uses Case Study on Craigslist - January 6, 2022 - Medium - Detailed case study analysis covering Craigslist's migration challenges, MongoDB selection criteria, and the three-month migration process for 1.5 billion postings.
-
Craigslist Adopting MongoDB - May 16, 2011 - ReadWrite - Contemporary coverage of Craigslist's MongoDB adoption announcement, featuring Jeremy Zawodny's explanation of the two billion document migration and archive usage patterns.
-
MySQL in 2012: Report from Percona Live - April 2012 - O'Reilly Radar - Technical conference report documenting Craigslist's complete infrastructure architecture as presented by Jeremy Zawodny, including the integration of MongoDB with MySQL, HAProxy, Memcache, and Redis components.
-
Jeremy Zawodny - Wikipedia - May 9, 2025 - Wikipedia - Biographical information on Craigslist's lead developer and co-author of "High Performance MySQL," providing context on his MySQL expertise and role in the MongoDB migration decision.
-
MongoDB Case Study - January 6, 2022 - Medium - Supplementary case study analysis focusing on MongoDB's scalability features and document-based architecture benefits for Craigslist's classified advertisement storage requirements.
-
Hanselminutes Podcast 199 - How Craigslist Works - with Jeremy Zawodny - 2010 - Scott Hanselman's Blog - Technical podcast interview with Jeremy Zawodny discussing Craigslist's overall system architecture and technology stack decisions, providing background context for the MongoDB migration.
-
NoSQL Databases and Big Data - November 1, 2018 - Medium - Comparative analysis of NoSQL database implementations including Craigslist's MongoDB adoption, contextualizing the migration within broader NoSQL adoption trends for big data challenges.