MongoDB to Postgres

🚨 This blog post is a merger of my MongoDB to Postgres (2024-03-06) and Sequelize vs. Prisma (2023-05-25) blog posts. This blog was created and the previous blogs were removed because both blogs contained the same information, so it did not make sense to have two separate blogs for the same content.

Introduction

During my time at eBay, I faced what became the most technically challenging problem of my career: migrating the Storage Management System (STMS) from MongoDB to Postgres. This wasn’t just a simple database swap; it was a complete architectural transformation of a critical system that ingests over 1.5 million metrics per minute across eBay’s data centers, with the requirement of zero downtime and maintaining nearly all existing functionality.

What is STMS?

The Storage Management System (STMS) serves as a critical internal tool for eBay’s Service & Storage Infrastructure (SSI) team. It monitors and manages devices across eBay’s data centers, allowing engineers to:

  • Monitor metrics from dozens of arrays, switches, hosts, disk groups, and clusters
  • Handle alerting for switches and arrays
  • Complete advanced tasks like host allocations
  • Access real-time data for other internal eBay services

STMS accounts for over 70 arrays, 60 switches, 1100 hosts, 900 disk groups, and 200 clusters across 3 of eBay’s data centers. Given its vital role in eBay’s infrastructure, any downtime or loss of functionality would directly impact the company’s core services and business operations.

The Challenge

Why the Migration Was Necessary

The decision to migrate from MongoDB to Postgres wasn’t taken lightly. While MongoDB had served STMS well initially, the growing complexity of our data relationships and the need for more sophisticated querying capabilities made Postgres a better long-term solution for our use case.

What Made This Problem Difficult

The complexity of this migration stemmed from several fundamental challenges:

1. Fundamental Database Differences MongoDB and Postgres are fundamentally different databases. MongoDB is a document-based database (NoSQL), meaning data is stored as JSON in collections, like documents in a filing cabinet. Postgres is a relational database (SQL), meaning data is stored as rows in tables, like in a spreadsheet.

2. Codebase Architecture STMS’s entire backend was built to process and manage data as JSONs, using packages exclusively compatible with MongoDB for database operations. This meant not just changing the database, but restructuring how our entire application handled data.

3. Zero Downtime Requirement Due to how vital STMS is as an internal tool, there could be no downtime during the migration. The system had to continue serving 1.5+ million metrics per minute throughout the entire process.

4. Tight Timeline and Limited Experience The migration had to be completed within a few months, with no clear execution plan initially. Neither I nor my coworkers had experience migrating a large legacy codebase from NoSQL to SQL databases, and I had limited prior experience with Postgres.

5. Scale and Complexity The migration involved converting 36 MongoDB collections into 74 Postgres tables, requiring careful consideration of relationships, indexing, and query optimization.

Choosing the Right ORM: Sequelize vs Prisma

One of the first major decisions was selecting an ORM (Object-Relational Mapping) tool. Since our codebase was already designed to use Mongoose for MongoDB, using an ORM would provide the smoothest transition path.

Requirements Analysis

After careful analysis of the project’s needs, I established essential criteria for any ORM solution:

  • Must be a JavaScript package (most of our code was written in JavaScript)
  • Must support Postgres and most of its features
  • Performance must be at least on par or better than Mongoose
  • Must be open source and maintained

The Candidates

After extensive research, I narrowed down to two main contenders: Sequelize and Prisma. I created comprehensive testing environments using Docker for Postgres and converted our largest, most complex dataset from document structure to table structure.

Testing Methodology

For each ORM, I measured performance across critical operations:

  • Time to create an entry
  • Time to update an entry
  • Time to update nested entries (relationships and JSON key-values)
  • Time to delete an entry
  • Time to query/get an entry

The Decision: Sequelize

Around May 15, 2023, I decided that Sequelize was the better ORM for our use case. Here’s why:

Sequelize Advantages:

  • Truly open-source and not maintained by a funded startup
  • Supported most of Postgres’s features
  • Better performance, especially compared to Prisma
  • Mature ecosystem with over 10 years of development
  • Flexible model/schema representation using JavaScript classes
  • Support for complex joins and filtering options including Regex

Performance Results: In my testing, Sequelize significantly outperformed Prisma. For our large dataset entries:

  • Sequelize: ~2.26 seconds per entry
  • Prisma: ~11.21 seconds per entry

Prisma was approximately 5x slower than Sequelize for our use case. Additionally, deleting one entry from our largest dataset took Prisma nearly 4 minutes, which was unacceptable for our requirements.

Sequelize Challenges:

  • More complex and bloated model representations (564 lines vs 262 lines for Mongoose)
  • Confusing syntax in certain cases
  • Database migration complexity
  • Less comprehensive documentation compared to Prisma

Detailed Comparison: Sequelize and Prisma Pros and Cons

To give a fuller picture of why I went with Sequelize, I want to share the detailed pros and cons I compiled for both ORMs during my evaluation. I also looked at how they stacked up in terms of schema representation and community support as of May 15, 2023. This deeper dive helped solidify my choice, and I hope it might be useful for anyone else facing a similar decision.

Sequelize Pros:

  • Has a sync() function that automatically creates and handles tables for you, saving a lot of manual effort.
  • Can handle complex joins for nested data, which was critical for STMS’s structure.
  • Supports a wide range of filtering options, including Regex, giving flexibility in queries.
  • Model/schema representation is done in raw JavaScript using classes, which are highly customizable to fit specific needs.
  • Handles database connections seamlessly, including support for multiple read-connections.
  • Supports raw SQL queries for when you need to get under the hood.
  • Community stats as of May 15, 2023: On NPM, last updated 14 days ago with 1,505,835 weekly downloads; on GitHub, last updated yesterday with 4.2k Forks and 27.9k Stars. It’s been open source with an MIT license for over 10 years, so I’m confident it’ll stay that way.

Sequelize Cons:

  • Model/schema representation can get very complex and bloated. For instance, while the Mongoose representation of our large dataset was about 262 lines (including spaces), the same dataset in Sequelize ballooned to 564 lines.
  • The syntax can be confusing and complicated in certain scenarios, which slowed me down at times.
  • Migrating or editing the database is a hassle. Even with sequelize-cli generating migration scripts, it’s still cumbersome, though I’ve noticed this is a common pain point with most ORMs.
  • Documentation isn’t great, though it’s improving. Luckily, tools like ChatGPT have a solid grasp of Sequelize due to its long history, which helped fill the gaps.
  • Not as type-sensitive as Prisma, which could lead to issues in some projects.
  • Limited TypeScript support, though this wasn’t a concern for STMS, it could be a dealbreaker for others.

Prisma Pros:

  • Uses its own schema language, making model creation cleaner and more concise. For comparison, while Mongoose took 262 lines for our large dataset, Prisma managed it in just 221 lines.
  • Comes with a CLI tool that simplifies database creation and migration, which is the best I’ve seen from an ORM so far, even if it’s not perfect.
  • Supports raw SQL queries, offering flexibility when needed.
  • The code syntax is clean and simpler to understand compared to Sequelize, making it easier to learn.
  • Automatically generates query builders for Node.js and TypeScript via its client, which is a nice touch.
  • Has excellent, clean documentation. ChatGPT isn’t as up-to-date on Prisma, but the official docs often made up for it.
  • Community stats as of May 15, 2023: On NPM, last updated 6 days ago with 1,344,705 weekly downloads; on GitHub, last updated 3 hours ago with 1.1k Forks and 31.3k Stars.

Prisma Cons:

  • Doesn’t support Regex filtering for Postgres, though it offers alternatives like “contains,” “includes,” and “startsWith.”
  • Performance was a major issue in my tests. Creating entries for our large dataset took Prisma about 11.21 seconds per entry compared to Sequelize’s 2.26 seconds, roughly 5x slower.
  • Deleting a single entry from the large dataset took nearly 4 minutes, which was a dealbreaker for our needs.
  • Even with a fair comparison on a complex, three-layer deep relationship dataset, Sequelize was significantly faster at deletions.
  • Prisma is backed by a startup with $56.5 million in funding. While its main ORM code is open source under Apache-2.0, I’m wary of potential licensing changes down the line, similar to what happened with MongoDB.

These detailed comparisons made it clear that Sequelize aligned better with STMS’s needs, especially on performance and long-term reliability. But I figured breaking it down like this might help others wrestling with the same choice for their projects.

The Migration Process

Data Structure Transformation

Converting from MongoDB’s document structure to Postgres’s relational structure required careful planning. I had to:

  1. Analyze Relationships: Identify how MongoDB documents related to each other and design appropriate foreign key relationships
  2. Normalize Data: Break down nested documents into separate tables where appropriate
  3. Preserve JSON Features: Use JSONB columns for truly unstructured data that needed to remain flexible
  4. Design Indexes: Create appropriate indexes for query performance

Custom Solutions

The migration required developing several custom solutions:

1. Data Migration Scripts I created comprehensive scripts to:

  • Extract data from MongoDB collections
  • Transform document structures to relational format
  • Import data into Postgres tables with proper relationships

2. API Compatibility Layer To maintain zero downtime, I built a compatibility layer that could:

  • Route requests to either MongoDB or Postgres depending on migration status
  • Ensure data consistency during the transition period
  • Provide fallback mechanisms

3. Custom Middleware Developed middleware to handle the differences in how MongoDB and Postgres handle certain operations, ensuring existing API endpoints continued to work without modification.

Overcoming Technical Challenges

Handling Complex Relationships

One of the biggest challenges was converting MongoDB’s embedded documents to Postgres relationships. For example, a single MongoDB document might contain:

  • Basic properties
  • Nested objects representing related entities
  • Arrays of embedded documents

This had to be carefully decomposed into:

  • Primary tables for main entities
  • Junction tables for many-to-many relationships
  • Foreign key relationships for one-to-many associations

Query Optimization

MongoDB’s query patterns don’t directly translate to SQL. I had to:

  • Rewrite complex aggregation pipelines as SQL joins
  • Optimize indexes for new query patterns
  • Ensure query performance met or exceeded MongoDB performance

Data Integrity

Ensuring data integrity during migration required:

  • Comprehensive validation scripts
  • Rollback procedures
  • Real-time data synchronization during transition periods

Results and Impact

The STMS migration from MongoDB to Postgres was successfully completed with zero downtime while maintaining nearly all features and functionality. The results exceeded expectations:

Performance Improvements:

  • Query performance improved for complex relational queries
  • Better data consistency and integrity
  • More efficient storage utilization

Operational Benefits:

  • Enhanced monitoring and debugging capabilities
  • Better integration with eBay’s existing SQL-based tools
  • Improved backup and recovery procedures

Team Impact:

  • Enhanced team knowledge of relational databases
  • Established patterns for future database migrations
  • Created reusable tools and processes

Technical Skills Gained

This project significantly expanded my technical expertise:

Database Technologies:

  • Deep understanding of Postgres features and optimization
  • SQL query optimization and performance tuning
  • Database design patterns and normalization
  • Primary-standby database configurations

Development Tools:

  • Sequelize ORM and query building
  • Database migration strategies
  • Performance testing methodologies
  • Data validation and integrity checking

Architecture Patterns:

  • Zero-downtime migration strategies
  • API compatibility layers
  • Database abstraction patterns
  • Monitoring and alerting systems

Personal and Professional Growth

This migration project was transformative for my career development. It pushed me into uncharted territory, requiring:

Leadership Skills:

  • Leading a complex technical project without prior experience
  • Making critical architectural decisions under pressure
  • Coordinating with multiple teams and stakeholders

Problem-Solving Abilities:

  • Breaking down complex problems into manageable components
  • Developing creative solutions to unprecedented challenges
  • Balancing multiple competing requirements and constraints

Communication and Teamwork:

  • Explaining technical concepts to non-technical stakeholders
  • Documenting processes and decisions for future reference
  • Mentoring team members on new technologies and patterns

Lessons Learned

Technical Lessons

  1. Database Selection Matters: The choice between NoSQL and SQL should be based on specific use cases and long-term requirements
  2. Performance Testing is Critical: Theoretical advantages don’t always translate to real-world performance gains
  3. Migration Planning: Comprehensive planning and testing are essential for complex migrations
  4. Tooling Investment: Building proper tooling upfront saves significant time and reduces errors

Project Management Lessons

  1. Stakeholder Communication: Regular updates and clear communication prevent misunderstandings
  2. Risk Management: Having fallback plans and rollback procedures is essential
  3. Timeline Management: Buffer time for unexpected challenges and learning curves
  4. Documentation: Thorough documentation enables knowledge transfer and future maintenance

Conclusion

The STMS MongoDB to Postgres migration stands as the most challenging and rewarding technical problem I’ve solved in my career. It required not just technical expertise, but also leadership, planning, and adaptability. The project’s success demonstrated that with proper planning, thorough testing, and commitment to excellence, even the most complex technical challenges can be overcome.

This experience fundamentally changed my approach to software engineering, emphasizing the importance of:

  • Understanding the full context and requirements before making technical decisions
  • Investing time in proper tooling and testing
  • Maintaining clear communication throughout complex projects
  • Being willing to learn new technologies and approaches when necessary

The migration’s success not only improved STMS’s capabilities but also established patterns and processes that continue to benefit eBay’s infrastructure projects. It reinforced my belief that embracing unknown challenges and succeeding through them is key to both personal and professional development.

Looking back, this project represents a turning point in my career, transforming me from a developer who implements solutions to an engineer who can architect and lead complex technical initiatives. The confidence and skills gained from this experience continue to guide my approach to new challenges and opportunities in software engineering.