MongoDB to Postgres

🚨 This blog post is a merger of my MongoDB to Postgres (2024-03-06) and Sequelize vs. Prisma (2023-05-25). The original blogs have been removed, and this blog has taken their place since both contained essentially the same content/information. The migration started in early March 2023, the switch occurred in mid-November 2023, and all instances of the old MongoDB system were completely shut down in early January 2024.

Introduction

During my time at eBay, I faced what became the most technically challenging problem of my career: migrating the Storage Management System (STMS) from MongoDB to Postgres. This wasn’t just a simple database swap; it was a complete architectural transformation of a critical system that ingests over 1.5 million metrics per minute across eBay’s data centers, with the requirement of zero downtime and maintaining nearly all existing functionality.

What is STMS?

The Storage Management System (STMS) serves as a critical internal tool for eBay’s Service & Storage Infrastructure (SSI) team. It monitors and manages devices across eBay’s data centers, allowing engineers to:

Monitor metrics from dozens of arrays, switches, hosts, disk groups, and clusters
Handle alerting for switches and arrays
Complete advanced tasks like host allocations
Access real-time data for other internal eBay services

STMS accounts for over 70 arrays, 60 switches, 1100 hosts, 900 disk groups, and 200 clusters across 3 of eBay’s data centers. Given its vital role in eBay’s infrastructure, any downtime or loss of functionality would directly impact the company’s core services and business operations.

The Challenge

Why the Migration Was Necessary

The decision to migrate from MongoDB to Postgres wasn’t taken lightly. While MongoDB had served STMS well initially, the growing complexity of our data relationships and the need for more sophisticated querying capabilities made Postgres a better long-term solution for our use case.

What Made This Problem Difficult

The complexity of this migration stemmed from several fundamental challenges:

1. Fundamental Database Differences MongoDB and Postgres are fundamentally different databases. MongoDB is a document-based database (NoSQL), meaning data is stored as JSON in collections, like documents in a filing cabinet. Postgres is a relational database (SQL), meaning data is stored as rows in tables, like in a spreadsheet.

2. Codebase Architecture STMS’s entire backend was built to process and manage data as JSONs, using packages exclusively compatible with MongoDB for database operations. This meant not just changing the database, but restructuring how our entire application handled data.

3. Zero Downtime Requirement Due to how vital STMS is as an internal tool, there could be no downtime during the migration. The system had to continue serving 1.5+ million metrics per minute throughout the entire process.

4. Tight Timeline & Limited Experience The migration had to be completed within a few months, with no clear execution plan initially. Neither I nor my coworkers had experience migrating a large legacy codebase from NoSQL to SQL databases, and I had limited prior experience with Postgres.

5. Scale & Complexity The migration involved converting 36 MongoDB collections into 74 Postgres tables, requiring careful consideration of relationships, indexing, and query optimization.

Choosing the Right ORM: Sequelize vs Prisma

One of the first major decisions was selecting an ORM (Object-Relational Mapping) tool. Since our codebase was already designed to use Mongoose for MongoDB, using an ORM would provide the smoothest transition path.

Requirements Analysis

After careful analysis of the project’s needs, I established essential criteria for any ORM solution:

Must be a JavaScript package (most of our code was written in JavaScript)
Must support Postgres and most of its features
Performance must be at least on par or better than Mongoose
Must be open source and maintained

The Candidates

After extensive research, I narrowed down to two main contenders: Sequelize and Prisma. I created comprehensive testing environments using Docker for Postgres and converted our largest, most complex dataset from document structure to table structure.

Testing Methodology

For each ORM, I measured performance across critical operations:

Time to create an entry
Time to update an entry
Time to update nested entries (relationships and JSON key-values)
Time to delete an entry
Time to query/get an entry

The Decision: Sequelize

Around May 15, 2023, I decided that Sequelize was the better ORM for our use case. Here’s why:

Sequelize Advantages:

Truly open-source and not maintained by a funded startup
Supported most of Postgres’s features
Better performance, especially compared to Prisma
Mature ecosystem with over 10 years of development
Flexible model/schema representation using JavaScript classes
Support for complex joins and filtering options including Regex

Performance Results:

In my testing, Sequelize significantly outperformed Prisma. For our large dataset entries:

Sequelize: ~2.26 seconds per entry
Prisma: ~11.21 seconds per entry

Prisma was approximately 5x slower than Sequelize for our use case. Additionally, deleting one entry from our largest dataset took Prisma nearly 4 minutes, which was unacceptable for our requirements.

Sequelize Challenges:

More complex and bloated model representations (564 lines vs 262 lines for Mongoose)
Confusing syntax in certain cases
Database migration complexity
Less comprehensive documentation compared to Prisma

Pros & Cons Comparison of Sequelize & Prisma

To give a fuller picture of why I went with Sequelize, I want to share the detailed pros and cons I compiled for both ORMs during my evaluation. I also looked at how they stacked up in terms of schema representation and community support as of May 15, 2023. This deeper dive helped solidify my choice, and I hope it might be useful for anyone else facing a similar decision.

Sequelize Pros:

Has a sync() function that automatically creates and handles tables for you, saving a lot of manual effort.
Can handle complex joins for nested data, which was critical for STMS’s structure.
Supports a wide range of filtering options, including Regex, giving flexibility in queries.
Model/schema representation is done in raw JavaScript using classes, which are highly customizable to fit specific needs.
Handles database connections seamlessly, including support for multiple read-connections.
Supports raw SQL queries for when you need to get under the hood.
Community stats as of May 15, 2023: On NPM, last updated 14 days ago with 1,505,835 weekly downloads; on GitHub, last updated yesterday with 4.2k Forks and 27.9k Stars. It’s been open source with an MIT license for over 10 years, so I’m confident it’ll stay that way.

Sequelize Cons:

Model/schema representation can get very complex and bloated. For instance, while the Mongoose representation of our large dataset was about 262 lines (including spaces), the same dataset in Sequelize ballooned to 564 lines.
The syntax can be confusing and complicated in certain scenarios, which slowed me down at times.
Migrating or editing the database is a hassle. Even with sequelize-cli generating migration scripts, it’s still cumbersome, though I’ve noticed this is a common pain point with most ORMs.
Documentation isn’t great, though it’s improving. Luckily, tools like ChatGPT have a solid grasp of Sequelize due to its long history, which helped fill the gaps.
Not as type-sensitive as Prisma, which could lead to issues in some projects.
Limited TypeScript support, though this wasn’t a concern for STMS, it could be a deal breaker for others.

Prisma Pros:

Uses its own schema language, making model creation cleaner and more concise. For comparison, while Mongoose took 262 lines for our large dataset, Prisma managed it in just 221 lines.
Comes with a CLI tool that simplifies database creation and migration, which is the best I’ve seen from an ORM so far, even if it’s not perfect.
Supports raw SQL queries, offering flexibility when needed.
The code syntax is clean and simpler to understand compared to Sequelize, making it easier to learn.
Automatically generates query builders for Node.js and TypeScript via its client, which is a nice touch.
Has excellent, clean documentation. ChatGPT isn’t as up-to-date on Prisma, but the official docs often made up for it.
Community stats as of May 15, 2023: On NPM, last updated 6 days ago with 1,344,705 weekly downloads; on GitHub, last updated 3 hours ago with 1.1k Forks and 31.3k Stars.

Prisma Cons:

Doesn’t support Regex filtering for Postgres, though it offers alternatives like “contains,” “includes,” and “startsWith.”
Performance was a major issue in my tests. Creating entries for our large dataset took Prisma about 11.21 seconds per entry compared to Sequelize’s 2.26 seconds, roughly 5x slower.
Deleting a single entry from the large dataset took nearly 4 minutes, which was a deal breaker for our needs.
Even with a fair comparison on a complex, three-layer deep relationship dataset, Sequelize was significantly faster at deletions.
Prisma is backed by a startup with $56.5 million in funding. While its main ORM code is open source under Apache-2.0, I’m wary of potential licensing changes down the line, similar to what happened with MongoDB.

These detailed comparisons made it clear that Sequelize aligned better with STMS’s needs, especially on performance and long-term reliability. But I figured breaking it down like this might help others wrestling with the same choice for their projects.

The Migration Process

Data Structure Transformation

Converting from MongoDB’s document structure to Postgres’s relational structure required careful planning. I had to:

Analyze Relationships: Identify how MongoDB documents related to each other and design appropriate foreign key relationships
Normalize Data: Break down nested documents into separate tables where appropriate
Preserve JSON Features: Use JSONB columns for truly unstructured data that needed to remain flexible
Design Indexes: Create appropriate indexes for query performance

Custom Solutions

The migration required developing several custom solutions:

1. Data Migration Scripts I created comprehensive scripts to:

Extract data from MongoDB collections
Transform document structures to relational format
Import data into Postgres tables with proper relationships

2. API Compatibility Layer To maintain zero downtime, I built a compatibility layer that could:

Route requests to either MongoDB or Postgres depending on migration status
Ensure data consistency during the transition period
Provide fallback mechanisms

3. Custom Middleware Developed middleware to handle the differences in how MongoDB and Postgres handle certain operations, ensuring existing API endpoints continued to work without modification.

Overcoming Technical Challenges

Handling Complex Relationships

One of the biggest challenges was converting MongoDB’s embedded documents to Postgres relationships. For example, a single MongoDB document might contain:

Basic properties
Nested objects representing related entities
Arrays of embedded documents

This had to be carefully decomposed into:

Primary tables for main entities
Junction tables for many-to-many relationships
Foreign key relationships for one-to-many associations

Query Optimization

MongoDB’s query patterns don’t directly translate to SQL. I had to:

Rewrite complex aggregation pipelines as SQL joins
Optimize indexes for new query patterns
Ensure query performance met or exceeded MongoDB performance

Data Integrity

Ensuring data integrity during migration required:

Comprehensive validation scripts
Rollback procedures
Real-time data synchronization during transition periods

Results & Impact

The STMS migration from MongoDB to Postgres was successfully completed with zero downtime while maintaining nearly all features and functionality. The results exceeded expectations:

Performance Improvements:

Query performance improved for complex relational queries
Better data consistency and integrity
More efficient storage utilization

Operational Benefits:

Enhanced monitoring and debugging capabilities
Better integration with eBay’s existing SQL-based tools
Improved backup and recovery procedures

Team Impact:

Enhanced team knowledge of relational databases
Established patterns for future database migrations
Created reusable tools and processes

Technical Skills Gained

This project significantly expanded my technical expertise:

Database Technologies:

Deep understanding of Postgres features and optimization
SQL query optimization and performance tuning
Database design patterns and normalization
Primary-standby database configurations

Development Tools:

Sequelize ORM and query building
Database migration strategies
Performance testing methodologies
Data validation and integrity checking

Architecture Patterns:

Zero-downtime migration strategies
API compatibility layers
Database abstraction patterns
Monitoring and alerting systems

Personal & Professional Growth

This migration project was transformative for my career development. It pushed me into uncharted territory, requiring:

Leadership Skills:

Leading a complex technical project without prior experience
Making critical architectural decisions under pressure
Coordinating with multiple teams and stakeholders

Problem-Solving Abilities:

Breaking down complex problems into manageable components
Developing creative solutions to unprecedented challenges
Balancing multiple competing requirements and constraints

Communication & Teamwork:

Explaining technical concepts to non-technical stakeholders
Documenting processes and decisions for future reference
Mentoring team members on new technologies and patterns

Lessons Learned

Technical Lessons

Database Selection Matters: The choice between NoSQL and SQL should be based on specific use cases and long-term requirements
Performance Testing is Critical: Theoretical advantages don’t always translate to real-world performance gains
Migration Planning: Comprehensive planning and testing are essential for complex migrations
Tooling Investment: Building proper tooling upfront saves significant time and reduces errors

Project Management Lessons

Stakeholder Communication: Regular updates and clear communication prevent misunderstandings
Risk Management: Having fallback plans and rollback procedures is essential
Timeline Management: Buffer time for unexpected challenges and learning curves
Documentation: Thorough documentation enables knowledge transfer and future maintenance

Conclusion

The STMS MongoDB to Postgres migration stands as the most challenging and rewarding technical problem I’ve solved in my career. It required not just technical expertise, but also leadership, planning, and adaptability. The project’s success demonstrated that with proper planning, thorough testing, and commitment to excellence, even the most complex technical challenges can be overcome.

This experience fundamentally changed my approach to software engineering, emphasizing the importance of:

Understanding the full context and requirements before making technical decisions
Investing time in proper tooling and testing
Maintaining clear communication throughout complex projects
Being willing to learn new technologies and approaches when necessary

The migration’s success not only improved STMS’s capabilities but also established patterns and processes that continue to benefit eBay’s infrastructure projects. It reinforced my belief that embracing unknown challenges and succeeding through them is key to both personal and professional development.

Looking back, this project represents a turning point in my career, transforming me from a developer who implements solutions to an engineer who can architect and lead complex technical initiatives. The confidence and skills gained from this experience continue to guide my approach to new challenges and opportunities in software engineering.