A Magento migration at scale is a fundamentally different project from a standard platform move. When the catalogue runs to millions of SKUs, when inventory must sync in real time across multiple warehouses, and when the business cannot afford downtime during peak trading periods, every architectural decision carries commercial weight. This article draws on the lessons from a real-world Magento migration involving 4.7 million products to give executives and technical teams a clear picture of what actually matters.
A Magento migration for a large catalogue succeeds or fails on four things: whether the indexing architecture can handle millions of SKUs without degrading storefront performance; whether real-time inventory synchronisation prevents overselling and data inconsistency; whether the checkout is protected from database deadlocks and silent order failures; and whether proactive alerting catches transaction errors before customers do. The infrastructure decisions made before migration begins determine whether all four hold under peak load.
Suggested: Server infrastructure diagram or large-scale eCommerce catalogue management visual
Alt: “Large-scale Magento migration architecture showing catalogue indexing, inventory sync and infrastructure components”
A Magento migration for a store with 5,000 SKUs and one for a store with 4.7 million SKUs share the same platform and many of the same configuration steps. The difference is that at 4.7 million SKUs, every architectural decision that is fine at small scale becomes a constraint at large scale.
The Strand Books case illustrates this clearly. Strand is one of New York City’s most iconic independent bookstores, known as the home of “18 miles of books.” Migrating its catalogue to Magento meant handling rare collectibles, out-of-print titles, new releases, and pre-order inventory simultaneously, with real-time synchronisation between the physical store, a warehouse, and the digital storefront. The scale challenges that emerged were not surprising in retrospect. They were the predictable consequences of applying standard Magento configuration to a non-standard catalogue size.
| Challenge | Impact at small scale | Impact at 4.7M SKUs |
|---|---|---|
| Catalogue reindexing | Completes in minutes, minimal impact on storefront | Runs for hours, blocks search and degrades product discovery during the run |
| Inventory sync | Acceptable lag of a few minutes between ERP and storefront | Minutes of lag at high order volume creates overselling risk and customer service load |
| Database deadlocks | Rare and recoverable without customer impact | Common during peak traffic. Silent order failures erode customer trust without visible errors |
| Product data consistency | Manual fixes are manageable | At millions of SKUs, a product reverting to a basic entry is undetectable without automated monitoring |
| Traffic spike handling | Standard hosting absorbs typical spikes | High-demand pre-order events and media attention create traffic spikes that require infrastructure designed for them |
The planning implication: a Magento migration plan that works for a 50,000 SKU store will not work for a 4.7 million SKU store without architectural changes to indexing, synchronisation, checkout, and infrastructure. These changes need to be designed before migration begins, not discovered during it. Every week of reactive problem-solving post-launch costs more than the equivalent week of pre-migration architecture planning.
Magento’s default indexer configuration assumes a catalogue of manageable size. When a full reindex takes hours rather than minutes, the consequences cascade: search results become stale, product availability shows incorrectly, and customers browsing during the index window encounter a degraded experience.
For a large catalogue Magento migration, three indexing changes are non-negotiable:
The default Magento batch size for catalogue indexing is not optimised for millions of records. Tuning the batch size to match available memory and processing capacity reduces the total indexing time and prevents the index process from consuming resources needed by the storefront. The right batch size is store-specific and requires load testing, not a rule of thumb.
Switching the Magento indexer to Update by Schedule rather than Update on Save means that catalogue changes queue for the next scheduled index run rather than triggering an immediate full or partial reindex on save. For a 4.7 million SKU catalogue, a triggered reindex on every product save is operationally untenable.
At very large catalogue scale, a known failure mode is a complex product listing reverting to a basic entry, losing variant configuration, attribute data, and associated pricing. Without automated detection and replay functionality, this failure can persist undetected for hours and result in incorrect orders. Automated monitoring that detects when a product’s attribute count drops below the expected threshold and triggers a replay restores the correct entry without manual intervention.
For Magento catalogues above approximately 100,000 SKUs, the default MySQL-based search is not fit for purpose. Elasticsearch handles the full-text search, faceted filtering, and synonym matching that a large catalogue requires. Adobe Commerce includes Elasticsearch natively. For Magento Open Source, it is available via the official Magento extension. Configuring Elasticsearch correctly for millions of SKUs, including index mapping, synonym libraries, and commercial ranking rules, is a standalone project within the migration scope.
In a large-scale Magento migration, the synchronisation architecture between the storefront, inventory system, and ERP is the most commercially critical technical decision. A Magento migration that replaces manual syncs with automated batch syncs has improved operationally. A migration that implements a real-time messaging architecture has changed the business model.
The difference matters because at 4.7 million SKUs, the Strand Books catalogue included rare and single-copy items where stock accuracy was existential to customer trust. A customer who orders a rare book that the system shows as in stock but the warehouse cannot fulfil does not order again. The business case for real-time inventory sync was not a technical preference. It was a customer retention requirement.
A centralised messaging system (RabbitMQ or a cloud-equivalent like AWS SQS) decouples the events that change inventory from the processes that update the storefront. When an order is placed, a picking event occurs in the warehouse, or a new shipment is received, the inventory system publishes an event to the queue. The Magento integration layer consumes that event and updates the catalogue in near real time, without a scheduled batch process and without blocking checkout operations.
When stock updates reach the storefront within seconds rather than minutes or hours, the window during which a sold-out product can be ordered closes. For single-copy rare items, this is the difference between a customer experience problem and a revenue problem.
With a message queue, a slow or temporarily unavailable inventory update does not block the checkout process. Orders complete successfully and inventory corrections arrive as the queue processes. Without this decoupling, a slow inventory sync during peak traffic becomes a checkout failure during peak trading.
Every inventory event is logged in the queue. If a sync failure occurs, the full event history can be replayed to restore the correct state. This is significantly faster and safer than attempting to reconcile inventory discrepancies manually from two system databases.
Mixed carts combining in-store pickup from a physical location and warehouse shipment for other items require inventory deductions from different stock pools at checkout. A message queue architecture handles this with separate event streams per fulfilment location rather than a single batch sync that must coordinate multiple pools.
Silent order failures are the most commercially damaging failure mode in a large-scale Magento migration. The customer completes checkout, the payment is authorised, the confirmation page renders. But the order does not write to the database because a database deadlock rolled back the transaction. The customer believes they have ordered. The business has no record of it. The fulfilment never happens.
At small catalogue scale, database deadlocks during checkout are rare. At 4.7 million SKU scale, with concurrent reads and writes across catalogue, inventory, and order tables during peak traffic, they become a predictable operational risk that must be designed against.
| Risk | Cause | Prevention |
|---|---|---|
| Database deadlocks | Concurrent writes to the same table rows during peak order volume | Transaction isolation tuning, query optimisation, index review on order and inventory tables, retry logic on deadlock detection |
| Payment gateway timeouts | Slow gateway response during high concurrency causes checkout to hang | Asynchronous payment processing with webhook confirmation rather than synchronous gateway calls at checkout submission |
| Tax calculation failures | Edge cases in product tax class configuration produce incorrect totals or checkout errors | Tax configuration audit against full product catalogue before migration. Integration testing with edge-case product types including gift cards and downloadable items |
| Mixed cart fulfilment errors | Orders combining pickup and delivery items fail when shipping logic does not handle split fulfilment | Custom shipping logic with explicit handling for multi-origin carts. Integration testing against every cart combination type in the catalogue |
| Gift card and voucher edge cases | Discount and balance logic fails at checkout when product types interact with promotions | Explicit edge case testing for every payment type combination. Proactive alerting when checkout totals deviate from expected ranges |
For Magento migrations specifically, the checkout protection measures connect directly to the conversion rate optimisation work described in our Magento CRO guide. A checkout that fails silently is not just an operations problem. It is a conversion rate problem that does not appear in your funnel analytics.
A Magento migration at 4.7 million SKU scale requires infrastructure designed for it before migration begins. The infrastructure architecture in the Strand Books case combined multiple layers, each addressing a specific failure mode:
Cloudflare or Fastly serves static assets from edge nodes globally, reducing origin server load and protecting against DDoS and bot traffic. For a high-profile store like Strand Books, media attention and bot activity during sale events are not hypothetical risks. The CDN is also the first layer of defence against the traffic spikes that accompany pre-order announcements and seasonal peaks.
HAProxy distributes incoming traffic across multiple application server nodes. When one node is under high load or experiences an issue, traffic routes to healthy nodes without customer-visible downtime. Load balancing is the prerequisite for the high-availability architecture that achieves zero downtime during hardware issues.
Varnish caches the rendered HTML of product pages, category pages, and static content at the edge of the application stack. Cached pages are served without hitting the Magento application or the database, reducing response times from hundreds of milliseconds to single-digit milliseconds. For a 4.7 million SKU catalogue, the hit rate on Varnish for popular products is the primary factor in maintaining performance during traffic spikes.
Redis stores Magento’s session data and object cache in memory, reducing database reads for frequently accessed data. On a large catalogue migration, the Redis configuration, particularly the cache key namespace and expiry policy for product attribute data, requires tuning to avoid cache stampede during reindexing runs.
Automated failover switches database connections to a replica node if the primary experiences an issue. Combined with proactive alerting on database replication lag and application error rates, this means the operations team knows about infrastructure issues before they produce customer-facing failures.
Adobe Commerce Cloud includes Fastly CDN, Varnish, and Elasticsearch natively, and provides a managed infrastructure layer that removes the operational overhead of configuring and maintaining each component. For large-scale Magento migrations where the team does not have dedicated infrastructure engineering capacity, Adobe Commerce Cloud removes a significant category of configuration risk. For teams with that capacity and specific infrastructure requirements, self-managed on AWS or Azure provides more flexibility at higher operational overhead.
A Magento migration that succeeds technically but leaves customers unable to find products has not delivered its commercial objective. At 4.7 million SKUs, the challenge of product discovery is as significant as the technical challenges of indexing and synchronisation.
The Strand Books migration addressed this with two specific capabilities that go beyond standard Magento product display:
Books have publication dates. Customers want to secure upcoming titles before they are physically available. Standard Magento out-of-stock handling does not support this natively. Custom pre-order logic allows a product with a future availability date to be ordered, triggers the appropriate payment handling (charge on ship or charge immediately), and surfaces correctly in search results without appearing as available stock.
For catalogue types beyond books, the same logic applies to any product with a known future availability date. This is one of the highest-impact discovery and revenue features for businesses with pre-release or made-to-order inventory.
When a customer is browsing search results or a category page with thousands of visible products, visual signals that communicate a product’s status reduce cognitive load and accelerate purchase decisions. Product badges for signed editions, rare items, in-store exclusives, pre-order, and low stock act as a navigation layer on top of the catalogue structure.
For any large catalogue Magento migration, product badge configuration should be part of the product attribute mapping exercise early in the project, not a cosmetic addition after launch.
For the broader product discovery principles including site search, faceted navigation, and mobile UX that apply to any large Magento catalogue, see our guide to eCommerce conversion rate optimisation.
The difference between a reactive and a proactive operations posture in a large-scale Magento migration is the difference between finding out about an order failure from a customer complaint and finding out about it from an internal alert before any customer notices.
The alerting architecture implemented in the Strand Books migration covered four categories of event that, without monitoring, would remain invisible until they produced customer-facing problems:
Automated alerts when an order fails to write to the database, when a payment is authorised but the order record is not created, or when a checkout session terminates without completing. At scale, these events occur at measurable frequency and must be caught and resolved before they compound.
Alerts when the time between an inventory event and the corresponding storefront update exceeds a defined threshold. A 10-minute lag that is acceptable on a quiet Tuesday becomes an overselling risk on a peak trading day. Threshold-based alerting allows the operations team to investigate before the lag produces incorrect orders.
Database replication lag between the primary and replica nodes is an early indicator of infrastructure stress. Alerting on replication lag gives the team time to investigate load distribution before the lag causes stale reads in the application.
Automated monitoring for products whose attribute count or variant configuration drops below the expected threshold, indicating a likely revert-to-basic-entry failure. For a 4.7 million SKU catalogue, manual monitoring of product listing integrity is not feasible. Automated alerting is the only practical approach.
The operational shift: the Strand Books migration moved from risky manual synchronisation to 24/7 automated data integrity across all four alert categories. The commercial value of that shift is not primarily in the individual incidents caught. It is in the change to the operations team’s posture, from firefighting after customer reports to resolution before customer impact.
Suggested: Infrastructure diagram or monitoring dashboard for a large-scale Magento store
Alt: “Magento migration infrastructure showing CDN, load balancer, Varnish cache, Redis and Elasticsearch stack”
The decisions that determine whether a large catalogue Magento migration succeeds are made before a line of code is written. This checklist covers the architectural and planning decisions that are most frequently underspecified in migration scopes:
- Product attribute mapping between source system and Magento data model is fully documented before migration begins
- Magento indexer configuration (batch size, asynchronous mode) is tested against the full catalogue size, not a sample
- Elasticsearch is configured with catalogue-specific synonym libraries, commercial ranking rules, and facet mapping
- Product listing integrity monitoring is in place to detect revert-to-basic-entry failures
- URL redirect mapping from the source platform is complete and tested for SEO continuity
- Real-time or near-real-time inventory sync is implemented via message queue rather than scheduled batch for high-velocity stock items
- Multi-origin inventory pools (warehouse, in-store) are handled by separate sync streams with explicit fulfilment logic at checkout
- Inventory sync lag alerting is configured with thresholds matched to the business’s acceptable overselling window
- Event replay capability is in place for sync failure recovery without manual reconciliation
- Database deadlock prevention is configured and tested under load at peak traffic volume
- Transaction failure alerting is live before go-live, not after the first silent failure is discovered
- Edge cases for all payment types including gift cards, vouchers, and store credit are tested against real product data
- Tax configuration is audited against the full product catalogue including digital, physical, and pre-order product types
- Mixed-cart fulfilment logic (pickup and delivery in one order) is tested for every cart composition type
- CDN configured for static asset caching and DDoS protection with origin shield
- Load balancer distributing traffic across multiple application nodes with health checks
- Varnish full-page cache configured and cache hit rate tested under realistic traffic patterns
- Redis tuned for object cache and session storage at catalogue scale
- Database replication configured with automated failover and replication lag alerting
- Load testing completed at 150% of expected peak traffic before go-live
On the question of timeline: for each checklist item above, the planning and configuration work belongs in the migration specification, not the post-launch backlog. A Magento migration that goes live without all of these in place is not a completed migration. It is a partially migrated store with known risks deferred to a future sprint that may never arrive.
- A Magento migration at large catalogue scale is a different project from a standard migration. Every architectural decision that is fine at small scale becomes a constraint at millions of SKUs. The planning must reflect this.
- Catalogue indexing requires specific configuration, not default settings. Batch size tuning, asynchronous indexing mode, and product listing integrity monitoring are non-negotiable for catalogues above 500,000 SKUs.
- Real-time inventory synchronisation via message queue is the architecture that prevents overselling. Scheduled batch syncs are insufficient for high-velocity or single-copy inventory. The queue decouples checkout from sync and enables event replay for failure recovery.
- Silent order failures are the most commercially damaging risk in a Magento migration. Transaction failure alerting must be live before go-live. Finding out about a silent failure from a customer complaint is not an acceptable operations posture.
- Infrastructure resilience requires CDN, load balancing, Varnish, Redis, and automated failover working together. The 4.7 million SKU Strand Books migration achieved 100% uptime post-launch. This was a designed outcome, not a fortunate one.
- Product discovery UX is a migration deliverable, not a post-launch optimisation. Pre-order logic, product badges, and search configuration for millions of SKUs require planning at the architecture stage, not the post-launch backlog.
5MS manages Magento and Adobe Commerce migrations including large catalogue projects with ERP integration, real-time inventory synchronisation, and infrastructure architecture. If you want a realistic scope and timeline for your Magento migration, that is exactly the conversation we like having.
A Magento migration at large catalogue scale succeeds when four architectural decisions are made correctly before go-live: catalogue indexing is configured with tuned batch processing and asynchronous mode to handle millions of records without degrading storefront performance; real-time inventory synchronisation via message queue eliminates overselling and decouples checkout from sync failures; checkout is protected from database deadlocks and silent order failures with proactive transaction alerting; and infrastructure combines CDN, load balancing, Varnish, and Redis in a resilience architecture tested at peak traffic. The 4.7 million SKU Strand Books Magento migration achieved 100% post-launch uptime and 24/7 automated data integrity by addressing all four before the store went live.
Related guides from 5MS:
Common questions about large catalogue Magento migration. Get in touch if yours is not here.
We manage Magento migrations every month. Tell us your catalogue size, your source platform, and your ERP. We will give you a realistic scope and timeline.
