Crash-Consistent MongoDB Snapshot and Restore with Everpure Fusion

Page content

I’ve been working on a project that combines two things I spend a lot of time with: MongoDB sharded clusters and Everpure FlashArrays. The goal is a production-grade backup and restore solution that avoids mongodump, doesn’t require fsyncLock, and scales across as many arrays as your cluster needs without any code changes. In this post I’m walking through the complete workflow (snapshot, disaster simulation, restore, and verification) running against a real 3-node sharded cluster with data spread across 3 independent FlashArrays, all managed through a single Everpure Fusion gateway connection.

The code is on GitHub at nocentino/mongodb-flasharray-backup.

Watch the complete walkthrough: Crash-Consistent MongoDB Snapshot and Restore Demo on YouTube

Let’s go.

The Problem with Traditional MongoDB Backups at Scale

When your MongoDB cluster grows past the point where mongodump finishes before the next backup needs to start, you have a problem. Here’s what you’re dealing with:

  1. Time: mongodump serializes reads. On a 1 TB+ dataset it takes hours, during which the cluster is under read load.
  2. Consistency: Coordinating a consistent point-in-time across config servers, multiple shards, and mongos is genuinely hard. There’s no single atomic FREEZE EVERYTHING primitive.
  3. Recovery time: Restoring from a mongodump means streaming all that data back across the network and replaying writes. RTO measured in hours.
  4. Scaling: Add more shards, add more arrays, and your backup script needs to grow with them.

Storage-layer snapshots flip this entirely. A FlashArray snapshot is a metadata-only pointer capture; it completes in under a second regardless of how much data is on the volume. Restore is a pointer swap, equally instant. What we need is a way to coordinate those snapshots across multiple independent arrays while the database stays online and writes keep flowing. That’s exactly what the Ops Manager third-party backup API combined with Everpure Fusion delivers.

The key insight: restore time is constant. Whether you have 10 GB or 10 TB on a volume, the FlashArray overwrite operation is a metadata swap. The 81 seconds you’ll see in the output below is dominated by MongoDB startup and replica set election, not by data movement.

The Architecture

Here’s the environment I’m running this in:

MongoDB cluster:

  • MongoDB 8.0 sharded cluster named aen-cluster
  • 3 physical nodes: aen-mongo-01, aen-mongo-02, aen-mongo-03
  • 3 shards (aen-shard_0, aen-shard_1, aen-shard_2), each a 3-member replica set spread across all 3 VMs. aen-shard_0 is a config shard — a MongoDB 8.0 feature where the config server replica set (CSRS) also stores user data and serves as a shard, which is why it’s labeled CSRS in the topology below. The practical upside for this workflow: a single coordinated snapshot captures the config metadata and all shard data together.
  • Mongos query router on aen-mongo-01:27017
  • Ops Manager 8.0 at aen-mongo-00:8080 with third-party backup ACTIVE

Storage:

  • Each VM has one 1 TB pRDM volume mounted at /data/mongo
  • Each volume lives on a different FlashArray
  • Protection group aen-mongodb-pg exists on each array independently

Fusion fleet:

  • Gateway: FlashArray2 (the array we connect to for all fleet operations)
  • Fleet members with the PG: FlashArray1, FlashArray2, FlashArray3

The topology looks like this:

  mongos (aen-mongo-01:27017)
    ├── aen-shard_0  (CSRS, port 27020)
    ├── aen-shard_1  (Shard 1, port 27021)
    └── aen-shard_2  (Shard 2, port 27022)

                  ┌──────────────┬──────────────┬──────────────┐
                  │ aen-mongo-01 │ aen-mongo-02 │ aen-mongo-03 │
                  │  1 TB pRDM   │  1 TB pRDM   │  1 TB pRDM   │
                  │ /data/mongo  │ /data/mongo  │ /data/mongo  │
                  └──────┬───────┴──────┬───────┴──────┬───────┘
                         │              │              │
               ┌─────────▼──────────────▼──────────────▼───────────┐
               │          Everpure Fusion Fleet                    │
               │   (gateway: FlashArray2)                          │
               │                                                   │
               │  FlashArray1     FlashArray2(GW)   FlashArray3    │
               │                                                   │
               │  aen-mongodb-pg  aen-mongodb-pg   aen-mongodb-pg  │
               │                                                   │
               └───────────────────────────────────────────────────┘

Each FlashArray holds its own local instance of aen-mongodb-pg. Fusion’s fleet API lets us issue coordinated snapshot commands to all three arrays through a single gateway connection using -ContextName to route each call to the right array.

Prerequisites

You’ll need these in place before running the scripts:

  • PureStoragePowerShellSDK2 module installed
  • PowerShell 7+
  • .env file at the project root with FlashArray credentials and Ops Manager API keys
  • SSH key auth to packer@aen-mongo-{01,02,03} (passwordless)
  • Ops Manager third-party backup configured and ACTIVE on the cluster
  • All MongoDB data volumes enrolled in the protection group (Initialize-ProtectionGroups.ps1)

The Script Walkthrough

Let’s walk through each step with actual execution output from my lab environment.

Step 0: Connect to the Fusion Gateway

The first thing the demo script does is connect to the Everpure Fusion gateway. This is the only authentication call we’ll make. Every subsequent FlashArray API operation, spanning all three arrays, flows through this single connection using -ContextName to route to the right fleet member.

$FA = Connect-Pfa2Array -EndPoint $FaEndpoint -Credential $FaCred -IgnoreCertificateError -ErrorAction Stop
Write-Host "  Connected: $FaEndpoint" -ForegroundColor Green
  Connected: FlashArray2.purestorage.com

One credential, one connection, the entire fleet. No per-array authentication loops. That’s the Fusion model.

Next, we discover which fleet members actually have our protection group. Resolve-FaContextNames enumerates the fleet, filters out any FlashBlades (which throw a Cross-product requests not supported error on Get-Pfa2Array, easy to detect), and checks each FlashArray for the PG.

$FaContextNames = Resolve-FaContextNames -FA $FA -PgName $ProtectionGroupName
Write-Host "  Fleet arrays with PG '$ProtectionGroupName': $($FaContextNames -join ', ')" -ForegroundColor Green
  Fleet members discovered (6): sn1-c60-e12-16, FlashArray3, FlashArray1, 
                                 slc6-fbs200-n3-b35-12, sn1-s200-c09-33, FlashArray2
    slc6-fbs200-n3-b35-12: not a FlashArray (skipped)
    sn1-s200-c09-33: not a FlashArray (skipped)
  FlashArrays in fleet (4): sn1-c60-e12-16, FlashArray3, FlashArray1, FlashArray2
    sn1-c60-e12-16: PG 'aen-mongodb-pg' NOT found
    FlashArray3: PG 'aen-mongodb-pg' present
    FlashArray1: PG 'aen-mongodb-pg' present
    FlashArray2: PG 'aen-mongodb-pg' present
  Resolved 3 FlashArray(s) with PG.
  Fleet arrays with PG 'aen-mongodb-pg': FlashArray3, FlashArray1, FlashArray2
  Found Protection Groups on: 3 arrays

This is the fleet I have today. The two FlashBlades are automatically excluded, and sn1-c60-e12-16 is a FlashArray in the fleet that just doesn’t have this particular MongoDB PG. Only the three arrays carrying MongoDB volumes are returned in $FaContextNames. If I add a fourth MongoDB node on a new array tomorrow, I add it to the fleet, create the PG on it, and this function returns four context names without any code changes.

Step 1: Runtime Topology Discovery — Node to Volume to Array

This is my favorite part of the demo. We have no hardcoded mapping of “which node’s volume lives on which array.” Instead, we discover it dynamically at runtime by combining two APIs: Ops Manager to find the cluster nodes, and Fusion to find the volume by SCSI serial.

First, we pull the physical nodes from Ops Manager and deduplicate by hostname (since each physical VM runs multiple shard members):

$ClusterNodes = (Invoke-OmApi -Path "group/$GroupId/clusters/$ClusterId").replicaSets.nodes | 
    Group-Object hostname | ForEach-Object { $_.Group[0] }
  Found 3 nodes: aen-mongo-01, aen-mongo-02, aen-mongo-03

Then for each node, we SSH in to read the SCSI serial of the block device backing /data/mongo. FlashArray encodes the volume serial in the SCSI WWN; strip the vendor prefix and you have a value that matches the FlashArray API’s serial field exactly. We then search the Fusion fleet for that serial:

foreach ($Node in $ClusterNodes) {
    # SSH: df -> dmsetup -> scsi_id to get the volume serial
    $SerialCmd = 'p=$(df /data/mongo | tail -1 | awk ''{print $1}''); [[ $p =~ /dev/mapper ]] && p=/dev/$(sudo dmsetup deps -o devname $p | sed -n "s/.*(\([^)]*\)).*/\1/p" | head -1); sudo /usr/lib/udev/scsi_id --whitelisted --device=$p'
    $ScsiSerial = (ssh @SshOpts "${SshUser}@$($Node.hostname)" $SerialCmd 2>/dev/null).Trim()
    $VolumeSerial = $ScsiSerial -replace "^$FaScsiVendorPrefix", ''  # Strip Pure vendor prefix

    # Search Fusion fleet for the volume — same API call across all arrays
    foreach ($CtxName in $FaContextNames) {
        $Vol = Get-Pfa2Volume -Array $FA -ContextName @($CtxName) -Filter "serial='$($VolumeSerial.ToUpper())'" -ErrorAction SilentlyContinue
        if ($Vol) { $FoundVolume = $Vol.Name; $FoundArray = ($CtxName -split '\.')[0]; break }
    }
}
  aen-mongo-01 → aen-mongo-01-data @ FlashArray1
  aen-mongo-02 → aen-mongo-02-data @ FlashArray2
  aen-mongo-03 → aen-mongo-03-data @ FlashArray3

Node         Volume            Array
----         ------            -----
aen-mongo-01 aen-mongo-01-data FlashArray1
aen-mongo-02 aen-mongo-02-data FlashArray2
aen-mongo-03 aen-mongo-03-data FlashArray3

Notice what’s happening here. We’re issuing Get-Pfa2Volume with -ContextName @($CtxName) (the same cmdlet, the same parameter, the same filter) and Fusion routes it to the right array each time. If we had 30 arrays in the fleet, the code is identical. The loop just runs 30 iterations.

Step 2: Data Baseline

Before taking the snapshot, we capture the document counts in our test database. After the restore, we’ll assert these match.

$LoadtestBefore = [long](Invoke-Mongos 'db.getSiblingDB("testdb").loadtest.countDocuments()')
$PayloadBefore  = [long](Invoke-Mongos 'db.getSiblingDB("testdb").payload.countDocuments()')
  testdb.loadtest  = 2159780
  testdb.payload   = 200000

loadtest is a write-heavy collection that’s been accumulating inserts over time. payload is a static 200,000-document collection sharded across all three shards. Both collections are sharded with a hashed shard key, so the data is distributed across all three arrays.

Step 3: Taking a Crash-Consistent Snapshot

This is where the Ops Manager third-party backup API comes in. Here’s the coordination flow:

  1. Ops Manager opens a $backupCursor on one secondary per shard, which pins the WiredTiger checkpoint on those nodes. The cluster continues accepting writes; the checkpoint is just pinned so it won’t be overwritten. Using a secondary avoids adding any load to the primary.
  2. Once all cursors are open and the job reaches READY state, we fire the FlashArray protection group snapshots via Fusion.
  3. Ops Manager closes the cursors (the job reaches FINISHED).

The result: each FlashArray has captured its volume at the exact same logical point in time: the pinned checkpoint. The cluster never stopped accepting writes. No fsyncLock, no write stall.

A note on timing: the cluster takes a moment to prepare for the snapshot. Ops Manager has to open the backup cursor and pin the checkpoint before the job reaches READY, and that preparation is where the wall-clock time goes. The snapshot itself, once we fire it, is nearly instant: a sub-second pointer capture on each array. And throughout all of it, both the preparation and the capture, the workload keeps running. Reads and writes continue against the cluster the entire time; nothing is paused, blocked, or stalled.

pwsh "$ProjectRoot/New-MongoSnapshot.ps1"
=== STEP 0: Pre-flight ===
  Third-party backup: ACTIVE (3 replica sets)
  Checking node health...
  Node health: all 9 nodes UP

=== STEP 1: Selecting snapshotable nodes (one secondary per shard) ===
  aen-shard_2 -> aen-mongo-03:27022 [SECONDARY]
  aen-shard_0 -> aen-mongo-02:27020 [SECONDARY]
  aen-shard_1 -> aen-mongo-03:27021 [SECONDARY]

  ... (STEP 2 output trimmed for brevity) ...

=== STEP 3: Creating snapshot job ===
  Snapshot job created: 6a22042c5781a47d74527dad
  Starting snapshot (opens $backupCursor on each node)...

=== STEP 4: Waiting for state = READY ===
  18:03:08  state = PENDING
  18:03:12  state = PENDING
  ... (polling every ~3s) ...
  18:07:44  state = READY
  Backup cursor is open. Proceeding to take FlashArray snapshots.

=== STEP 5: Taking FlashArray protection group snapshots ===
  Capturing preSnap counts for testdb via mongos aen-mongo-01:27017 ...
    preSnap  testdb.loadtest = 2159780
    preSnap  testdb.payload = 200000
  Created: aen-mongodb-pg.om-20260604-180744 on FlashArray3
  Created: aen-mongodb-pg.om-20260604-180744 on FlashArray1
  Created: aen-mongodb-pg.om-20260604-180744 on FlashArray2
    postSnap  testdb.loadtest = 2159780
    postSnap  testdb.payload = 200000

=== STEP 6: Finishing snapshot (closing $backupCursor) ===
  Finish signal sent.

=== STEP 7: Waiting for state = FINISHED ===
  18:07:50  state = FINISHING
  ... 
  18:08:42  state = FINISHED

=== STEP 7.5: Updating snapshot tags with post-snapshot metadata ===
  Post-snapshot tags written to FlashArray3 (FlashArray3.purestorage.com).
  Post-snapshot tags written to FlashArray1 (FlashArray1.purestorage.com).
  Post-snapshot tags written to FlashArray2 (FlashArray2.purestorage.com).

=== Snapshot Complete ===
  Snapshot ID      : 6a22042c5781a47d74527dad
  Total duration   : 343.8 seconds

The snapshot itself (the FlashArray pointer capture) is sub-second per array. The 344 seconds of elapsed time is entirely Ops Manager: opening $backupCursor on three secondaries, waiting for the checkpoint to be pinned (PENDINGREADY), then closing the cursors (FINISHINGFINISHED). The actual volume capture on each array happens between READY and Finish signal sent, a few seconds at most.

Step 4: The Tag-Based Snapshot Catalog

One of the things I like most about this solution is that there’s no external catalog database. Backup metadata lives in the FlashArray snapshot tags. After taking a snapshot, we embed the cluster name, timestamps, volume membership, and document counts directly in the tags. Any host with FlashArray API access can discover the full backup catalog by querying the fleet.

Here’s what querying the catalog looks like:

# Query all protection-group snapshots from every array in the fleet
$AllSnaps = @()
foreach ($CtxName in $FaContextNames) {
    $Snaps = Get-Pfa2ProtectionGroupSnapshot -Array $FA -ContextName @($CtxName) `
        -Filter "name='$ProtectionGroupName.*'" -ErrorAction SilentlyContinue
    if ($Snaps) { $AllSnaps += $Snaps }
}
Write-Host "  Found $($AllSnaps.Count) total snapshots across $($FaContextNames.Count) arrays for the Protection Group '$ProtectionGroupName'" -ForegroundColor Cyan

# Group snapshots by their suffix (the tag) — snapshots sharing a suffix form a
# backup set. Keep only complete sets: one snapshot per array.
$SnapshotSets = $AllSnaps |
    Group-Object { $_.Name -replace "^$([regex]::Escape($ProtectionGroupName))\.", '' } |
    Where-Object { $_.Count -eq $FaContextNames.Count }

if (-not $SnapshotSets) {
    throw "No complete backup sets found. A complete set requires one snapshot per array ($($FaContextNames.Count) arrays)."
}
Write-Host "  Found $($SnapshotSets.Count) complete backup sets" -ForegroundColor Green
  Found 186 total snapshots across 3 arrays for the Protection Group 'aen-mongodb-pg'
  Found 62 complete backup sets

186 snapshots across 3 arrays, 62 complete sets (one snapshot per array = a set). To choose what to restore, we sort the complete sets by their snapshot timestamp and take the newest. The group’s name is the suffix we tagged the snapshot with — that’s the $LatestSnapshotTag we’ll hand to the restore script:

# Most recent complete set = the one we'll restore
$LatestBackupSet   = $SnapshotSets |
    Sort-Object { ($_.Group | Select-Object -First 1).Created } -Descending |
    Select-Object -First 1
$LatestSnapshotTag = $LatestBackupSet.Name   # the timestamp suffix, e.g. om-20260604-180744
$BackupSnapshots   = $LatestBackupSet.Group  # the per-array snapshots in this set

That newest set is three snapshots, one per array:

Name                              Array       Created
----                              -----       -------
aen-mongodb-pg.om-20260604-180744 FlashArray2 6/4/2026 11:07:48 PM
aen-mongodb-pg.om-20260604-180744 FlashArray3 6/4/2026 11:07:47 PM
aen-mongodb-pg.om-20260604-180744 FlashArray1 6/4/2026 11:07:47 PM

And the tags on one of those snapshots:

Key             Value
---             -----
ClusterName     aen-cluster
BackupTimestamp om-20260604-180744
BackupType      SNAPSHOT
mongo:volumes   aen-mongo-01-data,aen-mongo-02-data,aen-mongo-03-data
mongo:preSnap   {"testdb":{"loadtest":2159780,"payload":200000}}
mongo:postSnap  {"testdb":{"loadtest":2159780,"payload":200000}}
mongo:t1ts      1780614272

The mongo:volumes tag tells the restore script exactly which volume member to use from each array. The pre/post document counts let us verify the restore hit the right window. All of this is stored on the FlashArray itself: no sidecar service, no JSON files on disk, no catalog database to maintain.

Step 5: Simulating a Disaster

Time to break things. We drop testdb through the mongos router:

Invoke-Mongos 'db.getSiblingDB("testdb").dropDatabase()'
{ ok: 1, dropped: 'testdb' }

 testdb.payload   = 0   (was 200000)
 testdb.loadtest  = 0   (was 2159780)
 ⚠️⚠️⚠️ Data is gone from all 3 nodes ⚠️⚠️⚠️

MongoDB replication propagates the drop to all 9 replica set members in milliseconds. The data is gone from all three nodes. Without storage snapshots, the recovery options here are grim: mongodump restore takes hours, replica set rollback only works for recent writes, and Ops Manager backup restore requires standing up the backup infrastructure and streaming data over the network.

Step 6: Restoring from the FlashArray Snapshot

The restore is an 8-step pipeline. We hand it the $LatestSnapshotTag we resolved from the catalog back in Step 4 (om-20260604-180744), and that single value is enough for it to locate every snapshot member across the fleet:

pwsh "$ProjectRoot/Restore-MongoSnapshot.ps1" -SnapshotTag $LatestSnapshotTag -Force

Step 0: Pre-flight: The script connects to Fusion, rediscovers the node-to-volume mapping via SCSI serial (same runtime discovery as before), and validates that all three snapshot members exist on their respective arrays.

  Snapshot found: aen-mongodb-pg.om-20260604-180744 on FlashArray3
  Snapshot found: aen-mongodb-pg.om-20260604-180744 on FlashArray1
  Snapshot found: aen-mongodb-pg.om-20260604-180744 on FlashArray2
  All 3 snapshots confirmed.
  Member verified: aen-mongodb-pg.om-20260604-180744.aen-mongo-01-data (size 1099511627776 bytes)
  Member verified: aen-mongodb-pg.om-20260604-180744.aen-mongo-02-data (size 1099511627776 bytes)
  Member verified: aen-mongodb-pg.om-20260604-180744.aen-mongo-03-data (size 1099511627776 bytes)

Steps 1-3: Stop MongoDB cleanly: Ops Manager automation agents send SIGTERM to mongod/mongos on all nodes. We wait for clean exit, then unmount /data/mongo on each node.

=== STEP 1: Stopping automation agents ===
  Stopped on aen-mongo-01, aen-mongo-02, aen-mongo-03

=== STEP 2: Stopping mongod/mongos ===
  aen-mongo-01: clean-stop
  aen-mongo-02: clean-stop
  aen-mongo-03: clean-stop

=== STEP 3: Unmounting /data/mongo ===
  aen-mongo-01: unmounted
  aen-mongo-02: unmounted
  aen-mongo-03: unmounted

Step 4: The FlashArray volume overwrite: This is the critical step, and it’s where Fusion’s value is clearest. We call New-Pfa2Volume -Overwrite for each volume, routing to the correct array via -ContextName. Each call is a metadata-only pointer swap: the live volume pointer is redirected to point at the snapshot’s data blocks. No bytes are copied. The volume is ready the instant the API call returns.

=== STEP 4: Restoring FlashArray volumes from 'om-20260604-180744' ===
  Overwriting aen-mongo-01-data on FlashArray1 <- aen-mongodb-pg.om-20260604-180744.aen-mongo-01-data ...
  Restored: aen-mongo-01-data
  Overwriting aen-mongo-02-data on FlashArray2 <- aen-mongodb-pg.om-20260604-180744.aen-mongo-02-data ...
  Restored: aen-mongo-02-data
  Overwriting aen-mongo-03-data on FlashArray3 <- aen-mongodb-pg.om-20260604-180744.aen-mongo-03-data ...
  Restored: aen-mongo-03-data

Three arrays, three volumes, same -ContextName routing pattern on every call. If this cluster had 30 nodes across 30 arrays, the code here would be identical, just 30 iterations of the loop instead of 3.

Steps 5-7: Remount, restart, and stabilize: We rescan the block devices (each host sees the new volume identity after the pointer swap), remount XFS, restart the Ops Manager automation agents, and wait for mongos to accept connections and all shards to elect primaries.

=== STEP 5: Rescanning LUN and remounting /data/mongo ===
  aen-mongo-01: rescan + integrity check + mount /dev/sdb ...
    aen-mongo-01: device=/dev/sdb1 fstype=xfs
    aen-mongo-01: WARN: RO integrity check exit 1 (advisory; mount will replay journal)
  aen-mongo-01: mounted
  aen-mongo-02: rescan + integrity check + mount /dev/sdb ...
    aen-mongo-02: device=/dev/sdb1 fstype=xfs
    aen-mongo-02: WARN: RO integrity check exit 1 (advisory; mount will replay journal)
  aen-mongo-02: mounted
  aen-mongo-03: rescan + integrity check + mount /dev/sdb ...
    aen-mongo-03: device=/dev/sdb1 fstype=xfs
    aen-mongo-03: WARN: RO integrity check exit 1 (advisory; mount will replay journal)
  aen-mongo-03: mounted

=== STEP 6: Starting automation agents ===

The WARN: RO integrity check exit 1 messages are expected and harmless. The snapshot was never cleanly unmounted from the OS’s point of view: the volume was captured mid-run. The kernel flags this and XFS replays the journal automatically on mount, which is exactly how WiredTiger gets rolled back to the pinned checkpoint. If you see this, everything is working correctly.

  Started on aen-mongo-01, aen-mongo-02, aen-mongo-03

=== STEP 7: Waiting for cluster to stabilize ===
  18:09:52  Checking cluster state...
    mongos not yet accepting connections...
  18:10:02  Checking cluster state...
    Authoritative shard count from listShards: 3
  18:10:05  mongos up, 3 shards registered, 3 primaries reachable.

Step 8: Verify: The restore script reads the mongo:volumes tag from the snapshot to confirm each node’s volume was part of the backup set, then counts documents and asserts they fall within the pre/post-snapshot range captured in the tags.

=== STEP 8: Verifying data ===
  testdb.loadtest document count: 2159780
  testdb.payload  document count: 200000
  Baseline OK : testdb.loadtest = 2159780 in [2159780, 2159780] (drift=0)
  Baseline OK : testdb.payload = 200000 in [200000, 200000] (drift=0)

=== Restore Complete ===
  Snapshot restored : om-20260604-180744
  Total duration    : 81 seconds

Step 7: Final Verification

Back in the demo script, we do one more independent check after the restore script exits:

  testdb.loadtest  = 2159780 (was 2159780)  [✓ PASS]
  testdb.payload   = 200000  (was 200000)   [✓ PASS]

2,359,780 documents total (2,159,780 loadtest + 200,000 payload). All accounted for. And the total restore time (81 seconds) would be the same if each volume held 10 TB instead of 1 TB. The FlashArray volume overwrite is a pointer swap. It doesn’t know or care about dataset size.

Why Fusion Changes the Calculus for Multi-Array MongoDB

If you’ve built multi-array backup scripts before, you know what the traditional approach looks like: one Connect-Pfa2Array per array, one copy of the snapshot logic per array, one more connection to maintain when you add a node. It works, but it doesn’t scale in the operational sense.

Fusion flips the model:

  • One gateway connection manages the entire fleet. One Connect-Pfa2Array call to FlashArray2, and we have API access to all three arrays.
  • -ContextName routes each operation to the correct fleet member. The same cmdlet, the same call pattern, the same loop, regardless of whether you have 3 arrays or 300.
  • Fleet-wide queries let you search for a volume by serial number across the entire fleet in a single loop. That’s how the runtime topology discovery works.
  • No external catalog: backup metadata is in the snapshot tags, queryable fleet-wide via Fusion.
  • Linear scalability: when we add a fourth MongoDB node on a new array, we add it to the fleet, create the PG on it, and every script in this repository continues to work without modification.

The scale advantage becomes especially important when you think about the runtime discovery step. One SCSI serial lookup, one Fusion fleet search, and we know exactly which volume on which array belongs to which MongoDB node. Dynamic. Runtime. No inventory file to maintain.

Wrapping Up

What I’ve shown here is a complete, production-grade crash-consistent snapshot and restore workflow for a sharded MongoDB cluster across three independent FlashArrays, managed through a single Everpure Fusion gateway. The cluster stays online during the snapshot, restore is constant-time regardless of dataset size, and the entire fleet is managed through one consistent API surface.

The full source, including New-MongoSnapshot.ps1, Restore-MongoSnapshot.ps1, Config.ps1, and the demo script, is on GitHub at nocentino/mongodb-flasharray-backup.

Let me know how it works in your environment.