Skip to main content

Migrating Devnet/Mainnet Archive to Berkeley Archive

Before you start the process to migrate your archive database from the current Mainnet or Devnet format to Berkeley, be sure that you:

Migration process

The Devnet/Mainnet migration can take up to a couple of days. Therefore, you can achieve a successful migration by using three stages:

  • Stage 1: Initial migration

  • Stage 2: Incremental migration

  • Stage 3: Remainder migration

Each stage has three migration phases:

  • Phase 1: Copying data and precomputed blocks from Devnet/Mainnet database using the berkeley_migration app.

  • Phase 2: Populating new Berkeley tables using the replayer app in migration mode

  • Phase 3: Additional validation for migrated database

Review these phases and stages before you start the migration.

Simplified approach

For convenience, use the mina-berkeley-migration-script app if you do not need to delve into the details of migration or if your environment does not require a special approach to migration.

Stage 1: Initial migration

mina-berkeley-migration-script  \
initial \
--genesis-ledger ledger.json \
--source-db postgres://postgres:postgres@localhost:5432/source \
--target-db postgres://postgres:postgres@localhost:5432/migrated \
--blocks-bucket mina_network_block_data \
--blocks-batch-size 500 \
--checkpoint-interval 10000 \
--checkpoint-output-path . \
--precomputed-blocks-local-path . \
--network NETWORK

where:

-g | --genesis-ledger: path to the genesis ledger file

-s | --source-db: connection string to the database to be migrated

-t | --target-db: connection string to the database that will hold the migrated data

-b | --blocks-bucket: name of the precomputed blocks bucket. Precomputed blocks are assumed to be named with format: {network}-{height}-{state_hash}.json

-bs | --blocks-batch-size: number of precomputed blocks to be fetched at one time from Google Cloud. A larger number, like 1000, can help speed up the migration process.

-n | --network: network name (devnet or mainnet) when determining precomputed blocks. Precomputed blocks are assumed to be named with format: {network}-{height}-{state_hash}.json.

-c | --checkpoint-output-path: path to folder for replayer checkpoint files

-i | --checkpoint-interval: frequency of dumping checkpoint expressed in blocks count

-l | --precomputed-blocks-local-path: path to folder for on-disk precomputed blocks location

The command output is the migration-replayer-XXX.json file required for the next run.

Stage 2: Incremental migration

mina-berkeley-migration-script \
incremental \
--genesis-ledger ledger.json \
--source-db postgres://postgres:postgres@localhost:5432/source \
--target-db postgres://postgres:postgres@localhost:5432/migrated \
--blocks-bucket mina_network_block_data \
--blocks-batch-size 500 \
--network NETWORK \
--checkpoint-output-path . \
--checkpoint-interval 10000 \
--precomputed-blocks-local-path . \
--replayer-checkpoint migration-checkpoint-XXX.json

where:

-g | --genesis-ledger: path to the genesis ledger file

-s | --source-db: connection string to the database to be migrated

-t | --target-db: connection string to the database that will hold the migrated data

-b | --blocks-bucket: name of the precomputed blocks bucket. Precomputed blocks are assumed to be named with format: {network}-{height}-{state_hash}.json

-bs | --blocks-batch-size: number of precomputed blocks to be fetched at one time from Google Cloud. A larger number, like 1000, can help speed up migration process.

-n | --network: network name (devnet or mainnet) when determining precomputed blocks. Precomputed blocks are assumed to be named with format: {network}-{height}-{state_hash}.json.

-r | --replayer-checkpoint: path to the latest checkpoint file migration-checkpoint-XXX.json

-c | --checkpoint-output-path: path to folder for replayer checkpoint files

-i | --checkpoint-interval: frequency of dumping checkpoint expressed in blocks count

-l | --precomputed-blocks-local-path: path to folder for on-disk precomputed blocks location

Stage 3: Remainder migration

mina-berkeley-migration-script \
final \
--genesis-ledger ledger.json \
--source-db postgres://postgres:postgres@localhost:5432/source \
--target-db postgres://postgres:postgres@localhost:5432/migrated \
--blocks-bucket mina_network_block_data \
--blocks-batch-size 500 \
--network NETWORK \
--checkpoint-output-path . \
--checkpoint-interval 10000 \
--precomputed-blocks-local-path . \
--replayer-checkpoint migration-checkpoint-XXX.json \
-fc fork-genesis-config.json

where:

-g | --genesis-ledger: path to the genesis ledger file

-s | --source-db: connection string to the database to be migrated

-t | --target-db: connection string to the database that will hold the migrated data

-b | --blocks-bucket: name of the precomputed blocks bucket. Precomputed blocks are assumed to be named with format: {network}-{height}-{state_hash}.json

-bs | --blocks-batch-size: number of precomputed blocks to be fetched at one time from Google Cloud. A larger number, like 1000, can help speed up the migration process.

-n | --network: network name (devnet or mainnet) when determining precomputed blocks. Precomputed blocks are assumed to be named with format: {network}-{height}-{state_hash}.json.

-r | --replayer-checkpoint: path to the latest checkpoint file migration-checkpoint-XXX.json

-c | --checkpoint-output-path: path to folder for replayer checkpoint files

-i | --checkpoint-interval: frequency of dumping checkpoint expressed in blocks count

-l | --precomputed-blocks-local-path: path to folder for on-disk precomputed blocks location

-fc | --fork-config: fork genesis config file is the new genesis config that is distributed with the new daemon and is published after the fork block is announced

Advanced approach

If the simplified berkeley migration script is, for some reason, not suitable for you, it is possible to run the migration using the berkeley_migration and replayer apps without an interface the script provides.

Stage 1: Initial migration

This first stage requires only the initial Berkeley schema, which is the foundation for the next migration stage. This schema populates the migrated database and creates an initial checkpoint for further incremental migration.

  • Inputs

    • Unmigrated Devnet/Mainnet database
    • Devnet/Mainnet genesis ledger
    • Empty target Berkeley database with the schema created, but without any content
  • Outputs

    • Migrated Devnet/Mainnet database to the Berkeley format from genesis up to the last canonical block in the original database
    • Replayer checkpoint that can be used for incremental migration

Phase 1: Berkeley migration app run

mina-berkeley-migration \
--batch-size 1000 \
--config-file ledger.json \
--mainnet-archive-uri postgres://postgres:postgres@localhost:5432/source \
--migrated-archive-uri postgres://postgres:postgres@localhost:5432/migrated \
--blocks-bucket mina_network_block_data \
--precomputed-blocks-local-path . \
--keep-precomputed-blocks \
--network NETWORK

where:

--batch-size: number of precomputed blocks to be fetched at one time from Google Cloud. A larger number, like 1000, can help speed up migration process.

--config-file: path to the genesis ledger file

--mainnet-archive-uri: connection string to the database to be migrated

--migrated-archive-uri: connection string to the database that will hold the migrated data

--blocks-bucket: name of the precomputed blocks bucket. Precomputed blocks are assumed to be named with format: {network}-{height}-{state_hash}.json

--precomputed-blocks-local-path: path to folder for on-disk precomputed blocks location

--keep-precomputed-blocks: keep the precomputed blocks on-disk after the migration is complete

--network: the network name (devnet or mainnet) when determining precomputed blocks. Precomputed blocks are assumed to be named with format: {network}-{height}-{state_hash}.json

Phase 2: Replayer in migration mode run

Replayer config must contain the Devnet/Mainnet ledger as the starting point. So first, you must prepare the replayer config file:

 jq '.ledger.accounts' genesis_ledger.json | jq  '{genesis_ledger: {accounts: .}}' > replayer_input_config.json

where:

genesis_ledger.json is the genesis file from a daemon bootstrap on a particular network

Then:

 mina-migration-replayer \
--migration-mode \
--archive-uri postgres://postgres:postgres@localhost:5432/migrated \
--input-file replayer_input_config.json \
--checkpoint-interval 10000 \
--checkpoint-output-folder .

where:

--migration-mode: flag for migration

--archive-uri: connection string to the database that will hold the migrated data

--input-file: path to the replayer input file, see below on how's created

replayer_input_config.json: is a file constructed out of network genesis ledger:

   jq '.ledger.accounts' genesis_ledger.json | jq  '{genesis_ledger: {accounts: .}}' > replayer_input_config.json

--checkpoint-interval: frequency of checkpoints file expressed in blocks count

--checkpoint-output-folder: path to folder for replayer checkpoint files

Phase 3: Validations

Use the berkeley_migration_verifier app to perform checks for both the fully migrated and partially migrated databases.

 mina-berkeley-migration-verifier \
pre-fork \
--mainnet-archive-uri postgres://postgres:postgres@localhost:5432/source \
--migrated-archive-uri postgres://postgres:postgres@localhost:5432/migrated

where:

--mainnet-archive-uri: connection string to the database to be migrated

--migrated-archive-uri: connection string to the database that will hold the migrated data

Stage 2: Incremental migration

After the initial migration, the data is migrated data up to the last canonical block. However, Devnet/Mainnet data is progressing with new blocks that must also be migrated again and again until the fork block is announced.

info

Incremental migration can, and probably must, be repeated a couple of times until the fork block is announced by Mina Foundation. Run the incremental migration multiple times with the latest Devnet/Mainnet database and the latest replayer checkpoint file.

  • Inputs

    • Latest Devnet/Mainnet database
    • Devnet/Mainnet genesis ledger
    • Replayer checkpoint from last run
    • Migrated berkeley database from initial migration
  • Outputs

    • Migrated Devnet/Mainnet database to the Berkeley format up to the last canonical block
    • Replayer checkpoint which can be used for the next incremental migration

Phase 1: Berkeley migration app run

mina-berkeley-migration \
--batch-size 1000 \
--config-file ledger.json \
--mainnet-archive-uri postgres://postgres:postgres@localhost:5432/source \
--migrated-archive-uri postgres://postgres:postgres@localhost:5432/migrated \
--blocks-bucket mina_network_block_data \
--precomputed-blocks-local-path . \
--keep-precomputed-blocks \
--network NETWORK

where:

--batch-size: number of precomputed blocks to be fetched at one time from Google Cloud. A larger number, like 1000, can help speed up migration process.

--config-file: path to the genesis ledger file

--mainnet-archive-uri: connection string to the database to be migrated

--migrated-archive-uri: connection string to the database that will hold the migrated data

--blocks-bucket: name of the precomputed blocks bucket. Precomputed blocks are assumed to be named with format: {network}-{height}-{state_hash}.json

--precomputed-blocks-local-path: path to folder for on-disk precomputed blocks location

--keep-precomputed-blocks: keep the precomputed blocks on-disk after the migration is complete

--network: the network name (devnet or mainnet) when determining precomputed blocks. Precomputed blocks are assumed to be named with format: {network}-{height}-{state_hash}.json

Phase 2: Replayer in migration mode run

 mina-migration-replayer \
--migration-mode \
--archive-uri postgres://postgres:postgres@localhost:5432/migrated \
--input-file replayer-checkpoint-XXX.json \
--checkpoint-interval 10000 \
--checkpoint-output-folder .

where:

--migration-mode: flag for migration

--archive-uri: connection string to the database that will hold the migrated data

--input-file: path to the latest checkpoint file replayer-checkpoint-XXX.json

replayer-checkpoint-XXX.json: the latest checkpoint generated from the previous migration

--checkpoint-interval: frequency of checkpoints file expressed in blocks count

--checkpoint-output-folder: path to folder for replayer checkpoint files

Incremental migration can be run continuously on top of the initial migration or last incremental until the fork block is announced.

Phase 3: Validations

Use the berkeley_migration_verifier app to perform checks for both the fully migrated and partially migrated database.

 mina-berkeley-migration-verifier \
pre-fork \
--mainnet-archive-uri postgres://postgres:postgres@localhost:5432/source \
--migrated-archive-uri postgres://postgres:postgres@localhost:5432/migrated

where:

--mainnet-archive-uri: connection string to the database to be migrated

--migrated-archive-uri: connection string to the database that will hold the migrated data

Note that: you can run incremental migration continuously on top of the initial migration or the last incremental until the fork block is announced.

Stage 3: Remainder migration

When the fork block is announced, you must tackle the remainder migration. This is the last migration run you need to perform. In this stage, you close the migration cycle with the last migration of the remainder blocks between the current last canonical block and the fork block (which can be pending, so you don't need to wait 290 blocks until it would become canonical). You must use --fork-state-hash as an additional parameter to the berkeley-migration app.

  • Inputs

    • Latest Devnet/Mainnet database
    • Devnet/Mainnet genesis ledger
    • Replayer checkpoint from last run
    • Migrated Berkeley database from last run
    • Fork block state hash
  • Outputs

    • Migrated devnet/mainnet database to berkeley up to fork point
    • Replayer checkpoint which can be used for the next incremental migration
info

The migrated database output from this stage of the final migration is required to initialize your archive nodes on the upgraded network.

Phase 1: Berkeley migration app run

mina-berkeley-migration \
--batch-size 1000 \
--config-file ledger.json \
--mainnet-archive-uri postgres://postgres:postgres@localhost:5432/source \
--migrated-archive-uri postgres://postgres:postgres@localhost:5432/migrated \
--blocks-bucket mina_network_block_data \
--precomputed-blocks-local-path \
--keep-precomputed-blocks \
--network NETWORK \
--fork-state-hash {fork-state-hash}

where:

--batch-size: number of precomputed blocks to be fetched at one time from Google Cloud. A larger number, like 1000, can help speed up migration process.

--config-file: path to the genesis ledger file

--mainnet-archive-uri: connection string to the database to be migrated

--migrated-archive-uri: connection string to the database that will hold the migrated data

--blocks-bucket: name of the precomputed blocks bucket. Precomputed blocks are assumed to be named with format: {network}-{height}-{state_hash}.json

--precomputed-blocks-local-path: path to folder for on-disk precomputed blocks location

--keep-precomputed-blocks: keep the precomputed blocks on-disk after the migration is complete

--network: the network name (devnet or mainnet) when determining precomputed blocks. Precomputed blocks are assumed to be named with format: {network}-{height}-{state_hash}.json

--fork-state-hash: fork state hash

info

When you run the berkeley-migration app with fork-state-hash, there is no requirement for the fork state block to be canonical. The tool automatically converts all pending blocks in the subchain, including the fork block, to canonical blocks.

Phase 2: Replayer in migration mode run

mina-migration-replayer \
--migration-mode \
--archive-uri postgres://postgres:postgres@localhost:5432/migrated \
--input-file replayer-checkpoint-XXX.json \
--checkpoint-interval 10000 \
--checkpoint-output-folder .

where:

--migration-mode: flag for migration

--archive-uri: connection string to the database that will hold the migrated data

--input-file: path to the latest checkpoint file replayer-checkpoint-XXX.json from stage 1

replayer-checkpoint-XXX.json: the latest checkpoint generated from the previous migration

--checkpoint-interval: frequency of checkpoints file expressed in blocks count

--checkpoint-output-folder: path to folder for replayer checkpoint files

Phase 3: Validations

Use the berkeley_migration_verifier app to perform checks for both the fully migrated and partially migrated databases.

 mina-berkeley-migration-verifier \
post-fork \
--mainnet-archive-uri postgres://postgres:postgres@localhost:5432/source \
--migrated-archive-uri postgres://postgres:postgres@localhost:5432/migrated \
--fork-config-file fork_genesis_config.json \
--migrated-replayer-output replayer-checkpoint-XXXX.json

where:

--mainnet-archive-uri: connection string to the database to be migrated

--migrated-archive-uri: connection string to the database that will hold the migrated data

--migrated-replayer-output: path to the latest checkpoint file replayer-checkpoint-XXX.json

--fork-config: fork genesis config file is the new genesis config that is distributed with the new daemon and is published after the fork block is announced

Example migration steps using Mina Foundation data for Devnet using Debian

See: Worked example using March 22 data

Example migration steps using Mina Foundation data for Mainnet using Docker

See: Worked example using March 22 data

How to verify a successful migration

o1Labs and Mina Foundation make every effort to provide reliable tools of high quality. However, it is not possible to eliminate all errors and test all possible Mainnet archive variations. All important checks are implemented in the mina-berkeley-migration-verifier application. However, you can use the following checklist if you want to perform the checks manually:

  1. All transaction (user command and internal command) hashes are left intact.

    Verify that the user_command and internal_command tables have the Devnet/Mainnet format of hashes. For example, CkpZirFuoLVV....

  2. Parent-child block relationship is preserved

    Verify that a given block in the migrated archive has the same parent in the Devnet/Mainnet archive (state_hash and parent_hash columns) that was used as input.

  3. Account balances remain the same

    Verify the same balance exists for a given block in Mainnet and the migrated databases.

Tips and tricks

We are aware that the migration process can be very long (a couple of days). Therefore, we encourage you to use cron jobs that migrate data incrementally. The cron job requires access to Google Cloud buckets (or other storage):

  • A bucket to store migrated-so-far database dumps
  • A bucket to store checkpoint files

We are tightly coupled with Google Cloud infrastructure due to the precomputed block upload mechanism. This is why we are using also buckets for storing dumps and checkpoint. However, you do not have to use Google Cloud for other things than precomputed blocks. With configuration, you can use any gsutil-compatible storage backend (for example, S3).

Before running the cron job, upload an initial database dump and an initial checkpoint file.

To create the files, run these steps locally:

  1. Download a Devnet/Mainnet archive dump and load it into PostgreSQL.
  2. Create an empty database using the new archive schema.
  3. Run the berkeley-migration app against the Devnet/Mainnet and new databases.
  4. Run the replayer app in migration mode with the --checkpoint-interval set to a suitable value (perhaps 100) and start with the original Devnet/Mainnet ledger in the input file.
  5. Use pg_dump to dump the migrated database and upload it.
  6. Upload the most recent checkpoint file.

The cron job performs the same steps in an automated fashion:

  1. Pulls the latest Devnet/Mainnet archive dump and loads it into PostgresQL.
  2. Pulls the latest migrated database and loads it into PostgreSQL.
  3. Pulls the latest checkpoint file.
  4. Runs the berkeley-migration app against the two databases.
  5. Runs the replayer app in migration mode using the downloaded checkpoint file; set the checkpoint interval to be smaller (perhaps 50) because there are typically only 200 or so blocks in a day.
  6. Uploads the migrated database.
  7. Uploads the most recent checkpoint file.

Be sure to monitor the cron job for errors.

Just before the Berkeley upgrade, migrate the last few blocks by running locally:

  1. Download the Devnet/Mainnet archive data directly from the k8s PostgreSQL node (not from the archive dump), and load it into PostgreSQL.
  2. Download the most recent migrated database and load it into PostgresQL.
  3. Download the most recent checkpoint file.
  4. Run the berkeley-migration app against the two databases.
  5. Run the replayer app in migration mode using the most recent checkpoint file.

It is worthwhile to perform these last steps as a dry run to make sure all goes well. You can run these steps as many times as needed.

Known migration problems

Please remember that rerunning after crash is always possible. After solving any of below issues you can rerun process and migration will continue form last position

Async was unable to add a file descriptor to its table of open file descriptors

For example:

 ("Async was unable to add a file descriptor to its table of open file descriptors"
(file_descr 18)
(error
"Attempt to register a file descriptor with Async that Async believes it is already managing.")
(backtrace
......

A remedy is to lower --block-batch-size parameter to values up to 500.

Map.find_exn: not found

For example:

(monitor.ml.Error
(Not_found_s
("Map.find_exn: not found"
....

Usually this error means that there is a gap in canonical chain. In order to fix it please ensure that missing-block-auditor run is successful

Yojson.Json_error .. Unexpected end of input

For example:

(monitor.ml.Error
("Yojson.Json_error(\"Line 1, bytes 1003519-1003520:\\nUnexpected end of input\")")
("Raised at Yojson.json_error in file \"common.ml\", line 5, characters 19-39"
"Called from Yojson.Safe.__ocaml_lex_read_json_rec in file \"lib/read.mll\", line 215, characters 28-52"
...

This issue is caused by invalid precomputed block. Deleting the downloaded precomputed blocks should resolve this issue.

Error querying db, error: Request to ... failed: ERROR: column \"type\" does not exist

You provided the migrated schema as source one when invoking script or berkeley-migration app

Poor performance of migration when accessing remote database

We conducted migration tests with both a local database and a distant database (RDS). The migration using the local database appears to process significantly faster. We strongly suggest to use offline database installed locally

ERROR: out of shared memory

(monitor.ml.Error (Failure "Error querying for user commands with id 1686617, error Request to postgresql://user:pwd@host:port/db failed: ERROR: out of shared memory
\nHINT: You might need to increase max_pred_locks_per_transaction

Solution is either to increase max_pred_locks_per_transaction setting in postgres database. Alternative is to isolate database from mainnet traffic (for example by exporting dump from live database and import it on isolated environment)

Berkeley migration app is consuming all of my resources

When running a full migration, you can stumble on memory leaks that prevent you from cleanly performing the migration in one pass. A machine with 64 GB of RAM can be frozen after ~40k migrated blocks. Each 200 blocks inserted into the database increases the memory leak by 4-10 MB.

A potential workaround is to split the migration into smaller parts using cron jobs or automation scripts.

FAQ

Migrated database is missing orphaned blocks

By design, Berkeley migration omits orphaned blocks and, by default, migrates only canonical (and pending, if setup correctly) blocks.

Replayer in migration mode overrides my old checkpoints

By default, the replayer dumps the checkpoint to the current folder. All checkpoint files have a similar format:

replayer-checkpoint-{number}.json.

To prevent override of old checkpoints, use the --checkpoint-output-folder and --checkpoint-file-prefix parameters to modify the output folder and prefix.