Test and validate
To verify that your connections and configuration work properly, run MOLT Fetch in a staging environment before migrating any data in production. Use a test or development environment that closely resembles production.
Configure the source database and connection
To prevent connections from terminating prematurely during the data export phase, set the following to high values on the source database:
Maximum allowed number of connections. MOLT Fetch can export data across multiple connections. The number of connections it will create is the number of shards (
--export-concurrency) multiplied by the number of tables (--table-concurrency) being exported concurrently.Note:With the default numerical range sharding, only tables with primary key types of
INT,FLOAT, orUUIDcan be sharded. PostgreSQL users can enable--use-stats-based-shardingto use statistics-based sharding for tables with primary keys of any data type. For details, refer to Table sharding.Maximum lifetime of a connection.
If a PostgreSQL database is set as a source, ensure that
idle_in_transaction_session_timeouton PostgreSQL is either disabled or set to a value longer than the duration of the data export phase. Otherwise, the connection will be prematurely terminated. To estimate the time needed to export the PostgreSQL tables, you can perform a dry run and sum the value ofmolt_fetch_table_export_duration_msfor all exported tables.
Optimize performance
To optimize data load performance, drop all non-
PRIMARY KEYconstraints and indexes on the target CockroachDB database before migrating:FOREIGN KEYUNIQUE- Secondary indexes
CHECKDEFAULTNOT NULL(you do not need to drop this constraint when usingdrop-on-target-and-recreatefor table handling)
Warning:Do not drop
PRIMARY KEYconstraints.You can recreate constraints and indexes after loading the data.
For PostgreSQL sources using
--use-stats-based-sharding, runANALYZEon source tables before migration to ensure optimal shard distribution. This is especially important for large tables where even distribution can significantly improve export performance.To prevent memory outages during
READ COMMITTEDdata export of tables with large rows, estimate the amount of memory used to export a table:--row-batch-size * --export-concurrency * average size of the table rowsIf you are exporting more than one table at a time (i.e.,
--table-concurrencyis set higher than1), add the estimated memory usage for the tables with the largest row sizes. Ensure that you have sufficient memory to runmolt fetch, and adjust--row-batch-sizeaccordingly. For details on how concurrency and sharding interact, refer to Table sharding.If a table in the source database is much larger than the other tables, filter and export the largest table in its own
molt fetchtask. Repeat this for each of the largest tables. Then export the remaining tables in another task.Ensure that the machine running MOLT Fetch is large enough to handle the amount of data being migrated. Fetch performance can sometimes be limited by available resources, but should always be making progress. To identify possible resource constraints, observe the
molt_fetch_rows_exportedmetric for decreases in the number of rows being processed. You can use the sample Grafana dashboard to view metrics. For details on optimizing export performance through sharding, refer to Table sharding.
Import and continuation handling
- When using
IMPORT INTOduring the data import phase to load tables into CockroachDB, if the fetch task terminates before the import job completes, the hanging import job on the target database will keep the table offline. To make this table accessible again, manually resume or cancel the job. Then resumemolt fetchusing continuation, or restart the task from the beginning.
Security
Cockroach Labs strongly recommends the following security practices.
Connection security
To keep your database credentials out of shell history and logs, follow these best practices when specifying your source and target connection strings:
- Avoid plaintext connection strings.
Provide your connection strings as environment variables. For example:
export SOURCE="postgres://migration_user:a%2452%26@localhost:5432/migration_db?sslmode=verify-full" export TARGET="postgres://root@localhost:26257/defaultdb?sslmode=verify-full"Afterward, reference the environment variables in MOLT commands:
--source $SOURCE --target $TARGETIf possible, use an external secrets manager to load the environment variables from stored secrets.
Use TLS-enabled connection strings to encrypt data in transit from MOLT to the database. When using TLS certificates, ensure certificate files are accessible to the MOLT binary on the same machine.
For example, a PostgreSQL connection string with TLS certificates:
postgresql://migration_user@db.example.com:5432/appdb?sslmode=verify-full&sslrootcert=/etc/migration_db/certs/ca.pem&sslcert=/etc/migration_db/certs/client.crt&sslkey=/etc/migration_db/certs/client.keyURL-encode connection strings for the source database and CockroachDB so special characters in passwords are handled correctly.
Given a password
a$52&, pass it to themolt escape-passwordcommand with single quotes:molt escape-password --password 'a$52&'Use the encoded password in your connection string. For example:
postgres://migration_user:a%2452%26@localhost:5432/migration_db
Remove
sslmode=disablefrom production connection strings.
By default, insecure connections (i.e., sslmode=disable on PostgreSQL; sslmode not set on MySQL) are disallowed. When using an insecure connection, molt fetch returns an error. To override this check, you can enable the --allow-tls-mode-disable flag. Do this only when testing, or if a secure SSL/TLS connection to the source or target database is not possible.
Cloud storage security
Ensure that access control is properly configured for Amazon S3, Google Cloud Storage, or Azure Blob Storage.
Amazon S3
Set the following environment variables in the terminal running
molt fetch:export AWS_REGION='us-east-1' export AWS_SECRET_ACCESS_KEY='key' export AWS_ACCESS_KEY_ID='id'To run
molt fetchin a containerized environment (e.g., Docker), pass the required environment variables using-e. If your authentication method relies on local credential files, you may also need to volume map the host path to the appropriate location inside the container using-v. For example:docker run \ -e AWS_ACCESS_KEY_ID='your-access-key' \ -e AWS_SECRET_ACCESS_KEY='your-secret-key' \ -v ~/.aws:/root/.aws \ -it \ cockroachdb/molt fetch \ --bucket-path 's3://migration/data/cockroach' ...
Alternatively, set
--use-implicit-authto use implicit authentication. When using assume role authentication, specify the service account with--assume-role. For example:--bucket-path 's3://migration/data/cockroach' --assume-role 'arn:aws:iam::123456789012:role/MyMigrationRole' --use-implicit-authSet
--import-regionto specify anAWS_REGION(e.g.,--import-region 'ap-south-1').Ensure the S3 bucket is created and accessible by authorized roles and users only.
Google Cloud Storage
Authenticate your local environment with Application Default Credentials:
Using
gcloud:gcloud init gcloud auth application-default loginUsing the environment variable:
export GOOGLE_APPLICATION_CREDENTIALS={path_to_cred_json}To run
molt fetchin a containerized environment (e.g., Docker), pass the required environment variables using-e. If your authentication method relies on local credential files, you may also need to volume map the host path to the appropriate location inside the container using-v. For example:docker run \ -e GOOGLE_APPLICATION_CREDENTIALS='/root/.config/gcloud/application_default_credentials.json' \ -v ~/.config/gcloud:/root/.config/gcloud \ -it \ cockroachdb/molt fetch \ --bucket-path 'gs://migration/data/cockroach' ...
Alternatively, set
--use-implicit-authto use implicit authentication. When using assume role authentication, specify the service account with--assume-role. For example:--bucket-path 'gs://migration/data/cockroach --use-implicit-auth --assume-role 'user-test@cluster-ephemeral.iam.gserviceaccount.com'Ensure the Google Cloud Storage bucket is created and accessible by authorized roles and users only.
Azure Blob Storage
Set the following environment variables in the terminal running
molt fetch:export AZURE_ACCOUNT_NAME='account' export AZURE_ACCOUNT_KEY='key'You can also speicfy client and tenant credentials as environment variables:
export AZURE_CLIENT_SECRET='secret' export AZURE_TENANT_ID='id'To run
molt fetchin a containerized environment (e.g., Docker), pass the required environment variables using-e. If your authentication method relies on local credential files, you may also need to volume map the host path to the appropriate location inside the container using-v. For example:docker run \ -e AZURE_ACCOUNT_NAME='account' \ -e AZURE_ACCOUNT_KEY='key' \ -e AZURE_CLIENT_SECRET='secret' \ -e AZURE_TENANT_ID='id' \ -v ~/.azure:/root/.azure \ -it \ cockroachdb/molt fetch \ --bucket-path 'azure-blob://migration/data/cockroach' ...
Alternatively, set
--use-implicit-authto use implicit authentication: For example:--bucket-path 'azure-blob://migration/data/cockroach' --use-implicit-authThis mode supports Azure managed identities and workload identities.
Ensure the Azure Blob Storage container is created and accessible by authorized roles and users only.