MOLT Fetch Best Practices

On this page Carat arrow pointing down

Test and validate

To verify that your connections and configuration work properly, run MOLT Fetch in a staging environment before migrating any data in production. Use a test or development environment that closely resembles production.

Configure the source database and connection

  • To prevent connections from terminating prematurely during the data export phase, set the following to high values on the source database:

    • Maximum allowed number of connections. MOLT Fetch can export data across multiple connections. The number of connections it will create is the number of shards (--export-concurrency) multiplied by the number of tables (--table-concurrency) being exported concurrently.

      Note:

      With the default numerical range sharding, only tables with primary key types of INT, FLOAT, or UUID can be sharded. PostgreSQL users can enable --use-stats-based-sharding to use statistics-based sharding for tables with primary keys of any data type. For details, refer to Table sharding.

    • Maximum lifetime of a connection.

  • If a PostgreSQL database is set as a source, ensure that idle_in_transaction_session_timeout on PostgreSQL is either disabled or set to a value longer than the duration of the data export phase. Otherwise, the connection will be prematurely terminated. To estimate the time needed to export the PostgreSQL tables, you can perform a dry run and sum the value of molt_fetch_table_export_duration_ms for all exported tables.

Optimize performance

  • To optimize data load performance, drop all non-PRIMARY KEY constraints and indexes on the target CockroachDB database before migrating:

    Warning:

    Do not drop PRIMARY KEY constraints.

    You can recreate constraints and indexes after loading the data.

  • For PostgreSQL sources using --use-stats-based-sharding, run ANALYZE on source tables before migration to ensure optimal shard distribution. This is especially important for large tables where even distribution can significantly improve export performance.

  • To prevent memory outages during READ COMMITTED data export of tables with large rows, estimate the amount of memory used to export a table:

    --row-batch-size * --export-concurrency * average size of the table rows
    

    If you are exporting more than one table at a time (i.e., --table-concurrency is set higher than 1), add the estimated memory usage for the tables with the largest row sizes. Ensure that you have sufficient memory to run molt fetch, and adjust --row-batch-size accordingly. For details on how concurrency and sharding interact, refer to Table sharding.

  • If a table in the source database is much larger than the other tables, filter and export the largest table in its own molt fetch task. Repeat this for each of the largest tables. Then export the remaining tables in another task.

  • Ensure that the machine running MOLT Fetch is large enough to handle the amount of data being migrated. Fetch performance can sometimes be limited by available resources, but should always be making progress. To identify possible resource constraints, observe the molt_fetch_rows_exported metric for decreases in the number of rows being processed. You can use the sample Grafana dashboard to view metrics. For details on optimizing export performance through sharding, refer to Table sharding.

Import and continuation handling

  • When using IMPORT INTO during the data import phase to load tables into CockroachDB, if the fetch task terminates before the import job completes, the hanging import job on the target database will keep the table offline. To make this table accessible again, manually resume or cancel the job. Then resume molt fetch using continuation, or restart the task from the beginning.

Security

Cockroach Labs strongly recommends the following security practices.

Connection security

  • To keep your database credentials out of shell history and logs, follow these best practices when specifying your source and target connection strings:

    • Avoid plaintext connection strings.
    • Provide your connection strings as environment variables. For example:

      export SOURCE="postgres://migration_user:a%2452%26@localhost:5432/migration_db?sslmode=verify-full"
      export TARGET="postgres://root@localhost:26257/defaultdb?sslmode=verify-full"
      

      Afterward, reference the environment variables in MOLT commands:

      --source $SOURCE
      --target $TARGET
      
    • If possible, use an external secrets manager to load the environment variables from stored secrets.

  • Use TLS-enabled connection strings to encrypt data in transit from MOLT to the database. When using TLS certificates, ensure certificate files are accessible to the MOLT binary on the same machine.

    For example, a PostgreSQL connection string with TLS certificates:

    icon/buttons/copy
    postgresql://migration_user@db.example.com:5432/appdb?sslmode=verify-full&sslrootcert=/etc/migration_db/certs/ca.pem&sslcert=/etc/migration_db/certs/client.crt&sslkey=/etc/migration_db/certs/client.key
    
  • URL-encode connection strings for the source database and CockroachDB so special characters in passwords are handled correctly.

    • Given a password a$52&, pass it to the molt escape-password command with single quotes:

      icon/buttons/copy
      molt escape-password --password 'a$52&'
      

      Use the encoded password in your connection string. For example:

      postgres://migration_user:a%2452%26@localhost:5432/migration_db
      
  • Remove sslmode=disable from production connection strings.

Note:

By default, insecure connections (i.e., sslmode=disable on PostgreSQL; sslmode not set on MySQL) are disallowed. When using an insecure connection, molt fetch returns an error. To override this check, you can enable the --allow-tls-mode-disable flag. Do this only when testing, or if a secure SSL/TLS connection to the source or target database is not possible.

Cloud storage security

Ensure that access control is properly configured for Amazon S3, Google Cloud Storage, or Azure Blob Storage.

Amazon S3
  • Set the following environment variables in the terminal running molt fetch:

    icon/buttons/copy
    export AWS_REGION='us-east-1'
    export AWS_SECRET_ACCESS_KEY='key'
    export AWS_ACCESS_KEY_ID='id'
    
    • To run molt fetch in a containerized environment (e.g., Docker), pass the required environment variables using -e. If your authentication method relies on local credential files, you may also need to volume map the host path to the appropriate location inside the container using -v. For example:

      docker run \
        -e AWS_ACCESS_KEY_ID='your-access-key' \
        -e AWS_SECRET_ACCESS_KEY='your-secret-key' \
        -v ~/.aws:/root/.aws \
        -it \
        cockroachdb/molt fetch \
        --bucket-path 's3://migration/data/cockroach' ...
      
  • Alternatively, set --use-implicit-auth to use implicit authentication. When using assume role authentication, specify the service account with --assume-role. For example:

    --bucket-path 's3://migration/data/cockroach'
    --assume-role 'arn:aws:iam::123456789012:role/MyMigrationRole'
    --use-implicit-auth
    
  • Set --import-region to specify an AWS_REGION (e.g., --import-region 'ap-south-1').

  • Ensure the S3 bucket is created and accessible by authorized roles and users only.

Google Cloud Storage
  • Authenticate your local environment with Application Default Credentials:

    Using gcloud:

    icon/buttons/copy
    gcloud init
    gcloud auth application-default login
    

    Using the environment variable:

    icon/buttons/copy
    export GOOGLE_APPLICATION_CREDENTIALS={path_to_cred_json}
    
    • To run molt fetch in a containerized environment (e.g., Docker), pass the required environment variables using -e. If your authentication method relies on local credential files, you may also need to volume map the host path to the appropriate location inside the container using -v. For example:

      docker run \
        -e GOOGLE_APPLICATION_CREDENTIALS='/root/.config/gcloud/application_default_credentials.json' \
        -v ~/.config/gcloud:/root/.config/gcloud \
        -it \
        cockroachdb/molt fetch \
        --bucket-path 'gs://migration/data/cockroach' ...
      
  • Alternatively, set --use-implicit-auth to use implicit authentication. When using assume role authentication, specify the service account with --assume-role. For example:

    --bucket-path 'gs://migration/data/cockroach
    --use-implicit-auth
    --assume-role 'user-test@cluster-ephemeral.iam.gserviceaccount.com'
    
  • Ensure the Google Cloud Storage bucket is created and accessible by authorized roles and users only.

Azure Blob Storage
  • Set the following environment variables in the terminal running molt fetch:

    icon/buttons/copy
    export AZURE_ACCOUNT_NAME='account'
    export AZURE_ACCOUNT_KEY='key'
    

    You can also speicfy client and tenant credentials as environment variables:

    icon/buttons/copy
    export AZURE_CLIENT_SECRET='secret'
    export AZURE_TENANT_ID='id'
    
    • To run molt fetch in a containerized environment (e.g., Docker), pass the required environment variables using -e. If your authentication method relies on local credential files, you may also need to volume map the host path to the appropriate location inside the container using -v. For example:

      docker run \
        -e AZURE_ACCOUNT_NAME='account' \
        -e AZURE_ACCOUNT_KEY='key' \
        -e AZURE_CLIENT_SECRET='secret' \
        -e AZURE_TENANT_ID='id' \
        -v ~/.azure:/root/.azure \
        -it \
        cockroachdb/molt fetch \
        --bucket-path 'azure-blob://migration/data/cockroach' ...
      
  • Alternatively, set --use-implicit-auth to use implicit authentication: For example:

    --bucket-path 'azure-blob://migration/data/cockroach'
    --use-implicit-auth
    

    This mode supports Azure managed identities and workload identities.

  • Ensure the Azure Blob Storage container is created and accessible by authorized roles and users only.

See also

×