Drupal Migration Sucks

Why is it so hard to debug?

In: Drupal · Web Development · store.lathes.co.uk


At the time of writing there are still hundreds of thousands of sites running on Drupal 7, despite the fact that security support for it has officially ended. There are several likely reasons for this, though there is one in particular that deserves its own write-up.

Migrating is Hard

When faced with migrating a site from Drupal 7 to a newer version, two options present themselves:

  • Use the Migrate API like a real man 💪
  • Use the Feeds module like a sissy boy 😿

Using Feeds Instead

There are actually some strong cases for using the Feeds module:

  1. Simpler data sources - For straightforward CSV, RSS, or similar structured data imports without any need to manipulate data, Feeds can be quicker to set up.
  2. One-time imports - If you’re not migrating an active site, it can be easier to download everything as a set of CSV files (using Views Data Export for example) and manipulate the data manually before importing.
  3. Recurring imports - Feeds is good for scheduled, repeated imports from consistent sources like RSS feeds.
  4. Content creation by end-users - Feeds can be configured to allow site users to upload and import their own content within defined parameters, through a fairly a user-friendly UI. No need to write PHP or YAML.
  5. No command-line - In environments where you can’t use Drush or other CLI tools, the UI is the only option. Migrate Plus provides a UI for the Migrate API, but if you’re going down that route you’re probably not going to use it (and you probably have shell access).
  6. Learning curve - Spending the time figuring out how to use the Migrate API might not be justified, even for developers or experienced site builders.

Conversely there are some good reasons to use Migrate APi instead:

  1. Data manipulation - Migrate API offers process plugins for data manipulation that Feeds can’t match, even with Feeds Tamper.

  2. Configuration migration - Migrate API provides migrations for everything; fields, content types, users, module and site configuration. The command drush migrate:info provides a summary of all the content and config entities available across all migrations; that is to say, those provided by core, by Migrate Tools, and also your own artisanal hand-spun YAML files.

  3. Version control - Migration configurations are stored in YAML files that can be version-controlled and deployed across environments.

  4. Entity references - Better handling of entity references and maintaining relationships between content. If necessary this can include the creation of ‘stub’ entities that are popoulated in a subsequent migration.

  5. Dependency management - The ability to define migration dependencies ensures everything is created in the correct order (taxonomy vocabularies and their fields, then the terms, then the content that references them).

  6. Drush integration - It’s a lot quicker to use the CLI once you get down to the nitty gritty.

  7. Data validation - More sophisticated options for validating data during the import process, including the option to skip a rows based on specific (or empty) values, and to log a custom message when that happens.

  8. Plugin-based - The Migrate API is built in such a way that makes it relatively easy for you to extend with your own source and process plugins.

So having decided you’re a real manly man who’s capable of using the Migrate API and has too much time on their hands, you then have to get to grips with it.

Anatomy of a Migration

Drupal migrations follow the Extract, Transform, Load (ETL) paradigm, and these stages are defined in the source, process and destination keys of a migration YAML file. You will generate a YAML file for each type of entity to be imported; a migration can only have one destination plugin.

Let’s use a migration for the ‘Article’ content type as an example:

id: articles
label: 'Content - Articles'
migration_group: site_content
source:
  # One source plugin is configured here
process:
  # Several process plugins go here; at least one for every value that must be set on the destination entity
  # Each process plugin may be as simple as setting a default value or copying a value from the source
destination:
  plugin: entity:node
  default_bundle: article
migration_dependencies:
  required:
    - my_tags

Extract (source)

Drupal’s migration system makes use of source plugins that connect to and extract data from various sources. These plugins can query SQL databases, parse CSV, XML, or JSON files, connect to REST APIs etc. Core provides some plugins, and more are available. You can also write your own if you think you’re hard enough.

Transform (process)

Transformation is the proccess of getting the data into the right format for Drupal to store, and may consist of:

  • Mapping sources to destination fields
  • Mapping specific values to other values
  • Handling nested values (such as a text field’s format and value)
  • Handling multiple values

Load (destination)

The transformed data is then loaded into the database. The destination plugin handles creating new entities (nodes, taxonomy terms, fields, content types etc.) and updating existing entities.

The system maintains a “map” of migrated items, tracking which source items correspond to which destination entities, enabling updates and rollbacks.

Golden Contrib

The Migrate Tools and Migrate Plus modules are pretty much a hard requirement. You may as well install them before you start.

Migrate Tools

The Migrate Tools module extends Drupal’s core migration framework with powerful utilities that make migrations more manageable:

  1. Drush Commands: Execute and manage migrations via the command line with commands like:

    drush migrate:import my_migration
    drush migrate:status
    drush migrate:rollback my_migration
  2. Migration UI: A simple interface for viewing migration status and executing operations through the Drupal admin interface. If you’ve come this far it’s pretty much redundent; just use Drush.

  3. Migration Groups: Organize related migrations together and execute them as a unit.

  4. Detailed Messaging: Improved error handling and messaging to troubleshoot migration issues.

Migrate Plus

Migrate Plus further enhances the migration system with:

  1. Additional Source Plugins: Support for XML, JSON and SOAP sources via HTTP.

  2. Migration Groups: Configure collections of migrations via YAML files.

  3. Process Plugins: Additional transformation tools including:

    • Entity lookup
    • Entity generate
    • Conditionally skip items
    • URL handling
    • Transliteration

Migrations: Structure

Migrations all have a source key, a process key, and a destination key that correspond to the terms Extract, Transform and Load in “ETL”. They may also specify their dependencies in the migration_dependencies key.

  1. Migration Definition:
id: my_articles
label: 'Article Content'
migration_group: my_migration_group
source:
  plugin: csv
  path: 'public://migrations/articles.csv'
  header_row_count: 1
  keys:
    - id
process:
  title: title
  body/value: body
  body/format:
    plugin: default_value
    default_value: 'full_html'
  field_tags:
    plugin: entity_lookup
    source: tag_ids
    entity_type: taxonomy_term
    bundle: tags
    value_key: name
destination:
  plugin: entity:node
  default_bundle: article
migration_dependencies:
  required:
    - my_tags

Debugging Migrations is Really Hard

That is, until you know what you’re doing. There is an informative page on the topic that I didn’t find until I’d already many hours. You literally just have to google ‘debugging drupal migrations’ and it pops right up. F.

https://www.drupal.org/docs/drupal-apis/migrate-api/debugging-migrations

This is key:

To investigate data within a migration, you can print data by combining the callback plugin and var_dump():

process:
  dump_sourcevar:
    plugin: callback
    callable: var_dump
    source: sourcevar 

When a migration fails you’ll need to reset it before it can be rolled back: drush migrate:reset

Best Practices

  1. Version Control: Store migration configurations in code for consistent deployment.

  2. Testing: Test with a subset of data before running full migrations. You can limi the number of rows to process with drush migrate:import my_migration --limit=500

  3. Backup ALL your shit: If you’re using a custom plugin (or migrating to a custom entity type) you may not be able to roll back without some further work. ddev snapshot -n <snapshot_name> before you try a migration for the first time.

Resources

Official Documentation

Tutorials and Guides

Code Examples

Community Resources

Tools

Blogs and Articles

Advanced Topics

Case Studies