Git Submodules

Trip hazards and how to avoid them

In: Web Development


On the face of it, Git submodules seem like a great idea. They allow you to include a Git repository inside another Git repository. This is good for maintaining shared code across multiple applications, or for including third-party libraries in your project. However, there are some trip hazards to be aware of when using them, and the source of most of the problems is this:

Submodules point to specific commits rather than branches.

The primary advantage of git submodules is precise version control. Unlike package managers that might automatically update dependencies, submodules point to specific commits in the external repository. This ensures that your web application, AI slop generator or robot litter tray firmware maintains consistent behavior regardless of upstream changes until you explicitly choose to update.

Using Git Submodules

To add a git repository into your project as a submodule, recite the following incantations:

  • From the root of your project, run git submodule add https://github.com/littertron/disco-mode.git src/libraries/led/disco-mode to initialise the desired git repository in the src/libraries/led/disco-mode directory. The submodule configuration will be added to the .gitmodules file in the project root and that change will be staged for commit.
  • Then run git submodule init to add the new submodule to the project’s git configuration. This command reads the .gitmodules file and updates the .git/config file.
  • Finally run git submodule update to download the contents of the submodule so you can get on with coding some slick pixel LED effects for the Littertron 9000-F

You’ve been working on Disco Mode for the Littertron 9000-F for a while, and it’s time to push your changes.

  • From within src/libraries/led/disco-mode, run git add -A to stage your changes, and git commit -m "Added new chase effects" to commit them. Then run git push to push your changes to the remote repository.
  • Back in the outer project, run git add src/libraries/led/disco-mode to stage the submodule update, and git commit -m "Updated Disco Mode submodule" to commit it. Finally, run git push.

The outer project’s repository does not store the contents of the submodule. Only the commit hash of the submodule is tracked and commited to the outer project.

Potential Problems

Submodules are not automatically updated when you pull changes to the outer project

Your colleague, who is working on the Littertron 9000-F’s sensor logic, sees your changes. They run git pull and get the impression they’re up to date but they are not. Their copy of your lighting control submodule has not been updated. The result? The light sensor is improperly calibrated, and Disco Mode causes an integer overflow in the sensor reading. The 9000-F forcefully ejects its contents as a safety measure.

Team members must use git pull --recurse-submodules in the outer repository to update everything. (Or git pull followed by git submodule update --init --recursive)

Submodules track commits, not branches

You’re tasked with designing some scaled back lighting effects for an OTA update to the Littertron 9000 Mini. You switch to the appropriate branch in the outer project, run git pull --recurse-submodules and start working. But when you flash the firmware to your prototype, you find that Disco Mode starts a literal disco inferno. It turns out the battery management submodule should have been updated to the latest commit on its main branch to support your sick RGB chases.

To add a submodule that tracks a branch, use git submodule add -b <branch> <repository> <path>

Running git submodule update --remote in the outer project will then update every submodule that tracks a branch to the latest commit on that branch. Adding --merge will merge those changes into your local submodule.

Submodules can be a pain to debug

LitterCorp use a CI/CD pipeline to deploy the firmware for the Littertron 9000-F. The pipeline runs the tests, builds the firmware, and packages it for deployment. But the pipeline fails with a message about ‘invalid submodules’. You spend hours trying to figure out what’s wrong, only to discover that one submodule was cloned into place before being added as a submodule. Running git submodule add didn’t produce any errors, and the code is all up to date… But it turns out the code is actually being tracked by the outer project, not the submodule; the submodule is indeed invalid.

Always use git submodule add to add submodules to your project. If you’ve cloned a repo into place by mistake, remove it from the project and then re-add it as a submodule.

What about Git Subtrees?

Git subtrees merge the contents of an external repository directly into a subdirectory of your main repository. Unlike submodules, the external code becomes part of your repository’s history. To quote the highest-voted Stack Overflow answer on the subject:

submodule is link;

subtree is copy

Changes to subtree code are tracked within your main repository, not as references to another repository. This can make it easier to work with them, and it is still possible to push changes back to the source repository.

  • git subtree add --prefix=<path> <repository> <ref> Adds a subtree
  • git subtree pull --prefix=<path> <repository> <ref> Updates a subtree
  • git subtree push --prefix=<path> <repository> <branch> Pushes changes back to the original repository

<ref> can be a branch, tag, or commit hash.

So when would you use a subtree over a submodule? If you want to include a third-party library in your project and you intend to heavily customise it to that specific project, a subtree is a good choice. If you want to include a re-usable module that you might want to update independently of the current project, a submodule is the way to go.

TLDR

Adding submodules:

  • To add submodules to a project always use git submodule add rather than simply cloning the repo.
  • Submodules generally point to specific commits in the external repository, not branches.
  • To add a submodule that tracks a specific branch, use git submodule add -b <branch> <repository> <path>

Updating submodules:

  • To pull your codebase and any submodules at the same time, use git pull --recurse-submodules. This will discard any local changes.
  • To pull only submodules use git submodule update --remote. This will also discard any local changes.
  • To pull only submodules, keeping local changes use git submodule update --remote --merge. This command:
    • Fetches the latest commits from the submodule’s remote.
    • Merges those changes into your local submodule.
    • Updates the submodule pointer in your parent repo.