git submodules

2020-09-04

~13 min read

2494 words

I was having a discussion with some friends recently when they mentioned that were using Git’s Submodules to manage the content for their blog. My site has historically had a relatively simple architecture, so I was intrigued. Digging in, I found that submodules solve some problems, create others, and are a great alternative depending on the goals of the project.

In this post I’ll cover:

What Are Submodules
When To Use Submodules
Using Submodules
- Setting Up Submodules
  - Change The Submodule’s Tracked Branch
- Consuming Submodules
Wrapping Up
Further Reading
Footnotes

What Are Submodules

Before getting too much further, it’s worth discussing what submodules are and how they’re useful. Git promotes submodules as a solution to the situation where a project has a dependency on another (sub) project for which changes should be tracked separately, but which need to be usable jointly:

Here’s an example. Suppose you’re developing a website and creating Atom feeds. Instead of writing your own Atom-generating code, you decide to use a library. You’re likely to have to either include this code from a shared library like a CPAN install or Ruby gem, or copy the source code into your own project tree. The issue with including the library is that it’s difficult to customize the library in any way and often more difficult to deploy it, because you need to make sure every client has that library available. The issue with copying the code into your own project is that any custom changes you make are difficult to merge when upstream changes become available.

Git addresses this issue using submodules. Submodules allow you to keep a Git repository as a subdirectory of another Git repository. This lets you clone another repository into your project and keep your commits separate.

So, there is nothing special about submodules until the relationship to the other project is established. That is, before declaring submodules as such, it is a project for which Git is used for version control. It’s the act of defining the module (i.e., project) as a dependency that makes it a submodule.

When To Use Submodules

I’ll be getting into the how in a bit, but before we do, let’s discuss the why for submodules. Given that there’s nothing special about a submodule outside of the relationship created between two projects and the tooling available for exactly that purpose these days - why would we use submodules?

If the projects are the same language, you could use a package manager. For example npm for Node projects, nuget for .Net, and pip for Python. Maybe the project needs to be private. Well, most package registries offer private repositories these days - though this is often a paid offering. (The fact that Github makes private repositories free mean that if they can be leveraged, as they can via submodules, they can offer a more economic alternative.)

So, what’s a real life use case for submodules in 2020?

As I mentioned in the introduction, my friends are using submodules for their blogs. Using Git to track both the blog itself (the markdown, configuration, etc.) and the content. However, they’ve found submodules a useful solution to address a specific problem: they want to share the source code of their websites publicly (just as I do) without exposing their content for whole cloth copying. Imagine someone forking a website because they like the design and suddenly having every blog post ever written on that blog.

Submodules make the right thing easy by allowing the separation of the content from the markup - promoting openness and inspiration without facilitating copying of ideas.

As Joshua Wehner wrote for the Github Blog:

Before you add a repository as a submodule, first check to see if you have a better alternative available. Git submodules work well enough for simple cases, but these days there are often better tools available for managing dependencies than what Git submodules can offer. Modern languages like Go have friendly, Git-aware dependency management systems built-in from the start. Others, like Ruby’s rubygems, Node.js’ npm, or Cocoa’s CocoaPods and Carthage, have been added by the programming community. Even front-end developers have tools like Bower to manage libraries and frameworks for client-side JavaScript and CSS.

That was in 2016. A lot’s changed in just four years too and the reason to add the complexity of submodules is less and less. The use cases for submodules seems to be continually narrowing. Still, I find the separation of content from a website compelling (it’s one reason why CMSs are so popular and valuable as tools). I’m just struggling to come up with many others.

Using Submodules

While there are fewer reasons to use submodules today than ever, they can still be a valuable tool and understanding how to use them is the first step to taking advantage.

When thinking about submodule use - I find it useful to think about two different steps.

Initial setup step. This is a set of one-time tasks related to tracking a submodule within a project.
Continued consumption of the submodule within a project - pulling in updates to the submodule, collaborating with peers, etc.

Setting Up Submodules

The first step to using a submodule within a project is adding it. To add a submodule we will need two things:

Access to the project that will be tracking the submodule, e.g., superproject
Access to the remote url for the submodule itself, e.g., git@github.com:stephencweiss/my-submodule.git

Once access is acquired we are ready to add the submodule. From the root of your project run submodule add. For example:

git submodule add git@github.com:stephencweiss/my-submodule.git

Cloning into '/Users/stephen/superproject/my-submodule'...
remote: Enumerating objects: 1240, done.
remote: Counting objects: 100% (1240/1240), done.
remote: Compressing objects: 100% (950/950), done.
remote: Total 3972 (delta 416), reused 598 (delta 284), pack-reused 2732
Receiving objects: 100% (3972/3972), 77.71 MiB | 7.76 MiB/s, done.
Resolving deltas: 100% (1423/1423), done.

The log shows the cloning of the master branch (the default) of my-submodule into the root of superproject.¹

The presence of the submodule however is only the first half of the process. We need to commit these changes to git in order to track the submodule. This part confused me initially because what git shows as a diff doesn’t look like most diffs. There aren’t lots of files to track. Instead, there’s one .diff file and a .gitmodules file:

git status

On branch master
Your branch is up to date with 'origin/master'.

Changes to be committed:
  (use "git restore --staged <file>..." to unstage)
        new file:   .gitmodules
        new file:   my-submodule

The .gitmodules file is used by git to track its modules:²

[submodule "my-submodule"]
    path = my-submodule
    url = git@github.com:stephencweiss/my-submodule.git

The new my-submodule file is much more interesting (to me):

my-submodule.diff

diff --git a/my-submodule b/my-submodule
new file mode 160000
index 0000000..b2a289a
--- /dev/null
+++ b/my-submodule
@@ -0,0 +1 @@
+Subproject commit b2a289ae352630a8ab01d2dfe9c42f981fd33908

All git sees is a commit hash for the subproject. The files appear in the project, but are tracked only via refs - which git manages and stores in a directory .git/modules within the main project for each submodule.

Once these files are committed, the project can be considered to be tracking the submodule. Any changes made to the (submodule) project can be pulled into this main project for consumption.^{[3](#footnotes}

Before moving on to talk about how to consume these submodules, a few quick points:

Change The Submodule’s Tracked Branch

Let’s say that while tracking the stable branch was good - we now want to track super-stable (or perhaps more likely, we move from the default to something else, like stable).

You have options at this point. You can change it for everyone or just for yourself with a git config command:

git config -f .gitmodules submodule.a/new/directory.branch super-stable

In most cases it makes sense to track it for everyone, so that’s what I’ve shown, though you could update just yours by dropping the -f .gitmodules.⁴

If we look at the .gitmodules now, we’ll see the change has been made:

.gitmodules

 [submodule "a/new/directory"]
     path = new/directory
     url = git@github.com:stephencweiss/my-submodule.git
+    branch = super-stable

Consuming Submodules

Now that we know how to set up submodules, let’s discuss how to consume them in order to use them effectively. But first, a word of caution…

Treat Submodules As Dependencies

So, there is nothing special about submodules until the relationship to the other project is established. That is, before declaring submodules as such, it is a project for which Git is used for version control. It’s the act of defining the module (i.e., project) as a dependency that makes it a submodule.

As noted above, there’s nothing special about submodules. Their defining feature is the relationship of treating one project tracked by Git as a dependency of another. This feature should be front of mind whenever interacting with a submodule in a project.

Just because the files for the submodule are present in the project does not mean that they should be edited locally. Consider a submodule just like any other dependency (e.g., a package installed via npm, nuget, or pip). One of the benefits of Git’s submodules is that they’re platform agnostic, however, it is still creating a dependency.

If changes are made locally, it’s akin to a monkey-patch. A change in the source code that will be erased with any update to the package.

The simplest rule of thumb is to avoid this and instead make your changes directly within the submodule project itself.

Cloning Projects With Submodules

At this point, you or a teammate have added a submodule to the project and you’re ready to clone the project to a new machine.

To ensure that the submodule and its contents are cloned alongside the main project there are multiple approaches: an easy way and a hard way.⁵

The easy way is to focus on the git clone command. The standard git clone covers most use cases by default:

git clone <repository>

In this case, however, we have to ensure that the submodules come with the clone and while the default will bring down the directories, it will not fetch the contents for the submodules.

The solution, as of v2.14, is to add the --recurse-submodules option to the git clone command:

git clone --recurse-submodules <repository>

If remembering this is too much, or you’re okay adding an additional check on all cloning commands, with v2.15, Git added a configuration setting to always recurse submodules (aka fetch the contents as well as the directory).

To turn this on, modify the .gitconfig:

git config --global submodule.recurse true
git config --list

$HOME/.gitconfig

[user]
  name = Stephen
#...
[submodule]
        recurse = true

Updating Submodules

After the initial clone of the submodules, keeping them in sync means that every time a new commit has been made to the submodule, it needs to be pulled (just like any other dependency that’s managed elsewhere).

To pull in the HEAD of the tracked branch, use the following command:

git submodule update --remote

Updating Cloned Submodules That Were Not Initialized

If you cloned a project and didn’t use the --recurse-submodules flag and didn’t have recurse set to true in your global .gitconfig, then the project will have the link, but the data won’t be present.

In this case, you have two choices:

Delete the project and reclone it with either of those options, or
Initialize the submodule with an update call:
```
git submodule update --recursive --init
```

Removing Submodules

Sometimes you may want to completely remove the submodule, just like you might remove any other dependency. However, because the submodule is tracked by git, there are a few extra steps you need to take.

This Stack Overflow discusses the topics and the community approved the following approach:

Delete the relevant section from the .gitmodules file.
Stage the .gitmodules changes: git add .gitmodules
Delete the relevant section from .git/config.
Remove the submodule files from the working tree and index: git rm --cached path_to_submodule (no trailing slash).
Remove the submodule’s .git directory: rm -rf .git/modules/path_to_submodule
Commit the changes: "
Delete the now untracked submodule files: rm -rf path_to_submodule

Wrapping Up

Git’s submodules has been one of the more challenging Git concepts for me to wrap my head around. After many stumbles, I’m now at a place where I feel comfortable that if I found a use case for using submodules, for example separating the content of my site from the site itself, I would be able to! More than that, however, the time spent experimenting with them was enjoyable and educational - which means to me that it was well spent!

Footnotes

¹ The path is slightly more nuanced than this. Per the manual:

submodules-manual

The optional argument <path> is the relative location for the cloned submodule to exist in the superproject. If <path> is not given, the canonical part of the source repository is
    used ("repo" for "/path/to/repo.git" and "foo" for "host.xz:foo/.git"). If <path> exists and is already a valid Git repository, then it is staged for commit without cloning. The
    <path> is also used as the submodule's logical name in its configuration entries unless --name is used to specify a logical name.

To expand on this, we can track alternative branches through the use of the -b (branch) flag. We can also direct a non-root destination through an optional second argument:

Track\

git submodule add -b stable git@github.com:stephencweiss/my-submodule.git a/brand/new/directory

In this example, git will track my-submodule in superproject/a/brand/new/directory.

² If we were tracking the stable branch in a/new/directory as described in Footnote 1, .gitmodules would have an additional attribute: