2020-09-04
|~13 min read
|2494 words
I was having a discussion with some friends recently when they mentioned that were using Git’s Submodules to manage the content for their blog. My site has historically had a relatively simple architecture, so I was intrigued. Digging in, I found that submodules solve some problems, create others, and are a great alternative depending on the goals of the project.
In this post I’ll cover:
Before getting too much further, it’s worth discussing what submodules are and how they’re useful. Git promotes submodules as a solution to the situation where a project has a dependency on another (sub) project for which changes should be tracked separately, but which need to be usable jointly:
Here’s an example. Suppose you’re developing a website and creating Atom feeds. Instead of writing your own Atom-generating code, you decide to use a library. You’re likely to have to either include this code from a shared library like a CPAN install or Ruby gem, or copy the source code into your own project tree. The issue with including the library is that it’s difficult to customize the library in any way and often more difficult to deploy it, because you need to make sure every client has that library available. The issue with copying the code into your own project is that any custom changes you make are difficult to merge when upstream changes become available.
Git addresses this issue using submodules. Submodules allow you to keep a Git repository as a subdirectory of another Git repository. This lets you clone another repository into your project and keep your commits separate.
So, there is nothing special about submodules until the relationship to the other project is established. That is, before declaring submodules as such, it is a project for which Git is used for version control. It’s the act of defining the module (i.e., project) as a dependency that makes it a submodule.
I’ll be getting into the how in a bit, but before we do, let’s discuss the why for submodules. Given that there’s nothing special about a submodule outside of the relationship created between two projects and the tooling available for exactly that purpose these days - why would we use submodules?
If the projects are the same language, you could use a package manager. For example npm
for Node projects, nuget
for .Net, and pip
for Python. Maybe the project needs to be private. Well, most package registries offer private repositories these days - though this is often a paid offering. (The fact that Github makes private repositories free mean that if they can be leveraged, as they can via submodules, they can offer a more economic alternative.)
So, what’s a real life use case for submodules in 2020?
As I mentioned in the introduction, my friends are using submodules for their blogs. Using Git to track both the blog itself (the markdown, configuration, etc.) and the content. However, they’ve found submodules a useful solution to address a specific problem: they want to share the source code of their websites publicly (just as I do) without exposing their content for whole cloth copying. Imagine someone forking a website because they like the design and suddenly having every blog post ever written on that blog.
Submodules make the right thing easy by allowing the separation of the content from the markup - promoting openness and inspiration without facilitating copying of ideas.
As Joshua Wehner wrote for the Github Blog:
Before you add a repository as a submodule, first check to see if you have a better alternative available. Git submodules work well enough for simple cases, but these days there are often better tools available for managing dependencies than what Git submodules can offer. Modern languages like Go have friendly, Git-aware dependency management systems built-in from the start. Others, like Ruby’s rubygems, Node.js’ npm, or Cocoa’s CocoaPods and Carthage, have been added by the programming community. Even front-end developers have tools like Bower to manage libraries and frameworks for client-side JavaScript and CSS.
That was in 2016. A lot’s changed in just four years too and the reason to add the complexity of submodules is less and less. The use cases for submodules seems to be continually narrowing. Still, I find the separation of content from a website compelling (it’s one reason why CMSs are so popular and valuable as tools). I’m just struggling to come up with many others.
While there are fewer reasons to use submodules today than ever, they can still be a valuable tool and understanding how to use them is the first step to taking advantage.
When thinking about submodule use - I find it useful to think about two different steps.
The first step to using a submodule within a project is adding it. To add a submodule we will need two things:
superproject
git@github.com:stephencweiss/my-submodule.git
Once access is acquired we are ready to add the submodule. From the root of your project run submodule add
. For example:
git submodule add git@github.com:stephencweiss/my-submodule.git
Cloning into '/Users/stephen/superproject/my-submodule'...
remote: Enumerating objects: 1240, done.
remote: Counting objects: 100% (1240/1240), done.
remote: Compressing objects: 100% (950/950), done.
remote: Total 3972 (delta 416), reused 598 (delta 284), pack-reused 2732
Receiving objects: 100% (3972/3972), 77.71 MiB | 7.76 MiB/s, done.
Resolving deltas: 100% (1423/1423), done.
The log shows the cloning of the master
branch (the default) of my-submodule
into the root of superproject
.1
The presence of the submodule however is only the first half of the process. We need to commit these changes to git in order to track the submodule. This part confused me initially because what git shows as a diff doesn’t look like most diffs. There aren’t lots of files to track. Instead, there’s one .diff
file and a .gitmodules
file:
git status
On branch master
Your branch is up to date with 'origin/master'.
Changes to be committed:
(use "git restore --staged <file>..." to unstage)
new file: .gitmodules
new file: my-submodule
The .gitmodules
file is used by git to track its modules:2
[submodule "my-submodule"]
path = my-submodule
url = git@github.com:stephencweiss/my-submodule.git
The new my-submodule
file is much more interesting (to me):
diff --git a/my-submodule b/my-submodule
new file mode 160000
index 0000000..b2a289a
--- /dev/null
+++ b/my-submodule
@@ -0,0 +1 @@
+Subproject commit b2a289ae352630a8ab01d2dfe9c42f981fd33908
All git sees is a commit hash for the subproject. The files appear in the project, but are tracked only via refs - which git manages and stores in a directory .git/modules
within the main project for each submodule.
Once these files are committed, the project can be considered to be tracking the submodule. Any changes made to the (submodule) project can be pulled into this main project for consumption.[3](#footnotes
Before moving on to talk about how to consume these submodules, a few quick points:
git submodules add
APILet’s say that while tracking the stable
branch was good - we now want to track super-stable
(or perhaps more likely, we move from the default to something else, like stable
).
You have options at this point. You can change it for everyone or just for yourself with a git config
command:
git config -f .gitmodules submodule.a/new/directory.branch super-stable
In most cases it makes sense to track it for everyone, so that’s what I’ve shown, though you could update just yours by dropping the -f .gitmodules
.4
If we look at the .gitmodules
now, we’ll see the change has been made:
[submodule "a/new/directory"]
path = new/directory
url = git@github.com:stephencweiss/my-submodule.git
+ branch = super-stable
Now that we know how to set up submodules, let’s discuss how to consume them in order to use them effectively. But first, a word of caution…
So, there is nothing special about submodules until the relationship to the other project is established. That is, before declaring submodules as such, it is a project for which Git is used for version control. It’s the act of defining the module (i.e., project) as a dependency that makes it a submodule.
As noted above, there’s nothing special about submodules. Their defining feature is the relationship of treating one project tracked by Git as a dependency of another. This feature should be front of mind whenever interacting with a submodule in a project.
Just because the files for the submodule are present in the project does not mean that they should be edited locally. Consider a submodule just like any other dependency (e.g., a package installed via npm
, nuget
, or pip
). One of the benefits of Git’s submodules is that they’re platform agnostic, however, it is still creating a dependency.
If changes are made locally, it’s akin to a monkey-patch. A change in the source code that will be erased with any update to the package.
The simplest rule of thumb is to avoid this and instead make your changes directly within the submodule project itself.
At this point, you or a teammate have added a submodule to the project and you’re ready to clone the project to a new machine.
To ensure that the submodule and its contents are cloned alongside the main project there are multiple approaches: an easy way and a hard way.5
The easy way is to focus on the git clone
command. The standard git clone
covers most use cases by default:
git clone <repository>
In this case, however, we have to ensure that the submodules come with the clone and while the default will bring down the directories, it will not fetch the contents for the submodules.
The solution, as of v2.14, is to add the --recurse-submodules
option to the git clone
command:
git clone --recurse-submodules <repository>
If remembering this is too much, or you’re okay adding an additional check on all cloning commands, with v2.15, Git added a configuration setting to always recurse submodules (aka fetch the contents as well as the directory).
To turn this on, modify the .gitconfig
:
git config --global submodule.recurse true
git config --list
[user]
name = Stephen
#...
[submodule]
recurse = true
After the initial clone of the submodules, keeping them in sync means that every time a new commit has been made to the submodule, it needs to be pulled (just like any other dependency that’s managed elsewhere).
To pull in the HEAD
of the tracked branch, use the following command:
git submodule update --remote
If you cloned a project and didn’t use the --recurse-submodules
flag and didn’t have recurse set to true in your global .gitconfig
, then the project will have the link, but the data won’t be present.
In this case, you have two choices:
Delete the project and reclone it with either of those options, or
Initialize the submodule with an update call:
git submodule update --recursive --init
Sometimes you may want to completely remove the submodule, just like you might remove any other dependency. However, because the submodule is tracked by git, there are a few extra steps you need to take.
This Stack Overflow discusses the topics and the community approved the following approach:
.gitmodules
file..gitmodules
changes:
git add .gitmodules
.git/config
.git rm --cached path_to_submodule
(no trailing slash)..git
directory:
rm -rf .git/modules/path_to_submodule
"
rm -rf path_to_submodule
Git’s submodules has been one of the more challenging Git concepts for me to wrap my head around. After many stumbles, I’m now at a place where I feel comfortable that if I found a use case for using submodules, for example separating the content of my site from the site itself, I would be able to! More than that, however, the time spent experimenting with them was enjoyable and educational - which means to me that it was well spent!
1 The path is slightly more nuanced than this. Per the manual:
The optional argument <path> is the relative location for the cloned submodule to exist in the superproject. If <path> is not given, the canonical part of the source repository is
used ("repo" for "/path/to/repo.git" and "foo" for "host.xz:foo/.git"). If <path> exists and is already a valid Git repository, then it is staged for commit without cloning. The
<path> is also used as the submodule's logical name in its configuration entries unless --name is used to specify a logical name.
To expand on this, we can track alternative branches through the use of the -b
(branch) flag. We can also direct a non-root destination through an optional second argument:
git submodule add -b stable git@github.com:stephencweiss/my-submodule.git a/brand/new/directory
In this example, git will track my-submodule
in superproject/a/brand/new/directory
.
2 If we were tracking the stable branch in a/new/directory
as described in Footnote 1, .gitmodules
would have an additional attribute:
[submodule "a/new/directory"]
path = new/directory
url = git@github.com:stephencweiss/my-submodule.git
branch = stable
3 More on this in a moment, but the short answer is treat submodules as dependencies.
4 This example comes from the Git documentation, but the reason that it works is that the -f
flag is short for the --file
option which allows specifying a specific file rather than using the one specified in GIT_CONFIG
:
[...]
-f config-file, --file config-file
Use the given config file instead of the one specified by GIT_CONFIG
5 The easy one was added later which is why the hard way exists, however since it does, I’m going to focus on it.
Hi there and thanks for reading! My name's Stephen. I live in Chicago with my wife, Kate, and dog, Finn. Want more? See about and get in touch!