If you ask a moderately experienced git user about submodules they are likely to turn to you, look you earnestly in the eye and say: "don't use them, life is too short". Or, if they are a more experienced and helpful git user, they might explain exactly why git submodules are painful. Before telling you: "don't use them, life is too short"

And, it is true, git submodules are painful particularly if you have to interact with them manually before or after every push to a shared remote repo - there are too many moving parts that can make life hell when they get out of sync.

git submodules are ok for tracking the precise version of a few rarely updated external dependencies because the manual book keeping overhead their use normally implies is small if done only occasionally, but they are terrible for tracking the precise versions of frequently updated internal dependencies, particularly in the absence of a robust automated solution.

why bother?

Given the hassles, why bother using them at all?

It turns out that git submodules do provide quite a neat solution to the problem of tracking the configuration of multi-component releases.

To explain, consider a system comprised of multiple components where the source for each component is tracked by its own git repository. To track the configuration of a particular release of the system as a whole requires some way to track git commits of each component in the system and tracking particular commits of a collection of components is something that git submodules can do quite well - it's just the user experience that is horrible.

So, what if you could provide a user experience that didn't suck and still take advantage of git submodules for configuration management? That is what this article is about.

the approach

This section sets forth the elements of the approach.

tracking agent

The approach assumes the existence of one or more tracking agents. Each tracking agent would be responsible for monitoring the tracking branches of one or more submodules and updating the tracking branch of a superproject as the submodules change.

dedicated workspaces

The tracking agents would operate with a dedicated workspace - in particular, the tracking agent should never be run in a developer's workspace which might have untracked, unstaged or uncommitted work in it.

centralisation

Most tools in the git suite operate in a distributed fashion without reference to a central repository. Submodule tracking agent should be centralised. The main reason is that we want remove the responsibility for maintaining the superproject from individual developers and also we want to simplify the solution to eliminate the possibility of multiple agents operating on the same superproject branch at the same time.

read-only superprojects

The intent is that developers never have to manually update a superproject, making them effectively read-only from the developer's point of view. Developers might choose to use superprojects to checkout all the components of a release, but would not be required to.

submodule-only superprojects

One way to avoid the need for developers to manually update a superproject is to not store any files in the superproject. In other words, a superproject should be a pure collection of submodules and contain no other files. If there are no other files, there is no need for a developer to edit them and if a developer doesn't need to edit them, then there is no need for a developer to update to the superproject directly and so there can be no possibility of merge conflicts between a developer's changes and the tracking agent's changes.

This is really more a guideline for superproject design than a strict rule that would be checked by the submodule tracking agents.

no requirement for developer superproject checkout

If a developer doesn't need to maintain the superproject, then there is no need for the developer to checkout the superproject.

That said, when used in a readonly fashion, superprojects can still be a handy way to initially checkout a group of related submodules or periodically resync them all with their origins at the latest revision levels.


This post has described the requirements for a git submodule tracking agent. A future article will discuss the in detailed design of such an agent.