When I talked about submodules vs. subtrees before, one of the things I listed as a benefit for subtrees was the speed of the initial clone.  I’d written a few scripts to help me benchmark the two, and with a little extra time that I have this weekend, I thought I’d share the data.

I generated 2, 4, 6, 8, and 10 plugin repositories for both submodules and subtrees and cloned each one ten times over both a local and a remote connection.  Here is the result:

As you can see, submodules take longer for each one you add and subtrees stay pretty much the same.  Here’s the R code to generate the above graph:

#!/usr/bin/env Rscript

library(ggplot2) # load up the ggplot2 library

# load up the data from the google csv export
smst <- read.csv('data.csv')

# add names to the data
names(smst) <- c('type', 'count', 'time')

# force count to be a factor instead of a continuous variable
smst$count <- factor(smst$count)

# calculate the mean for each type/count group
smst_mean <- aggregate(list(time=smst$time), list(type=smst$type, count=smst$count), mean)

png(filename = "submodule_vs_subtree.png", width=700, height=700)

ggplot(smst_mean, aes(x=count, y=time, group=type, color=type)) + geom_line(size = 2) + ylab("time") + xlab("plugin count") + opts(title = "Submodule vs. Subtree checkout times")

The generation and benchmarking scripts as well as the reported data and code are in my submodule_vs_subtree repo on github.