endot

eschew obfuscation (and espouse elucidation)

Introducing Talk Notes

In the course of my work and my online reading and research, I often come across videos of talks that I want to watch. I rarely take the time to watch those videos, mostly because of the time commitment; I usually only have a few minutes to spare.

Lately, I’ve done something to change that. I’m taking a little bit of time out of my Friday schedule each week to watch a talk that looks interesting. I also try and focus on the talk. Rather than checking my email or chatting while the talk is playing, I take notes, sometimes including screenshots of important slides.

Over the course of the past month, I’ve had some success with this strategy, and I was able to watch three talks. Here are links to my notes:

  • Using Datomic With Riak - I picked this talk because we’ve used a bit of Riak at work and a buddy of mine keeps raving about Datomic. This talk is actually a great overview of the philosophy and design behind Datomic.
  • Raft - the Understandable Distributed Protocol - CoreOS’s etcd has been getting some mention lately, and Raft is the consensus algorithm used to keep all of its data consistent. At the end of watching this talk, I found another one (by one of the Raft authors), and it balanced the practicality of the first with some more of the theory.
  • React - Rethinking Best Practices - The functional programming paradigm is gathering steam, and Facebook’s React JavaScript library is a fascinating take on building modern web UIs in a functional manner.

I really enjoyed the process of taking notes in this way, and I hope to continue this as the year progresses.

Oh, and if you know of a good talk, please let me know on twitter.

Using Docker to Generate My Octopress Blog

When I originally set up Octopress, I set it up on my Mac laptop using rvm, as recommended at the time. It worked very well for me until just a few minutes after my last post, when I decided to sync with the upstream changes.

After merging in the changes, I tried to generate my blog again, just to make sure everything worked. Well, it didn’t, and things went downhill from there. The rake generate command failed because a new gem was required. So, I ran bundle install to get the latest gems. That failed when a gem required ruby 1.9.3. Then installing ruby 1.9.3 failed in rvm because I needed a newer version of rvm. After banging on that problem for a few minutes, I decided to take a break and come back to the problem later.

Docker to the rescue

Fast forward a few weeks, and I came up with a better idea. I decided to dockerize Octopress. This keeps all the dependencies sanely bottled up in an image that I can run like a command.

Here is the code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
FROM ubuntu:12.10
MAINTAINER  Nate Jones <nate@endot.org>

# instal system dependencies and ruby
RUN apt-get update
RUN apt-get install git ruby1.9.3 build-essential language-pack-en python python-dev -y

# make sure we're working in UTF8
ENV LC_ALL en_US.utf8

# add the current blog source
ADD . /o
WORKDIR /o

# install octopress dependencies
RUN gem install bundler
RUN bundle install

# set up user so that host files have correct ownership
RUN addgroup --gid 1000 blog
RUN adduser --uid 1000 --gid 1000 blog
RUN chown -R blog.blog /o
USER blog

# base command
ENTRYPOINT ["rake"]

How to use it

To use this Dockerfile, I put it at the root of my blog source and ran this command:

1
$ docker build -t ndj/octodock .

Then, since rake is set as the entry point, I can run the image as if it were a command. I use the -v switch to overlay the current blog source over the one cached in the image and -rm switch to throw away the container when it’s done.

1
2
3
4
5
6
7
8
$ docker run -rm -v `pwd`:/o ndj/octodock generate
## Generating Site with Jekyll
   remove .sass-cache/
   remove source/stylesheets/screen.css
   create source/stylesheets/screen.css
Configuration from /o/_config.yml
Building site: source -> public
Successfully generated site: source -> public

A few notes

  • I had to force the UTF8 locale in order to get ruby to stop complaining about non-ascii characters in the blog entries.
  • I add a user called blog with the same UID/GID as my system user, so that any commands that generate files aren’t owned by root. I look forward to proper user namespaces so that I won’t have to do this.
  • Deploying the blog doesn’t use my SSH key, as the ‘blog’ user in the image is doing the rsync, not my host system user. I’m ok with typing my password in or just rsync’ing the data directly.

Docker is a great piece of technology, and I keep finding new uses for it.

Enjoy.

Git-annex Tips

Last time I posted about git-annex, I introduced it and described the basics of my set up. Over the past year, I’ve added quite a bit of data to my main git-annex. It manages just over 100G of data for me across 9 repositories. Here’s a few bits of information that may be useful to others considering git-annex (or who are already knee deep in).

Archive, not backup

The website for git-annex explicitly states that it is not a backup system. An alternate description, that’s more appropriate, is that it’s part of an archival system. An archival system is somewhat concerned with backups of data, but it also deals with cataloging and retrieval.

I imagine that it’s a library system (books, not code) with the ability to do instantaneous inter-library loans. I have one repository (by the name of ‘silo’) that contains copies of all my data. I then have linked repositories on each computer that I use regularly that have little or no data in them, just git-annex style symlinks. If I find that I need something from the main repository on one of those computers, I can query where that file is with git annex whereis:

1
2
3
4
5
6
$ git annex whereis media/pictures/2002-02-08-olympics.tgz
whereis media/pictures/2002-02-08-olympics.tgz (4 copies)
        8314baa2-4193-8d77-bb7f-489bd73e7db4 -- calvin_dr
        8b22886e-14f2-98f0-31ec-6770b0a08f22 -- silo
        f8ec3d60-47bf-a392-4739-b39dd609d554 -- hobbes_dr
ok

(I actually have three full copies of my data, in the *_dr repositories, but that’s a story for another day. Suffice it to say that calvin_dr and hobbes_dr are two identical external drives.)

I can retrieve the contents with git annex get. git-annex is smart enough to know that the silo remote is over a network connection and the ‘calvin_dr’ is local, so it copies the data from there:

1
2
3
4
5
6
7
8
9
$ git annex get  media/pictures/2002-02-08-olympics.tgz
get media/pictures/2002-02-08-olympics.tgz (from calvin_dr...)
SHA256E-s48439263--67c0de0e883c5d5d62a615bb97dce624370127e5873ae22770b200889367ae1c.tgz
    48439263 100%   25.10MB/s    0:00:01 (xfer#1, to-check=0/1)

sent 48445343 bytes  received 42 bytes  19378154.00 bytes/sec
total size is 48439263  speedup is 1.00
ok
(Recording state in git...)

Then, running git annex whereis shows the file contents are local as well:

1
2
3
4
5
6
7
$ git annex whereis media/pictures/2002-02-08-olympics.tgz
whereis media/pictures/2002-02-08-olympics.tgz (5 copies)
    8314baa2-4193-8d77-bb7f-489bd73e7db4 -- calvin_dr
    8b22886e-14f2-98f0-31ec-6770b0a08f22 -- silo
    f8ec3d60-47bf-a392-4739-b39dd609d554 -- hobbes_dr
    ae7e4cde-0023-1f1f-b1e2-7efd2954ec01 -- here (home_laptop)
ok

And I can view the contents of the file like normal:

1
2
3
4
5
$ tar -tzf media/pictures/2002-02-08-olympics.tgz | head
2002-02-08-olympics/
2002-02-08-olympics/p2030001.jpg
2002-02-08-olympics/p2030002.jpg
...

Then, when I’m done, I can just git annex drop the file to remove the local copy of the data. git-annex, in good form, checks to make sure that there’s another copy before deleting it.

1
2
3
$ git annex drop media/pictures/2002-02-08-olympics.tgz
drop media/pictures/2002-02-08-olympics.tgz ok
(Recording state in git...)

All along the way, git-annex is tracking which repositories have each file, making it easy to find what I want. This sort of quick access and query-ability means that I know where my data is and I can access it when I need it.

Transporting large files

My work laptop used to be my only laptop, and so it had a number of my personal files, mostly pictures. I’ve transfered most of those off of that system, but every once in a while, I come across some personal data that I need to transfer to my home repository.

I usually add it to the local git-annex repository on my work laptop and then use git annex move to move it to my home server. However, if it’s a significant amount of data and I don’t feel like waiting for the long transfer over my slow DSL line, I can copy the data to my external drive at work and then copy it off when I get home. Doing this manually can get tedious if there are more than a few files, but git-annex makes it a cinch. First, I can query what files are not on my home server and then copy those to the calvin_dr drive.

1
2
3
work-laptop$ git annex add huge-file1.tgz huge-file2.tgz huge-file3.tgz
work-laptop$ git annex sync
work-laptop$ git annex copy --not --in silo --to calvin_dr

Then, when I get home, I attach the drive to my personal laptop and run git annex copy to copy the files to the server:

1
personal-laptop$ git annex copy --to silo --not --in silo

Detecting duplicates

Many of my backups are the “snapshot” style, where I rsync’d a tree of files to another drive or server in an attempt to make sure that data was safe. The net effect of this strategy is that I have several mostly-identical backups of the same data. So, when I find a new copy of data that I’ve previously added to my git-annex system, I don’t know if I can safely delete it just based on the top level directory name.

For example, if I discover a tree of pictures that are organized by date and event:

1
2
3
4
5
$ find pictures -type d
pictures
pictures/2002-02-08-olympics
pictures/2002-04-20-tahoe
pictures/2004-11-18-la-zoo

And, checking in my git-annex repo, I can see that there are three files that correspond to those directories:

1
2
3
4
$ find backup/pictures -type l
backup/pictures/2002-02-08-olympics.tgz
backup/pictures/2002-04-20-tahoe.tgz
backup/pictures/2004-11-18-la-zoo.tgz

I can probably remove the found files, but I might have modified the pictures in this set and I’d like to know before I toss them. After running into this scenario a few times, I wrote a little utility called archdiff that I can use to get an overview of the differences between two archives (or directories). It’s just a fancy wrapper around diff -r --brief that automatically handles unpacking any archives found. For example:

1
2
$ archdiff 2002-04-20-tahoe/ ~/backup/pictures/2002-04-20-tahoe.tgz
$ 

Since there was no output, the directory has the same contents as the archive and can be safely deleted. Here’s another example:

1
2
3
$ archdiff 2002-02-08-olympics/ ~/backup/pictures/2002-02-08-olympics.tgz
Files 2002-02-08-olympics/p2030001.jpg and 2002-02-08-olympics.tgz-_RhD/2002-02-08-olympics/p2030001.jpg differ
$ 

One of the files in this directory has modifications, so I can now take the time to look at the two files and see if I want to keep it or not.

Archdiff behaves like a good UNIX program and its exit code reflects whether or not differences were found, so it’s possible to script the checking of multiple directories. Here’s an example script that would check the above three directories:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
#!/bin/bash

cd ~/backup/pictures

for dir in ~/pictures/*; do
    basedir=$(basename $dir)
    echo "checking $dir"

    # retrieve the file from another git-annex repo
    git annex get $basedir.tgz

    if archdiff $dir $basedir.tgz; then
        echo "$dir is the same, removing"
        rm -rf $dir

        # drop the git-annex managed file, we no longer need it
        git annex drop $basedir.tgz
    fi
done

Once this is done, the only directories left will be those with differences and the tarball will still be present in the git-annex repository for investigation. I end up writing little scripts like this as I go through old backups to help me process large amounts of data quickly.

All done

That’s it for now. If you have any questions about this or git-annex in general, tweet at me @ndj.

Enjoy.

Remotecopy, Two Years Later

It’s been over two years since I wrote remotecopy and I still use it every day.

The most recently added feature is the -c option, which will remove the trailing newline from the copied data if it only contains one line. I found myself writing little scripts that would only output one line with the intent of using that output to build a command line on a different system, and the extra newline at the end often messed up the new command. The -c solves this problem.

For instance, I have git-url, which outputs the origin url of the current git repository. This makes it easy to clone the repo on a new system (rc is my alias for remotecopy -c):

1
2
3
firsthost:gitrepo$ git url | rc
Input secret:
rc-alaelifj3lij2ijli3ajfwl3iajselfiae

Now the clone url is in my clipboard, so I just type git clone and then paste to clone on a different system:

1
2
3
secondhost:~$ git clone git@github.com:justone/gitrepo.git
Cloning into 'gitrepo'...
...

No tmux pbcopy problems

Most OSX tmux users are familiar with the issues with pbcopy and the current workarounds.

Since remotecopy works by accessing the server over a tcp socket, it’s immune to these problems. I just use remotecopy on my local system as if I were on a remote system.

LA Perl Mongers

At the latest LA Perl Mongers meeting, the talks were lightning in nature, so I threw together a presentation about remotecopy. The interesting source bits are up on github, including a pdf copy of the slides.

For the presentation, I used the excellent js-sequence-diagrams to make this diagram, that hopefully helps show the data flow in a remotecopy interaction.

git annex map

Enjoy.

A Script to Ease SCP Use

Since I work on remote systems all the time, I use SCP repeatedly to transfer files around. One of the more cumbersome tasks is specifying the remote file or directory location.

So I wrote a helper script to make it easier. It’s called scptarget, and it generates targets for SCP, either the source or the destination.

For instance, if I want to copy a file down from a remote server, I run scptarget like this and copy the output:

1
2
$ scptarget file.pl
endot.org:/home/nate/file.pl

Then it’s easy to paste it into my SCP command on my local system:

1
2
$ scp endot.org:/home/nate/file.pl .
...

I usually use remotecopy (specifically remotecopy -c) to copy it so that I don’t even have to touch my mouse.

Examples

Here are a few example uses.

First, without any arguments, it targets the current working directory. This is useful when I want to upload something from my local system to where I’m remotely editing files.

1
2
$ scptarget
endot.org:/home/nate

Specifying a file targets the file directly.

1
2
$ scptarget path/to/file.pl
endot.org:/home/nate/path/to/file.pl

Absolute paths are handled correctly:

1
2
$ scptarget /usr/local/bin/file
endot.org:/usr/local/bin/file

Vim SCP targets

Vim supports editing files over SCP, so passing -v in generates a target that it can use:

1
2
$ scptarget -v path/to/file.pl
scp://endot.org//home/nate/file.pl

And to edit, just pass that in to Vim:

1
$ vim scp://endot.org//home/nate/file.pl

IP based targets

Sometimes I need the target to use the IP of the server instead of its hostname. This usually happens with development VMs (a la Vagrant), which are only addressable via IP. Passing -i to scptarget causes it behave this way. Under the hood, it uses getip, which is a script I wrote that prints out the first IP of the current host. If there is no non-private IP, then it will return the first private IP. (I am fully aware that there may be better ways of doing the above. Let me know if you have a better script.)

1
2
$ scptarget path/to/file.pl
64.13.192.60:/home/nate/path/to/file.pl

That’s it. I find it incredibly useful and I hope you do too.

Enjoy.

Seeing the Shuttle

Launch

A little over thirteen years ago, I embarked on a cross-country trip with one of my college buddies. I’ll elaborate more on the trip in another post, but the pertinent part of that story is that we happened to be in Florida in late May, 2000.

We’d originally planned to see certain sights along the way, but by the time we reached the east coast we had grown quite good at adding extra stops to the itinerary. When we stopped in Orlando, we quickly added a trip to the Kennedy Space Center, as we are both great fans of NASA. While we were there, we learned that in a few days a shuttle (Atlantis) was going to launch, so we quickly rearranged the next leg of our trip so that we could be back in the area and then purchased tickets.

Since it was an early AM launch window, they let us into the main building of the space center just before three in the morning. Most of the exhibits were open and since the only people there were the ones going to see the launch, there were no crowds. We’d spent most of our previous visit in the other buildings on site, so it was quite a treat to wander around uninhibited. One of the theaters that usually shows documentary style films was showing live video of the close out crew getting the astronauts into the shuttle while a staff person up in front answered questions from the dozen or so people in the audience. I remember sitting in that room for some time, intently watching the video and enjoying every minute.

When the time came for us to head out to the launch site, we loaded into shuttles that took us out to where NASA Parkway East crosses the Banana River. The causeway over the river is the closest the public can get to a shuttle launch at just over six miles away. We waited out there for about two hours before the final nine minute countdown began, and when the clock struck zero it lifted off, almost effortlessly. From our vantage point it was silent until a few seconds later when the shock wave rolled across the water and hit us. It was an experience like none other.

Retirement

Shortly before the shuttle program ended a couple years ago, NASA announced which museums around the country would receive a retired orbiter and we were lucky enough to get the Endeavour for the California Science Center.

Over the holiday break, I was able to visit it with my family. It’s on display in a purpose-built hanger while they work on a permanent home. It was great to see it up close, but the hanger and the pre-exhibit room were packed with holiday crowds.

Then, this past week, I was able to return for a second visit with another college friend and his family. This time, there were only a few schoolchildren to maneuver around while looking up at the orbiter. While my friend and his family wandered around, I was able to just sit and study the vehicle itself.

When I saw it thirteen years ago, it was a speck on the horizon. This time it was so big that I couldn’t take it all in at once. I noticed where the black heat tiles begin and the other locations (beside the underbelly) where they’ve been placed. I could appreciate the enormity of the engine nozzles at the back and the texture of the thermal blankets that cover most of the top half. I counted the maneuvering thrusters on the nose and tail and could see the backwards flag on the right side. Again, it was an experience like none other.

There’s a lot to learn about the shuttle program and about Endeavour in particular. For instance, I learned that the reason for Endeavour’s British spelling is that it was named for the HMS Endeavour, the ship that Captain Cook explored Australia and New Zealand with. Also, I learned that Endeavour was built as the replacement for Challenger, and 22 years after the Challenger disaster it was Endeavour who took the first teacher into space.

If you’re in the LA area and are a fan of space flight, then don’t miss seeing the Endeavour. I’ll definitely be going back.

Endeavour Endeavour Endeavour Endeavour Endeavour

Managing Backups With Git-annex

My Situation

I have backups. Many backups. Too many backups.

I use time machine to back up my macs, but that only covers the systems that I currently run. I have archives of older systems, some for nostalgic reasons, some for reference. I also have a decent set of digital artifacts (pictures, videos and documents) that I’d rather not lose.

So I keep backups.

Unfortunately, I’m not very organized. When I encounter data that I want to keep, I usually rsync it onto one or another external drive or server. However, since the data is not organized, I can’t tell how much of it can simply be deleted instead of backed up again. The actual amount of data that should be backed up is probably less than half of the amount of data that exists on the various internal and external drives both at home and at work. This also means that most of my hard drives are at 90% capacity and I don’t know what I can safely delete.

I really needed a way of organizing the data and getting it somewhere that I can trust.

git-annex

I initially heard of git-annex a while ago, when I was perusing the git wiki. It seemed like an interesting extension but I didn’t take another look at it until the creator started a kickstarter project to extend it into a dropbox replacement.

git-annex is great. It’s an extension to git that allows managing files with git without actually checking them in. git-annex does this by replacing each file with a symlink that points to the real content in the .git/annex directory (named after a checksum of the file’s contents). Only the symlink gets checked into git.

To illustrate, here’s how to get from nothing to tracking a file with git-annex:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
$ mkdir repo && cd repo
$ git init && git commit -m initial --allow-empty
Initialized empty Git repository in /Users/nate/repo/.git/
[master (root-commit) c8562e6] initial
$ git annex init main
init main ok
(Recording state in git...)
$ mv ~/big.tar.gz .
$ ls -lh
-rw-r--r--  1 nate  staff    10M Dec 23 15:31 big.tar.gz
$ git annex add big.tar.gz
add big.tar.gz (checksum...) ok
(Recording state in git...)
$ ls -lh
lrwxr-xr-x  1 nate  staff   206B Dec 23 15:32 big.tar.gz -> .git/annex/objects/PP/wZ/SHA256E-s10485760--7c8fdf649d2b488cc6c545561ba7b9f00c52741a5db3b0130a8c9de8f66ff44f.tar.gz/SHA256E-s10485760--7c8fdf649d2b488cc6c545561ba7b9f00c52741a5db3b0130a8c9de8f66ff44f.tar.gz
$ git commit -m 'adding big tarball'
...

When the repository is cloned, only the symlink exists. To get the file contents, run git annex get:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
$ cd .. && git clone repo other && cd other
Cloning into 'other'...
done.
$ git annex init other
init other ok
(Recording state in git...)
$ file -L big.tar.gz
big.tar.gz: broken symbolic link to .git/annex/objects/PP/wZ/SHA256E-s10485760--7c8fdf649d2b488cc6c545561ba7b9f00c52741a5db3b0130a8c9de8f66ff44f.tar.gz/SHA256E-s10485760--7c8fdf649d2b488cc6c545561ba7b9f00c52741a5db3b0130a8c9de8f66ff44f.tar.gz
$ git annex get big.tar.gz
get big.tar.gz (merging origin/git-annex into git-annex...)
(Recording state in git...)
(from origin...) ok
(Recording state in git...)
$ file -L big.tar.gz
big.tar.gz: data

By using git-annex, every clone doesn’t have to have the data for every file. git-annex keeps track of which repositories contain each file (in a separate git branch that it maintains) and provides commands to move file data around. Every time file content is moved, git-annex updates the location information. This information can be queried to figure out where a files content is and to limit the data manipulation commands.

There is (much) more info in the walkthrough on the git-annex site.

My Setup

What I have is a set of git repositories that are linked like this:

git annex map

[git-annex has a subcommand to generate a map, but it requires that all hosts are reachable from where it’s run, and that’s not possible for me. I quickly gave up when trying to make my own Graphviz chart and ended up using Lekh Diagram on my iPad (thanks Josh).]

My main repository is on a machine at home (which started life as a mini thumper and is now an Ubuntu box), and there are clones of that repository on various remote machines. To add a new one, all I need to do is clone an existing repository and run git annex init <name> in that repository to register it in the system.

This has allowed me to start organizing my backup files in a simple directory structure. Here is a sampling of the directories in my repository:

  • VMs - VM images that I don’t want to (or can’t) recreate.
  • funny - Humorous files that I want to keep a copy of (as opposed to trusting the Internet).
  • media - Personal media archives, currently mostly tarballs of pictures going back ten years.
  • projects - Archives of inactive projects.
  • software - Downloaded software for which I’ve purchased licenses.
  • systems - Archives of files from systems I no longer access.

There are other directories, and these directories may change over time as I add more data. I can move the symlinks around, even without having the actual data on my system, and when I commit, git-annex will update its tracking information accordingly. Every time I add data or move things around, all I need to do is run git annex sync to synchronize the tracking data.

Here is the simple workflow that I go through when changing data in any git-annex managed repository:

1
2
3
4
5
$ git annex sync
$ # git annex add ...
$ # git annex get ...
$ # git annex drop ...
$ git annex sync

With this in place, it’s easy to know where to put new data since everything is just directories in a git repo. I can access files from anywhere because my home backup server is available as an ssh remote. More importantly, I can just grab what I want from there, because git-annex knows how to just grab the contents of a single file.

One caveat to this system is that using git and git-annex means that certain file attributes, like permissions and create/modify/access time are not preserved. To work around this, for files that I want to preserve completely, I just tarball them up and add that file to the git-annex.

Installing git-annex

git-annex is written in Haskell. Installing the latest version on on OS X is not the most repeatable process, and the version that comes with most linux distributions is woefully out of date. So I’ve opted for using the prebuilt OS X app (called beta) or linux tarball.

After copying the OS X app into Applications or unpacking the linux tarball, I run the included runshell script to get access to git-annex:

1
2
3
4
$ /home/nate/git-annex.linux/runshell bash                      # on linux
$ /Applications/git-annex.app/Contents/MacOS/runshell bash      # on OS X
$ git annex version
git-annex version: 3.20121211

I’ll share more scripts and tips in future blog posts.

Enjoy.

Dfm Graduates to Its Own Repository and Learns How to Import Files

I recently split dfm out into its own git repository. This should make it easier to add new features and grow the test suite without cluttering up the original dotfiles repository. I’ll sync dfm over at regular intervals, so anyone who wants to keep up to date by merging with master will be ok.

I also just finished up a major new feature: dfm can now import files. So instead of:

1
2
3
4
$ cp .vimrc .dotfiles
$ dfm install
$ dfm add .vimrc
$ dfm ci -m 'adding .vimrc'

There is an import subcommand that accomplishes all of this:

1
2
3
4
5
6
7
$ dfm import .vimrc
INFO: Importing .vimrc from /home/user into /home/user/.dotfiles
INFO:   Symlinking .vimrc (.dotfiles/.vimrc).
INFO: Committing with message 'importing .vimrc'
[personal 8dbf30d] importing .vimrc
 1 file changed, 46 insertions(+)
 create mode 100644 .vimrc

There are a smattering of other new features as well, like having dfm execute a script or fixup permissions on install. These are listed in the changelog for v0.6 and documented in the wiki.

To update to the latest, just run these commands:

1
2
$ dfm remote add upstream git://github.com/justone/dotfiles.git
$ dfm pull upstream master

Or, grab dfm from its repository.

Enjoy.

Extending Svn, à La Git

Subversion is a useful tool. It does most of what I need it to do, but sometimes there are missing features. Sometimes, it’s something that git does natively. Other times, it’s a repeated command sequence. It’s easy to write small scripts to do these new things, but they never feel like they fit in with the rest of the commands.

I’ve always been fond of the way that git can be extended by simply creating a script with the right name; git-foo [args] becomes git foo [args]. I wanted that same level of extensibility with subversion, so I decided to write a little wrapper called svn. It’s in my PATH ahead of /usr/bin, and it detects if the subcommand given exists as svn-$subcommand in my path somewhere. If that’s found, it is executed. Otherwise the real svn binary is executed.

I originally wrote svn in perl, but the other day a friend of mine 1 rewrote it with shell, cutting it by more than half and making it easier to understand. Here it is:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
#!/usr/bin/env bash

## If there is a svn-${COMMAND}, try that.
## Otherwise, assume it is a svn builtin and use the real svn.

COMMAND=$1
shift

SUB_COMMAND=$(type -path svn-${COMMAND})
if [ -n "$SUB_COMMAND" -a -x "$SUB_COMMAND" -a "${COMMAND}" != "upgrade" ]; then
    exec $SUB_COMMAND "$@"
else
    command -p svn $COMMAND "$@"
fi

Once I had the wrapper, I started creating little extensions to subversion. Here are the ones I’ve created.

svn url

This prints out the URL of the current checkout.

I frequently need to have the same checkout on multiple machines. So, grabbing the URL quickly is essential. All this script does is get the the URL out of the svn status line, but it makes the following possible:

1
$ svn url | remotecopy

Which means no mouse is needed.

svn vd

This shows the uncommitted differences with vimdiff.

Since subversion doesn’t have native support for using external diff tools, this script uses vimdiff.pl to add that in.

I used to have my subversion configuration set so that vimdiff was always used, but decided to add this script so that I could choose at the prompt which one I wanted (svn di for native, svn vd for vimdiff).

svn clean

This is the analog to git-clean. It removes any untracked or ignored files.

This is indispensible for projects that generate a lot of build artifacts or times when there are several untracked items to delete. Running it without additional options will show what files would be removed, and adding the -f flag will do the deleting.

svn fm (fm = ‘fix merge’)

This makes it easy to fix merge conflicts by loading up the right files in vimdiff.

When a conflict exists during a merge, subversion dumps several files in the local directory to help you figure out how the conflict occurred.

1
2
3
4
5
6
7
nate@laptop:~/test1
$ svn st
 M      .
?       file.merge-left.r23262
?       file.merge-right.r23265
?       file.working
C       file

I can never remember which file is which, so running svn fm conflictedfile runs vimdiff like this:

On the left is the file before the merge and on the right is the new file being merged. The middle has the merged file with conflict markers.

If all the conflicts are resolved, the conflict is marked as resolved.

All done

That’s it for now. Enjoy.

Update 2012-09-17: Updated wording about svn clean behavior. Default changed from deleting to showing what would be deleted and the option -n changed to -f.

  1. The above links to his github, but I like the picture on his homepage better.

Host-specific Bash Configuration

Well, I was going to write about a nifty bit of bash to help with ssh-agent in tmux, but someone beat me to it, so I’ll just write up his idea instead.

Every once in a while, it’s nice to have a bit of bash initialization that only runs on one system. You could just throw that at the end of the .bashrc for that system, but that’s not very persistent. It would be better to have, in the spirit of dotjs, a directory where you drop files with the same name as the host and they get run.

So, here’s a bit of bash initialization that does that and a bit more.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
HN=$( hostname -f )
HOST_DIR=$HOME/.bashrc.d/host.d

# split hostname
HN_PARTS=($(echo $HN | tr "." "\n"))

TEST_DOMAIN_NAME=
for (( c = ${#HN_PARTS[@]}; c--; c == 0 )); do
    if [[ -z $TEST_DOMAIN_NAME ]]; then
        TEST_DOMAIN_NAME="${HN_PARTS[$c]}"
    else
        TEST_DOMAIN_NAME="${HN_PARTS[$c]}.$TEST_DOMAIN_NAME"
    fi

    if [[ -f $HOST_DIR/$TEST_DOMAIN_NAME ]]; then
        source $HOST_DIR/$TEST_DOMAIN_NAME
    elif [[ -d $HOST_DIR/$TEST_DOMAIN_NAME ]]; then
        for file in $HOST_DIR/$TEST_DOMAIN_NAME/*; do
            source $file
        done
    fi
done

One additional bit is that it uses successively longer segments of the hostname, so for the hostname foo.bar.domain.com, the following names are checked, in order: com, domain.com, bar.domain.com, foo.bar.domain.com. Doing this means that domain-specific initialization is easy and that more specific filenames can override their general counterparts.

The other extra is that if the name exists as a directory, all the files in that directory are sourced. So the full list of checked locations for the above hostname would be:

  • com
  • com/*
  • domain.com
  • domain.com/*
  • bar.domain.com
  • bar.domain.com/*
  • foo.bar.domain.com
  • foo.bar.domain.com/*

It works pretty well, but I’m sure it could be better written. I’m not very proficient with bash, so if you have any suggestions for improving it, let me know.

Enjoy.