endot

eschew obfuscation (and espouse elucidation)

Envbox: Keeping Secret Environment Variables Secure

In my day to day work and evening and weekend side work, I do most almost all of my development working on remote systems. This has a number of advantages that are for another post, but this post is about one of the limitations.

Most developers have a tool belt that they’re continually improving, and as I work on mine I come across commands - like hub - that require1 putting a secret value into an environment variable, usually for authentication.

For instance, to use hub, I need to do something like this:

1
2
3
4
$ export GITHUB_TOKEN=ba92810bab08av0ab0157028bb
$ alias git=hub
$ git create username/repo
$ git pull-request -o

If I were only running git/hub commands on my local desktop, I could put the environment variable export into my shell and be done with it. But on any remote system, I only have these options:

  1. Run export GITHUB_TOKEN=.. in my shell before any command that requires it. This isn’t good because the token is now in my history, and any command that I run has access to the value.
  2. Run each command that needs the token like this: GITHUB_TOKEN=... git create .... This solves the access issue, but it still pollutes my history. It’s also cumbersome to deal with when running many commands.
  3. Add the export to my dotfiles. This solves the history problem (and the “remembering to enter the variable” problem), but then my token is available to anyone that I share my dotfiles with.

I wanted something that I could use to securely manage these kinds of environment variables while making it convenient to expose them to specific commands. So I wrote envbox.

Envbox is written in Go, primarily because the language is quite suitable for these sorts of problems, but also because there’s a NaCl secretbox implementation in the Go “Sub Repositories”, and I thought it was a good fit for this problem.

Usage

After installation (instructions in the README), the first step is to set up envbox by generating a key:

1
2
$ envbox key generate -s
c348603fe1a708277666222a1e549b7c02b22419b3cfe44d0dd5800b3da27b56

This key is used to encrypt each of the environment variables. Next up is to add a new environment variable:

1
2
3
4
$ envbox add -n GITHUB_TOKEN
value: aeijfalsjiegliasjefliajsefljaef
$ envbox list
GITHUB_TOKEN=aeijfalsjiegliasjefliajsefljaef

Then, when running a command that needs the variable:

1
2
$ envbox run -e GITHUB_TOKEN -- bash -c 'echo $GITHUB_TOKEN'
aeijfalsjiegliasjefliajsefljaef

Or, more apropos for the above example:

1
alias git='envbox run -e GITHUB_TOKEN -- hub'

Storage

Envbox stores each variable in its own file on disk:

1
2
3
4
5
6
7
8
9
10
11
$ hexdump -C ~/.local/share/envbox/7ebac232c337c78af91cc4341d650a90a9044d0b259059e8.envenc
00000000  79 80 8b 0d e2 9c c1 85  0c 36 1c bb 6c 94 f6 3c  |y........6..l..<|
00000010  25 55 fb c1 00 3a 6c 3e  e4 b7 ad c3 bc cf a5 75  |%U...:l>.......u|
00000020  76 57 cb 23 c2 91 13 20  79 df 9d d8 72 89 05 26  |vW.#... y...r..&|
00000030  90 d5 f1 9e 05 26 51 fb  f5 fd 3d d9 65 fa 3d b9  |.....&Q...=.e.=.|
00000040  79 ee 35 7e 6a 83 8e fd  32 56 9e f1 f7 1d ef 23  |y.5~j...2V.....#|
00000050  05 03 a2 3c cc f0 6b 8d  cc 08 31 8c f2 d2 c1 a1  |...<..k...1.....|
00000060  72 33 6e 48 59 87 b5 8b  82 b3 1a b3 e3 d7 98 8c  |r3nHY...........|
00000070  d8 a3 c0 04 f0 f5 c1 53  06 84 14 b7 ee 45 c0 de  |.......S.....E..|
00000080  82 a2                                             |..|
00000082

Currently, the key is stored in a permission-restricted file in your home directory so that envbox can decrypt the files, but the plan is to move to a credential cache system like the one git uses, so that the key is only held in memory for a configurable time. This makes a better tradeoff between security and convenience.

Summary

There are a few other things that envbox can do, such as accepting multi-line variables and differentiating the envbox name from the variable name, so that several of the same variable (e.g. two different GITHUB_TOKENs) can be tracked.

I’ve found it to be incredibly useful, allowing me to version and distribute my secret variables while keeping them secure.

Enjoy.

  1. hub doesn’t actually require the environment variable, but logging in for every push and pull seems a bit inefficient.

My Note-taking Workflow

A few people have asked about my note-taking workflow and it’s been quite useful to me, so I thought I would describe what works for me.

I’ve tried several of the popular note-taking tools out there and found them overbearing or over-engineered. I just wanted something simple, without lock-in or a crazy data format.

So my notes are just a tree of files. Yup, just directories and files. It isn’t novel or revolutionary. It doesn’t involve a fancy application or Web 2.0 software. It also works surprisingly well.

Formatting

I’d taken notes in plain-text files for a while, but what really made my notes more useful was that I switched to Markdown a few years ago. Markdown is one of the best text formatting languages out there1, and many sites use it as their markup language.

So, any time I take notes, I write in Markdown. It took a little while to get used to the syntax, but thankfully the basics are straightforward and sensible. It also looks great without any processing. I can share it with others without reformatting. Or, if I need a fancier presentation, I can use pandoc to transform it into almost any other format imaginable.

Tooling

There are a plethora2 of tools that understand a tree of files. I can use find, ack, vim, and any other command line tools to manage my personal knowledge base. Not only does this make my notes more accessible, but it also means that I develop greater competency in the tools I also use for everyday development.

I originally used Notational Velocity (and then nvALT) for note taking. I really liked the quick searchability that it provides. After a buddy suggested that Vim would be able to do the same, I switched over immediately. For filename searching, I use ctrlp.vim (custom config) and for content searching I use ack.vim.

As far as rendering to other formats, I use the most excellent pandoc. In my vimrc, I have mapping for converting the current file to html with pandoc:

1
nmap <leader>vv :!pandoc -t html -T 'Pandoc Generated - "%"' --smart --standalone --self-contained --data-dir %:p:h -c ~/.dotfiles/css/pandoc.css "%" \|bcat<cr><cr>

It generates a self-contained html page (with images embedded as data urls) and then opens the output in a web browser (thanks to this bcat script).

Sync

Universal access is incredibly important for note taking. Without it, your distilled knowledge is locked inside your computer.

To make my notes available wherever I go, I keep them in Dropbox3. DropBox does a decent job of synchronizing, but it’s best feature is its integration into so many iOS apps. Almost every app that supports remote file access integrates with Dropbox.

I’d love to use BitTorrent Sync, but its developer API was only recently released and it’s going to take time for apps to support it.

Mobile Application

For mobile access I used to use Notesy. I appreciated its simple interface and quick rendering preview. It recently gained a few keyboard helpers for frequently used markdown characters.

However, once Editorial was released as a universal application, I switched over immediately. Not only is it’s main editing interface more pleasant to use, with better helpers and inline markdown rendering previews, it also sports the ability to add snippets via abbreviations and a phenomenally powerful workflow system that can orchestrate inter-app automation.

Use cases

There are many use cases where my system is useful. Here are a few.

Notes instead of bookmarks

I used to save bookmarks on Delicous as I found interesting URLs online. I found out, however, that over time I never went back and looked at those bookmarks because they weren’t coherently organized. There’s something about tags that just doesn’t help when it comes to searching for information.

Now, instead of saving bookmarks, I create notes based on particular topics and add links I find to those files. The fact that it’s a regular text file means that I can not only use Markdown sections to organize links into headings, but I can also include sample code blocks or images from the local directory.

Talk notes

By virtue of the fact that I take notes in Markdown and my blog is Markdown, when I take notes on talks, it’s extremely easy to publish them. I just copy them over and add the right Octopress YAML header.

Conference notes

When I’m at a conference, I can choose to take notes on my phone or my laptop depending on the type of content. One time I was taking notes on my laptop during a late-in-the-day session and noticed that my battery was getting low. I didn’t need to have the laptop out for any other reason, so I closed it up, opened my phone, and continued taking notes where I’d left off.

Sermon notes

I keep notes every Sunday and keep those notes in a sub-folder. It’s easy to keep types of notes separate by just using regular folders.

Blog post editing

This one is a little meta, for sure. I’ve edited this blog post over the course of a few weeks, sometimes on a computer and sometimes on either my iPad or iPhone. I keep a clone of this blog’s source in Dropbox as well, so I can do most of my editing wherever I happen to be. After that, a few quick commands over ssh and this post will be live.

Conclusion

That pretty much covers my note-taking system. If you’d like to adopt a similar system, let me know how it goes and any cool tools that you discover.

Enjoy.

  1. I tried RST too, but I found it to be too prickly for my note taking needs. However, it’s awesome for software documentation.

  2. What is a plethora?

  3. Oh oh, guess I do use a Web 2.0 tool.

My Tmux Configuration, Refined

When I wrote about tmux for the first time, I was just getting into the idea of nesting sessions. I ran a local tmux session that wrapped remote tmux sessions for more than a year before I switched it up again.

I added another level.

Background

I originally started nesting tmux sessions so that I wouldn’t have to use tabs in Terminal to keep track of different remote tmux sessions. This allowed me to connect to my work machine from home and get my entire working session instantly. While that worked well, I began to see a few issues with that approach:

  1. At work, I ran my top level tmux session on my work laptop. The downside of this is that I had to leave my laptop open and running all the time to be able to access it remotely. This also necessitated some tricky SSH tunnels that I wasn’t entirely comfortable leaving open.
  2. The top level tmux session at home was on my home server, and so it was convenient to connect to from work, but if I connected to that session from my top level work session, the key bindings would end up conflicting.

Solution

I solved the first issue by running my top level work session on a server at work. This allowed me to close my laptop when I wasn’t in the office and it afforded me a location to run things that weren’t specific to a particular system but that I didn’t want to live and die with my laptop.

I solved the second issue by adding a new level of tmux. I called this new level uber and assigned it the prefix C-q to differentiate it from the other levels1,

With that in place, I would start the uber session on my laptop and then connect to both my home and work mid-level sessions, and via those, the leaf tmux sessions. Then, I could choose what level I wanted to operate on just by changing the prefix that I used.

Multiple sockets

Another thing that I wanted to do from time to time was run two independent tmux sessions on my local laptop. I could have used the built-in multi-session support in tmux, but I also wanted the ability to nest sessions locally, and tmux doesn’t support that natively. In looking for a solution, I stumbled on the idea of running each level on it’s own server socket. By adding that, I can now run all three on the same system and running two independent tmux sessions is as easy as running two different levels in separate windows. Plus, I can still use the native multi-session support within each level.

Sharing sessions

The most recent modification I made was to add easy support for sharing a tmux session between two Terminal windows. This allows me to treat my local Terminal windows as viewports into my tmux session tree, attaching where ever I need without necessarily detaching another Terminal window.

To enable this, I added an optional command line flag to the session start scripts that makes tmux start a new view of the session instead of detaching other clients. I also enabled ‘aggressive-resize’ so that the size of the tmux sessions aren’t limited to the smallest Terminal window unless more than one are looking at the exact same tmux window.

How it all looks

tmux sessions

It can look a little overwhelming, but in reality it’s quite simple to use. Most of my time is spent in the leaf node sessions, and that interaction is basically vanilla tmux.

Installing this for yourself

Configuration

The configuration for my set up is available in my dotfiles repository on Github:

  1. .tmux.shared - contains shared configuration and bindings that are common to all levels
  2. .tmux.uber - configuration unique to the top-level session
  3. .tmux.master - configuration unique to mid-level tmux sessions
  4. .tmux.conf - configuration unique to the lowest-level (leaf) sessions

Wrapper scripts

The heart of the wrapper scripts is tmux-sess. It holds all the logic for setting the socket and sharing sessions.

The rest of the scripts are thin wrappers around tmux-sess. For instance, here is tmux-uber:

1
2
3
#!/bin/sh

tmux-sess -s uber -f ~/.tmux.uber $*

The other level scripts are tmux-home for the mid-level session and tmux-main for the lowest-level.

Wrapping up

I hope that this information is helpful. If you have any questions, please ask me on twitter.

Enjoy.

  1. I also quickly decided that this uber level didn’t need to have its own status line. That would be crazy.

Talk Notes: February 2014

I was out of town two of the Fridays this month, so I was only able to get two talks in:

  • Clojure core.async - Continuing in my fascination with Clojure, I picked this talk to explore the non-Java techniques for handling concurrency. I’m familiar with CSP from my Go experience, and it was interesting to hear Clojure’s take on the same foundation. Clojure also implements a macro that turns the spaghetti code that is callbacks into a sequential function that still operates asynchronously.
  • Inventing on Principle - Several people have recommended this talk to me, and I finally got around to watching it. It’s worth the watch just for the amazing demos that he built, but the deeper notion that there could be an underlying principle that guides your life is thought provoking. It also makes me want to play with Light Table.

Enjoy.

Talk Notes: February 2014

I was out of town two of the Fridays this month, so I was only able to get two talks in:

  • Clojure core.async - Continuing in my fascination with Clojure, I picked this talk to explore the non-Java techniques for handling concurrency. I’m familiar with CSP from my Go experience, and it was interesting to hear Clojure’s take on the same foundation. Clojure also implements a macro that turns the spaghetti code that is callbacks into a sequential function that still operates asynchronously.
  • Inventing on Principle - Several people have recommended this talk to me, and I finally got around to watching it. It’s worth the watch just for the amazing demos that he built, but the deeper notion that there could be an underlying principle that guides your life is thought provoking. It also makes me want to play with Light Table.

Enjoy.

Setting Up Vim for Clojure

I’ve been experimenting with Clojure lately. A few of my coworkers had begun the discovery process as well, so I suggested that we have a weekly show-and-tell, because a little accountability and audience can turn wishes into action.

Naturally, I looked around for plug-ins that would be of use in my editor of choice. Here’s what I have installed:

These are all straightforward to install, as long as you already have a Pathogen or Vundle setup going. If you don’t, you really should, because nobody likes a messy Vim install.

All of these plug-ins automatically work when a Clojure file is opened, with the exception of rainbow parentheses. To enable those, a little .vimrc config is necessary:

1
2
3
4
au BufEnter *.clj RainbowParenthesesActivate
au Syntax clojure RainbowParenthesesLoadRound
au Syntax clojure RainbowParenthesesLoadSquare
au Syntax clojure RainbowParenthesesLoadBraces

Now, once that’s all set up, it’s time to show a little bit of what this setup can do. I have a little clojure test app over here on github. After cloning it (and assuming you’ve already installed leiningen):

  1. Open up dev.clj and follow the instructions to set up the application in a running repl.
  2. Then open testclj/core.clj and make any modification, such as changing “Hello” to “Hi”.
  3. Then after a quick cpr to reload the namespace in the repl, you can reload your web browser to see the updated code.

This setup makes for a quick dev/test cycle, which is quite useful for experimentation. Of course, there are many more features of each of the above plugins. I’ve barely scratched the surface and I’m already very impressed.

Enjoy.

Introducing Talk Notes

In the course of my work and my online reading and research, I often come across videos of talks that I want to watch. I rarely take the time to watch those videos, mostly because of the time commitment; I usually only have a few minutes to spare.

Lately, I’ve done something to change that. I’m taking a little bit of time out of my Friday schedule each week to watch a talk that looks interesting. I also try and focus on the talk. Rather than checking my email or chatting while the talk is playing, I take notes, sometimes including screenshots of important slides.

Over the course of the past month, I’ve had some success with this strategy, and I was able to watch three talks. Here are links to my notes:

  • Using Datomic With Riak - I picked this talk because we’ve used a bit of Riak at work and a buddy of mine keeps raving about Datomic. This talk is actually a great overview of the philosophy and design behind Datomic.
  • Raft - the Understandable Distributed Protocol - CoreOS’s etcd has been getting some mention lately, and Raft is the consensus algorithm used to keep all of its data consistent. At the end of watching this talk, I found another one (by one of the Raft authors), and it balanced the practicality of the first with some more of the theory.
  • React - Rethinking Best Practices - The functional programming paradigm is gathering steam, and Facebook’s React JavaScript library is a fascinating take on building modern web UIs in a functional manner.

I really enjoyed the process of taking notes in this way, and I hope to continue this as the year progresses.

Oh, and if you know of a good talk, please let me know on twitter.

Using Docker to Generate My Octopress Blog

When I originally set up Octopress, I set it up on my Mac laptop using rvm, as recommended at the time. It worked very well for me until just a few minutes after my last post, when I decided to sync with the upstream changes.

After merging in the changes, I tried to generate my blog again, just to make sure everything worked. Well, it didn’t, and things went downhill from there. The rake generate command failed because a new gem was required. So, I ran bundle install to get the latest gems. That failed when a gem required ruby 1.9.3. Then installing ruby 1.9.3 failed in rvm because I needed a newer version of rvm. After banging on that problem for a few minutes, I decided to take a break and come back to the problem later.

Docker to the rescue

Fast forward a few weeks, and I came up with a better idea. I decided to dockerize Octopress. This keeps all the dependencies sanely bottled up in an image that I can run like a command.

Here is the code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
FROM ubuntu:12.10
MAINTAINER  Nate Jones <nate@endot.org>

# instal system dependencies and ruby
RUN apt-get update
RUN apt-get install git ruby1.9.3 build-essential language-pack-en python python-dev -y

# make sure we're working in UTF8
ENV LC_ALL en_US.utf8

# add the current blog source
ADD . /o
WORKDIR /o

# install octopress dependencies
RUN gem install bundler
RUN bundle install

# set up user so that host files have correct ownership
RUN addgroup --gid 1000 blog
RUN adduser --uid 1000 --gid 1000 blog
RUN chown -R blog.blog /o
USER blog

# base command
ENTRYPOINT ["rake"]

How to use it

To use this Dockerfile, I put it at the root of my blog source and ran this command:

1
$ docker build -t ndj/octodock .

Then, since rake is set as the entry point, I can run the image as if it were a command. I use the -v switch to overlay the current blog source over the one cached in the image and -rm switch to throw away the container when it’s done.

1
2
3
4
5
6
7
8
$ docker run -rm -v `pwd`:/o ndj/octodock generate
## Generating Site with Jekyll
   remove .sass-cache/
   remove source/stylesheets/screen.css
   create source/stylesheets/screen.css
Configuration from /o/_config.yml
Building site: source -> public
Successfully generated site: source -> public

A few notes

  • I had to force the UTF8 locale in order to get ruby to stop complaining about non-ascii characters in the blog entries.
  • I add a user called blog with the same UID/GID as my system user, so that any commands that generate files aren’t owned by root. I look forward to proper user namespaces so that I won’t have to do this.
  • Deploying the blog doesn’t use my SSH key, as the ‘blog’ user in the image is doing the rsync, not my host system user. I’m ok with typing my password in or just rsync’ing the data directly.

Docker is a great piece of technology, and I keep finding new uses for it.

Enjoy.

Git-annex Tips

Last time I posted about git-annex, I introduced it and described the basics of my set up. Over the past year, I’ve added quite a bit of data to my main git-annex. It manages just over 100G of data for me across 9 repositories. Here’s a few bits of information that may be useful to others considering git-annex (or who are already knee deep in).

Archive, not backup

The website for git-annex explicitly states that it is not a backup system. An alternate description, that’s more appropriate, is that it’s part of an archival system. An archival system is somewhat concerned with backups of data, but it also deals with cataloging and retrieval.

I imagine that it’s a library system (books, not code) with the ability to do instantaneous inter-library loans. I have one repository (by the name of ‘silo’) that contains copies of all my data. I then have linked repositories on each computer that I use regularly that have little or no data in them, just git-annex style symlinks. If I find that I need something from the main repository on one of those computers, I can query where that file is with git annex whereis:

1
2
3
4
5
6
$ git annex whereis media/pictures/2002-02-08-olympics.tgz
whereis media/pictures/2002-02-08-olympics.tgz (4 copies)
        8314baa2-4193-8d77-bb7f-489bd73e7db4 -- calvin_dr
        8b22886e-14f2-98f0-31ec-6770b0a08f22 -- silo
        f8ec3d60-47bf-a392-4739-b39dd609d554 -- hobbes_dr
ok

(I actually have three full copies of my data, in the *_dr repositories, but that’s a story for another day. Suffice it to say that calvin_dr and hobbes_dr are two identical external drives.)

I can retrieve the contents with git annex get. git-annex is smart enough to know that the silo remote is over a network connection and the ‘calvin_dr’ is local, so it copies the data from there:

1
2
3
4
5
6
7
8
9
$ git annex get  media/pictures/2002-02-08-olympics.tgz
get media/pictures/2002-02-08-olympics.tgz (from calvin_dr...)
SHA256E-s48439263--67c0de0e883c5d5d62a615bb97dce624370127e5873ae22770b200889367ae1c.tgz
    48439263 100%   25.10MB/s    0:00:01 (xfer#1, to-check=0/1)

sent 48445343 bytes  received 42 bytes  19378154.00 bytes/sec
total size is 48439263  speedup is 1.00
ok
(Recording state in git...)

Then, running git annex whereis shows the file contents are local as well:

1
2
3
4
5
6
7
$ git annex whereis media/pictures/2002-02-08-olympics.tgz
whereis media/pictures/2002-02-08-olympics.tgz (5 copies)
    8314baa2-4193-8d77-bb7f-489bd73e7db4 -- calvin_dr
    8b22886e-14f2-98f0-31ec-6770b0a08f22 -- silo
    f8ec3d60-47bf-a392-4739-b39dd609d554 -- hobbes_dr
    ae7e4cde-0023-1f1f-b1e2-7efd2954ec01 -- here (home_laptop)
ok

And I can view the contents of the file like normal:

1
2
3
4
5
$ tar -tzf media/pictures/2002-02-08-olympics.tgz | head
2002-02-08-olympics/
2002-02-08-olympics/p2030001.jpg
2002-02-08-olympics/p2030002.jpg
...

Then, when I’m done, I can just git annex drop the file to remove the local copy of the data. git-annex, in good form, checks to make sure that there’s another copy before deleting it.

1
2
3
$ git annex drop media/pictures/2002-02-08-olympics.tgz
drop media/pictures/2002-02-08-olympics.tgz ok
(Recording state in git...)

All along the way, git-annex is tracking which repositories have each file, making it easy to find what I want. This sort of quick access and query-ability means that I know where my data is and I can access it when I need it.

Transporting large files

My work laptop used to be my only laptop, and so it had a number of my personal files, mostly pictures. I’ve transfered most of those off of that system, but every once in a while, I come across some personal data that I need to transfer to my home repository.

I usually add it to the local git-annex repository on my work laptop and then use git annex move to move it to my home server. However, if it’s a significant amount of data and I don’t feel like waiting for the long transfer over my slow DSL line, I can copy the data to my external drive at work and then copy it off when I get home. Doing this manually can get tedious if there are more than a few files, but git-annex makes it a cinch. First, I can query what files are not on my home server and then copy those to the calvin_dr drive.

1
2
3
work-laptop$ git annex add huge-file1.tgz huge-file2.tgz huge-file3.tgz
work-laptop$ git annex sync
work-laptop$ git annex copy --not --in silo --to calvin_dr

Then, when I get home, I attach the drive to my personal laptop and run git annex copy to copy the files to the server:

1
personal-laptop$ git annex copy --to silo --not --in silo

Detecting duplicates

Many of my backups are the “snapshot” style, where I rsync’d a tree of files to another drive or server in an attempt to make sure that data was safe. The net effect of this strategy is that I have several mostly-identical backups of the same data. So, when I find a new copy of data that I’ve previously added to my git-annex system, I don’t know if I can safely delete it just based on the top level directory name.

For example, if I discover a tree of pictures that are organized by date and event:

1
2
3
4
5
$ find pictures -type d
pictures
pictures/2002-02-08-olympics
pictures/2002-04-20-tahoe
pictures/2004-11-18-la-zoo

And, checking in my git-annex repo, I can see that there are three files that correspond to those directories:

1
2
3
4
$ find backup/pictures -type l
backup/pictures/2002-02-08-olympics.tgz
backup/pictures/2002-04-20-tahoe.tgz
backup/pictures/2004-11-18-la-zoo.tgz

I can probably remove the found files, but I might have modified the pictures in this set and I’d like to know before I toss them. After running into this scenario a few times, I wrote a little utility called archdiff that I can use to get an overview of the differences between two archives (or directories). It’s just a fancy wrapper around diff -r --brief that automatically handles unpacking any archives found. For example:

1
2
$ archdiff 2002-04-20-tahoe/ ~/backup/pictures/2002-04-20-tahoe.tgz
$ 

Since there was no output, the directory has the same contents as the archive and can be safely deleted. Here’s another example:

1
2
3
$ archdiff 2002-02-08-olympics/ ~/backup/pictures/2002-02-08-olympics.tgz
Files 2002-02-08-olympics/p2030001.jpg and 2002-02-08-olympics.tgz-_RhD/2002-02-08-olympics/p2030001.jpg differ
$ 

One of the files in this directory has modifications, so I can now take the time to look at the two files and see if I want to keep it or not.

Archdiff behaves like a good UNIX program and its exit code reflects whether or not differences were found, so it’s possible to script the checking of multiple directories. Here’s an example script that would check the above three directories:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
#!/bin/bash

cd ~/backup/pictures

for dir in ~/pictures/*; do
    basedir=$(basename $dir)
    echo "checking $dir"

    # retrieve the file from another git-annex repo
    git annex get $basedir.tgz

    if archdiff $dir $basedir.tgz; then
        echo "$dir is the same, removing"
        rm -rf $dir

        # drop the git-annex managed file, we no longer need it
        git annex drop $basedir.tgz
    fi
done

Once this is done, the only directories left will be those with differences and the tarball will still be present in the git-annex repository for investigation. I end up writing little scripts like this as I go through old backups to help me process large amounts of data quickly.

All done

That’s it for now. If you have any questions about this or git-annex in general, tweet at me @ndj.

Enjoy.

Remotecopy, Two Years Later

It’s been over two years since I wrote remotecopy and I still use it every day.

The most recently added feature is the -c option, which will remove the trailing newline from the copied data if it only contains one line. I found myself writing little scripts that would only output one line with the intent of using that output to build a command line on a different system, and the extra newline at the end often messed up the new command. The -c solves this problem.

For instance, I have git-url, which outputs the origin url of the current git repository. This makes it easy to clone the repo on a new system (rc is my alias for remotecopy -c):

1
2
3
firsthost:gitrepo$ git url | rc
Input secret:
rc-alaelifj3lij2ijli3ajfwl3iajselfiae

Now the clone url is in my clipboard, so I just type git clone and then paste to clone on a different system:

1
2
3
secondhost:~$ git clone git@github.com:justone/gitrepo.git
Cloning into 'gitrepo'...
...

No tmux pbcopy problems

Most OSX tmux users are familiar with the issues with pbcopy and the current workarounds.

Since remotecopy works by accessing the server over a tcp socket, it’s immune to these problems. I just use remotecopy on my local system as if I were on a remote system.

LA Perl Mongers

At the latest LA Perl Mongers meeting, the talks were lightning in nature, so I threw together a presentation about remotecopy. The interesting source bits are up on github, including a pdf copy of the slides.

For the presentation, I used the excellent js-sequence-diagrams to make this diagram, that hopefully helps show the data flow in a remotecopy interaction.

git annex map

Enjoy.