Sequence Analysis software and materials ─ Brendan Halpin

See below for papers and talk slides

Sequence analysis utilities for Stata

Autumn 2015: SADI on SSC

SADI is now available directly from the main Stata add-on archive, SSC:

ssc install sadi

February 2015: Updated SADI

SADI has been updated to by-pass Stata's limitation on matrix size, meaning that now more than 11,000 sequences can be compared.

April 2014: Updated SADI

An update to the SADI package was released on 3 April. Many small improvements are included, including more stable plugins. Studer et al's discrepancy measure is one such addition.

Nov 9 2011: New Version

A new version of the sequence analysis add-ons for Stata is now available from (Use Stata to download.) There are two main differences over the previous implementation: first, the plugin it contains is now compiled for 32- and 64-bit Windows, and 32-bit Linux; second, duplicates are not used in the pairwise distance calculations, though complete N times N matrices are created. The latter change reduces time taken substantially if there are many duplicates.

Three distance measures are provided in this package, Hamming distance, standard OM and my OMv (see Halpin 'Optimal Matching Analysis and Life Course Data: the importance of duration', Sociological Methods and Research, 38 (3), 2010). Note that OMv is not guaranteed to generate metric distances.

Several utility functions are also included:


net from
net install sadi

This code uses functions from the moremata package, so you may need to do ssc install moremata, and restart Stata, before using the sadi commands.

If you have any problems installing or running these utilities, please let me know at

Papers, talks and lectures

RSS/SLSS June 9 2015: MICT

Talk about Multiple Imputation for Categorical Time-series.

Non-self-identical missing values, Oxford Feb 2015

Slides from my talk to the Workshop on Algorithmic Social Research, Nuffield College, Oxford, Feb 27 2015 .

Stata German User Group, Hamburg June 2014

Slides from my talk to the Hamburg SUG.

More on imputing sequence data

In this paper I describe my imputation of missing data in sequences in greater detail.

Lausanne Conference on Sequence Analysis

I presented this paper on multiple imputation for gaps in lifecourse sequences at the Lausanne Conference on Sequence Analysis, in June 2012

Journées Trajectoires, Paris: "Simulating Sequences"

Slides from a one-day conference at the Université Paris-I in October 2011. There is also a recording of the presentation.

Helsinki talk, May 2010

Slides and other details from a presentation to the Helsinki Collegium for Advanced Studies, May 2010 are here.

OFPR Course, Paris May 2009

In May 2009 I spent a month in Paris as a guest of CREST, presenting an occasional course to PhD students from institutions across Paris, under the "Option Formation par la Recherche" scheme. Slides for my lectures are here.

QMSS2, Oslo

Slides from my talk to the QMSS2 conference in Oslo, October 2008

RC33, Naples

I gave two papers to the RC33 Conference in Naples, September 2008:

Frontiers in Social and Economic Mobility, Cornell, 2003

My paper to the "Frontiers in Social and Economic Mobility" conference in Cornell in March 2003 is available as Departmental Working Paper WP2003-01. I had the honour of sharing the session with Andrew Abbott and Larry Wu.

Older material: for reference only

Here I make available a number of utilities for Stata related to sequence analysis (including optimal matching). Some of this material relates to the short course I gave in the Essex Summer School in July 2007.

Copy the relevant files to a directory in Stata's "adopath". (All the relevant files in a single zip file are here.)

Older material: Duration-adjusted Optimal Matching

The adapted Needleman-Wunsch algorithm used by the omav command is designed to treat tokens differently according to the length of the spell in which they occur. This is intended to give better results than conventional OM when used with life-course data. A preliminary discussion of the algorithm is available in these talk slides.

Older material: Stata ado and help files


All the relevant files are available in a single zip file here.

Older material: Plugin files

The oma, omav and combin commands are based on C plugins for speed. Implementing them in C rather than in Stata's Mata matrix language yields a 40-fold speed increase. Two versions are presented, X.linux.plugin and X.w32.plugin. If you are using 32-bit Windows copy each X.w32.plugin to X.plugin. For Linux, do the same with X.linux.plugin. If you have another operating system, you may be able to compile the code yourself (see below).

Files related to creating the C plugins

Older material: Essex course

The Essex course is described here, and files related to the course are available here. See in particular labs.pdf.

Brendan Halpin
Department of Sociology
University of Limerick