SADI has been updated to by-pass Stata's limitation on matrix size, meaning that now more than 11,000 sequences can be compared.

An update to the SADI package was released on 3 April. Many small improvements are included, including more stable plugins. Studer et al's discrepancy measure is one such addition.

A new version of the sequence analysis add-ons for Stata is now available from http://teaching.sociology.ul.ie/sadi/sadi.pkg. (Use Stata to download.) There are two main differences over the previous implementation: first, the plugin it contains is now compiled for 32- and 64-bit Windows, and 32-bit Linux; second, duplicates are not used in the pairwise distance calculations, though complete N times N matrices are created. The latter change reduces time taken substantially if there are many duplicates.

Three distance measures are provided in this package, Hamming
distance, standard OM and my OMv (see Halpin 'Optimal Matching Analysis
and Life Course Data: the importance of duration', *Sociological
Methods and Research*, 38 (3), 2010). Note that OMv is not
guaranteed to generate metric distances.

Several utility functions are also included:

`trans2subs`

creates substitution-cost matrices based on the observed pattern of transitions`stripe`

creates string representations of the sequences, which allows you to use Stata's regular-expression functions to summarise them`metricp`

tests the pairwise distance matrix for the triangle inequality (note OMv will often fail this test!)- Finally,
`permtab`

,`permtabga`

and`ari`

allow us to compare cluster solutions.`ari`

calculates the Adjusted Rand Index, which indexes the level of agreement between two unlabelled classifications of the same size, while`permtab`

and`permtabga`

permute the values of one of the classifications to maximise the agreement, and return the permutation.`permtabga`

uses a genetic algorithm to provide an approximate solution, as permutations of more than 8-10 elements take infeasibly long.

net from http://teaching.sociology.ul.ie/sadi net install sadi

This code uses functions from the `moremata`

package, so
you may need to do `ssc install moremata`

, and restart Stata,
before using the `sadi`

commands.

If you have any problems installing or running these utilities, please let me know at brendan.halpin@ul.ie.

Slides from my talk to the Workshop on Algorithmic Social Research, Nuffield College, Oxford, Feb 27 2015 .

Slides from my talk to the Hamburg SUG.

In this paper I describe my imputation of missing data in sequences in greater detail.

I presented this paper on multiple imputation for gaps in lifecourse sequences at the Lausanne Conference on Sequence Analysis, in June 2012

Slides from a one-day conference at the Université Paris-I in October 2011. There is also a recording of the presentation.

Slides and other details from a presentation to the Helsinki Collegium for Advanced Studies, May 2010 are here.

In May 2009 I spent a month in Paris as a guest of CREST, presenting an occasional course to PhD students from institutions across Paris, under the "Option Formation par la Recherche" scheme. Slides for my lectures are here.

Slides from my talk to the QMSS2 conference in Oslo, October 2008

I gave two papers to the RC33 Conference in Naples, September 2008:

- One arguing about substitution costs and
- One presenting time-warping.

My paper to the "Frontiers in Social and Economic Mobility" conference in Cornell in March 2003 is available as Departmental Working Paper WP2003-01. I had the honour of sharing the session with Andrew Abbott and Larry Wu.

Here I make available a number of utilities for Stata related to sequence analysis (including optimal matching). Some of this material relates to the short course I gave in the Essex Summer School in July 2007.

Copy the relevant files to a directory in Stata's "adopath". (All the relevant files in a single zip file are here.)

The adapted Needleman-Wunsch algorithm used by
the `omav`

command is designed to treat tokens
differently according to the length of the spell in which they
occur. This is intended to give better results than conventional
OM when used with life-course data. A preliminary discussion of
the algorithm is available in these talk slides.

NB THESE FILES ARE OUTDATED

- oma.ado (help file): Code to run optimal matching
- omav.ado (help file): Implements optimal matching with a correction for continuous spells
- combin.ado (help file): Implements Elzinga's X/T method (experimental implementation using different algorithm from Elzinga's CHESA software)
- degenne.ado (help file): Implements so-called "Degenne" methods
- hamming.ado (help file): Implements Hamming distance
- permtab.ado (help file): Utility to compare pairs of cluster solutions
- trans2subs.ado (help file): Utility to generate substitution matrices from observed transition rates

All the relevant files are available in a single zip file here.

The oma, omav and combin commands are based on C plugins for speed. Implementing them in C rather than in Stata's Mata matrix language yields a 40-fold speed increase. Two versions are presented, X.linux.plugin and X.w32.plugin. If you are using 32-bit Windows copy each X.w32.plugin to X.plugin. For Linux, do the same with X.linux.plugin. If you have another operating system, you may be able to compile the code yourself (see below).

- Makefile: Commands to compile the plugins
- elzspell.c: C code for combin command
- omamatv3.c: C code for oma and omav commands
- stplugin.c and stplugin.h: Stata C code for compiling plugins, check their site for more up to date versions, and for helpful info on compiling plugins
- uthash.h: Code used by elzspell.c, from Troy Hanson

The Essex course is described here, and files related to the course are available here. See in particular labs.pdf.

Brendan HalpinDepartment of Sociology

University of Limerick