SADI has been updated to by-pass Stata's limitation on matrix size, meaning that now more than 11,000 sequences can be compared.
An update to the SADI package was released on 3 April. Many small improvements are included, including more stable plugins. Studer et al's discrepancy measure is one such addition.
A new version of the sequence analysis add-ons for Stata is now available from http://teaching.sociology.ul.ie/sadi/sadi.pkg. (Use Stata to download.) There are two main differences over the previous implementation: first, the plugin it contains is now compiled for 32- and 64-bit Windows, and 32-bit Linux; second, duplicates are not used in the pairwise distance calculations, though complete N times N matrices are created. The latter change reduces time taken substantially if there are many duplicates.
Three distance measures are provided in this package, Hamming distance, standard OM and my OMv (see Halpin 'Optimal Matching Analysis and Life Course Data: the importance of duration', Sociological Methods and Research, 38 (3), 2010). Note that OMv is not guaranteed to generate metric distances.
Several utility functions are also included:
trans2subscreates substitution-cost matrices based on the observed pattern of transitions
stripecreates string representations of the sequences, which allows you to use Stata's regular-expression functions to summarise them
metricptests the pairwise distance matrix for the triangle inequality (note OMv will often fail this test!)
ariallow us to compare cluster solutions.
aricalculates the Adjusted Rand Index, which indexes the level of agreement between two unlabelled classifications of the same size, while
permtabgapermute the values of one of the classifications to maximise the agreement, and return the permutation.
permtabgauses a genetic algorithm to provide an approximate solution, as permutations of more than 8-10 elements take infeasibly long.
net from http://teaching.sociology.ul.ie/sadi net install sadi
This code uses functions from the
moremata package, so
you may need to do
ssc install moremata, and restart Stata,
before using the
If you have any problems installing or running these utilities, please let me know at firstname.lastname@example.org.
Talk about Multiple Imputation for Categorical Time-series.
Slides from my talk to the Workshop on Algorithmic Social Research, Nuffield College, Oxford, Feb 27 2015 .
Slides from my talk to the Hamburg SUG.
In this paper I describe my imputation of missing data in sequences in greater detail.
I presented this paper on multiple imputation for gaps in lifecourse sequences at the Lausanne Conference on Sequence Analysis, in June 2012
Slides from a one-day conference at the Université Paris-I in October 2011. There is also a recording of the presentation.
Slides and other details from a presentation to the Helsinki Collegium for Advanced Studies, May 2010 are here.
In May 2009 I spent a month in Paris as a guest of CREST, presenting an occasional course to PhD students from institutions across Paris, under the "Option Formation par la Recherche" scheme. Slides for my lectures are here.
Slides from my talk to the QMSS2 conference in Oslo, October 2008
I gave two papers to the RC33 Conference in Naples, September 2008:
My paper to the "Frontiers in Social and Economic Mobility" conference in Cornell in March 2003 is available as Departmental Working Paper WP2003-01. I had the honour of sharing the session with Andrew Abbott and Larry Wu.
Here I make available a number of utilities for Stata related to sequence analysis (including optimal matching). Some of this material relates to the short course I gave in the Essex Summer School in July 2007.
Copy the relevant files to a directory in Stata's "adopath". (All the relevant files in a single zip file are here.)
The adapted Needleman-Wunsch algorithm used by
omav command is designed to treat tokens
differently according to the length of the spell in which they
occur. This is intended to give better results than conventional
OM when used with life-course data. A preliminary discussion of
the algorithm is available in these talk slides.
NB THESE FILES ARE OUTDATED
All the relevant files are available in a single zip file here.
The oma, omav and combin commands are based on C plugins for speed. Implementing them in C rather than in Stata's Mata matrix language yields a 40-fold speed increase. Two versions are presented, X.linux.plugin and X.w32.plugin. If you are using 32-bit Windows copy each X.w32.plugin to X.plugin. For Linux, do the same with X.linux.plugin. If you have another operating system, you may be able to compile the code yourself (see below).
The Essex course is described here, and files related to the course are available here. See in particular labs.pdf.Brendan Halpin