The most up to date version of SADI is now described in the Stata Journal. A number of commands have changed names in order to be compatible with Stata Corp requirments, but otherwise there are no large changes. (Preprint at http://teaching.sociology.ul.ie/bhalpin/sadisjmain-local.pdf)
SADI is now available directly from the main Stata add-on archive, SSC:
ssc install sadi
SADI has been updated to by-pass Stata's limitation on matrix size, meaning that now more than 11,000 sequences can be compared.
An update to the SADI package was released on 3 April. Many small improvements are included, including more stable plugins. Studer et al's discrepancy measure is one such addition.
A new version of the sequence analysis add-ons for Stata is now available from http://teaching.sociology.ul.ie/sadi/sadi.pkg. (Use Stata to download.) There are two main differences over the previous implementation: first, the plugin it contains is now compiled for 32- and 64-bit Windows, and 32-bit Linux; second, duplicates are not used in the pairwise distance calculations, though complete N times N matrices are created. The latter change reduces time taken substantially if there are many duplicates.
Three distance measures are provided in this package, Hamming distance, standard OM and my OMv (see Halpin 'Optimal Matching Analysis and Life Course Data: the importance of duration', Sociological Methods and Research, 38 (3), 2010). Note that OMv is not guaranteed to generate metric distances.
Several utility functions are also included:
trans2subs
creates substitution-cost matrices based
on the observed pattern of transitionsstripe
creates string representations of the
sequences, which allows you to use Stata's regular-expression
functions to summarise themmetricp
tests the pairwise distance matrix for the
triangle inequality (note OMv will often fail this test!)permtab
, permtabga
and
ari
allow us to compare cluster solutions.
ari
calculates the Adjusted Rand Index, which indexes the
level of agreement between two unlabelled classifications of the same
size, while
permtab
and permtabga
permute the values of
one of the classifications to maximise the agreement, and return the
permutation. permtabga
uses a genetic algorithm to
provide an approximate solution, as permutations of more than 8-10
elements take infeasibly long.
net from http://teaching.sociology.ul.ie/sadi net install sadi
This code uses functions from the moremata
package, so
you may need to do ssc install moremata
, and restart Stata,
before using the sadi
commands.
If you have any problems installing or running these utilities, please let me know at brendan.halpin@ul.ie.
Notes from a short course on sequence analysis given at the University of Umeå, Sweden.
Notes from a short course on sequence analysis given to students of the Insitute of Sociology, Academia Sinica, Taipei, Taiwan.
Notes from a short course on sequence analysis given to students of the Universities of Bergen and Oslo, in Oslo.
Talk about Multiple Imputation for Categorical Time-series.
Slides from my talk to the Workshop on Algorithmic Social Research, Nuffield College, Oxford, Feb 27 2015 .
Slides from my talk to the Hamburg SUG.
In this paper I describe my imputation of missing data in sequences in greater detail.
I presented this paper on multiple imputation for gaps in lifecourse sequences at the Lausanne Conference on Sequence Analysis, in June 2012
Slides from a one-day conference at the Université Paris-I in October 2011. There is also a recording of the presentation.
Slides and other details from a presentation to the Helsinki Collegium for Advanced Studies, May 2010 are here.
In May 2009 I spent a month in Paris as a guest of CREST, presenting an occasional course to PhD students from institutions across Paris, under the "Option Formation par la Recherche" scheme. Slides for my lectures are here.
Slides from my talk to the QMSS2 conference in Oslo, October 2008
I gave two papers to the RC33 Conference in Naples, September 2008:
My paper to the "Frontiers in Social and Economic Mobility" conference in Cornell in March 2003 is available as Departmental Working Paper WP2003-01. I had the honour of sharing the session with Andrew Abbott and Larry Wu.
Here I make available a number of utilities for Stata related to sequence analysis (including optimal matching). Some of this material relates to the short course I gave in the Essex Summer School in July 2007.
Copy the relevant files to a directory in Stata's "adopath". (All the relevant files in a single zip file are here.)
The adapted Needleman-Wunsch algorithm used by
the omav
command is designed to treat tokens
differently according to the length of the spell in which they
occur. This is intended to give better results than conventional
OM when used with life-course data. A preliminary discussion of
the algorithm is available in these talk slides.
NB THESE FILES ARE OUTDATED
All the relevant files are available in a single zip file here.
The oma, omav and combin commands are based on C plugins for speed. Implementing them in C rather than in Stata's Mata matrix language yields a 40-fold speed increase. Two versions are presented, X.linux.plugin and X.w32.plugin. If you are using 32-bit Windows copy each X.w32.plugin to X.plugin. For Linux, do the same with X.linux.plugin. If you have another operating system, you may be able to compile the code yourself (see below).
The Essex course is described here, and files related to the course are available here. See in particular labs.pdf.
Brendan Halpin