Table of Contents
A new command
In this blog I introduce a new utility command, sdmksubs
, which creates substitution cost matrices for sequence analysis, as part of the SADI Stata add-ons.
Most sequence-analysis distance commands require the specification of substitution costs, which describe the pattern of differences in the state space through which the sequences move. These can be derived from theory, from external data, or can be imposed by researcher fiat. It is also common to use the pattern of transitions in the sequence data to derive them, though this is not an unproblematically good idea. The existing trans2subs
command in SADI calculates simple transition-based substitution costs. The new sdmksubs
calculates this substitution cost structure, and a range of others, some simple “theoretical” ones, and some based on the transition pattern, but taking more of the data into account than the traditional trans2subs
matrix.
SADI is a Stata package sequence analysis of data such as lifecourse histories, and has been around for quite a while. Recent improvements includes fixes for internal changes in Stata 18, lifting limits on sequence length, etc., but here I focus on sdmksubs
only.