Substitution costs from transition rates

Given that determining substitution costs in sequence analysis is such a bone of contention, many researchers look for a way for the data to generate the costs. The typical way to do this is, is by pooling transition rates and defining the substitution cost to be:

2 – p(ij) – p(ji)

where p(ij) is the transition rate from state i to state j. Intuitively, states that are closer to each other will have higher transitions, and vice versa.

I don’t recommend this approach in general, for reasons which I will not go into here, but I do have a utility in my Stata package for sequence analysis, SADI, which calculates these quantities, trans2subs.

This requires the data in long format, so we reshape first (by default the sequences are in wide format, as variables state1 to stateN).

reshape long state, i(id) j(t)
trans2subs state, id(id) distmat(trpr1)
trans2subs state, id(id) distmat(trpr2) diagincl
reshape wide

The transition rates are calculated by default without the diagonal (i.e., ignoring cases where the sequence remains in the same state from t to t+1), but this can be over-ridden by an option.

The command works by cross-tabulating state with its lag, putting the results in a matrix, and letting Mata do some simple calculations on the result. However, the trans2subs command as distributed is fragile, and can break down in certain circumstances, for instance where a row or column has values only on the diagonal (i.e., a state that is only exited or is never exited, such as never-married or retired). Thanks to Anna Manzoni for alerting me to this problem.

As a short term solution, I present an alternative command here, t2s, which is more robust. I will replace trans2subs with this code when I next update the SADI package, but for now you can access it from this link, or by cutting and pasting from here:


mata:
void transition_driven_subsmat2(string matrix tabmat, scalar diagincl) {
// Read stata matrix into mata
G=st_matrix(tabmat)

if (rows(G)!=cols(G)) {
_error(“Table isn’t square”)
}

if (diagincl==0) {
G = G – diag(G)
}

Gr=G:/rowsum(G)
subsmat= trunc(0.5:+(J(rows(G),rows(G),2) – Gr – Gr’):*1000000):/1000000
subsmat = subsmat – diag(subsmat)
st_matrix(tabmat,subsmat)
}
end

capture program drop t2s
program define t2s
syntax varlist(min=1 max=1) [if] [in], IDvar(varname) SUBSmat(string) [DIAgincl]

if (“`diagincl'”==””) {
local diagincl 0
}
else {
local diagincl 1
}

marksample touse

local colvar `varlist’
tempvar rowvar

by `idvar’: gen `rowvar’=`colvar'[_n-1] if _n>1

di “Generating transition-driven substitution matrix”

qui tab `rowvar’ `colvar’ if `touse’, matcell(`subsmat’)

mata: transition_driven_subsmat2(“`subsmat'”,`diagincl’)
end

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.