Substitution costs from transition rates

Given that determining substitution costs in sequence analysis is such a bone of contention, many researchers look for a way for the data to generate the costs. The typical way to do this is, is by pooling transition rates and defining the substitution cost to be:

2 – p(ij) – p(ji)

where p(ij) is the transition rate from state i to state j. Intuitively, states that are closer to each other will have higher transitions, and vice versa.

I don’t recommend this approach in general, for reasons which I will not go into here, but I do have a utility in my Stata package for sequence analysis, SADI, which calculates these quantities, trans2subs.

This requires the data in long format, so we reshape first (by default the sequences are in wide format, as variables state1 to stateN).

reshape long state, i(id) j(t) trans2subs state, id(id) distmat(trpr1) trans2subs state, id(id) distmat(trpr2) diagincl reshape wide

The transition rates are calculated by default without the diagonal (i.e., ignoring cases where the sequence remains in the same state from t to t+1), but this can be over-ridden by an option.

The command works by cross-tabulating state with its lag, putting the results in a matrix, and letting Mata do some simple calculations on the result. However, the trans2subs command as distributed is fragile, and can break down in certain circumstances, for instance where a row or column has values only on the diagonal (i.e., a state that is only exited or is never exited, such as never-married or retired). Thanks to Anna Manzoni for alerting me to this problem.

As a short term solution, I present an alternative command here, t2s, which is more robust. I will replace trans2subs with this code when I next update the SADI package, but for now you can access it from this link, or by cutting and pasting from here:

mata: void transition_driven_subsmat2(string matrix tabmat, scalar diagincl) { // Read stata matrix into mata G=st_matrix(tabmat)

if (rows(G)!=cols(G)) {
_error(“Table isn’t square”)
}

if (diagincl==0) {
G = G – diag(G)
}

Gr=G:/rowsum(G)
subsmat= trunc(0.5:+(J(rows(G),rows(G),2) – Gr – Gr’):*1000000):/1000000
subsmat = subsmat – diag(subsmat)
st_matrix(tabmat,subsmat)
}
end

capture program drop t2s
program define t2s
syntax varlist(min=1 max=1) [if] [in], IDvar(varname) SUBSmat(string) [DIAgincl]

if (“`diagincl'”==””) {
local diagincl 0
}
else {
local diagincl 1
}

marksample touse

local colvar `varlist’
tempvar rowvar

by `idvar’: gen `rowvar’=`colvar'[_n-1] if _n>1

di “Generating transition-driven substitution matrix”

qui tab `rowvar’ `colvar’ if `touse’, matcell(`subsmat’)

mata: transition_driven_subsmat2(“`subsmat'”,`diagincl’)
end

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Sociology, Statistics and Software

Thoughts on computers, data analysis and the social sciences

Leave a Reply Cancel reply