One of the key anxieties about sequence analysis is how to set substitution costs. Many criticisms of optimal matching focus on the fact that we have no theory or method for assigning substitution costs (Larry Wu’s 2000 SMR paper is a case in point). Sometimes analysts opt for using transition-probability-derived costs to avoid the issue.
My stock line is that substitution costs should describe the relationships between states in the basic state space (e.g., employment status), and that the distance-measure algorithm simply maps an understanding of the basic state-space onto the state-space of the trajectories (e.g., work-life careers). Not everyone finds this very convincing, however.
I’ve been playing recently with a set of tools for evaluating different distance measures, and it struck me that I could use them to address this issue. Rather than compare hand-composed substitution matrices, however, I felt an automated approach was needed: randomise the matrices, and see what the results looked like. Can we improve on the analysts’ judgement? What are the characteristics of “good” substitution cost matrices?
Continue reading Substitution costs in sequence analysis — randomise! →