Category Archives: sequence_analysis

Discrepancy analysis in Stata

In Studer et al (2011) an important new tool is introduced to the field of sequence analysis, the idea of “discrepancy” as a way of analysing pairwise distances. This quantity is shown to be analogous to variance, and is thus amenable to ANOVA-type analysis, which means it is a very attractive complement to cluster analysis of distance matrices.

This has been implemented in TraMineR (under R), along with a raft of other innovations coming out of Geneva and Lausanne. Up to now it hasn’t been available elsewhere. I spoke to Matthias Studer at the LaCOSA conference, and he convinced me that it was easy to code, and that all the information required was in the paper. This turned out to be the case, and I have written an initial Stata implementation. Continue reading Discrepancy analysis in Stata

Cluster analysis is unstable, we knew that!

Experience tells me that small changes in the data can lead to substantial changes in the solution of a cluster analysis. This is especially true when the space is sparsely populated, as is the case with sequence analysis of lifecourses. Small changes in parameterisation (e.g., substitution costs) can lead to substantial differences in the cluster solution.

However, recently I came across an extreme case of sensitivity. Continue reading Cluster analysis is unstable, we knew that!