Slides from my workshop on sequence analysis as part of the "Pathways to Adulthood" Workshop, Helsinki Collegium for Advanced Studies, 19 May 2010:
Sequence analysis is possible with Stata using the SQ-ados, with Stata using my C-plugin and using the TraMineR package with R. You can install the SQ add-on by issuing the command "
ssc install sq" from within Stata (but check that you do not need to access the internet via a proxy server; in that case set the appropriate variable -- see "help netio" within Stata for information). You can install my C-plugin using the instructions here, and you can install TraMineR in R using the command
install.package(TraMineR)within R.
Here are some examples, using this artificial data set.
This example uses the SQ add-on to fit pairwise distances between all pairs of sequences:
input id len s1 s2 s3 s4 s5 1 5 1 2 3 2 3 2 5 2 3 4 1 4 3 5 4 4 4 4 4 4 5 1 2 3 2 3 5 5 1 1 1 1 1 6 4 4 3 2 1 -1 end #delimit ; matrix subsmat = (0,1,2,3 \ 1,0,1,2 \ 2,1,0,1 \ 3,2,1,0 ); #delimit cr reshape long s, i(id) j(m) sqset s id m sqdes sqtab sqom if s!=-1, subcost(rawdistance) indelcost(2) full standard(longer) matrix list SQdist
Use the command
help sqdemofor more information.
This example runs my plugin and SQOM on the same data:
#delimit ; adopath + [[ path to where you have installed plugin and helper programs ]]; set matsize 800; matrix subs = (0,1,2,3 \ 1,0,1,2 \ 2,1,0,1 \ 3,2,1,0 ); use anondata; oma state1-state72, pwdist(bs) subsmat(subs) indel(2) length(72); reshape long state, i(id) j(m); sqset state id m; sqdes; sqtab; sqom, subcost(subs) indelcost(2) full standard(longer); reshape wide; clustermat wards bs, add; cluster generate o6=groups(6); tab o6;
This example uses TraMineR on the same data. See the TraMineR manual at http://mephisto.unige.ch/pub/TraMineR/Doc/1.4/TraMineR-1.4-Users-Guide.pdf for an explanation:
library(TraMineR) library(foreign) bs <- read.dta("anondata.dta") attach(bs) bs.lab <- seqstatl(bs[, 2:74]) bs.scode <- c("F", "p", "U", "n") bs.seq <- seqdef(bs, 2:74, states=bs.scode, labels=bs.lab) seqiplot(bs.seq, withlegend = T, title = "Index plot: 10 first sequences") seqfplot(bs.seq, pbarw = T, withlegend = T, space = 0, title = "Sequence frequency plot") seqdplot(bs.seq, withlegend = T, title = "State distribution plot") bs.turb <- seqST(bs.seq) summary(bs.turb) hist(bs.turb, col = "cyan") submat <- rbind(c(0, 1, 2, 3 ), c(1, 0, 1, 2 ), c(2, 1, 0, 1 ), c(3, 2, 1, 0 )) dist.om <- seqdist(bs.seq, method = "OM", indel = 2, sm = submat) library(cluster) clusterward <- agnes(dist.om, diss = TRUE, method = "ward") cl.6 <- cutree(clusterward, k = 6) table(cl.6)Brendan Halpin