UCAS, ethnicity and admission rates

September 21, 2015Statabrendan

UCAS, the UK university admissions clearing house, have released data relating to ethnicity and admissions to English universities, in part in response to Vikki Boliver‘s research in Sociology suggesting that members of ethnic minorities are less likely to be admitted to Russell Group universities.

The analysis note with the release is sober and correct, showing a mostly consistent pattern of offer rates for ethnic minority students being lower (but not far lower) than expected. However, UCAS’s press release seems to have suggested that the effect is almost explained away, and attributes it to ethnic minority students disproportionately applying to courses with low acceptance rates. This does not seem to be the case.

Multi-processor Stata without Stata-MP

October 13, 2014Uncategorizedbrendan

Exploit your cores!

If you don’t have Stata-MP, it can be difficult to benefit from all the cores on your computer. However, if your problem can be split up in parts that can run in parallel, it is easy to run multiple instances of Stata. In this note I demonstrate a simple case, using the example of a simulation I wish to run many times.

Continue reading Multi-processor Stata without Stata-MP →

Substitution costs from transition rates

September 24, 2014Uncategorizedbrendan

Given that determining substitution costs in sequence analysis is such a bone of contention, many researchers look for a way for the data to generate the costs. The typical way to do this is, is by pooling transition rates and defining the substitution cost to be:

2 – p(ij) – p(ji)

where p(ij) is the transition rate from state i to state j. Intuitively, states that are closer to each other will have higher transitions, and vice versa. Continue reading Substitution costs from transition rates →

New Sequence Analysis Tools

April 3, 2014Uncategorizedbrendan

I last released SADI, my sequence analysis tools for Stata, in November 2011. Since then I’ve made various improvements and additions, relating to ongoing work such as that reported in Dept Working Paper WP2012-02 and WP2013-05 (the latter is an early version of a paper that is coming out in the book of the LaCOSA conference, due shortly).
Continue reading New Sequence Analysis Tools →

Hardline materialism in the Irish Times letter page

January 27, 2014UncategorizedBrendan Halpin

The text of my letter published in the Irish Times today (at http://www.irishtimes.com/debate/letters/philosophy-and-science-1.1667425):

Sir, – William Reville (Science, January 16th) criticises materialism as excluding, without evidence, the possibility of the supernatural. Continue reading Hardline materialism in the Irish Times letter page →

Using Emacs to send mail later

January 27, 2014Uncategorizedbrendan

There are lots of ways to schedule mail to be sent some time in the future, but it is easy, for those of us who write and send mail from Emacs, to use that program and the Unix atd batch system to do it. If you use message-mode to write messages, this approach means that creating mails for delayed sending is the same as for normal sending.

Continue reading Using Emacs to send mail later →

Mapping with Python and Stata

Elevation data for large swathes of the planet have been collected by NASA and are available to download from http://dds.cr.usgs.gov/srtm/.

The data is contained in binary files, each representing a 1-degree by 1-degree “square”. Here are five lines of Python and four lines of Stata that will turn the data into a simple graph:

import struct file = open("data/N52W011.hgt", "r") for y in range(1201): for x in range(1201): print y, x, struct.unpack(">h",file.read(2))[0]

Do python file.py > map.dat. Then run this Stata code:

infile i j height using /tmp/ext.dat gen h2 = int(sqrt(height)) replace h2 = 30 if h2<=0 hmap j i h2, nosc

Low res version of map

(Hi-res version.)

You may need to install Python’s struct package, and Stata’s hmap add on, but they’re available from the usual locations.

There are better ways of doing this, of course: it’s slow, the aspect ratio is wrong, the colours are not ideal and the axis labelling is bad. Even worse, it is a complete abuse of the hmap add-on. It’s a quick and dirty way to turn binary data into pictures, all the same.

Hedstrom’s Desires-Believes-Acts model in Emacs lisp

April 28, 2013Uncategorizedbrendan

Emacs-lisp is a pretty functional language for managing Emacs and automating complex tasks within it, particularly to do with text processing. It’s probably not wise to use it for more general programming or analytical tasks, but every now and then (when I need to procrastinate, mostly) I get carried away.

A few years ago I was reading Peter Hedstrom’s book, Dissecting the Social, and realised his Desires-Believes-Acts model (a kind of cellular automaton) would be easy enough to implement. More recently, I noticed that Emacs’ tools for displaying simple games like Tetris (do “M-x tetris”) would permit a clean display.

In Hedstrom’s model, every cell in a grid may desire an outcome, and may believe they are able to achieve it. If they do both, they act. Belief and desire depend on the beliefs and desires of your neighbours. Generally, even starting from random and low distributions of belief and desire, within a number of iterations stable configurations emerge, with systematic segregation; often everyone acts in the end but sometime stable oscillating systems emerge.

Continue reading Hedstrom’s Desires-Believes-Acts model in Emacs lisp →

Discrepancy analysis in Stata

June 19, 2012sequence_analysis, Statabrendan

In Studer et al (2011) an important new tool is introduced to the field of sequence analysis, the idea of “discrepancy” as a way of analysing pairwise distances. This quantity is shown to be analogous to variance, and is thus amenable to ANOVA-type analysis, which means it is a very attractive complement to cluster analysis of distance matrices.

This has been implemented in TraMineR (under R), along with a raft of other innovations coming out of Geneva and Lausanne. Up to now it hasn’t been available elsewhere. I spoke to Matthias Studer at the LaCOSA conference, and he convinced me that it was easy to code, and that all the information required was in the paper. This turned out to be the case, and I have written an initial Stata implementation. Continue reading Discrepancy analysis in Stata →

Cluster analysis is unstable, we knew that!

April 23, 2012sequence_analysis, Statabrendan

Experience tells me that small changes in the data can lead to substantial changes in the solution of a cluster analysis. This is especially true when the space is sparsely populated, as is the case with sequence analysis of lifecourses. Small changes in parameterisation (e.g., substitution costs) can lead to substantial differences in the cluster solution.

However, recently I came across an extreme case of sensitivity. Continue reading Cluster analysis is unstable, we knew that! →

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Sociology, Statistics and Software

Thoughts on computers, data analysis and the social sciences