{"id":496,"date":"2018-04-14T13:59:00","date_gmt":"2018-04-14T13:59:00","guid":{"rendered":"http:\/\/teaching.sociology.ul.ie\/bhalpin\/wordpress\/?p=496"},"modified":"2018-04-20T16:51:19","modified_gmt":"2018-04-20T16:51:19","slug":"handling-dyadic-data-in-stata","status":"publish","type":"post","link":"https:\/\/teaching.sociology.ul.ie\/bhalpin\/wordpress\/?p=496","title":{"rendered":"Handling dyadic data in Stata"},"content":{"rendered":"<div id=\"outline-container-org71af7ba\" class=\"outline-2\">\n<h2 id=\"org71af7ba\">Processing dyads in Stata<\/h2>\n<div id=\"text-1\" class=\"outline-text-2\">\n<p>Sometimes when you are working with nested data (such as household surveys, with data on all individuals in the household), analysis focuses on dyads (such as spouse pairs) rather than individual cases. This means you need to link data in one observation with that in another. As long as the data includes information in ego&#8217;s record about where alter&#8217;s record is (e.g., by holding alter&#8217;s ID as a variable), the simplest way to do this is to create a separate data file, where the alter ID variable is renamed to ID, and the substantive variables are also renamed, and to match it back in to the original data. This is not terribly difficult, but it is messy, so I present here a more convenient method.<br \/>\n<!--more--><\/p>\n<p>First, an example using the standard approach, and the wave 18 BHPS. The BHPS is a household survey where each record represents an individual, and in theory each adult member of the household is surveyed. Each individual has a unique ID, <code>pid<\/code>. For individuals whose spouse is in the survey (and therefore probably in the data set), their spouse&#8217;s ID is stored in <code>osppid<\/code>.<\/p>\n<div class=\"org-src-container\">\n<pre class=\"src src-Stata\"><span style=\"color: #1c86ee;\">use<\/span> osppid osex ojbstat <span style=\"color: #6e8b3d;\">using<\/span> \/home\/data\/bhps\/oindresp\r\n<span style=\"color: #6e8b3d;\">tempfile<\/span><span style=\"color: #00688b;\"> spousedata<\/span>\r\n<span style=\"color: #1c86ee;\">keep<\/span> <span style=\"color: #6e8b3d;\">if<\/span> osppid!=0 <span style=\"color: #7f7f7f;\">\/\/ Drop cases where no spouse reported<\/span>\r\n<span style=\"color: #1c86ee;\">rename<\/span> (osppid osex ojbstat) (pid spsex spjbstat)\r\n<span style=\"color: #1c86ee;\">save<\/span> `<span style=\"color: #2e8b57;\">spousedata<\/span>', <span style=\"color: #1c86ee;\">replace<\/span>\r\n<span style=\"color: #1c86ee;\">use<\/span> pid osppid osex ojbstat <span style=\"color: #6e8b3d;\">using<\/span> \/home\/data\/bhps\/oindresp\r\n<span style=\"color: #1c86ee;\">merge<\/span> 1:1 pid <span style=\"color: #6e8b3d;\">using<\/span> `<span style=\"color: #2e8b57;\">spousedata<\/span>'\r\n<span style=\"color: #1c86ee;\">keep<\/span> <span style=\"color: #6e8b3d;\">if<\/span> _<span style=\"color: #1c86ee;\">merge<\/span>!=2 <span style=\"color: #7f7f7f;\">\/\/ Drop people reported as alters who are not present as egos<\/span>\r\n<\/pre>\n<\/div>\n<p>This code first loads alter-ID and two substantive variables, renames them (renaming alter-ID to the same name as ego-ID), and saves to a temporary file. The file thus contains information about ego keyed to alter&#8217;s ID: if we consider it from alter&#8217;s point of view it consists of information about alter&#8217;s alter keyed on ID (for spouse pairs the relationship is symmetric, but in general it reverses the relationship: if ego is the parent and alter the child, this file contains information about the individual&#8217;s parent). It then loads ego-ID, alter-ID and the substantive variables again, and does a merge. It drops cases which are present only in the alter file (these are people whose ID is reported as spouses, who are not present in the file, due typically to non-response).<\/p>\n<p>Here we see the result, crosstabulating ego and alter sex: nearly (but not quite) everyone is reporting heterosexual relationships: <\/p>\n<pre class=\"example\">. tab osex spsex\r\n\r\n                   |         sex \r\n              sex  |      male     female |     Total\r\n-------------------+----------------------+----------\r\n              male |        40      4,513 |     4,553 \r\n            female |     4,513         26 |     4,539 \r\n-------------------+----------------------+----------\r\n             Total |     4,553      4,539 |     9,092 \r\n\r\n\r\n\r\n<\/pre>\n<p>My alternative involves using a custom program to find the row number of alter&#8217;s record, and is more concise:<\/p>\n<div class=\"org-src-container\">\n<pre class=\"src src-Stata\"><span style=\"color: #1c86ee;\">use<\/span> pid osppid osex ojbstat <span style=\"color: #6e8b3d;\">using<\/span> \/home\/data\/bhps\/oindresp, <span style=\"color: #1c86ee;\">clear<\/span>\r\n\r\ndyadid pid osppid, <span style=\"color: #1c86ee;\">gen<\/span>(idx)\r\n<span style=\"color: #1c86ee;\">gen<\/span> spsex2 = osex[idx]\r\n<span style=\"color: #1c86ee;\">gen<\/span> spjbstat = ojbstat[idx]\r\n<\/pre>\n<\/div>\n<p>The results are identical.<\/p>\n<pre class=\"example\">. tab osex spsex2\r\n\r\n                   |        spsex2\r\n              sex  |         1          2 |     Total\r\n-------------------+----------------------+----------\r\n              male |        40      4,513 |     4,553 \r\n            female |     4,513         26 |     4,539 \r\n-------------------+----------------------+----------\r\n             Total |     4,553      4,539 |     9,092 \r\n\r\n\r\n\r\n<\/pre>\n<\/div>\n<\/div>\n<div id=\"outline-container-org2a52b01\" class=\"outline-2\">\n<h2 id=\"org2a52b01\">The program<\/h2>\n<div id=\"text-2\" class=\"outline-text-2\">\n<p>In the example the main work is obscured, as it takes place in the <code>dyadid<\/code> command. This command uses Mata&#8217;s associative arrays to create a new variable, which is the case number of the spouse record. Effectively, the Mata code passes through the data twice, first creating in an <code>asarray<\/code> a record of the case number for each observed ego-ID, and then plugging in each alter-ID into the same array to pull out the corresponding case number.<\/p>\n<div class=\"org-src-container\">\n<pre class=\"src src-Stata\">mata:\r\n\r\nreal matrix function dyadid (string idvar, string dyadidvar, string genvar) {\r\n  <span style=\"color: #6e8b3d;\">st<\/span>_view(id = ., ., (idvar))\r\n  <span style=\"color: #6e8b3d;\">st<\/span>_view(dyadid = ., ., (dyadidvar))\r\n  <span style=\"color: #6e8b3d;\">st<\/span>_view(<span style=\"color: #1c86ee;\">gen<\/span> = ., ., (genvar))\r\n\r\n  nobs = <span style=\"color: #6e8b3d;\">length<\/span>(dyadid)\r\n\r\n  altindex = asarray_create(<span style=\"color: #8b7355;\">\"real\"<\/span>)\r\n  <span style=\"color: #8b7355;\">\"Build AS-array\"<\/span>\r\n  <span style=\"color: #6e8b3d;\">for<\/span> (i=1; i&lt;=nobs; i++) {\r\n    asarray(altindex,id[i],i)\r\n  }\r\n  <span style=\"color: #8b7355;\">\"Read AS-array\"<\/span>\r\n  <span style=\"color: #6e8b3d;\">for<\/span> (i=1; i&lt;=nobs; i++) {\r\n    <span style=\"color: #6e8b3d;\">if<\/span> (asarray_contains(altindex,dyadid[i])) {\r\n      <span style=\"color: #1c86ee;\">gen<\/span>[i] = asarray(altindex,dyadid[i])\r\n    }\r\n    <span style=\"color: #6e8b3d;\">else<\/span> {\r\n      <span style=\"color: #1c86ee;\">gen<\/span>[i] = .\r\n    }\r\n  }\r\n  <span style=\"color: #8b7355;\">\"Done\"<\/span>\r\n}\r\n\r\n<span style=\"color: #1c86ee;\">end<\/span>\r\n\r\nprogram dyadid\r\nsyntax varlist(min=2 max=2), <span style=\"color: #1c86ee;\">gen<\/span>(string)\r\ntokenize `<span style=\"color: #2e8b57;\">varlist<\/span>'\r\n\r\n<span style=\"color: #7f7f7f;\">\/* <\/span><span style=\"color: #7f7f7f;\">\/\/ Check that alter-ID is unique if not missing <\/span><span style=\"color: #7f7f7f;\">*\/<\/span>\r\n<span style=\"color: #7f7f7f;\">\/* <\/span><span style=\"color: #7f7f7f;\">preserve <\/span><span style=\"color: #7f7f7f;\">*\/<\/span>\r\n<span style=\"color: #7f7f7f;\">\/* <\/span><span style=\"color: #7f7f7f;\">keep if !<\/span><span style=\"color: #6e8b3d;\">missing<\/span><span style=\"color: #7f7f7f;\">(`2') <\/span><span style=\"color: #7f7f7f;\">*\/<\/span>\r\n<span style=\"color: #7f7f7f;\">\/* <\/span><span style=\"color: #7f7f7f;\">isid `2' <\/span><span style=\"color: #7f7f7f;\">*\/<\/span>\r\n<span style=\"color: #7f7f7f;\">\/* <\/span><span style=\"color: #7f7f7f;\">restore <\/span><span style=\"color: #7f7f7f;\">*\/<\/span>\r\n\r\nqui <span style=\"color: #1c86ee;\">gen<\/span> `<span style=\"color: #2e8b57;\">gen<\/span>' = .\r\nmata dyadid(<span style=\"color: #8b7355;\">\"`1'\"<\/span>, <span style=\"color: #8b7355;\">\"`2'\"<\/span>, <span style=\"color: #8b7355;\">\"`<\/span><span style=\"color: #2e8b57;\">gen<\/span><span style=\"color: #8b7355;\">'\"<\/span>)\r\n<span style=\"color: #1c86ee;\">end<\/span>\r\n\r\n<span style=\"color: #7f7f7f;\">\/*<\/span>\r\n\r\n<span style=\"color: #7f7f7f;\">With dyadic data, given ID (not necessarily unique) and alter-ID<\/span>\r\n<span style=\"color: #7f7f7f;\">(unique, but potentially missing), where alter-ID is the ID of the<\/span>\r\n<span style=\"color: #7f7f7f;\">partner, generate an index variable which is the row number of the<\/span>\r\n<span style=\"color: #7f7f7f;\">partner's record<\/span>\r\n\r\n<span style=\"color: #7f7f7f;\">. dyadid id spid, gen(idx)<\/span>\r\n<span style=\"color: #7f7f7f;\">. gen spempstat = empstat[idx]<\/span>\r\n\r\n<span style=\"color: #7f7f7f;\">*\/<\/span>\r\n<\/pre>\n<\/div>\n<p>The syntax is<\/p>\n<pre class=\"example\">dyadid egoID alterID, gen(indexvar)\r\n<\/pre>\n<p>The ego-ID does not need to be unique, but the alter-ID should be (though it can be missing). However, if there are duplicates in alter-ID it won&#8217;t provoke an error, but only the last occurrence will be recorded. Where there is no alter, or where alter&#8217;s ID is not present in the data as an ego-record, the index variable will be missing.<\/p>\n<p>To recap, the sort of data this is intended for includes records for both ego and alter, keyed on an ID variable, and linked by a variable that contains alter&#8217;s ID. We link from ego to alter by finding the case number of the ego-record corresponding to the alter-ID variable.<\/p>\n<\/div>\n<\/div>\n<div id=\"outline-container-org625dd93\" class=\"outline-2\">\n<h2 id=\"org625dd93\">Implications for SADI<\/h2>\n<div id=\"text-3\" class=\"outline-text-2\">\n<p>I plan to extend some of my SADI sequence distance measures to use this mechanism to create dyadic distance variables, rather than square pairwise matrices. This means it is much more efficient with large data sets, if only dyadic distances are needed. Let me know if this interests you.<\/p>\n<\/div>\n<\/div>\n<div id=\"outline-container-org7ffae9c\" class=\"outline-2\">\n<h2 id=\"org7ffae9c\">Installation<\/h2>\n<div id=\"text-4\" class=\"outline-text-2\">\n<p>The code is available on SSC:<\/p>\n<pre class=\"example\">. ssc describe dyadid\r\n. ssc install dyadid\r\n. ssc get dyadid\r\n<\/pre>\n<\/div>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>Processing dyads in Stata Sometimes when you are working with nested data (such as household surveys, with data on all individuals in the household), analysis focuses on dyads (such as spouse pairs) rather than individual cases. This means you need to link data in one observation with that in another. As long as the data &hellip; <a href=\"https:\/\/teaching.sociology.ul.ie\/bhalpin\/wordpress\/?p=496\" class=\"more-link\">Continue reading <span class=\"screen-reader-text\">Handling dyadic data in Stata<\/span> <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"_links":{"self":[{"href":"https:\/\/teaching.sociology.ul.ie\/bhalpin\/wordpress\/index.php?rest_route=\/wp\/v2\/posts\/496"}],"collection":[{"href":"https:\/\/teaching.sociology.ul.ie\/bhalpin\/wordpress\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/teaching.sociology.ul.ie\/bhalpin\/wordpress\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/teaching.sociology.ul.ie\/bhalpin\/wordpress\/index.php?rest_route=\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/teaching.sociology.ul.ie\/bhalpin\/wordpress\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=496"}],"version-history":[{"count":7,"href":"https:\/\/teaching.sociology.ul.ie\/bhalpin\/wordpress\/index.php?rest_route=\/wp\/v2\/posts\/496\/revisions"}],"predecessor-version":[{"id":504,"href":"https:\/\/teaching.sociology.ul.ie\/bhalpin\/wordpress\/index.php?rest_route=\/wp\/v2\/posts\/496\/revisions\/504"}],"wp:attachment":[{"href":"https:\/\/teaching.sociology.ul.ie\/bhalpin\/wordpress\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=496"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/teaching.sociology.ul.ie\/bhalpin\/wordpress\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=496"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/teaching.sociology.ul.ie\/bhalpin\/wordpress\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=496"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}