{"id":122,"date":"2011-04-03T10:51:10","date_gmt":"2011-04-03T10:51:10","guid":{"rendered":"http:\/\/teaching.sociology.ul.ie\/bhalpin\/wordpress\/?p=122"},"modified":"2014-04-20T17:04:38","modified_gmt":"2014-04-20T17:04:38","slug":"stata-and-make","status":"publish","type":"post","link":"https:\/\/teaching.sociology.ul.ie\/bhalpin\/wordpress\/?p=122","title":{"rendered":"Stata and Make"},"content":{"rendered":"<p>J Scott Long has written an interesting book on <a href=\"http:\/\/www.stata.com\/bookstore\/wdaus.html\">Workflow and Data Analysis using Stata<\/a>. It&#8217;s good stuff but I was disappointed to see he makes no mention of\u00a0<code>make<\/code> and Makefiles.<\/p>\n<p>What&#8217;s <code>make<\/code>? It is a simple and powerful way of describing projects, designed initially for building complex C programs on Unix, but capable of being adapted to many other uses. One is the data analysis workflow, where there are many many steps between the raw data and the final paper.<\/p>\n<p>So I was pleased to see (from a Gary King tweet) that the <a href=\"http:\/\/ow.ly\/4rCFN\">current newsletter<\/a> of the Political Methodology of the APSA is devoted to workflow matters, and it contains a mention of using <code>make<\/code> for managing data analysis projects in Fredrickson, Testa and Weidmann&#8217;s article (though in the context of R and LaTeX, rather than Stata).<\/p>\n<p><!--more-->I&#8217;ve been routinely using Makefiles to manage projects for perhaps 15 years, and they offer the same advantage in project-level replication as do-files to in analysis-level replication. I&#8217;m surprised to see almost no evidence of interest in <code>make<\/code> in Stata contexts (I can find only one instance of a <a href=\"http:\/\/www.kai-arzheimer.com\/blog\/2008\/03\/20\/how-stata-and-a-makefile-can-make-your-day\/\">blog post<\/a>, for instance).<\/p>\n<p>Makefiles describe &#8220;dependencies&#8221; between files, in the following structure:<\/p>\n<pre>target: dependency1 dependency2 ...\r\n&lt;tab&gt;rule<\/pre>\n<p>For instance, if <code>clean.dta<\/code> is created by running <code>cleandata.do<\/code> which reads <code>raw.dat<\/code>, we can express the relationship thus (<code>&lt;tab&gt;<\/code> indicates a literal tab-character):<\/p>\n<pre>clean.dta: cleandata.do raw.dat\r\n&lt;tab&gt;stata -b do cleandata.do<\/pre>\n<p>The command <code>make clean.dta<\/code> will then run the batch stata command if <code>clean.dta<\/code> doesn&#8217;t exist or is older than either <code>cleandata.do<\/code> or <code>raw.dat<\/code>.<\/p>\n<p>A fuller example:<\/p>\n<pre>clean.dta: cleandata.do raw.dat\r\n&lt;tab&gt;stata -b do cleandata.do\r\n\r\nlookup.dta: preptab.do lookupdata.dat\r\n&lt;tab&gt;stata -b do preptab.do\r\n\r\nworkingdata.dta: lookup.dta clean.dta mergelookup.do\r\n&lt;tab&gt;stata -b do preptab.do\r\n\r\nfig1.eps: workingdata.dta drawfig1.do\r\n&lt;tab&gt;stata -b do drawfig1.do\r\n\r\nfig1.pdf: fig1.eps\r\n&lt;tab&gt;epstopdf fig1.eps\r\n\r\npaper.pdf: paper.tex fig1.pdf\r\n&lt;tab&gt;pdflatex paper<\/pre>\n<p>The command <code>make paper.pdf<\/code> will all commands necessary to create the final PDF, depending on what nested ancestors in the tree above it do not exist, or are newer. <code>make<\/code> is a boon when you have to do complex data manipulation, but it can also facilitate the generation of deliverables such as papers and reports.<\/p>\n<p>Stata, however, has one serious shortcoming from <code>make<\/code>&#8216;s point of view: if the do-file fails to create the target <code>.dta<\/code> file, it will still complete with a zero exit status. <code>make <\/code>looks for a non-zero exit status to indicate failure, in which case it won&#8217;t run the now-futile subsequent commands. To accomodate this I have a wrapper program that greps the log file for error messages and manufactures the appropriate exit status (it does a number of other useful things as well, such as timing the job, and running it at a lower priority). If the file is called <code>stb<\/code> then replace the rule in the Makefile by <code>stb <em>dofilename<\/em><\/code>.<\/p>\n<pre>#! \/bin\/bash\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 \u00a0\r\n\r\n# Nov\u00a0 7 2001 21:05:17\r\n# A wrapper for running Stata in batch mode.\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 \u00a0\r\n\r\n# Main purpose is to catch errors and pass them to the calling\r\n# process, typically \"make\". To do this it catches a couple of\r\n# typical problems with the do-file not existing etc, and otherwise\r\n# runs Stata (under nice), directing the output to $1.log. It then\r\n# greps the log file for error messages, and returns an error if it\r\n# finds them. grep should provide enough context that you can see\r\n# the error message on stdout as well.\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 \u00a0\r\n\r\n# It additionally appends information about wall and cpu time to\r\n# the logfile, along with a time stamp.\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 \u00a0\r\n\r\nprogname=`basename $0`\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 \u00a0\r\n\r\n# Strip the .do if it is there, stata ignores it\r\nstatacode=${1%*.do}\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 \u00a0\r\n\r\n# Test for do-file in another directory -- Stata logs to current\r\n# directory in either case, so direct extra log-lines to correct\r\n# location\r\nstatacodestripdir=${statacode##*\/}\r\nstatalog=$statacodestripdir.log\r\nif [ $statacode != $statacodestripdir ]; then\r\n echo \"$progname: Note: do-file may not be in current directory, but log-file is\";\r\nfi\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 \u00a0\r\n\r\necho \"$progname: Running Stata on $statacode...\"\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 \u00a0\r\n\r\nif [ \"$2\" == \"\" ]; then\r\n arg2=\"-m200\";\r\nelse arg2=$2;\r\nfi\r\n# Test for the existence of the do-file\r\nif [ -r ${statacode}.do ]; then\r\n echo \"$progname: Starting: `date`\" &gt;&gt; $statalog\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 \u00a0\r\n\r\nnice time -f \"$progname: Elapsed: %E; System: %S; User: %U; Major PFs: %F\"\\\r\n stata $arg2 -b do $statacode 2&gt; \/tmp\/stb$$timelog\r\n exitcode=$?\r\n cat \/tmp\/stb$$timelog\r\n cat \/tmp\/stb$$timelog &gt;&gt; $statalog\r\n echo \"$progname: Finished: `date`\" &gt;&gt; $statalog\r\n rm -f \/tmp\/stb$$timelog\r\nif [ \"$exitcode\" != \"0\" ]; then\r\n echo \"Stata exiting with exit code $exitcode\" &amp;&amp; exit $exitcode;\r\n fi\r\n if (egrep --before-context=1 \"^r\\([0-9]+\\)\" $statacode.log); then\r\n echo \"$progname: Stata errors found in $statacode.do\";\r\n exit 1;\r\n else echo \"$progname: No Stata errors found\";\r\n fi\r\nelse\r\n echo \"$progname: ${statacode}.do does not exist\";\r\n exit 1;\r\nfi<\/pre>\n","protected":false},"excerpt":{"rendered":"<p>J Scott Long has written an interesting book on Workflow and Data Analysis using Stata. It&#8217;s good stuff but I was disappointed to see he makes no mention of\u00a0make and Makefiles. What&#8217;s make? It is a simple and powerful way of describing projects, designed initially for building complex C programs on Unix, but capable of &hellip; <a href=\"https:\/\/teaching.sociology.ul.ie\/bhalpin\/wordpress\/?p=122\" class=\"more-link\">Continue reading <span class=\"screen-reader-text\">Stata and Make<\/span> <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[4],"tags":[],"_links":{"self":[{"href":"https:\/\/teaching.sociology.ul.ie\/bhalpin\/wordpress\/index.php?rest_route=\/wp\/v2\/posts\/122"}],"collection":[{"href":"https:\/\/teaching.sociology.ul.ie\/bhalpin\/wordpress\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/teaching.sociology.ul.ie\/bhalpin\/wordpress\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/teaching.sociology.ul.ie\/bhalpin\/wordpress\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/teaching.sociology.ul.ie\/bhalpin\/wordpress\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=122"}],"version-history":[{"count":5,"href":"https:\/\/teaching.sociology.ul.ie\/bhalpin\/wordpress\/index.php?rest_route=\/wp\/v2\/posts\/122\/revisions"}],"predecessor-version":[{"id":300,"href":"https:\/\/teaching.sociology.ul.ie\/bhalpin\/wordpress\/index.php?rest_route=\/wp\/v2\/posts\/122\/revisions\/300"}],"wp:attachment":[{"href":"https:\/\/teaching.sociology.ul.ie\/bhalpin\/wordpress\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=122"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/teaching.sociology.ul.ie\/bhalpin\/wordpress\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=122"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/teaching.sociology.ul.ie\/bhalpin\/wordpress\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=122"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}