{"id":365,"date":"2016-01-17T21:59:42","date_gmt":"2016-01-17T21:59:42","guid":{"rendered":"http:\/\/teaching.sociology.ul.ie\/bhalpin\/wordpress\/?p=365"},"modified":"2016-01-17T21:59:42","modified_gmt":"2016-01-17T21:59:42","slug":"pseudo-r2-is-pseudo","status":"publish","type":"post","link":"https:\/\/teaching.sociology.ul.ie\/bhalpin\/wordpress\/?p=365","title":{"rendered":"Pseudo-R2 is pseudo"},"content":{"rendered":"<p>People like the R<sup>2<\/sup> stat from linear regression so much that they re-invent it in places it doesn&#8217;t naturally arise, such as logistic regression. The true R<sup>2<\/sup> has nice clean interpretations, as the proportion of variation explained or the square of the correlation between observed and predicted values. The fake or pseudo-R<sup>2<\/sup> statistics are often based on relating the loglikelihood of the current model against that of the null model (intercept only) in some way. There is a good overview at <a href=\"http:\/\/www.ats.ucla.edu\/stat\/mult_pkg\/faq\/general\/Psuedo_RSquareds.htm\">UCLA<\/a>.<\/p>\n<p>One of the most popular pseudo-R<sup>2<\/sup> is McFadden&#8217;s. This is defined as 1 &#8211; LLm\/LL0 where LLm is the log-likelihood of the current model, and LL0 that of the null model. This appears to have the range 0-1 though 1 will never be reached in practice.<\/p>\n<p>It is well known that if we fit linear regressions by maximum-likelihood, we get exactly the same parameter estimates as if we fit by ordinary least squares. We can demonstrate this in Stata:<\/p>\n<p><code>. sysuse auto<br \/>\n. reg price headroom mpg<br \/>\n. glm price headroom mpg<\/code><\/p>\n<p>Since the ML estimation of the linear regression gives us loglikelihoods, we can calculate pseudo-R2 and true R2 for the same model. This code does it for a range of simple models with Stata&#8217;s demonstration &#8220;auto&#8221; data set:<\/p>\n<p><code><br \/>\nsysuse auto, clear<br \/>\nglm price<br \/>\nlocal basell = e(ll)<br \/>\nlocal vars \"mpg rep78 headroom trunk weight length turn displacement gear_ratio foreign\"<br \/>\nlocal rhs = \"\"<\/p>\n<p>gen r2 = .<br \/>\ngen mcf = .<\/p>\n<p>local i 0<br \/>\nforeach var in `vars' {<br \/>\n  local i = `i'+1<br \/>\n  local rhs = \"`rhs' `var'\"<br \/>\n  qui glm price `rhs'<br \/>\n  local mcfad =  1 - (e(ll)\/ `basell')<br \/>\n  qui reg price `rhs'<br \/>\n  di %6.3f `=e(r2)' %6.3f `mcfad' \" : `rhs'\"<br \/>\n  qui replace r2 = `=e(r2)' in `i'<br \/>\n  qui replace mcf = `mcfad' in `i'<br \/>\n}<\/p>\n<p>label var mcf \"McFadden Pseudo-R2\"<br \/>\nlabel var r2 \"R-squared\"<br \/>\nscatter mcf r2<br \/>\n<\/code><\/p>\n<p>This generates the following graph, in which we see that there is a monotonic but non-linear relationship between the two measures. We can also see very clearly that pseudo-R2 is always substantially lower than R2. Thus it should be clear that while it emulates R2 in spirit, it doesn&#8217;t actually approximate it. So when people talk about proportion of variation explained in a logistic regression, shoot them down.<\/p>\n<p><img decoding=\"async\" src=\"http:\/\/teaching.sociology.ul.ie\/bhalpin\/pseudoR2.png\" alt=\"pseudo-R2 vs R2\" \/><\/p>\n","protected":false},"excerpt":{"rendered":"<p>People like the R2 stat from linear regression so much that they re-invent it in places it doesn&#8217;t naturally arise, such as logistic regression. The true R2 has nice clean interpretations, as the proportion of variation explained or the square of the correlation between observed and predicted values. The fake or pseudo-R2 statistics are often &hellip; <a href=\"https:\/\/teaching.sociology.ul.ie\/bhalpin\/wordpress\/?p=365\" class=\"more-link\">Continue reading <span class=\"screen-reader-text\">Pseudo-R2 is pseudo<\/span> <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"_links":{"self":[{"href":"https:\/\/teaching.sociology.ul.ie\/bhalpin\/wordpress\/index.php?rest_route=\/wp\/v2\/posts\/365"}],"collection":[{"href":"https:\/\/teaching.sociology.ul.ie\/bhalpin\/wordpress\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/teaching.sociology.ul.ie\/bhalpin\/wordpress\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/teaching.sociology.ul.ie\/bhalpin\/wordpress\/index.php?rest_route=\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/teaching.sociology.ul.ie\/bhalpin\/wordpress\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=365"}],"version-history":[{"count":9,"href":"https:\/\/teaching.sociology.ul.ie\/bhalpin\/wordpress\/index.php?rest_route=\/wp\/v2\/posts\/365\/revisions"}],"predecessor-version":[{"id":375,"href":"https:\/\/teaching.sociology.ul.ie\/bhalpin\/wordpress\/index.php?rest_route=\/wp\/v2\/posts\/365\/revisions\/375"}],"wp:attachment":[{"href":"https:\/\/teaching.sociology.ul.ie\/bhalpin\/wordpress\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=365"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/teaching.sociology.ul.ie\/bhalpin\/wordpress\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=365"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/teaching.sociology.ul.ie\/bhalpin\/wordpress\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=365"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}