I have a Stata program, metricp, which takes a distance matrix (square, symmetric, zero diagonal) and tests for the triangle inequality (that for all i, j, there is no k such that d[i,j] > d[i,k] + d[j,k]).
I needed something similar today in R, and found Matthew Vavrek’s fossil package, which includes a tri.ineq() function. However, it turned out to be very slow on the large matrix I threw at it, so I decided to speed it up with some of the techniques in metricp.ado.
This is the original tri.ineq():
`tri.ineq` <-
function(dist)
{
mat<-as.matrix(dist)
n<-dim(mat)[1]
ineq <- 0
for (i in 1:(n-2)) {
for (j in (i+1):(n-1)) {
for (k in (j+1):n) {
sds<-c(mat[j,i], mat[k,i], mat[k,j])
lng<-max(sds)
if (lng>(sum(sds)-lng)) ineq<-ineq+1
}
}
}
return(ineq==0)
}
It’s sensible code, but it uses three nested loops (for each i and j, look for an infringing k) and it doesn’t stop once it has found one. My code instead takes each i and j and adds the vector of distances to all ks and takes its minimum, and stops once it has found a minimum number of infringing ks.
My version:
`tri.ineq2` <-
function(dist)
{
mat<-as.matrix(dist)
n<-dim(mat)[1]
ineq <- 0
for (i in 1:(n-1)) {
for (j in (i+1):n) {
if (mat[i,j] > min(mat[i,]+mat[j,])) {
ineq <- ineq+1
}
if (ineq>=10) break
}
if (ineq>=10) break
}
return(ineq==0)
}
The term mat[i,]+mat[j,] represents a vector if distances i to k to j for all ks, so taking its minimum gets the distance for the “worst” k, if any infringes the triangle inequality. Breaking on ineq>=10 avoids the waste of time of checking every infringement; it could make sense to break after a single infringement (my metricp.ado command offers options for reporting the infringements).
Testing with a 1151 times 1151 matrix shows its about 20 times as fast (when the matrix is metric):
> system.time(tri.ineq(jm3))
user system elapsed
545.120 0.000 545.182
> source("/tmp/tri.ineq2.R")
> system.time(tri.ineq2(jm3))
user system elapsed
24.248 0.012 24.264
I left a note on Matthew’s blog but it didn’t format properly, hence this short note.
metricp.ado is part of SADI. Do ssc install sadi in Stata to install it.
Really interesting. I was searching for ways to speed up a script I am using, and this might do the trick. At least partially. Thank you!