Suppressor: the Midas Touch
Can adding an additional explanatory variable make previously-insiginificant ones significant?
You are doing regression analysis with lots of variables. You find one of them has a huge p-value. “Drop it,” a voice screams.
Not so fast. In this post, I show that there exists a certain category of explanatory variables (formally known as suppressor), the inclusion of which increases the explanatory power of other exisiting variables, so much so that insignificant ones might end up being significant.
Find $x_1$, $x_2$ and $y$ such that $x_1$ is insiginificant according specification:
but becomes siginificant in the existence of $x_2$:
By definition, $x_2$ is the suppressor we are looking for.
A Stylized Example
One possible way to find such case would be:
n <- 100 x1 <- rnorm(n, 0, 0.01) # random varaible drawn from normal distribution x2 <- runif(n, 0, 10) # random variable drawn from uniform distribution epsilon <- rnorm(n, 0, 0.001) y <- 3 + 1 * x1 + 1 * x2 + epsilon
Not surprisingly, $x_1$ by itself has negligible explanatory power:
|y ~ x1||Estimate||Std. Error||t value||p-value|
However, things change dramatically once we bring $x_2$ into the equation:
|y ~ x1 + x2||Estimate||Std. Error||t value||p-value|
The real reason behind the weird example is the fact $x_2$ completely outshines $x_1$ in terms of the ability to explain the variation in $y$. $x_1$’s usefullness is simply overwhelmed by the huge residuals in the absence of $x_2$.
On the contrary, after controlling for $x_2$ (which already does an excellent job in explaining $y$), the small but non-zero remainder of $y$ is almost entirely driven by $x_1$.
The existence of suprressor calls for caution when we delete explanatory variables soley based on their insiginificance, although it can also be argued that missing such an important regessor in the first place ($x_2$ in this case) is the greater sin.