We want to test whether the parameter β1 is significant the test statistic equals

The data are n = 30 observations on driver age and the maximum distance (feet) at which individuals can read a highway sign (Sign Distance data).

(Data source: Mind On Statistics, 3rd edition, Utts and Heckard)

The plot below gives a scatterplot of the highway sign data along with the least squares regression line.

We want to test whether the parameter β1 is significant the test statistic equals

Here is the accompanying Minitab output, which is found by performing Stat >> Regression >> Regression on the highway sign data.

Regression Analysis: Distance, Age

Coefficients

PredictorCoefSE CoefT-ValueP-Value
Constant576.68 23.47 24.57 0.000
Age-3.0068 0.4243 -7.09 0.000

Regression Equation

Distance = 577 - 3.01 Age

Hypothesis Test for the Intercept (\(\beta_{0}\))

This test is rarely a test of interest, but does show up when one is interested in performing a regression through the origin (which we touched on earlier in this lesson). In the Minitab output above, the row labeled Constant gives the information used to make inferences about the intercept. The null and alternative hypotheses for a hypotheses test about the intercept are written as:

\(H_{0} \colon \beta_{0} = 0\)
\(H_{A} \colon \beta_{0} \ne 0\)

In other words, the null hypothesis is testing if the population intercept is equal to 0 versus the alternative hypothesis that the population intercept is not equal to 0. In most problems, we are not particularly interested in hypotheses about the intercept. For instance, in our example, the intercept is the mean distance when the age is 0, a meaningless age. Also, the intercept does not give information about how the value of y changes when the value of x changes. Nevertheless, to test whether the population intercept is 0, the information from the Minitab output is used as follows:

  1. The sample intercept is \(b_{0}\) = 576.68, the value under Coef.
  2. The standard error (SE) of the sample intercept, written as se(\(b_{0}\)), is se(\(b_{0}\)) = 23.47, the value under SE Coef. The SE of any statistic is a measure of its accuracy. In this case, the SE of \(b_{0}\) gives, very roughly, the average difference between the sample \(b_{0}\) and the true population intercept \(\beta_{0}\), for random samples of this size (and with these x-values).
  3. The test statistic is t = \(b_{0}\)/se(\(b_{0}\)) = 576.68/23.47 = 24.57, the value under T-Value.
  4. The p-value for the test is p = 0.000 and is given under P-Value. The p-value is actually very small and not exactly 0.
  5. The decision rule at the 0.05 significance level is to reject the null hypothesis since our p < 0.05. Thus, we conclude that there is statistically significant evidence that the population intercept is not equal to 0.

So how exactly is the p-value found? For simple regression, the p-value is determined using a t distribution with n − 2 degrees of freedom (df), which is written as \(t_{n−2}\), and is calculated as 2 × area past |t| under a \(t_{n−2}\) curve. In this example, df = 30 − 2 = 28. The p-value region is the type of region shown in the figure below. The negative and positive versions of the calculated t provide the interior boundaries of the two shaded regions. As the value of t increases, the p-value (area in the shaded regions) decreases.

t - t 2 x the area to the right of \(\mid t \mid\)

Hypothesis Test for the Slope (\(\beta_{1}\))

This test can be used to test whether or not x and y are linearly related. The row pertaining to the variable Age in the Minitab output from earlier gives information used to make inferences about the slope. The slope directly tells us about the link between the mean y and x. When the true population slope does not equal 0, the variables y and x are linearly related. When the slope is 0, there is not a linear relationship because the mean y does not change when the value of x is  changed. The null and alternative hypotheses for a hypotheses test about the slope are written as:

\(H_{0} \colon \beta_{1}\) = 0
\(H_{A} \colon \beta_{1}\) ≠ 0

In other words, the null hypothesis is testing if the population slope is equal to 0 versus the alternative hypothesis that the population slope is not equal to 0. To test whether the population slope is 0, the information from the Minitab output is used as follows:

  1. The sample slope is \(b_{1}\) = −3.0068, the value under Coef in the Age row of the output.
  2. The SE of the sample slope, written as se(\(b_{1}\)), is se(\(b_{1}\)) = 0.4243, the value under SE Coef. Again, the SE of any statistic is a measure of its accuracy. In this case, the SE of b1 gives, very roughly, the average difference between the sample \(b_{1 }\)and the true population slope \(\beta_{1}\), for random samples of this size (and with these x-values).
  3. The test statistic is t = \(b_{1}\)/se(\(b_{1}\)) = −3.0068/0.4243 = −7.09, the value under T-Value.
  4. The p-value for the test is p = 0.000 and is given under P-Value.
  5. The decision rule at the 0.05 significance level is to reject the null hypothesis since our p < 0.05. Thus, we conclude that there is statistically significant evidence that the variables of Distance and Age are linearly related.

As before, the p-value is the region illustrated in the figure above.

Confidence Interval for the Slope (\(\beta_{1}\))

A confidence interval for the unknown value of the population slope \(\beta_{1}\) can be computed as

sample statistic ± multiplier × standard error of statistic

→ \(b_{1 }\)± t* × se(\(b_{1}\))

To find the t* multiplier, you can do one of the following:

  1. In simple regression, the t* multiplier is determined using a \(t_{n−2}\) distribution. The value of t* is such that the confidence level is the area (probability) between −t* and +t* under the t-curve.
  2. A table such as the one in the textbook can be used to look up the multiplier.
  3. Alternatively, software like Minitab can be used.

95% Confidence Interval

In our example, n = 30 and df = n − 2 = 28. For 95% confidence, t* = 2.05. A 95% confidence interval for \(\beta_{1}\), the true population slope, is:

3.0068 ± (2.05 × 0.4243)
3.0068 ± 0.870
or about − 3.88 to − 2.14.

Interpretation: With 95% confidence, we can say the mean sign reading distance decreases somewhere between 2.14 and 3.88 feet per each one-year increase in age. It is incorrect to say that with 95% probability the mean sign reading distance decreases somewhere between 2.14 and 3.88 feet per each one-year increase in age. Make sure you understand why!!!

99% Confidence Interval

For 99% confidence, t* = 2.76. A 99% confidence interval for \(\beta_{1}\) , the true population slope is:

3.0068 ± (2.76 × 0.4243)
3.0068 ± 1.1711
or about − 4.18 to − 1.84.

Interpretation: With 99% confidence, we can say the mean sign reading distance decreases somewhere between 1.84 and 4.18 feet per each one-year increase in age. Notice that as we increase our confidence, the interval becomes wider. So as we approach 100% confidence, our interval grows to become the whole real line.

As a final note, the above procedures can be used to calculate a confidence interval for the population intercept. Just use \(b_{0}\) (and its standard error) rather than \(b_{1}\).