## Ann E. Watkins, Richard L. Scheaffer, George W. Cobb

## Chapter 3

## Relationships Between Two Quantitative Variables - all with Video Answers

## Educators

Chapter Questions

For each of the lettered scatterplots in Display 3.6, give the trend (positive or negative), strength (strong, moderate, or weak), and shape (linear or curved). Which plots show varying strength?

Display 3.6 can't copy Eight scatterplots with various distributions.

James Kiss

Numerade Educator

For each set of cases and variables, tell whether you expect the relationship to be (i) positive or negative and (ii) strong, moderate, or weak.

$$

\begin{array}{|c|c|c|}

\hline \text { Cases } & \text { Variable } 1 & \text { Variable } 2 \\

\hline \text { a. Hens' eggs } & \text { Length } & \begin{array}{l}

\text { Width (diameter of } \\

\text { cross-section) }

\end{array} \\

\hline \begin{array}{l}

\text { b. High school } \\

\text { reniors }

\end{array} & \begin{array}{l}

\text { SAT I math } \\

\text { scone }

\end{array} & \begin{array}{l}

\text { SUT I critical reading } \\

\text { score }

\end{array} \\

\hline \text { c. Trees } & \text { Mge } & \text { Number of rings } \\

\hline \text { d. People } & \text { Age } & \text { Body flexibility } \\

\hline \text { e U.S. states } & \text { Popeulation } & \begin{array}{l}

\text { Number of } \\

\text { representatives in } \\

\text { Congress }

\end{array} \\

\hline \begin{array}{l}

\text { f. Countries of the } \\

\text { United Nations }

\end{array} & \text { Land arca } & \text { Population } \\

\hline \text { 8. Olympic games } & \text { Year } & \begin{array}{l}

\text { Winning time in the } \\

\text { womens } 100 \text {-meter race }

\end{array} \\

\hline

\end{array}

$$

Figure can't copy

LaTasha Colander crosses the finish line of the women’s 100-meter dash final at the 2004 U.S. Olympic Team Track and Field Trials.

Jill Tolbert

Numerade Educator

Match each set of cases and variables (A–D) with the short summary (I–IV) of its scatterplot.

Table can't copy

I. strong negative relationship, somewhat curved

II. strong, curved positive relationship

III. moderate, roughly linear, positive relationship

IV. moderate negative relationship

Figure can't copy

James Kiss

Numerade Educator

SAT I math scores. In 2005 , the average SAT I math score across the United States was 520 . North Dakota students averaged 605, Illinois students averaged 606, and students from the nearby state of Iowa did even better, averaging 608 . Why do states from the Midwest do so well? It is easy to jump to a false conclusion, but the scatterplot in Display 3.7 can help you find a reasonable explanation.

a. Estimate the percentage of students in Iowa and in Illinois who took the SAT I. New York had the highest percentage of students who took the SAT I. Estimate that percentage and the average SAT I math score for students in that state.

b. Describe the shape of the plot. Do you see any clusters? Are there any outliers? Is the relationship linear or curved? Is the overall trend positive or negative? What is the strength of the relationship?

c. Is the distribution of the percentage of students taking the SAT I bimodal? Explain how the scatterplot shows this. Is the distribution of SAT I math scores bimodal?

Display 3.7 can't copy Average SAT I math scores by state versus the percentage of high school graduates who took the exam. [Source: College Board, www.collegeboard.com.]

d. The cases used in this plot are the 50 U.S. states in 2005 . Would you expect the pattern to generalize to some other set of cases? Why or why not?

e. Suggest an explanation for the trend. (Hint: The SAT is administered from Princeton, New Jersey. An alternative exam, the ACT, is administered from Iowa. Many colleges and universities in the Midwest either prefer the $\mathrm{ACT}$ or at least accept it in place of the SAT, whereas colleges in the eastern states tend to prefer the SAT.) Is there anything in the data that you can use to help you decide whether your explanation is correct?

Heena Haldankar

Numerade Educator

Each of the 51 cases plotted on the scatterplots in Display 3.8 is a top-rated university. The $y$-coordinate of a point tells the graduation rate, and the $x$-coordinate tells the value of some other quantitative variable-the percentage of alumni who gave that year, the student/faculty ratio, the 75 th percentile of the SAT scores (math plus critical reading) for a recent entering class, and the percentage of incoming students who ranked in the top $10 \%$ of their high school graduating class.

Display 3.8 can't copy Scatterplots showing the relationship between graduation rate and four other variables for 51 top-rated universities.

[Source: U.S. News and World Report, 2000.]

a. Compare the shapes of the four plots.

i. Which plots show a linear shape? Which show a curved shape?

ii. Which plots show just one cluster? Which show more than one?

iii. Which plots have outliers?

b. Compare the trends of the relationships: Which plots show a positive trend? A negative trend? No trend?

c. Compare the strengths of the relationships: Which variables give more precise predictions of the graduation rate? Which variable is almost useless for predicting graduation rate?

d. Generalization. The cases in these plots are the 51 universities that happened to come out at the top of one particular rating scheme. Do you think the complete set of all U.S. universities would show pretty much the same relationships? Why or why not?

e. Explanation. Consider the two variables with the strongest relationship to graduation rates. Offer an explanation for the strength of these particular relationships. In what ways, if any, can you use the data to help you decide whether your explanation is in fact correct?

Jameson Kuper

Numerade Educator

Hat size. What does hat size really measure? A group of students investigated this question by collecting a sample of hats. They recorded the size of the hat and then measured the circumference, the major axis (the length across the opening in the long direction), and the minor axis. (See Display 3.9 on the next page. Hat sizes have been changed to decimals; all other measurements are in inches.) Is hat size most closely related to circumference, major axis, or minor axis? Answer this question by making appropriate plots and describing the patterns in those plots.

Figure can't copy

$$

\begin{array}{cc|c|c}

\begin{array}{c}

\text { Hat } \\

\text { Slze }

\end{array} & \text { Gircumference } & \begin{array}{c}

\text { Major } \\

\text { Axis }

\end{array} & \begin{array}{c}

\text { Miner } \\

\text { Axis }

\end{array} \\

\hline 6.625 & 20.00 & 7.00 & 5.75 \\

\hline 6.750 & 20.75 & 7.25 & 6.00 \\

\hline 6.875 & 20.50 & 7.50 & 6.00 \\

\hline 6.875 & 20.75 & 7.25 & 6.00 \\

\hline 6.875 & 20.75 & 7.50 & 6.00 \\

\hline 6.875 & 21.50 & 7.25 & 6.25 \\

\hline 7.000 & 21.25 & 7.50 & 6.00 \\

\hline 7.000 & 21.00 & 7.50 & 6.00 \\

\hline 7.000 & 21.00 & 7.50 & 6.25 \\

\hline 7.000 & 21.75 & 7.50 & 6.25 \\

\hline 7.125 & 21.50 & 7.75 & 6.25 \\

\hline 7.125 & 21.75 & 7.75 & 6.50 \\

\hline 7.125 & 21.50 & 7.75 & 6.25 \\

\hline 7.125 & 22.25 & 7.75 & 6.25 \\

\hline 7.250 & 22.00 & 7.75 & 6.25 \\

\hline 7.250 & 22.50 & 7.75 & 6.50 \\

\hline 7.375 & 22.25 & 7.75 & 6.50 \\

\hline 7.375 & 22.25 & 8.00 & 6.50 \\

\hline 7.375 & 22.50 & 8.00 & 6.50 \\

\hline 7.375 & 22.75 & 8.00 & 6.50 \\

\hline 7.375 & 23.00 & 8.00 & 6.50 \\

\hline 7.900 & 22.75 & 8.00 & 6.50 \\

\hline 7.500 & 22.50 & 8.00 & 6.50 \\

\hline 7.625 & 23.00 & 8.25 & 6.50 \\

\hline 7.625 & 23.00 & 8.25 & 6.50 \\

\hline 7.625 & 23.25 & 8.25 & 6.75 \\

\hline

\end{array}

$$

Display 3.9 Hat sizes, with circumference and axes in inches. [Source: Roger Johnson, Carleton College, data from student project.]

Sheryl Ezze

Numerade Educator

Westvaco, revisited. To determine whether Westvaco discriminated by age in laying off employees, you could investigate whether it might have discriminated in hiring. Display 3.10 shows the age at hire plotted against the year the person was hired.

a. Describe the pattern in the plot, following the six-step model.

b. Does this plot provide evidence that Westvaco discriminated by age in hiring?

Display 3.10 Figure can't copy Age at hire versus year of hire for the 50 employees in Westvaco Corporation's engineering department.

c. Display 3.11 shows the year of birth of the Westvaco employees plotted against the year they were hired. Open circles represent employees laid off, and solid circles represent employees kept. Does this scatterplot suggest a reason why older employees tended to be laid off more frequently?

Display 3.11 Figure can't copy Year of birth versus year of hire for Westvaco employees.

Check back soon!

Passenger aircraft. Airplanes vary in their size, speed, average flight length, and cost of operation. You can probably guess that larger planes use more fuel per hour and cost more to operate than smaller planes, but the shapes of the relationships are less obvious. Display 3.12 lists data on the 33 most commonly used passenger airplanes in the United States. The variables are the number of seats, average cargo payload in tons, airborne speed in miles per hour, flight length in miles, fuel consumption in gallons per hour, and operating cost per hour in dollars.

$$

\begin{array}{|c|c|c|c|c|c|c|}

\hline \text { Aircaft } & \begin{array}{l}

\text { Number } \\

\text { of Seats }

\end{array} & \begin{array}{l}

\text { Cargo } \\

\text { (tons) }

\end{array} & \begin{array}{l}

\text { Speed } \\

\text { ( } \mathrm{mi} / \mathrm{h} \text { ) }

\end{array} & \begin{array}{l}

\text { Flight } \\

\text { length } \\

\text { (mi) }

\end{array} & \begin{array}{l}

\text { Fuel } \\

(\mathrm{gal} / \mathrm{h} \text { ) }

\end{array} & \begin{array}{l}

\text { Cost } \\

(S / h)

\end{array} \\

\hline 8747-200 / 300 & 370 & 16.6 & 520 & 3148 & 3625 & 9153 \\

\hline 8747-400 & 367 & 806 & 534 & 3960 & 3411 & 8143 \\

\hline \text { L-1011 } & 325 & 0.04 & 494 & 2023 & 1981 & \mathrm{BOH2} \\

\hline \text { DC-10 } & 286 & 24.87 & 497 & 1637 & 2405 & 7374 \\

\hline 8767-400 & 265 & 6.26 & 495 & 1682 & 1711 & 3124 \\

\hline \text { B-777 } & 263 & 9.43 & 525 & 3515 & 2165 & 5105 \\

\hline \text { A330 } & 261 & 11.12 & 509 & 3559 & 1407 & 3076 \\

\hline \text { MD-11 } & 261 & 45.07 & 515 & 2485 & 2473 & 7695 \\

\hline A 300-600 & 235 & 19.12 & 460 & 947 & 1638 & 6518 \\

\hline B 757-300 & 235 & 0.3 & 472 & 1309 & 985 & 2345 \\

\hline \text { B767-300ER } & 207 & 7.89 & 497 & 2122 & 1579 & 4217 \\

\hline 8757-200 & 181 & 1.41 & 464 & 1175 & 1045 & 3312 \\

\hline \text { B767-200ER } & 175 & 3.72 & 487 & 1987 & 1404 & 3873 \\

\hline \text { A } 321 & 169 & 0.44 & 45 & 1094 & 673 & 1347 \\

\hline 8737-800 / 900 & 151 & 0.37 & 454 & 1035 & 770 & 2248 \\

\hline \text { MD-90 } & 150 & 0.25 & 446 & 886 & 825 & 2716 \\

\hline \text { B727-200 } & 148 & 6.46 & 430 & 644 & 1289 & 4075 \\

\hline A 320 & 146 & 0.31 & 454 & 1065 & 767 & 2359 \\

\hline \mathrm{B} 737-400 & 141 & 0.25 & 409 & 646 & 703 & 2595 \\

\hline \text { MD-80 } & 134 & 0.19 & 432 & 791 & 953 & 2718 \\

\hline \text { B737-700LR } & 132 & 0.28 & 441 & 879 & 740 & 1692 \\

\hline B 737-300 / 700 & 132 & 0.22 & 403 & 542 & 723 & 2388 \\

\hline \text { А } 319 & 122 & 0.27 & 442 & 904 & 666 & 1913 \\

\hline 8737-100 / 200 & 119 & 0.11 & 396 & 465 & 824 & 2377 \\

\hline 8717-200 & 112 & 0.22 & 339 & 175 & 573 & 3355 \\

\hline 8737-500 & 110 & 0.19 & 407 & 576 & 756 & 2347 \\

\hline \text { DC-9 } & 101 & 0.15 & 387 & 496 & 826 & 2071 \\

\hline \mathrm{F}-100 & 87 & 0.05 & 398 & 587 & 662 & 2303 \\

\hline 8737-200 \mathrm{C} & 55 & 275 & 387 & 313 & 924 & 321 \\

\hline \text { ERJ-145 } & 50 & 0 & 360 & 343 & 280 & 1142 \\

\hline \text { CRJ-145 } & 49 & 0.01 & 397 & 486 & 369 & 1433 \\

\hline \text { ERJ-135 } & 37 & 0 & 357 & 382 & 267 & 969 \\

\hline \text { SD340B } & 33 & 0 & 230 & 202 & 84 & 644 \\

\hline

\end{array}

$$

Display 3.12 Data on passenger aircraft. [Source: Air Transport Association of America, 2005, www.air-transport.org.]

a. cost per hour

i. Make scatterplots with cost per hour on the y-axis to explore this variable’s dependence on the other variables. Report your most interesting findings. Here are examples of some questions you could investigate: For which variable is the relationship to the cost per hour strongest? Is there any one airplane whose cost per hour, in relation to other variables, makes it an outlier?

ii. Do your results mean that larger planes are less efficient? Define your own variable, and plot it against other variables to judge the relative efficiency of the larger planes.

b. flight length

i. Make scatterplots with length of flight on the $x$-axis to explore this variable's relationship to the other variables.

Report your most interesting findings. Here is an example of a question you could investigate: Which variable, cargo or number of seats, shows a stronger relationship to flight length? Propose a reasonable explanation for why this should be so.

ii. Do planes with a longer flight length tend to use less fuel per mile than planes with a shorter flight length?

c. speed, seats, and cargo

i. Make scatterplots to explore the relationships between the variables speed, seats, and cargo. Report your most interesting findings. Here are some examples of questions you could investigate: For which variable, cargo or number of seats, is the relationship to speed more obviously curved? Explain why that should be the case. Which plane is unusually slow for the amount of cargo it carries? Which plane is unusually slow for the number of seats it has?

ii. The plot of cargo against seats has two parts: a flat stretch on the left and a fan on the right. Explain, in the language of airplanes, seats, and cargo, what each of the two patterns tells you.

Check back soon!

Display 3.27 shows cost in dollars per hour versus number of seats for three aircraft models. Five lines, labeled A-E, are shown on the plot. Their equations, listed below, are labeled I-V.

a. Match each line (A-E) with its equation (I-V).

I. cost $=-290+15.8$ seats

II. cost $=400+15.8$ seats

III. cost $=1000+15.8$ seats

IV. $\cos t=370+25$ seats

V. cost $=900+10$ seats

b. Match each line (A-E) with the appropriate verbal description (I-V):

I. This line overestimates cost.

II. This line underestimates cost.

III. This line overestimates cost for the smallest plane and underestimates cost for the largest plane.

IV. This line underestimates cost for the smallest plane and overestimates cost for the largest plane.

V. On balance, this line gives a better fit than the other lines.

Display 3.27 can't copy Cost in dollars per hour versus number of seats for three aircraft models.

James Kiss

Numerade Educator

Examine the scatterplot in Display 3.28.

Display 3.28 can't copy Calories versus fat, per $5-o z$ serving, for seven kinds of pizza. [Source: Consumer Reports, July 2003.]

a. Which two kinds of pizza in Display 3.28 have the fewest calories? Which two have the least fat? Which region of the graph has the pizzas with the most fat?

b. Display 3.29 shows the data again, with five possible summary lines. Match each equation (I-V) with the appropriate line (A-E).

I. calories 7015 fat

II, calories 1025 fat

III. calories 15015 fat

IV. calories 11015 fat

V. calories $17010 \mathrm{fat}$

Display 3.29 can't copy Five possible fitted lines for the pizza data.

c. Consider the possible summary lines in Display 3.29.

i. Which line gives predicted values for calorie content that are too high? How can you tell this from the plot?

ii. Which line tends to give predicted calorie values that are too low?

iii. Which line tends to overestimate calorie content for lower-fat pizzas and underestimate calorie content for higher-fat pizzas?

iv. Which line has the opposite problem, underestimating calorie content when fat content is lower and overestimating calorie content when fat content is higher?

$\mathrm{v}$. Which line fits the data best overall?

Sheryl Ezze

Numerade Educator

Heights of boys. The scatterplot in Display 3.30 shows the median height, in inches, for boys ages 2 through 14 years.

Display 3.30 can't copy Median height versus age for boys. [Source: National Health and Nutrition Examination Survey (NHANES), 2002, www.cdc.gov.]

a. Estimate the slope of the line that summarizes the relationship between age and median height.

b. Explain the meaning of the slope with respect to boys and their median height.

c. Write the equation of the line using the slope from part a and a point on the line.

d. Interpret the $y$-intercept. Does the interpretation make sense in this context?

Donald Albin

Numerade Educator

Pizza again. Display 3.31 shows the calorie and fat content of 5 oz of various kinds of pizza.

Display 3.31 can't copy Calories and fat content per $5-\mathrm{oz}$ serving, for seven kinds of pizza. [Source: Consumer Reports, January 2002.]

a. Use the line on the scatterplot to predict the calorie content of a pizza with $10.5 \mathrm{~g}$ of fat. Often use the line to predict the calorie content of a pizza with $15 \mathrm{~g}$ of fat.

b. Use the two predictions in part a to estimate the slope of the line. Write the equation of the line using this slope and a point on the line.

c. There are 9 calories in a gram of fat. How is your estimated slope related to this number?

Sheryl Ezze

Numerade Educator

Stopping on a dime? In an emergency, the typical driver requires about 0.75 second to get his or her foot onto the brake pedal. The distance the car travels during this reaction time is called the reaction distance. Display 3.32 shows the reaction distances for cars traveling at various speeds.

$$

\begin{array}{cc}

\begin{array}{c}

\text { Speed } \\

\text { (mi/h) }

\end{array} & \begin{array}{r}

\text { Reaction Distance } \\

\text { (ft) }

\end{array} \\

\hline 20 & 22 \\

30 & 33 \\

40 & 44 \\

50 & 55 \\

60 & 66 \\

70 & 77 \\

\hline

\end{array}

$$

Display 3.32 Reaction distance at various speeds.

a. Plot reaction distance versus speed, with speed on the horizontal axis. Describe the shape of the plot.

b. What should the $y$-intercept be?

c. Find the slope of the line of best fit by calculating the change in $y$ per unit change in $x$. What does the slope represent in this situation?

d. Write the equation of the line that fits these data.

e. Use the equation of the line in part $\mathrm{d}$ to predict the reaction distance for a car traveling at a speed of $55 \mathrm{mi} / \mathrm{h}$ and at $75 \mathrm{mi} / \mathrm{h}$.

f. How would the equation change if it actually took 1 second, instead of 0.75 second, for drivers to react?

Tanishq Gupta

Numerade Educator

The scatterplot in Display 3.33 shows operating cost (in dollars per hour) versus fuel consumption (in gallons per hour) for a sample of commercial aircraft.

Display 3.33 can't copy Operating cost versus fuel consumption for commercial aircraft.

a. Which is the explanatory variable and which is the response variable?

b. Estimate the slope of the regression line from the graph, and interpret it in the context of this situation.

c. The $y$-intercept is 470 . Does this value have a reasonable interpretation in this situation?

d. Use the line to predict the cost per hour for a plane that consumes $1500 \mathrm{gal} / \mathrm{h}$ of fuel.

Vaidik Stats

Numerade Educator

Arsenic is a potent poison sometimes found in groundwater. Long-term exposure to arsenic in drinking water can cause cancer. How much arsenic a person has absorbed can be measured from a toenail clipping. The plot in Display 3.34 shows the arsenic concentrations in the toenails of 21 people who used water from their private wells plotted against the arsenic concentration in their well water. Both measurements are in parts per million.

Display 3.34 can't copy Arsenic concentrations. [Source: M. R. Karagas et al. Toenail Samples as an Indicator of Drinking Water Arsenic Exposure, Cancer Epidemiology, Biomarkers and Prevention 5 (1996): 849-52.]

a. What is the predictor variable, and what is the response variable?

b. Describe the relationship.

c. Fstimate the residual for the person with the highest concentration of arsenic in the well water.

d. Find the person on the plot with the largest residual. What was the concentration of arsenic in that person’s toenails?

e. The World Health Organization has set a standard that the concentration of arsenic in drinking water should be less than $0.01 \mathrm{mg} / \mathrm{L}$. ( $1 \mathrm{mg} / \mathrm{L}=1 \mathrm{ppm}$. Is this standard exceeded in any of these wells?

Check back soon!

More pizza. Refer to the pizza data in E12.

a. The least squares residuals for the pizza data are, in order from smallest to largest $-40.58,-17.66,-15.95,-1.03,14.28$, 26.44 , and 34.50 . Match each residual with its pizza.

b. What does the residual for Pizza Hut's Pan pizza tell you about the pizza's number of calories versus fat content?

c. For Pizza Hut's Hand Tossed and Domino's Deep Dish, are the residuals positive or negative? How can you tell this from the scatterplot in Display 3.31?

Sheryl Ezze

Numerade Educator

The level of air pollution is indicated by a measure called the air quality index (AQI). $\mathrm{An} \mathrm{AQI}$ greater than 100 means the air quality is unhealthy for sensitive groups such as children. The table and plot in Display 3.35 show the number of days in Detroit that the $\mathrm{AQI}$ was greater than 100 for the years 2001,2002 , and 2003.

$$

\begin{array}{cc}

\begin{array}{c}

\text { Years Since } \\

2000

\end{array} & \begin{array}{c}

\text { Number of Days } \\

\text { MQI }>100

\end{array} \\

\hline 1 & 31 \\

2 & 28 \\

3 & 19

\end{array}

$$

Display 3.35 can't copy Air quality index for 2001–2003. [Source: U.S. Environmental Protection Agency, www.epa.gov.]

a. By hand, compute the equation of the least squares line.

b. Interpret the slope in the context of this situation.

c. Which year has the largest residual? What is this residual?

d. Compute the SSE for this line.

e. Verify that the sum of the residuals is 0 .

f. Find the SSE for the line that has the same slope as the least squares line but passes through the point for 2002 . Is this SSE larger or smaller than the SSE for the least squares line? According to the least squares approach, which line fits better?

g. Find the slope of the line that passes through the points for 2001 and 2003. Then find the fitted value for 2002 . Finally, find all three residuals and the value of the SSE for this line.

h. The least squares line doesn't pass through any of the points, and yet judging by the SSE that line fits better than the one in part g. Do you agree that the least squares line fits better than the lines in parts $f$ and $g$ ? Explain why or why not.

AG

Ankit Gupta

Numerade Educator

Even more pizza. Refer again to the table and scatterplot in Display 3.31 on page 134.

a. By hand, compute the equation of the least squares regression line for using $f a t$ to predict calories. How close was your estimate of the equation in E12?

b. Which of these values must be the SSE for this regression? Explain your answer.

$$

\begin{array}{llll}

0 & 29.3 & 861.4 & 4307

\end{array}

$$

Blank Blank

Numerade Educator

Heights of girls. Display 3.36 gives the median height in inches for girls ages 2-14.

a. Practice using your calculator by making a scatterplot, finding the equation of the least squares line for median height versus age, and graphing the equation on the plot.

$$

\begin{array}{rc}

\begin{array}{r}

\text { Age } \\

(y r)

\end{array} & \begin{array}{c}

\text { Median Helght } \\

\text { (in.) }

\end{array} \\

\hline 2 & 35.1 \\

3 & 38.7 \\

4 & 41.3 \\

5 & 44.1 \\

6 & 46.5 \\

7 & 48.6 \\

8 & 51.7 \\

9 & 53.7 \\

10 & 56.1 \\

11 & 59.5 \\

12 & 61.2 \\

13 & 62.9 \\

14 & 63.6 \\

\hline

\end{array}

$$

Display 3.36 can't copy Median height for girls ages 2-14.

b Judging from the plot, is the residual for 11 -year-olds positive or negative? Compute this residual to check your answer.

c. Verify that the line contains the point of averages, $(\bar{x}, \bar{y})$.

d. How does the regression line for girls compare to the line for boys in E11?

James Kiss

Numerade Educator

Sum of residuals. In this exercise, you will show that the sum of the residuals is equal to 0 if and only if the regression line passes through the point of averages, $(\bar{x}, \bar{y})$.

a. Show that for a horizontal line the sum of the residuals will be 0 if and only if the line passes through the point of averages.

b. Show that no matter what the slope of the line is, the sum of the residuals will be 0 if and only if the line passes through the point of averages.

c. Why isn't it good enough to define the regression line as the line that makes the sum of the residuals equal 0 ?

Sheryl Ezze

Numerade Educator

Height versus age. Display 3.37 shows a standard computer printout for the median height versus age data of E11.

Display 3.37 can't copy Computer output of median height versus age data.

a. Write the equation of the regression line. How does it compare to your estimate of the equation in E11?

b. What is the SSE for this least squares line? Does its value seem reasonable given the scatterplot in Display 3.30 on page 133 ?

Donald Albin

Numerade Educator

Part of a printout for the percentage of alumni who give to their colleges versus the student/faculty ratio is shown in Display 3.38. (These are the data in the scatterplot shown in Display 3.24 on page 131. )

Display 3.38 can't copy Computer output: regression analysis of percentage giving to alumni fund versus student/faculty ratio.

a. What equation is given in the printout for the least squares regression line?

b. Examine the table of unusual observations. What is the student/faculty ratio at the college with the largest residual (in absolute value)? Find this college in Display 3.24 on page 131 .

c. Verify that the fit and the value of the largest residual were computed correctly.

d. Locate the SSE on the printout. Why is this value so large?

Jameson Kuper

Numerade Educator

For the least squares regression line you found in E19, calculate the residuals for girls ages 2,8 , and 14. What does this suggest about the pattern of growth beyond what is summarized in the equation of the regression line?

Sheryl Ezze

Numerade Educator

More about slope.

a. You and three friends, one right after the other, each buy the same kind of gas at the same pump. Then you make a scatterplot of your data, with one point per person, plotting the number of gallons on the $x$-axis and the total price paid on the $y$-axis. Will all four points lie on the same line? Explain.

b. You and the same three friends each drive $80 \mathrm{mi}$ but at different average speeds. Afterward, you plot your data twice, first as a set of four points with coordinates average speed, $x$, and elapsed time, $y$, and then as a set of points with coordinates average speed, $x$, and $y^*$ defined as $\frac{1}{\text { elapsed time }}$. Which plot will give a straight line? Explain your reasoning. Will the other plot be a curve opening up, a curve opening down, or neither?

James York

Numerade Educator

The data set in Display 3.39 is the pizza data of E12 augmented by other brands of cheese pizza typically sold in supermarkets.

a. Plot calories versus fat. Does there appear to be a linear association between calories and fat? If so, fit a least squares line to the data, and interpret the slope of the line.

b. Plot fat versus cost. Does there appear to be a linear association between cost and fat? If so, fit a least squares line to the data, and interpret the slope of the line.

c. Plot calories versus cost. Does there appear to be a linear association between cost and calories?

d. Write a summary of your findings.

Display 3.39 can't copy Food values and cost per 5-oz serving of pizza. [Source: Consumer

Sheryl Ezze

Numerade Educator

Poverty. What variables are most closely associated with poverty? Display 3.40 provides information on population characteristics of the 50 U.S. states plus the District of Columbia. Each variable is measured as a percentage of the state's population, as described here:

Percentage living in metropolitan areas

Percentage white

Percentage of adults who have graduated from high school

Percentage of families with incomes below the poverty line

Percentage of families headed by a single parent

Construct scatterplots to determine which variables are most strongly associated with poverty.

Write a letter to your representative in Congress about poverty in America, relying only on what you find in these data. Point out the variables that appear to be most strongly associated with poverty and those that appear to have little or no association with poverty.

Figure can't copy

$$

\begin{array}{|c|c|c|c|c|c|}

\hline \text { State } & \begin{array}{l}

\text { Metropolitan } \\

\text { Residence }

\end{array} & \text { White } & \text { Graduates } & \text { Poverty } & \begin{array}{l}

\text { Single } \\

\text { Parent }

\end{array} \\

\hline \text { Alabama } & 55.4 & 713 & 79.9 & 14.6 & 14.2 \\

\hline \text { Alaska } & 65.6 & 70.8 & 90.6 & 8.3 & 108 \\

\hline \text { Arizona } & 88.2 & 87.7 & 83.8 & 13.3 & 11.1 \\

\hline \text { Arkansas } & \$ 2.5 & 81 & 80.9 & 18 & 12.1 \\

\hline \text { California } & 94.4 & 77.5 & 81.1 & 12.8 & 126 \\

\hline \text { Colorado } & 84.5 & 90.2 & 88.7 & 94 & 9.6 \\

\hline \text { Connecticut } & 87.7 & 85.4 & 87.5 & 7.8 & 121 \\

\hline \text { Delaware } & 80.1 & 76.3 & 88.7 & 8.1 & 13.1 \\

\hline \text { District of Columbis } & 100 & 36.2 & 86 & 16.8 & 18.9 \\

\hline \text { Florida } & 89.3 & 80.6 & 84.7 & 12.1 & 12 \\

\hline \text { Georgla } & 71.6 & 67.5 & 85.1 & 12.1 & 14.5 \\

\hline \text { Hawaii } & 91.5 & 259 & 8 R .5 & 10.6 & 124 \\

\hline \text { Idaho } & 66.4 & 95.5 & 88.2 & 11.8 & 8.7 \\

\hline \text { Illinols } & 87.8 & 79.5 & 85.9 & 11.2 & 123 \\

\hline \text { Indiana } & 70.8 & 88.9 & 86.4 & 8.7 & 11.1 \\

\hline \text { Iowa } & 61.1 & 94.9 & 89.7 & 83 & 8.6 \\

\hline \text { Kanses } & 71.4 & 89.3 & 88.6 & 94 & 9.3 \\

\hline \text { Kentucky } & 55.8 & 90.3 & 82.8 & 13.1 & 11.8 \\

\hline \text { Louisiana } & 72.6 & 64.2 & 79.8 & 17 & 16.6 \\

\hline \text { Maine } & 40.2 & 97.1 & 86.6 & 11.3 & 95 \\

\hline \text { Maryland } & 86.1 & 65.6 & 87.6 & 73 & 14.1 \\

\hline \text { Massachusetts } & 91.4 & 87.2 & 87.1 & 9.6 & 11.9 \\

\hline \text { Michigan } & 74.7 & 81.5 & 87.6 & 10.3 & 12.5 \\

\hline \text { Minnesota } & 70.9 & 90.2 & 91.6 & 6.5 & 89 \\

\hline \text { Mississippt } & 48.8 & 61.2 & 81.2 & 17.6 & 173 \\

\hline \text { Missouri } & 69.4 & 85.3 & 88.3 & 9.6 & 11.6 \\

\hline \text { Montana } & 54.1 & 90.9 & 90.1 & 13.7 & 8.9 \\

\hline \text { Nebraska } & 69.8 & 92.1 & 90.8 & 9.5 & 9.1 \\

\hline \text { Nevada } & 91.5 & 84.1 & 85.6 & 83 & 11.1 \\

\hline \text { New Hampshire } & 59.3 & 96.3 & 92.1 & 5.6 & 9.1 \\

\hline \text { New Jersey } & 94.4 & 77.3 & 86.2 & 7.8 & 126 \\

\hline \text { New Mexice } & 75 & 84.9 & 81.7 & 17.8 & 13.2 \\

\hline \text { New York } & 87.5 & 73.6 & 84.2 & 14 & 14.7 \\

\hline \text { North Carolina } & 60.2 & 74.1 & 81.4 & 13.1 & 12.5 \\

\hline \text { North Dakota } & 55.9 & 92.5 & 89.7 & 11.9 & 7.8 \\

\hline \text { Ohio } & 77.4 & 85.4 & 87.2 & 10.1 & 12.1 \\

\hline \text { Oklahoma } & 65.3 & 78.4 & 85.7 & 14.7 & 11.4 \\

\hline \text { Oregon } & 78.7 & 90.8 & 869 & 11.2 & 9.8 \\

\hline \text { PennsyIvania } & 77.1 & 86.4 & 86 & 9.2 & 11.6 \\

\hline \text { Rhode lsland } & 90.9 & 89.2 & 81 & 10.3 & 129 \\

\hline \text { South Carolina } & 60.5 & 67.7 & 80.8 & 13.5 & 148 \\

\hline \text { South Dakota } & 51.9 & 88.8 & 8 B .7 & 10.2 & 9 \\

\hline \text { Tennessee } & 63.6 & 80.8 & 81 & 14.2 & 12.9 \\

\hline \text { Texas } & 82.5 & 83.6 & 77.2 & 15.3 & 12.7 \\

\hline \text { Utah } & 88.2 & 93.6 & 89.4 & 93 & 9.4 \\

\hline \text { Vermant } & 38.2 & 969 & 88.9 & 99 & 9.3 \\

\hline \text { Viruinia } & 73 & 739 & 87.8 & 8.7 & 119 \\

\hline \text { Washington } & 82 & 85.5 & 89.1 & 10.8 & 9.9 \\

\hline \text { West Virginia } & 46.1 & 95 & 78.7 & 16 & 10.7 \\

\hline \text { Wisconsin } & 683 & 90.1 & 88.6 & 8.6 & 9.6 \\

\hline \text { Wyoming } & 65.1 & 94.7 & 90.9 & 9.5 & 8.7 \\

\hline

\end{array}

$$

Display 3.40 Characteristics of state populations, as percentage of population. [Source: U.S. Census Bureau, www.census.gov.]

Brandon Cleary

Numerade Educator

Each scatterplot in Display 3.57 was made on the same set of axes. Match each scatterplot with its correlation, choosing from -0.06 , $0.25,0.40,0.52,0.66,0.74,0.85$, and 0.90 .

Display 3.57 can't copy Eight scatterplots with various correlations.

James Kiss

Numerade Educator

Estimate the correlation between the variables in these scatterplots.

a. The proportion of the state population living in dorms versus the proportion living in cities in Display 3.4 on page 109.

b. The graduation rate versus the 75 th percentile of SAT scores in E5 on page 113 .

c. The college graduation rate versus the percentage of students in the top $10 \%$ of their high school graduating class in E5 on page 113.

Vaidik Stats

Numerade Educator

For each set of pairs, $(x, y)$, compute the correlation by hand, standardizing and finding the average product.

a. $(-2,-1),(-1,1),(0,0),(1,1),(2,1)$

b. $(-2,2),(0,2),(0,3),(0,4),(2,4)$

Sheryl Ezze

Numerade Educator

For each artificial data set in P11 on page 155 , compute the correlation by hand, standardizing and finding the average product.

Raymond Matshanda

Numerade Educator

The scatterplot in Display 3.58 shows part of the hat size data of E6 on page 113. The plot is divided into quadrants by vertical and horizontal lines that pass through the point of averages, $(\bar{x}, \bar{y})$.

Display 3.58 can't copy Head circumference, in inches, versus hat size.

a. Estimate the value of the correlation.

b. Using the idea of standardized scores, explain why the correlation is positive.

c. Identify the point that contributes the most to the correlation. Explain why the contribution it makes is large.

d. Identify a point that contributes little to the correlation. Explain why the contribution it makes is small.

Kaylee Mcclellan

Numerade Educator

The ellipses in Display 3.59 represent scatterplots that have a basic elliptical shape.

Display 3.59 can't copy Three pairs of elliptical scatterplots.

a. Match these conditions with the corresponding pair of ellipses.

I. One $s_y$ is larger than the other, the $s_x$ 's are equal, and the correlations are strong.

II. One of the correlations is stronger than the other, the $s_x$ 'sare equal, and the $s$, 's are equal.

III. One $S_x$ is larger than the other, the $s$ y's are equal, and the correlations are weak.

b. Draw a pair of elliptical scatterplots to illustrate each comparison.

i. One $s_y$ is larger than the other, the $s_x^3$ 's are equal, and the correlations are weak.

ii. One $s_x$ is larger than the other, the $s y$ 's are equal, and the correlations are strong.

James Kiss

Numerade Educator

Several biology students are working together to calculate the correlation for the relationship between air temperature and how fast a cricket chirps. They all use the same crickets and temperatures, but some measure temperature in degrees Celsius and others measure it in degrees Fahrenheit. Some measure chirps per second, and others measure chirps per minute. Some use $x$ for temperature and $y$ for chirp rate, while others have it the other way around.

a. Will all the students get the same value for the slope of the least squares line? Explain why or why not.

b. Will they all get the same value for the correlation? Explain why or why not.

Figure can't copy

Tyler Moulton

Numerade Educator

For the sample of top-rated universities in E5 on page 113, the graduation rate has mean $82.7 \%$ and standard deviation $8.3 \%$. The student/faculty ratio has mean 11.7 and standard deviation 4.3 . The correlation is -0.5 .

a. Find the equation of the least squares line for predicting graduation rate from student/faculty ratio.

b. Find the equation of the least squares line for predicting student/faculty ratio from graduation rate.

Brandon Cleary

Numerade Educator

These questions concern the relationship between the correlation, $r$, and the slope, $b_1$, of the regression line.

a. If $y$ is more variable than $x$, will the slope of the least squares line be greater (in absolute value) than the correlation?

Justify your answer.

b. For a list of pairs $(x, y), r=0.8, b_1=1.6$, and the standard deviations of $x$ and $y$ are 25 and 50 . (Not necessarily in that order.) Which is the standard deviation for $x$ ? Justify your answer.

c. Students in a statistics class estimated and then measured their head circumferences in inches. The actual circumferences had $S D 0.93$, and the estimates had $S D 4.12$. The equation of the least squares line for predicting estimated values from actual values was $\hat{y}=11.97+0.36 x$. What was the correlation?

d. What would be the slope of the least squares line for predicting actual head circumferences from the estimated values?

Check back soon!

Lost final exam. After teaching the same history course for about a hundred years, an instructor has found that the correlation, $r$, between the students' total number of points before the final examination and the number of points scored on their final examination is 0.8 . The pre-final-exam point totals for all students in this year's course have mean 280 and $S D$ 30. The points on the final exam have mean 75 and $S D$ 8. The instructor's dog ate Julie's final exam, but the instructor knows that her total number of points before the exam was 300 . He decides to predict her final exam score from her pre-final-exam total. What value will he get?

Nick Johnson

Numerade Educator

Lurking variables. For each scenario, state a careless conclusion assuming cause and effect, and then identify a possible lurking variable.

a. For a large sample of different animal species, there is a strong positive correlation between average brain weight and average life span.

b. Over the last 30 years, there has been a strong positive correlation between the average price of a cheeseburger and the average tuition at private liberal arts colleges.

c. Over the last decade, there has been a strong positive correlation between the price of an average share of stock, as measured by the S&P 500 , and the number of Web sites on the Internet.

Sherrie Fenner

Numerade Educator

Manufacturers of low-fat foods often increase the salt content in order to keep the flavor acceptable to consumers. For a sample of different kinds and brands of cheeses, Consumer Reports measured several variables, including calorie content, fat content, saturated fat content, and sodium content. Using these four variables, you can form six pairs of variables, so there are six different correlations. These correlations turned out to be either about 0.95 or about -0.5 .

a. List all six pairs of variables, and for each pair decide from the context whether the correlation is close to 0.95 or to -0.5 .

b. State a careless conclusion based on taking the negative correlations as evidence of cause and effect.

c. Explain the negative correlation using the idea of a lurking variable.

James Kiss

Numerade Educator

A study to determine whether ice cream consumption depends on the outside temperature gave the results shown in Display 3.60.

Figure can't copy

Display 3.60 can't copy Data table, scatterplot, and regression analysis for the effects of outside temperature on ice cream consumption. [Source: Koteswara Rao Kadiyala, “Testing for the Independence of Regression Disturbances,” Econometrica 38 (1970): 97–117.]

a. Use the values of SST and SSE in the regression analysis to compute $r$, the correlation for the relationship between the temperature in degrees Fahrenheit and the number of pints of ice cream consumed per person. Check your answer against $\mathrm{R}$-sq in the analysis.

b. Compute the value of the residual that is largest in absolute value.

c. Is there a cause-and-effect relationship between the two variables?

d. What are the units for each of $x, y, b_1$, and $r$ ?

e. The letters MS stand for "mean square." How do you think the MS is computed?

Tyler Moulton

Numerade Educator

The scatterplot in Display 3.61 shows part of the aircraft data of Display 3.12 on page 115 . For these data, $r^2=0.83$. Should $r^2$ be used as a statistical measure for these data? If so, interpret this value of $r^2$ in the context of the data. If not, explain why not.

Display 3.61 can't copy Scatterplot of number of seats versus fuel consumption (gal/h) for passenger aircraft.

James Kiss

Numerade Educator

Suppose a teacher always praises students who score exceptionally well on a test and always scolds students who score exceptionally poorly. Use the notion of regression toward the mean to explain why the results will tend to suggest the false conclusion that scolding leads to improvement whereas praise leads to slacking off.

John Long

Numerade Educator

A few years ago, a school in New Jersey tested all its 4 th graders to select students for a program for the gifted. Two years later, the students were retested, and the school was shocked to find that the scores of the gifted students had dropped, whereas the scores of the other students had remained, on average, the same. What is a likely explanation for this disappointing development?

Check back soon!

Extreme temperatures. The data in Display 3.78 provide the maximum and minimum temperatures ever recorded on each continent.

$$

\begin{array}{|c|c|c|}

\hline \text { Continent } & \begin{array}{l}

\text { Maximum } \\

\text { Temperature } \\

\text { ('F) }

\end{array} & \begin{array}{l}

\text { Minimum } \\

\text { Temperature } \\

\text { (F) }

\end{array} \\

\hline \text { Africa } & 136 & -11 \\

\hline \text { Antarctica } & 59 & -129 \\

\hline \text { Asia } & 129 & -90 \\

\hline \text { Australia } & 128 & -9 \\

\hline \text { Europe } & 122 & -67 \\

\hline \text { North America } & 134 & -81 \\

\hline \text { Oceania } & 108 & 12 \\

\hline \text { South America } & 120 & -27 \\

\hline

\end{array}

$$

Display 3.78 Maximum and minimum recorded temperatures for the continents.

[Source: National Climatic Data Center, 2005, www.ncdc.noaa.gov .]

a. Construct a scatterplot of the data suitable for predicting the minimum temperature from a given maximum temperature. Is a straight line a good model for these points? Explain.

b. Fit a least squares line to the points and calculate the correlation, even if you thought in part a that a straight line was not a good model.

c. Explain, in words and numbers, what influence Antarctica has on the slope of the regression line and on the correlation. How could an account of these data be misleading if it were not accompanied by a plot?

Figure can't copy

Two climbers stand on Mount Erebus, Antarctica, 12,500 ft above sea level.

Suman Saurav Thakur

Numerade Educator

The data and plot in Display 3.79 are from E15 on page 135. They show the arsenic concentrations in the toenails of 21 people who used water from their private wells. Both measurements are in parts per million.

$$

\begin{array}{ll}

\begin{array}{c}

\text { Arsenicin Water } \\

\text { (ppm) }

\end{array} & \begin{array}{c}

\text { Arsenicin Toenails } \\

\text { (ppm) }

\end{array} \\

\hline 0.00087 & 0.119 \\

0.00021 & 0.118 \\

0 & 0.099 \\

0.00115 & 0.118 \\

0 & 0.277 \\

0 & 0.358 \\

0.00013 & 0.08 \\

0.00069 & 0.158 \\

0.00039 & 0.31 \\

0 & 0.105 \\

0 & 0.073 \\

0.046 & 0.832 \\

0.0194 & 0.517 \\

0.137 & 2.252 \\

0.0214 & 0.851 \\

0.0175 & 0.269 \\

0.0764 & 0.433 \\

0 & 0.141 \\

0.0165 & 0.275 \\

0.00012 & 0.135 \\

0.0041 & 0.175 \\

\hline

\end{array}

$$

Display 3.79 can't copy Arsenic concentrations.

a. Which point do you think has the most influence on the slope and correlation? What would be the effect of removing this point? Perform the calculations to see if your intuition is correct.

b. Find a point that you think has almost no influence on the slope and correlation. Perform the calculations to see if your intuition is correct.

c. Find a point whose removal you think would make the correlation increase. Perform the calculations to see if your intuition is correct.

Check back soon!

How effective is a disinfectant? The data in Display 3.80 show (coded) bacteria colony counts on skin samples before and after a disinfectant is applied.

$$

\begin{array}{rrrr}

x & y & \text { Predicted } & \text { Residual } \\

\hline 11 & 6 & -?- & -?- \\

8 & 0 & -?- & -?- \\

5 & 2 & -?- & -?- \\

14 & 8 & -?- & -?- \\

19 & 11 & -?- & -?- \\

6 & 4 & -?- & -?- \\

10 & 13 & -t- & -?- \\

6 & 1 & -?- & -?- \\

11 & 8 & -?- & -?- \\

3 & 0 & -?- & \rightarrow- \\

\hline

\end{array}

$$

Display 3.80 Coded bacteria colony counts before $(x)$ and after $(y)$ treatment. [Source:

Snedecor and Cochran, Statistical Methods (Iowa State University Press, 1967), p. 422.]

a. Plot the data, fit a regression line to them, and complete a copy of the table, filling in the predicted values and residuals.

b. Plot the residuals versus $x$, the count before the treatment. Comment on the pattern.

c. Use the residual plot to determine for which skin sample the disinfectant was unusually effective and for which skin sample it was not very effective.

Victor Salazar

Numerade Educator

Textbook prices. Display 3.81 compares recent prices at a college bookstore to those of a large online bookstore.

a. The equation of the regression line is online $=-3.57+1.03$ college . Interpret this equation in terms of textbook prices

$$

\begin{array}{lcr}

\text { Type of Textbook } & \begin{array}{c}

\text { College } \\

\text { Bookstore } \\

\text { Price (\$) }

\end{array} & \begin{array}{c}

\text { Online } \\

\text { Bookstore } \\

\text { Price (\$) }

\end{array} \\

\hline \text { Chemistry } & 93.40 & 94.18 \\

\text { Classic Fiction } & 9.95 & 7.96 \\

\text { English Anthology } & 46.70 & 48.75 \\

\text { Calculus } & 76.00 & 94.15 \\

\text { Biology } & 86.70 & 80.95 \\

\text { Statiotioc } & 7.95 & 6.36 \\

\text { Dictionary } & 24.00 & 16.80 \\

\text { Style Manual } & 12.70 & 10.66 \\

\text { Art History } & 66.00 & 45.50 \\

\hline

\end{array}

$$

Display 3.81 can't copy Prices for a sample of textbooks at a college bookstore and an online bookstore.

b. Construct a residual plot. Interpret it and point out any interesting features.

c. In comparing the prices of the textbooks, you might be more interested in a different line: $y=x$. Draw this line on a copy of the scatterplot in Display 3.81. What does it mean if a point lies above this line? Below it? On it?

d. A boxplot of the differences college price - online price is shown in Display 3.82. Interpret this boxplot.

Display 3.82 can't copy A boxplot of the differences between the college price and the online price for various textbooks.

Erika Bustos

Numerade Educator

Pizzas, again. Display 3.83 shows the pizza data from E12 on page 134, with its regression line.

Display 3.83 can't copy Calories versus fat, per 5 -oz serving, for seven kinds of pizza.

a. Estimate the residuals from the graph, and use your estimates to sketch a rough version of a residual plot for this data set.

b. Which pizza has the largest positive residual? The largest negative residual? Are any of the residuals so extreme as to suggest that those pizzas should be regarded as exceptions?

c. Is any one of the pizzas a highly influential data point? If so, specify which one(s), and describe the effect on the slope of the fitted line and the correlation of removing the influential point or points from the analysis.

Sheryl Ezze

Numerade Educator

Aircraft. Look again at Display 3.76 on page 174, which shows a scatterplot of flight length versus number of seats.

a. Does the slope of the pattern increase, decrease, or stay roughly constant as you move from left to right across the plot?

b. Focusing on the varia

tion (spread) in flight length, $y$, for planes with roughly the same seating capacity, compare the spreads for planes with few seats, a moderate number of seats, and a large number of seats. As you move from left to right across the plot, how does the spread change, if at all?

c. Suppose a friend chose a plane from the sample at random and told you the approximate number of seats. Could you guess its flight length to within 500 miles if the number of seats was between 50 and 150 ? If it was between 200 and 300 ? Explain.

d. What is the relationship between your answer in part $\mathrm{b}$ and residual plot $\mathrm{I}$ in Display 3.77?

e. Give an explanation for why the variation in flight length shows the pattern it does.

R M

Numerade Educator

Match each scatterplot ( A-D ) in Display 3.84 with its residual plot (I-IV) in Display 3.85. For which plots is a linear regression appropriate?

Display 3.84 can't copy Four scatterplots.

Display 3.85 can't copy Four residual plots.

Check back soon!

Can either of the plots in Display 3.86 be a residual plot? Explain your reasoning.

Display 3.86 can't copy Residual Plots?

Sophie Knight

Numerade Educator

Display 3.87 gives the data set for the three passenger jets from the example on page 123, along with a scatterplot showing the least squares line. (Values have been rounded.)

a. Use the equation of the line to find predicted values and residuals to complete the table in Display 3.87.

b. Use your numbers from part a to construct two residual plots, one with the predictor, $x$, on the horizontal axis and the other with the predicted value, $\hat{y}$, on the horizontal axis. How do the two plots differ?

$$

\begin{array}{lcccc}

\text { Aircraft } & \text { Seats } & \text { Cost } & \text { Predicted } & \text { Residual } \\

\hline \text { ERJ-145 } & 50 & 1100 & -?- & -?- \\

\text { DC-9 } & 100 & 2100 & -?- & -?- \\

\text { MD-90 } & 150 & 2700 & -?- & -?- \\

\hline

\end{array}

$$

Number ot Seats

Display 3.87 can't copy Cost per hour versus number of seats for three models of the passenger aircraft.

Tyler Moulton

Numerade Educator

Explain why a residual plot of ( $x$, residual ) and a plot of ( predicted value, residual ) have exactly the same shape if the slope of the regression line is positive. What changes if the slope is negative?

Tyler Moulton

Numerade Educator

Can you recapture the scatterplot from the residual plot? The residual plot in Display 3.88 was calculated from data showing the recommended weight (in pounds) for men at various heights over $64 \mathrm{in}$. The fitted weights ranged from $145 \mathrm{lb}$ to $187 \mathrm{lb}$. Make a rough sketch of the scatterplot of these data.

Display 3.88 can't copy Residuals of recommended weight versus height for men.

Check back soon!

The plot in Display 3.89 shows the residuals resulting from fitting a line to the data for female life expectancy (life exp) versus gross national product (GNP, in thousands of dollars per capita) for a sample of countries from around the world. The regression equation for the sample data was

$$

\text { life exp }=67.00+0.63 \mathrm{GNP}

$$

Sketch the scatterplot of life exp versus GNP.

Display 3.89 can't copy Residuals of female life expectancy versus gross national product.

Jameson Kuper

Numerade Educator

For the data in Display 3.97 on page 185 , try fitting a straight line to the square root of flight length as a function of speed. Does this transformation work as well as the log transformation? Explain your reasoning.

James Kiss

Numerade Educator

More dying dice. Follow the same steps as in P26 on page 194 for these numbers of surviving dice: $200,72,28,9,5,2$, and 1 . Use your data to estimate what the probability of "dying" was in order to generate these numbers.

Sheryl Ezze

Numerade Educator

Growing kids. Median heights and weights of growing boys are presented in Display 3.113. What model would you choose to predict weight from a boy's known height?

$$

\begin{array}{rcc}

\text { Age (yr) } & \text { Helght (in.) } & \text { Weight (lb) } \\

\hline 2 & 35.8 & 28.03 \\

3 & 39.1 & 31.68 \\

4 & 41.4 & 35.90 \\

5 & 44.2 & 40.66 \\

6 & 46.8 & 45.72 \\

\hline 7 & 49.6 & 50.97 \\

8 & 51.7 & 56.65 \\

9 & 54.1 & 63.10 \\

10 & 56.3 & 70.60 \\

11 & 58 & 79.35 \\

12 & 60.8 & 89.47 \\

13 & 6.7 & 100.78 \\

14 & 66.6 & 112.71 \\

\hline

\end{array}

$$

Display 3.113 can't copy Median heights and weights of growing boys. [Source: National Heakh and Nutrition Examination Survey (NHANES), 2002, www cde.gov.]

Adrian Co

Numerade Educator

Cost per seat per mile and flight length, revisited. As you saw in $\mathrm{P} 25$ on page 174, when cost per seat per mile is plotted against flight length, the pattern is not linear. The residual plot in Display 3.77 on page 174 strongly suggests that two line segments might provide a better model than a single line. Apparently, there is one relationship for aircra meant for longer routes and another for aircra meant for shorter routes. Look through the complete listing of the data (Display 3.12 on page 115) and the scatterplots from $\mathrm{P} 25$ to see whether any features other than ight length separate the aircra into the same two groups.

Jon Southam

Numerade Educator

Chimp hunting parties. After Jane Goodall discovered that chimpanzees are not solely vegetarian, much research began into the behavior of chimpanzees as hunters. Some animals hunt alone or in small groups, while others hunt in large groups. Where does the chimp fit in, and what is the success rate of chimps' hunting parties? Not surprisingly, the success of the hunt depends in part on the size of the hunting party. Display 3.114 gives some data on the number of chimps in a hunting party and the success rate of parties of that size.

$$

\begin{array}{|c|c|}

\hline \text { Number of Chimps } & \text { Percentage Successful } \\

\hline 1 & 20 \\

\hline 2 & 30 \\

\hline 3 & 28 \\

\hline 4 & 42 \\

\hline 5 & 40 \\

\hline 6 & 58 \\

\hline 7 & 45 \\

\hline 8 & 62 \\

\hline 9 & 65 \\

\hline 10 & 63 \\

\hline 12 & 75 \\

\hline 13 & 75 \\

\hline 14 & 78 \\

\hline 15 & 75 \\

\hline 16 & 82 \\

\hline

\end{array}

$$

Display 3.114 Hunting party size and percentage of success. [Sources: Mathematics Teacher, August 2005, p. 13; C. B. Stanford, "Chimpanze Hunting Behavior and Human Evolution,' American Scientist 83 (1995).]

a. Plot the data in a way that allows the building of a model to predict success from size of hunting party. Describe the pattern you see.

b. Will a simple linear regression model work well here? Why or why not?

c. Look for a transformation that will produce a model with better predicting ability than the simple linear one. Fit the model to the data.

d. Investigate the residuals from the model in part c. Are you happy with the fit of that model?

Joanna Quigley

Numerade Educator

The data in Display 3.115 are the population of the United States from 1830 through 2000 and the number of immigrants entering the country in the decade preceding the given year.

$$

\begin{array}{ccc}

\text { Year } & \text { U.S. Population } & \begin{array}{c}

\text { Immigration } \\

\text { (in thousands) }

\end{array} \\

\hline 1830 & 12,866,020 & 152 \\

\hline 1840 & 17,069,453 & 599 \\

\hline 1850 & 23,191,876 & 1713 \\

\hline 1860 & 31,433,321 & 2598 \\

\hline 1870 & 39,818,449 & 2315 \\

\hline 1880 & 50,155,783 & 1812 \\

\hline 1890 & 62,947,714 & 5247 \\

\hline 1900 & 75,994,575 & 3688 \\

\hline 1910 & 91,972,266 & 8795 \\

\hline 1920 & 105,710,620 & 5736 \\

\hline 1930 & 122,775,046 & 4107 \\

\hline 1940 & 131,669,275 & 528 \\

\hline 1950 & 150,697,361 & 1035 \\

\hline 1960 & 179,323,175 & 2515 \\

\hline 1970 & 203,302,031 & 3322 \\

\hline 1980 & 226,542,199 & 4493 \\

\hline 1990 & 248,709,873 & 7338 \\

\hline 2000 & 281,421,906 & 9095 \\

\hline

\end{array}

$$

Display 3.115 Population and immigration in the United States, 1830-2000. [Source: U.S. Census Bureau, Statistical Abstract of the United States, 2004-2005.]

a. Find the population growth for each decade. Was the increase in population constant from decade to decade? How would you describe the pattern?

b. Fit a model to the (year, population) data and defend your model as representative of the major trend(s) in U.S. population growth.

c. Make a plot over time of the immigration by decade. Describe the pattern you see here. Can you fit one of the models from this section to data that look like this?

Carson Merrill

Numerade Educator

Display 3.116 gives data about passengers on United Airlines flight 815, Chicago-O'Hare to Los Angeles, on October 31, 1997. There were 186 passengers, but the data concern those 33 passengers who had tickets for the Chicago-to-Los Angeles leg only. The variables are

$X$ : number of days before the fiight that the

$Y$: price of the ticket

Display 3.116 can't copy Number of days before the fl ight that the ticket was purchased and price of airline ticket. [Source: New York Times, Weekly Review, April 12, 1998.]

Because it is in the airline's interest to sell tickets early, you might expect $Y$ to be negatively associated with $X$.

It happens that the first four cases in Display 3.116 are for passengers who first class, and those passengers pay more than other passengers no matter when they purchase their ticket. So you can justify examining the data for the 29 economy-class passengers only.

Finally, one passenger paid $\$ 0$ because he or she used frequent-flyer miles. You are justi ed in deleting this value from the data set if the goal is to find a model that relates price to time of purchase.

Can you find a model that relates the cost of the ticket to the number of days in advance that the ticket was purchased? Explain the problems you encounter in doing this.

Ahmed Kamel

Numerade Educator

Different body organs use different amounts of oxygen, even when you take their mass into consideration. For example, the brain uses more oxygen per kilogram of tissue than the lungs do. Scientists are interested in how oxygen consumption is related to the mass of an animal and whether that relationship differs from organ to organ. The data in Display 3.117 show typical body mass, oxygen consumption in brain tissue, and oxygen consumption in lung tissue for a selection of animals. (Oxygen consumption often is measured in milliliters per hour per gram of tissue, but the actual units were not recorded for these data.)

a. As you can see from the table, as total body mass increases, the oxygen consumption in brain tissue tends to go down. Define a function that models this situation. Then find a way to describe the rate of decrease.

b. Repeat part a for lung tissue. How does this relationship differ from that of brain tissue?

c. It is known that the proportion of body mass concentrated in the brain decreases appreciably as the size of the animal increases, whereas the proportion concentrated in the lungs remains relatively constant. One possible theory on oxygen consumption is that the rates of consumption within organ tissue can be explained largely by the relative size of the organ within the body. Is this theory supported by the data? Explain your reasoning.

$$

\begin{array}{lccc}

\text { Animal } & \begin{array}{c}

\text { Body Mass } \\

(\mathrm{kg})

\end{array} & \begin{array}{c}

\text { Brain Orygen } \\

\text { Consumption }

\end{array} & \begin{array}{c}

\text { Lung Oxygen } \\

\text { Consumption }

\end{array} \\

\hline \text { Mouse } & 0.021 & 32.9 & 12.0 \\

\text { Rat } & 0.210 & 26.3 & 8.6 \\

\text { Guinea pig } & 0.510 & 27.3 & 8.5 \\

\text { Rabbit } & 1.050 & 28.2 & 8.0 \\

\text { Cat } & 2.750 & 26.9 & 3.9 \\

\text { Dog } & 15.900 & 21.2 & 4.9 \\

\text { Sheep } & 49.000 & 19.7 & 5.4 \\

\text { Cattle } & 420.000 & 17.2 & 4.3 \\

\text { Horse } & 725.000 & 15.7 & 4.4 \\

\hline

\end{array}

$$

Display 3.117 Oxygen use by certain animal organs. The oxygen measurements are coded values (original measurements not given) but are still comparable.

[Source: K. Schmidt-Nielsen, Why Is Animal Size So Important? (Cambridge University Press, 1984), p. 94.]

Carter Rogers

Numerade Educator

How is the birthrate of countries related to their economic output? Do richer countries have higher birthrates, perhaps because families can a ord more children? Or do poorer countries have higher birthrates, perhaps due to the need for family workers and a lack of education? Display 3.118 shows the birthrates (number of births per thousand population) and the GNP (in thousands of dollars per capita) for a selection of countries from around the world.

a. Construct a scatterplot of these data and comment on the pattern you observe.

b. Fit a statistical model to these data and interpret the slope and intercept of the model in the context of the data.

$$

\begin{array}{lcr}

\text { Country } & \begin{array}{c}

\text { Birthrate } \\

\text { (per 1000) }

\end{array} & \text { GNP } \\

\hline \text { Ageria } & 18.9 & 1.7 \\

\hline \text { Argentina } & 17.7 & 4.2 \\

\hline \text { Australia } & 12.7 & 19.5 \\

\text { Brazil } & 18.1 & 2.8 \\

\text { Canada } & 11.1 & 22.4 \\

\hline \text { China } & 12.8 & 1.0 \\

\hline \text { Colombia } & 22 & 1.8 \\

\hline \text { Denmark } & 12 & 30.3 \\

\hline \text { Esypt } & 24.9 & 1.5 \\

\hline \text { France } & 12.7 & 22.2 \\

\hline \text { Gernaany } & 8.8 & 22.7 \\

\hline \text { India } & 23.8 & 0.5 \\

\hline \text { Indonesia } & 21.9 & 0.7 \\

\hline \text { Israd } & 18.9 & 16.0 \\

\hline \text { Japan } & 9.6 & 34.0 \\

\hline \text { Malaysia } & 24.2 & 3.5 \\

\hline \text { Mexico } & 22.3 & 5.9 \\

\hline \text { Nigeria } & 39.2 & 0.3 \\

\hline \text { Pakistan } & 32.8 & 0.4 \\

\hline \text { Philippines } & 26.8 & 1.0 \\

\hline \text { Russia } & 9.2 & 2.1 \\

\hline \text { South Africat } & 19.4 & 2.5 \\

\hline \text { Spain } & 10 & 14.6 \\

\hline \text { United Kingdom } & 11.1 & 25.5 \\

\hline \text { United States } & 14.2 & 35.4 \\

\hline

\end{array}

$$

Display 3.118 Birthrates and GNP for selected countries, 2002. [Source: U.S. Census Bureau, Statistical Abstract of the United States, 2004–2005.]

Check back soon!

According to the National Center for Health Statistics, the percentage of males who smoke has decreased markedly over the past 40 years, but there still may be some interesting trends to observe. Display 3.119 shows the percentage of males who smoke in selected years for various age groups.

a. Study the trend in the percentage of smokers for the entire male population age 18 and over. The points follow the pattern of exponential decay. How should you modify the percentages before taking their logarithms? Fit the model and interpret the slope.

b. Study the trend in the percentage of smokers for the group age 18 to 24 . What model would you use to explain the relationship between the percentage of smokers and the year for this age group? Explain your reasoning. What feature makes this data set more difficult to model than the data set in part a?

c. Study the trend in the percentage of smokers for the group age 65 and over. Does this group show the same kind of trend as seen in the two groups studied in parts $\mathrm{a}$ and $\mathrm{b}$ ? Explain.

$$

\begin{array}{ccccccc}

\text { Year } & 18-24 & 25-34 & 35-44 & 45-64 & 65+ & 18+ \\

\hline 1965 & 54.1 & 60.7 & 58.2 & 51.9 & 28.5 & 51.6 \\

\hline 1974 & 42.1 & 50.5 & 51.0 & 42.6 & 24.8 & 42.9 \\

\hline 1979 & 35.0 & 43.9 & 41.8 & 39.3 & 20.9 & 37.2 \\

\hline 1983 & 32.9 & 38.8 & 41.0 & 35.9 & 22.0 & 34.7 \\

\hline 1985 & 28.0 & 38.2 & 37.6 & 33.4 & 19.6 & 32.1 \\

\hline 1987 & 28.2 & 44.8 & 36.6 & 33.5 & 17.2 & 31.0 \\

\hline 1988 & 25.5 & 36.2 & 36.5 & 31.3 & 18.0 & 30.1 \\

\hline 1990 & 26.6 & 31.6 & 34.5 & 29.3 & 14.6 & 28.0 \\

\hline 1991 & 23.5 & 32.8 & 33.1 & 29.3 & 15.1 & 27.5 \\

\hline 1992 & 28.0 & 32.8 & 32.9 & 28.6 & 16.1 & 28.2 \\

\hline 1993 & 28.8 & 30.2 & 32.0 & 29.2 & 13.5 & 27.5 \\

1994 & 29.8 & 31.4 & 33.2 & 28.3 & 13.2 & 28.2 \\

1995 & 27.8 & 29.5 & 31.5 & 27.1 & 14.9 & 27.0 \\

1997 & 31.7 & 30.3 & 32.1 & 27.6 & 12.8 & 27.6 \\

\hline 1998 & 31.3 & 28.6 & 30.2 & 27.7 & 10.4 & 26.4 \\

\hline 1999 & 29.5 & 29.1 & 30.0 & 25.8 & 10.5 & 25.7 \\

\hline 2000 & 28.5 & 29.0 & 30.2 & 26.4 & 10.2 & 25.7 \\

\hline 2001 & 30.4 & 27.2 & 27.4 & 26.4 & 11.5 & 25.2 \\

\hline 2002 & 32.4 & 27.5 & 29.7 & 24.5 & 10.1 & 25.2 \\

\hline

\end{array}

$$

Display 3.119 Percentage of males who smoke by age group and year. [Source: National Center for Health Statistics, 2003.]

Carson Merrill

Numerade Educator

Is global warming a reality? One measure of global warming is the amount of carbon dioxide $\left(\mathrm{CO}_2\right)$ in the atmosphere. Display 3.120 gives the annual average carbon dioxide levels (in parts per million ) in the atmosphere over Mauna Loa Observatory in Hawaii for the years 1959 through 2003.

a. Plot the data and describe the trend over the years.

b. Fit a straight line to the data and look at the residuals. Describe the pattern you see.

c. Suggest another model that might fit these data well. Fit the model and assess how well it removes the pattern from the residuals.

d. Use the model you like best to describe numerically the growth rate in atmospheric carbon dioxide over Hawaii.

Display 3.120 can't copy Carbon dioxide in the atmosphere.

[Source: Mauna Loa Observatory.]

Robin Corrigan

Numerade Educator

How does the average SAT math score for students in a state relate to the percentage of students taking the exam? Display 3.121 shows the average SAT math score for each state in 2005 , along with the percentage of high school seniors taking the exam. Find a model that seems like a good predictor of average SAT math scores based on knowledge of the percentage of seniors taking the exam.

Display 3.121 can't copy Average SAT math scores by state.

[Source: College Board, www.collegeboard.com.]

Jerelyn Nevil

Numerade Educator

Leonardo’s rules. A class of 15 students recorded the measurements in Display 3.122 for Activity 3.3a.

$$

\begin{array}{ccccc}

\text { Student } & \text { Height } & \begin{array}{c}

\text { Arm } \\

\text { Span }

\end{array} & \begin{array}{c}

\text { Kneeling } \\

\text { Height }

\end{array} & \begin{array}{c}

\text { Hand } \\

\text { Length }

\end{array} \\

\hline 1 & 170.5 & 1680 & 126.0 & 18.0 \\

2 & 1700 & 1720 & 129.5 & 18.0 \\

3 & 107.0 & 101.0 & 79.5 & 10.0 \\

\hline 4 & 159.0 & 161.0 & 116.0 & 16.0 \\

5 & 1660 & 1660 & 122.0 & 18.0 \\

6 & 1750 & 1740 & 125.0 & 19.5 \\

7 & 158.0 & 153.5 & 116.0 & 16.0 \\

8 & 95.5 & 95.0 & 71.5 & 100 \\

9 & 132.5 & 129.0 & 95.0 & 11.5 \\

10 & 1650 & 1690 & 124.0 & 17.0 \\

11 & 1790 & 1750 & 131.0 & 20.0 \\

12 & 1490 & 154.0 & 109.5 & 15.5 \\

13 & 1430 & 1420 & 111.5 & 16.0 \\

14 & 158.0 & 156.5 & 119.0 & 17.5 \\

\hline 15 & 161.0 & 1640 & 121.0 & 16.5 \\

\hline

\end{array}

$$

Display 3.122 Sample measurements, in centimeters, for Activity 3.3a.

a. Construct scatterplots and fit least squares lines for each of Leonardo's rules in Activity 3.3a. Do the rules appear to hold?

b. Interpret the slopes of your regression lines.

c. If appropriate, find the value of $r$ for each of the three relationships. Which correlation is strongest? Which is weakest?

Check back soon!

Space Shuttle Challenger. On January 28, 1986, because two O-rings did not seal properly, Space Shuttle Challenger exploded and seven people died. The temperature predicted for the morning of the flight was between $26^{\circ} \mathrm{F}$ and $29^{\circ} \mathrm{F}$. The engineers were concerned that the cold temperatures would cause the rubber $\mathrm{O}$-rings to malfunction. On seven previous flights at least one of the twelve $\mathrm{O}$-rings had shown some distress. The NASA officials and engineers who decided not to delay the flight had available to them data like those on the scatterplot in Display 3.123 can't copy before they made that decision.

Display 3.123 Flights when at least one O-ring showed some distress.

a. Why did it seem reasonable to launch despite the low temperature?

b. Display 3.123 contains information only about flights that had $\mathrm{O}$-ring failures. Data for all flights were available on a table like the one in Display 3.124. Add the missing points to a copy of the scatterplot in Display 3.123. How do these data affect any trend in the scatterplot? Would you have recommended launching the space shuttle if you had seen the complete plot? Why or why not?

Display 3.124 can't copy Challenger O-ring data. [Source:

Siddhartha R. Dalal et al., "Risk Analysis of the Space Shuttle: Pre-Challenger Prediction of Fallure," Journal of the American Statistical Association 84 (1989): 945-47.]

Christopher Stanley

Numerade Educator

Exam scores. Students' scores on two exams in a statistics course are given in Display 3.125 along with a scatterplot with regression line and a residual plot. The regression equation is Exam $2=51.0+0.430($ Exam 1$)$, and the correlation, $r$, is 0.756 .

$$

\begin{array}{cc}

\text { Exam 1 } & \text { Exam 2 } \\

\hline 80 & 88 \\

52 & 83 \\

87 & 87 \\

95 & 92 \\

67 & 75 \\

71 & 78 \\

97 & 97 \\

96 & 85 \\

88 & 93 \\

100 & 93 \\

88 & 86 \\

86 & 85 \\

81 & 81 \\

61 & 73 \\

97 & 92

\end{array}

$$

$$

\begin{array}{cc}

\text { Exam 1 } & \text { Exam 2 } \\

\hline 96 & 99 \\

\hline 78 & 90 \\

93 & 88 \\

92 & 92 \\

\hline 91 & 93 \\

96 & 92 \\

\hline 69 & 73 \\

76 & 87 \\

91 & 91 \\

98 & 97 \\

83 & 89 \\

96 & 83 \\

95 & 97 \\

80 & 86 \\

\hline

\end{array}

$$

Display 3.125 can't copy Data for exam scores in a statistics class, with scatterplot and residual plot.

a. Is there a point that is more influential than the other points on the slope of the regression line? How can you tell from the scatterplot? From the residual plot?

b. How will the slope change if the scores for this one influential point are removed from the data set? How will the correlation change? Calculate the slope and correlation for the revised data to check your estimate.

c. Construct a residual plot of the revised data. Does a linear model fit the data well?

d. Refer to the scatterplot of Exam 2 versus Exam I in Display 3.125. Does this plot illustrate regression to the mean? Explain your reasoning.

James Kiss

Numerade Educator

Suppose you have the Exam 1 and Exam 2 scores of all students enrolled in U.S. History.

a. The slope of the regression line for predicting the scores on Exam 2 from the scores on Exam 1 is 0.51 . The standard deviation for Exam I scores is 11.6, and the standard deviation for Exam 2 scores is 7.0 . Use only this information to find the correlation coefficient for these scores.

b. Suppose you know, in addition, that the means are 82.3 for Exam 1 and 87.8 for Exam 2. Find the equation of the least squares line for predicting Exam 2 scores from Exam 1 scores.

Sheryl Ezze

Numerade Educator

You are given a list of six values, $-1.5,-0.5$, $0,0,0.5$, and 1.5 , for $x$ and the same list of six values for $y$. Note that the list has mean 0 and standard deviation 1.

a. Match each $x$-value with a $y$-value so that the resulting six pairs $(x, y)$ have correlation 1.

b. Match the $x$ - and $y$-values again so that the points have the largest possible correlation less than 1 .

c. Match the values again, this time to get a correlation as close to 0 as possible.

d. Match the values a fourth time to get a correlation of -1 .

James Kiss

Numerade Educator

Display 3.126 lists the values of six variables with a "scatterplot matrix" showing all 30 possible scatterplots for these variables. For example, the first scatterplot in the first row has variable $\mathrm{B}$ on the $x$-axis and variable A on the $y$-axis. The first scatterplot in the second row has variable $\mathrm{A}$ on the $x$-axis and variable $\mathrm{B}$ on the $y$-axis.

a. For five pairs of variables the correlation is exactly 0 , and for one other pair it is 0.02 , or almost 0 . Identify these six pairs of variables. What do they have in common?

b. At the other extreme, one pair of variables has correlation 0.87 ; the next highest correlation is 0.58 , and the third highest is 0.45 . Identify these three pairs, and put them in order from strongest to weakest correlation.

c. Of the remaining six pairs, four have correlations of about 0.25 ( give or take a little) and two have correlations of about

$$

\begin{array}{rrrrrr}

\text { A } & \text { B } & \text { C } & \text { D } & \text { E } & F \\

\hline-4 & -2 & -2 & 0 & -6 & 0 \\

-3 & -1 & 0 & -2 & -5 & -1 \\

-2 & 0 & 0 & 0 & -5 & 0 \\

-1 & 1 & 0 & 2 & -5 & 1 \\

0 & 2 & 2 & 0 & -4 & 0 \\

0 & 2 & -1 & -1 & 4 & 0 \\

1 & 1 & -1 & 1 & 5 & -1 \\

2 & 0 & 1 & -1 & 5 & 0 \\

3 & -1 & 1 & 1 & 5 & 1 \\

4 & -2 & 0 & 0 & 6 & 0 \\

\hline

\end{array}

$$

Display 3.126 can't copy Data table for 6 variables and a “scatterplot matrix” of all 30 possible scatterplots for the variables.

0.1 (give or take a little). Which four pairs have correlations around 0.25 ?

d. Choose several scatterplots that you think best illustrate the phrase "the correlation measures direction and strength but not shape," and use them to show what you mean.

James Kiss

Numerade Educator

Decide whether each statement is true or false, and then explain your decision.

a. The correlation is to bivariate data what the standard deviation is to univariate data.

b. The correlation measures direction and strength but not shape.

c. If the correlation is near 0 , knowing the value of one variable gives you a narrow interval of likely values for the other variable.

d. No matter what data set you look at, the correlation coefficient, $r$, and least squares slope, $b_1$, will always have the same sign.

Jerrah Biggerstaff

Numerade Educator

Look at the scatterplot of average SAT I math scores versus the percentage of students taking the exam in Display 3.7 on page 112 .

a. Estimate the correlation.

b. What possibly important features of the plot are lost if you give only the correlation and the equation of the least squares line?

c. Sketch what you think the residual plot would look like if you fitted one line to all the points.

Lucas Finney

Numerade Educator

The correlation between in-state tuition and out-of-state tuition, measured in dollars, for a sample of public universities is 0.80 .

a. Rewrite the sentence above so that someone who does not know statistics can understand it.

b. Does the correlation change if you convert tuition costs to thousands of dollars and recompute the correlation? Does it change if you take logarithms of the tuition costs and recompute the correlation?

c. Does the slope of the least squares line change if you convert tuition costs to thousands of dollars and recompute the slope? Does it change if you take logarithms of the tuition costs and recompute the slope?

Nick Johnson

Numerade Educator

Display 3.127 shows a scatterplot divided into quadrants by vertical and horizontal lines that pass through the point of averages. $(\bar{x}, \bar{y})$.

a. For each of the four quadrants, give the sign of $z_x$ (the standardized value of $x$ ), $z_y$ (the standardized value of $y$ ), and theii product $z_x \cdot z_y$.

b. Which point(s) make the smallest contribution to the correlation? Explain why the contributions are small.

Display 3.127 can't copy A scatterplot divided into quadrants by lines passing through the means.

Kaylee Mcclellan

Numerade Educator

Rank these summaries for three sets of bivariate data by the strength of the relationship, from weakest to strongest.

A. $\hat{y}=90+100 x$

$s_x=5$

$s_y=1000$

B. $\hat{y}=\frac{x}{3}-12$

$s_x=0.9$

$s_y=1$

C. $\hat{y}=1.05+0.01 x$

$s_x=0.05$

$s_y=0.002$

Carson Merrill

Numerade Educator

There's an extremely strong relationship between the price of books online and the price at your local bookstore.

a. Does this mean the prices are almost the same?

b. Explain why it is wrong to say that the prices online "cause" the prices at your local bookstore. Why is the relationship so strong if neither set of prices causes the other?

Caleb Wood

Numerade Educator

Describe a set of cases and two variables for which you would expect to see regression toward the mean.

Check back soon!

Life spans. In Chapter 2, you looked at the characteristics of mammals ( given in Display 2.24 on page 43 ) one at a time. Now you can look at the relationship between two variables. For example, is longevity associated with gestation period? The variables are average longevity in years, maximum longevity in years, gestation period in days, and speed in miles per hour.

a. Construct a scatterplot of gestation period versus maximum longevity. Describe what you see, including an estimate of the correlation.

b. Repeat part a, with average longevity in place of maximum longevity. Does the average longevity or the maximum longevity give a better prediction of the gestation period?

c. Does speed appear to be associated with average longevity?

Trent Speier

Numerade Educator

Spending for police. The data in Display 3.128 give the number of police officers, the total expenditures for police officers, the population, and the violent crime rate for a sample of states in 2000 .

a. Explore and summarize the relationship between the number of police officers and total expenditures for police.

b. Explore and summarize the relationship between the population of the states and the number of police officers they employ.

c. Is the number of police officers strongly related to the rate of violent crime in these states? Explain. Find a transformation that straightens these data. Check the linearity of your transformed data with a residual plot.

$$

\begin{array}{|c|c|c|c|c|}

\hline \text { State } & \begin{array}{l}

\text { Number of } \\

\text { Police Officers } \\

\text { (In thousands) }

\end{array} & \begin{array}{l}

\text { Expenditures } \\

\text { for Police } \\

\text { (in millions } \\

\text { of dollars) }

\end{array} & \begin{array}{l}

\text { Population } \\

\text { (in millions) }

\end{array} & \begin{array}{l}

\text { Violent Crime } \\

\text { Rate (number per } \\

100,000 \text { of state } \\

\text { population) }

\end{array} \\

\hline \text { California } & 96.9 & 7653 & 340 & 622 \\

\hline \text { Colorado } & 12.0 & 753 & 4.3 & 34 \\

\hline \text { Florida } & 55.2 & 3371 & 160 & 812 \\

\hline \text { Illinots } & 44.1 & 2718 & 12.4 & 657 \\

\hline \text { Iowa } & 7.3 & 346 & 2.9 & 266 \\

\hline \text { Louisiana } & 16.1 & 635 & 4.5 & 681 \\

\hline \text { Maine } & 3.1 & 118 & 1.3 & 110 \\

\hline \text { Mississippi } & 8.6 & 337 & 2.8 & 361 \\

\hline \text { New Jensey } & 33.4 & 1829 & 8.4 & 384 \\

\hline \text { Tennessee } & 18,1 & 828 & 5.7 & 707 \\

\hline \text { Texas } & \$ 8,9 & 2866 & 20.9 & 545 \\

\hline \text { Virginia } & 18.8 & 965 & 7.1 & 282 \\

\hline \text { Washington } & 14.1 & 854 & 5.9 & 370 \\

\hline

\end{array}

$$

Display 3.128 Number of police officers and related variables. [Source: U.S. Census Bureau, Statistical Abstract of the United States, 2004–2005.]

Tyler Moulton

Numerade Educator

House prices. Display 3.129 gives the selling prices for all houses sold in a Florida community in one month.

a. Construct a model to predict the selling price from the area, transforming any variables, if necessary. Would you use the same model for both new and used houses?

b. Are there any influential observations that have a serious effect on the model? If so, what would happen to the slope of the prediction equation and the correlation if you removed this (or these) point(s) from the analysis?

c. Predict the selling price of an old house measuring $1000 \mathrm{sq} \mathrm{ft}$. Do the same for an old house measuring $2000 \mathrm{sq} \mathrm{ft}$. Which prediction do you feel more confident about? Explain.

d. Explain the effect of the number of bathrooms on the selling price of the houses. Is it appropriate to fit a regression model to price as a function of the number of bathrooms and interpret the results in the usual way? Why or why not?

$$

\begin{array}{|c|c|c|c|c|c|}

\hline \text { House } & \begin{array}{l}

\text { Price } \\

\text { (S thousands) }

\end{array} & \begin{array}{l}

\text { Area } \\

\text { (thousands } \\

\text { of sq ft) }

\end{array} & \begin{array}{l}

\text { Number of } \\

\text { Bedrooms }

\end{array} & \begin{array}{l}

\text { Number of } \\

\text { Bathrooms }

\end{array} & \begin{array}{l}

\begin{array}{c}

\text { New (1), } \\

\text { Old (0) }

\end{array}

\end{array} \\

\hline 1 & 48.5 & 1.10 & 3 & 1 & 0 \\

\hline 2 & 550 & 101 & 3 & 2 & 0 \\

\hline 3 & 680 & 1.45 & 3 & 2 & 0 \\

\hline 4 & 137,0 & 240 & 3 & 3 & 0 \\

\hline 5 & 309.4 & 330 & 4 & 3 & 1 \\

\hline 6 & 17,5 & 0.40 & 1 & 1 & 0 \\

\hline 7 & 19.6 & 1.28 & 3 & 1 & 0 \\

\hline 8 & 24.5 & 0.74 & 3 & 1 & 0 \\

\hline 9 & 34.8 & 0.78 & 2 & 1 & 0 \\

\hline 10 & 320 & 0.97 & 3 & 1 & 0 \\

\hline 11 & 280 & 0.84 & 3 & \text { I } & 0 \\

\hline 12 & 49,9 & 1.08 & 2 & 2 & 0 \\

\hline 13 & 59.9 & 0.99 & 2 & 1 & 0 \\

\hline 14 & 61.5 & 1.01 & 3 & 2 & 0 \\

\hline 15 & 600 & 134 & 3 & 2 & 0 \\

\hline 16 & 659 & 1.22 & 3 & \text { I } & 0 \\

\hline 17 & 67.9 & 1.28 & 3 & 2 & 0 \\

\hline 18 & 689 & 1.29 & 3 & 2 & 0 \\

\hline 19 & 69.9 & 1.52 & 3 & 2 & 0 \\

\hline 20 & 70.5 & 1.25 & 3 & 2 & 0 \\

\hline 21 & 72.9 & 1.28 & 3 & 2 & 0 \\

\hline 22 & 72.5 & 1.28 & 3 & 1 & 0 \\

\hline 23 & 720 & 136 & 3 & 2 & 0 \\

\hline 24 & 71.0 & 1.20 & 3 & 2 & 0 \\

\hline 25 & 76.0 & 1.46 & 3 & 2 & 0 \\

\hline 26 & 72.9 & 1.56 & 4 & 2 & 0 \\

\hline 27 & 73.0 & 1.22 & 3 & 2 & 0 \\

\hline 28 & 700 & 1.40 & 2 & 2 & 0 \\

\hline 29 & 76.0 & 1.15 & 2 & 2 & 0 \\

\hline 30 & 69.0 & 1.74 & 3 & 2 & 0 \\

\hline 31 & 75.5 & 1.62 & 3 & 2 & 0 \\

\hline 32 & 76.0 & 1.66 & 3 & 2 & 0 \\

\hline 33 & 81.8 & 133 & 3 & 2 & 0 \\

\hline 34 & 84.5 & 1.34 & 3 & 2 & 0 \\

\hline 35 & 835 & 1.40 & 3 & 2 & 0 \\

\hline

\end{array}

$$

Display 3.129

$$

\begin{array}{|c|c|c|c|c|c|}

\hline \text { House } & \begin{array}{l}

\text { Price } \\

\text { (s thousands) }

\end{array} & \begin{array}{l}

\text { Area } \\

\text { (thousands } \\

\text { of } s q \mathrm{ft} \text { ) }

\end{array} & \begin{array}{l}

\text { Number of } \\

\text { Bedrooms }

\end{array} & \begin{array}{l}

\text { Number of } \\

\text { Bathrooms }

\end{array} & \begin{array}{l}

\begin{array}{c}

\text { New (1), } \\

\text { Old (0) }

\end{array}

\end{array} \\

\hline 36 & 86.0 & 1.15 & 2 & 2 & 1 \\

\hline 37 & 86.9 & 1.58 & 3 & 2 & 1 \\

\hline 38 & 869 & 1.58 & 3 & 2 & 1 \\

\hline 39 & 86.9 & 1.58 & 3 & 2 & 1 \\

\hline 40 & 87.9 & 1.71 & 3 & 2 & 0 \\

\hline 41 & 88.1 & 2.10 & 3 & 2 & 0 \\

\hline 42 & 859 & 1.27 & 3 & 2 & 0 \\

\hline 43 & 89.5 & 1.3 & 3 & 2 & 0 \\

\hline 44 & 87.4 & 1.25 & 3 & 2 & 0 \\

\hline 45 & 879 & 1.68 & 3 & 2 & 0 \\

\hline 46 & 88.0 & 1.55 & 3 & 2 & 0 \\

\hline 47 & 900 & 1.55 & 3 & 2 & 0 \\

\hline 48 & 96.0 & 1.36 & 3 & 2 & 1 \\

\hline 49 & 99.9 & 1.51 & 3 & 2 & 1 \\

\hline 50 & 95.5 & 1.54 & 3 & 2 & 1 \\

\hline 51 & 98.5 & 1.51 & 3 & 2 & 0 \\

\hline 52 & 100.1 & 1.85 & 3 & 2 & 0 \\

\hline 53 & 99.9 & 1.62 & 4 & 2 & 1 \\

\hline 54 & 1019 & 1.40 & 3 & 2 & 1 \\

\hline 55 & 1019 & 1.92 & 4 & 2 & 0 \\

\hline 56 & 102.3 & 1.42 & 3 & 2 & 1 \\

\hline 57 & 110.8 & 1.56 & 3 & 2 & 1 \\

\hline 58 & 105.0 & 1.43 & 3 & 2 & 1 \\

\hline 59 & 97.9 & 2.00 & 3 & 2 & 0 \\

\hline 60 & 106.3 & 1.45 & 3 & 2 & 1 \\

\hline 61 & 106.5 & 1.65 & 3 & 2 & 0 \\

\hline 62 & 116.0 & 1.72 & 4 & 2 & 1 \\

\hline 63 & 1080 & 1.79 & 4 & 2 & 1 \\

\hline 64 & 107.3 & 1.85 & 3 & 2 & 0 \\

\hline 65 & 1099 & 2.06 & 4 & 2 & 1 \\

\hline 66 & 1100 & 1.76 & 4 & 2 & 0 \\

\hline 67 & 120.0 & 1.62 & 3 & 2 & 1 \\

\hline 68 & 115.0 & 1.80 & 4 & 2 & 1 \\

\hline 69 & 113.4 & 1.98 & 3 & 2 & 0 \\

\hline 70 & 114.9 & 1.57 & 3 & 2 & 0 \\

\hline 71 & 115.0 & 2.19 & 3 & 2 & 0 \\

\hline 72 & 115.0 & 2.07 & 4 & 2 & 0 \\

\hline 73 & 117.9 & 1.99 & 4 & 2 & 0 \\

\hline 74 & 110.0 & 1.55 & 3 & 2 & 0 \\

\hline 75 & 115.0 & 1.67 & 3 & 2 & 0 \\

\hline 76 & 124.0 & 2.40 & 4 & 2 & 0 \\

\hline 77 & 129.9 & 1.79 & 4 & 2 & 1 \\

\hline 78 & 124.0 & 1.89 & 3 & 2 & 0 \\

\hline 79 & 128.0 & 1.88 & 3 & 2 & 1 \\

\hline 80 & 132.4 & 200 & 4 & 2 & 1 \\

\hline 81 & 139.3 & 2.05 & 4 & 2 & 1 \\

\hline 82 & 139.3 & 200 & 4 & 2 & 1 \\

\hline 83 & 139.7 & 2.03 & 3 & 2 & 1 \\

\hline 84 & 142.0 & 2.12 & 3 & 3 & 0 \\

\hline 85 & 141.3 & 208 & 4 & 2 & 1 \\

\hline

\end{array}

$$

$$

\begin{array}{cccccc}

\text { House } & \begin{array}{c}

\text { Price } \\

\text { (\$ thousands) }

\end{array} & \begin{array}{c}

\text { Area } \\

\text { (thousands } \\

\text { of sq ft) }

\end{array} & \begin{array}{c}

\text { Number of } \\

\text { Bedrooms }

\end{array} & \begin{array}{c}

\text { Number of } \\

\text { Bathrooms }

\end{array} & \begin{array}{c}

\text { New (1), } \\

\text { Old (0) }

\end{array} \\

\hline 86 & 147.5 & 2.9 & 4 & 2 & 0 \\

87 & 142.5 & 2.40 & 4 & 2 & 0 \\

88 & 1480 & 2.40 & 5 & 2 & 0 \\

89 & 149.0 & 3.05 & 4 & 2 & 0 \\

90 & 150.0 & 2.04 & 3 & 3 & 0 \\

91 & 1729 & 2.25 & 4 & 2 & 1 \\

92 & 1900 & 2.57 & 4 & 3 & 1 \\

93 & 2800 & 3.85 & 4 & 3 & 0

\end{array}

$$

Display 3.129 Selling prices of houses in Gainesville, Florida. [Source: Gainesville Board of Realtors, 1995.]

Sheryl Ezze

Numerade Educator

Spending for schools. Display 3.130 provides data on spending and other variables related to public school education for 2001. The variables are defined as

ExpPP expenditure per pupil (in dollars)

ExpPC expenditure per capita (per person in the state, in dollars)

TeaSal average teacher salary (in thousands of dollars)

%Dropout percentage who drop out of school

Enroll number of students enrolled (in thousands)

Teachers number of teachers (in thousands)

a. Examine the association between perpupil expenditure and average teacher salary, with the goal of predicting perpupil expenditure. Is this a cause-andeffect relationship?

b. Analyze the effect of average teacher salary on per-capita expenditure ( spending on public schools divided by the number of people in the state ). Compare the association to the association in part a. Are the relative sizes of the correlations about what you would expect?

c. Are any variables good predictors of the percentage of dropouts? Explain your reasoning.

$$

\begin{array}{lrrrrrr}

\text { State } & \text { ExpPP } & \text { ExpPC } & \text { TeaSal } & \text { \%oDropout } & \text { Enroll } & \text { Teadhers } \\

\hline \text { Alabama } & 6669 & 1097 & 38.2 & 4.1 & 737 & 46.5 \\

\text { Alaska } & 10366 & 2165 & 49.7 & 8.2 & 134 & 8.1 \\

\text { Arizona } & 6547 & 1109 & 40.9 & 10.9 & 922 & 45.1 \\

\hline \text { Arlansas } & 7080 & 1177 & 37.8 & 5.3 & 450 & 31.8 \\

\hline \text { California } & 8442 & 1507 & 56.3 & \text { n/a } & 6249 & 309.8 \\

\text { Colonado } & 9092 & 1499 & 42.7 & \text { n/a } & 742 & 45.4 \\

\text { Connecticut } & 12605 & 2078 & 55.4 & 3 & 570 & 43.2 \\

\text { Delaware } & 11776 & 1681 & 50.8 & 4.2 & 116 & 7.7 \\

\text { Florida } & 8192 & 1227 & 40.3 & 4.4 & 2500 & 141 \\

\text { Georgla } & 9727 & 1675 & 45.5 & 7.2 & 1471 & 95.9 \\

\text { Hawail } & 8092 & 1207 & 44.5 & 5.7 & 185 & 11.2 \\

\text { Idaho } & 6883 & 1266 & 40.1 & 5.6 & 247 & 13.8 \\

\text { Illinois } & 11371 & 1871 & 51.5 & 6 & 2071 & 133.7 \\

\hline \text { Indiana } & 10131 & 1639 & 45 & \text { n/a } & 996 & 59.5 \\

\hline

\end{array}

$$

$$

\begin{array}{|c|c|c|c|c|c|c|}

\hline \text { State } & \text { ExpPP } & \text { ExpPC } & \text { TeaSal } & \text { \%Dropout } & \text { Enroll } & \text { Teachers } \\

\hline \text { Iowa } & 8054 & 1335 & 39.1 & 27 & 486 & \text { 34.6 } \\

\hline \text { Kansas } & 8568 & 1485 & 37.8 & 3.2 & 470 & 32.6 \\

\hline \text { Kentucky } & 7619 & 1218 & 39 & 4.6 & 654 & 38.8 \\

\hline \text { Louisiana } & 8081 & 1320 & 37.2 & 8.3 & 731 & 50.3 \\

\hline \text { Maine } & 10170 & 1618 & 38.5 & 3.1 & 206 & 16 \\

\hline \text { Maryland } & 9242 & 1460 & 49.7 & 4.1 & 861 & 55.5 \\

\hline \text { Massachusetts } & 11403 & 1728 & 50.8 & 3.4 & 973 & 71.9 \\

\hline \text { Michigan } & 10377 & 1788 & 53.6 & \mathrm{n} / \mathrm{a} & 1731 & 92.2 \\

\hline \text { Minnesota } & 10888 & 1844 & 44.7 & 4 & 851 & 52.6 \\

\hline \text { Mississippi } & 6555 & 1129 & 346 & 4.6 & 494 & 30.6 \\

\hline \text { Missouri } & 8188 & 1314 & 37.7 & 4.2 & 910 & 66.2 \\

\hline \text { Montana } & \pi 17 & 1288 & 35.8 & 4.2 & 152 & 10.4 \\

\hline \text { Nebraska } & 8067 & 1331 & 37.9 & 4 & 285 & 20.7 \\

\hline \text { Nerada } & 8793 & 1448 & 41.8 & 5.2 & 357 & 19.5 \\

\hline \text { New Hampshire } & 9193 & 1493 & 41.9 & 5.4 & 207 & 15 \\

\hline \text { New jersey } & 11822 & 1850 & 54.2 & 28 & 1342 & 105.4 \\

\hline \text { New Mexico } & 8722 & 1507 & 37 & 53 & 320 & 21 \\

\hline \text { New York } & 12918 & 1939 & 53 & 3.8 & 2872 & 226 \\

\hline \text { North Carolina } & 7925 & 1255 & 43.1 & 6.3 & 1315 & 86,1 \\

\hline \text { North Dakota } & 7538 & 1261 & 33.9 & 2.2 & 106 & 7.7 \\

\hline \text { Ohio } & 10028 & 1609 & 45.5 & 3.9 & 1831 & 122.2 \\

\hline \text { Okdahoma } & 6733 & 1200 & 34.9 & \$ 2 & 622 & 40.6 \\

\hline \text { Oregon } & 8490 & 1329 & 47.6 & 5.3 & 551 & 27.1 \\

\hline \text { Pennsylvania } & 8735 & 1291 & 51.4 & 36 & 1822 & 118.3 \\

\hline \text { Rhode Island } & 10348 & 1530 & 51.1 & 5 & 158 & 13.4 \\

\hline \text { South Carolina } & 9009 & 1517 & 40.4 & 3.3 & 691 & 44.5 \\

\hline \text { South Dakata } & 8117 & 1367 & 32.4 & 3.9 & 128 & 9 \\

\hline \text { Tennessee } & 6284 & 1004 & 39.7 & 43 & 925 & 58.3 \\

\hline \text { Tevas } & 8780 & 1682 & 40 & 4.2 & 4163 & 288.4 \\

\hline \text { Utah } & 6064 & 1268 & 38.3 & 3.7 & 485 & 21.6 \\

\hline \text { Verimont } & 11198 & 1834 & 41.5 & 4.7 & 101 & 8.8 \\

\hline \text { Virginia } & 7165 & 1143 & 43.2 & 3.5 & 1163 & 93.1 \\

\hline \text { Washington } & 9102 & 1514 & 45 & 4 & 1009 & 53.1 \\

\hline \text { West Virginia } & 9859 & 1546 & 38.5 & 4.2 & 283 & 19.9 \\

\hline \text { Wisconsin } & 10631 & 1718 & 42.8 & 2.3 & 879 & 60 \\

\hline \text { Wyoming } & 10943 & 1931 & 38.8 & 6.4 & 88 & 6,5 \\

\hline

\end{array}

$$

Display 3.130 Public school education by state in 2001. [Source: U.S. Census Bureau, Statistical Abstract of the United States, 2004–2005.]

Jerelyn Nevil

Numerade Educator