Statistical Inference — Practice Exam #1

Compute the sample mean $\overline{X}$ for the following 5 numbers: 1, 4, 2, 3, 4
1. 2.8 ✓
2. 3.00
3. 2.24
4. 1.4
5. None of the above are correct
$\begin{matrix} \overline{x} & = & \frac{\sum_{i = 1}^{n} x_{i}}{n} \end{matrix}$
Compute the sample variance $σ^{2}$ for the following 5 numbers: 3, 4, 2, 4, 5
1. 1.04
2. 1.5
3. 1.3 ✓
4. 1.10
5. None of the above answers are correct
$\begin{matrix} σ_{x}^{2} & = & \frac{\sum_{i = 1}^{|x|} {(x_{i} - \overline{x})}^{2}}{|x| - 1} \end{matrix}$
Suppose you have 10 numbers and have computed the mean to be 6.0. You then discover that the last number in the data was entered incorrectly. It was entered as 9.0 when it should have been 8.0. If you replace the incorrect value (9.0) with the correct one (8.0), and recompute the mean, you will obtain a new mean of:
1. 6.9
2. 4.9
3. It is impossible to determine
4. 5.9 ✓
A series of numbers that will give the original mean are:
$x_{1} = x_{2} = … = x_{n - 2} = \overline{x}$ $x_{n - 1} = 2 \overline{x} - x_{n}$
The reason for this is to get the average of $x_{n - 1}$ and $x_{n}$ to equal $\overline{x}$ .

$x_{n}$ from this set can then be replaced with ${x'}_{n}$ to compute the new average.
$\overline{x'} = \frac{(n - 2) \overline{x} + 2 \overline{x} - x_{n} + {x'}_{n}}{n}$
Suppose two different scores, each with a minimum of zero and a maximum possible value of 25, are summed to produce a "total raw admission score" that is used to determine university admissions. Suppose that score A has a mean of $μ_{1}$ and a standard deviation of $σ$ , and score B has a mean of $μ_{2}$ and a standard deviation of $2 σ$ . Joseph has a Z-score of 1 on Score A and a Z-score of -1 on Score B, while Marilyn has a Z-score of -1 on Score A and a Z-score of 1 on Score B. Which student will have the higher total raw admission score?
1. Joseph
2. Both will have the same mark.
3. Marilyn ✓
You have 4 numbers: a, b, c, d. If you multiply them all by 4, the mean of the resulting data set will be
1. $16 (a+ b+ c+ d)$
2. $4 (a+ b+ c+ d)$
3. $(a+ b+ c+ d)$ ✓
4. $\frac{(a+ b+ c+ d)}{4}$
In general, if:
${x'}_{i} = k x_{i}$
Then:
$\overline{x'} = \frac{\sum k x_{i}}{n} = \frac{k \sum x_{i}}{n}$
In this particular situation, $k = n$ , so they cancel.
You have 10 numbers with a sample mean of 9.0 and a sample variance of 11.0. You discover that the last number in the list was recorded as 10.0 when it should have been recorded as 14.0. If you correct your error and correctly recompute the sample variance, what value will you obtain?
1. 11.0
2. 14.988
3. 13.489 ✓
4. 12.14
5. None of the above answers are correct
Consider the following data array:

5 3 1 2

5 9 9 8

5 7 15 2

4 7 6 1

Compute:
Σ i=2 4 Σ j=1 3 Xij
1. 59
2. 72
3. 66
4. 67 ✓
5. None of the above.
For square matrices, indices range from 1 to n_i and 1 to n_j and:
$X_{i, j} = X_{row, column}$
You have a sample of N observations, and you transform them to Z-scores. Assuming the original scores are not all equal, the quantity
Σ i=1 N Zi2
will always be equal to:
1. N - 1 ✓
2. 0
3. 1
4. N
5. Cannot be determined from the information given
This derivation relies on a transformation of the variance when dealing with z-scores: (Recall that $σ_{z} = 1$ and $μ_{z} = 0$ .)
$\begin{matrix} σ_{z}^{2} & = & \frac{\sum {(z_{i} - μ_{z})}^{2}}{n - 1} \\ 1 & = & \frac{\sum {(z_{i} - 0)}^{2}}{n - 1} \\ \sum {z_{i}}^{2} & = & n - 1 \end{matrix}$
You have the following set of data: 1,5,3,7,8. Find the median.
1. 5 ✓
2. 4
3. 7
4. 3
For the median, the elements must be in ascending or descending order, so the set is 1,3,5,7,8. Since the size of the set is odd, the median is simply the middle value.
Which of the following is not true of the median?
1. Multiplying all values by 5 multiplies the median by 5.
2. The value for combined groups cannot be determined from the sample sizes and group values of the individual groups
3. It is sensitive to outliers ✓
4. It is the middle value in a distribution
Outliers can affect the mean, but the median only takes the number of values into account and the central value.
Widgets are weighed on a scale that is accurate to the nearest 100 grams. A group of 50 widgets is weighed. The heaviest widget is recorded as 9000 grams, the lightest at 4000 grams. The inclusive range is the difference between the upper real limit and lower real limit for the data. What is the inclusive range for these data?
1. 5100 ✓
2. 5001
3. 5200
4. 5050
If a widget weighs more than 3950 grams, it will register as 4000. Similiarly, if it weighs less than 9050 it will register as 9000, so the range for x is 3950 < x < 9050.
You have a set of data that have a mean of 58 and a standard deviation of 12. You wish them to have a mean of 67 and a standard deviation of 11, while retaining the shape of the present distribution. What values of a and b in the linear transformation formula Y = aX + b will produce a new set of data with the desired mean and standard deviation?
1. a = 0.916 67, b = 13.833 ✓
2. a = 1.833 3, b = 13.833
3. a = 1.090 9, b = 12.833
4. a = 0.916 67, b = 27.667
5. a = 1.916 7, b = -13.833
The scale factor will be:
$a = \frac{σ'}{σ}$
The shift is found by scaling the mean and comparing it to the needed mean:
$b = μ' - μ a$
Suppose you have evenly spaced numbers, and the distance between adjacent numbers is k. It is well known that, if N = 3, the sample mean of the 3 numbers is the middle value and their sample variance is $k^{2}$ . Suppose N = 5. Find an expression for the sample variance of the 5 numbers.
1. $5 k^{2}$
2. $2 k^{2}$
3. $k^{2}$
4. $2.5 k^{2}$ ✓
5. None of the above answers are correct
First, consider the sum of a sequence of numbers:
$\begin{matrix} \sum_{i = 1}^{k} i & = & 1 + 2 + … + (k - 1) + k \\ = & (1 + k) + (2 + (k - 1)) + … + (\frac{k - 1}{2} + k - (\frac{k - 1}{2} - 1)) + \frac{k + 1}{2} \\ = & \frac{k - 1}{2} (k + 1) + \frac{k + 1}{2} \\ = & \frac{k^{2} + k}{2} \end{matrix}$
The same basic logic produces:
$\begin{matrix} \sum_{i = 1}^{k} i^{2} & = & \frac{k^{3} + 2 k^{2}}{2} \end{matrix}$
This can be used in the expansion of n elements evenly spaced by k units:
$\begin{matrix} \sum_{i = 1}^{n} x_{i} & = & x_{1} + (x_{1} + k) + (x_{1} + 2 k) + … + (x_{1} + (n - 1) k) \\ = & n x_{1} + k + 2 k + … + (n - 1) k \\ = & n x_{1} + k \sum_{i = 1}^{n - 1} i \\ = & n x_{1} + k \frac{{(n - 1)}^{2} + (n - 1)}{2} \\ = & n x_{1} + k \frac{n^{2} - n}{2} \end{matrix}$
And again, through a similar process:
$\begin{matrix} \sum_{i = 1}^{n} x_{i}^{2} & = & n x_{1}^{2} + k x_{1} \sum_{i = 1}^{n - 1} i + k^{2} \sum_{i = 1}^{n - 1} i^{2} \\ = & n x_{1}^{2} + k x_{1} (\frac{n^{2} - n}{2}) + k^{2} (\frac{n^{3} - n^{2} + n}{2}) \end{matrix}$
Consider then the mean:
$\begin{matrix} μ & = & \frac{\sum_{i = 1}^{n} x_{i}}{n} \\ = & x_{1} + k \frac{n - 1}{2} \end{matrix}$
The variance then is:
$\begin{matrix} σ^{2} & = & \frac{\sum {(x_{i} - μ)}^{2}}{n - 1} \\ = & \frac{\sum ({x_{i}}^{2} - μ x_{i} + μ^{2})}{n - 1} \\ = & \frac{\sum {x_{i}}^{2} - \sum μ x_{i} + \sum μ^{2}}{n - 1} \\ = & \frac{\sum {x_{i}}^{2} - μ \sum x_{i} + n μ^{2}}{n - 1} \end{matrix}$
ToDo: Finish this derivation.
The probability distribution shown below is
1. Symmetric and Bimodal
2. Positively Skewed
3. None of the above answers are correct
4. Negatively Skewed
5. Symmetric and Unimodal ✓
For the following data array, compute ${\overline{X}}_{• 3}$ .

5 4 1 2

5 3 4 9

5 2 20 1

4 4 8 2
1. 11.0
2. 7.25
3. 6.1875
4. 8.25 ✓
${\overline{X}}_{• j} = \frac{Σ_{i = 1}^{n} X_{i, j}}{n}$
For the following data, compute $X_{• •}$ .

7 7 1 2

7 3 7 14

5 2 49 0

4 4 14 5
1. 32.75
2. 8.1875
3. 131 ✓
4. 130
5. 129
For the following frequency distribution of number of children for psychology faculty members at a major Eastern institution

X f

3 3

2 6

1 3

0 8

Compute the sample mean
1. 4.8
2. 1.2 ✓
3. 1.0
4. 2.2
Group A is composed of a sample of 10 observations, and has a sample mean of 3 and a sample variance of 30. Group B is composed of a sample of 10 observations, and has a sample mean of 5 and a sample variance of 10. If A and B are combined to form one large group, what will be the variance of this combined group?
1. 27.11
2. 24.0
3. 20.0 ✓
4. 4.4721
5. None of the above answers are correct
Which of the following is not true?
1. $P_{50} = \frac{P_{75} + P_{25}}{2}$ ✓
2. $P_{50} = Q_{2}$
3. The 3rd decile is equal to $P_{30}$
4. $P_{50}$ = Median
Jamal took an exam where the class mean was $\overline{X} = 76$ and the class standard deviation was $σ = 10$ . What Z-score must Jamal exceed to have a grade that exceeds 75?
1. -0.31623
2. 0.4
3. -0.1 ✓
4. -0.2
5. None of the above answers are correct.
In the frequency distribution below, what is the cumulative relative frequency of the value X = 1?

X f

3 7

2 7

1 7

0 4
1. 72
2. 0.83
3. 0.44 ✓
4. 0.16
5. None of the above answers are correct
The cumulative relative frequency is:
$\frac{Σ_{i = 1}^{k} X_{i}}{Σ_{i = 1}^{n} X_{i}}$
In a normal distribution with a mean of 60 and a standard deviation of 20, which of the following scores is at the 10th percentile?
1. 34.369 ✓
2. 44.369
3. 43.855
4. 32.307
5. None of the above answers are correct
From the table in Glass & Hopkins on page 615, r(-1.282) = .1. -1.282(20) + 60 = 34.36.
IQ scores have a distribution that is approximately normal in shape, with a mean of 100 and a standard deviation of 15. What percentage of scores is at or above an IQ of 116?
1. 12.464
2. 14.306 ✓
3. 15.737
4. 16.355
5. None of the above answers are correct
(116 - 100) / 15 ≊ 1.0666
Group A has a mean of 0 and a standard deviation of 1. Group B has a mean of 0 and a standard deviation of 1.11. The populations have a normal distribution. Consider the proportion of scores at or above 3.0 in Group B, and the proportion of scores at or above 3.0 for Group A? What is the ratio of these proportions?
1. 1.2738
2. 3 5665
3. 2.5475 ✓
4. 3.057
You have a set of 11 numbers with a mean of 18. If you add an additional number to the original group, and that new number has a value of 23, what will be the mean of the new set of 12 numbers?
1. 20.258
2. 22.1
3. 12.892
4. 18.417 ✓
5. None of the above answers are correct.
(23 - 18) / 12 ≊ 0.417
Consider two columns of numbers, X and Y. The sum of cross-products of deviation scores of the numbers is equal to:
SCP= ∑ xi-x_ yi-y_
SCP is always equal to which of the following quantities:
1. $\sum (x_{i} - \overline{x}) y_{i}$
2. $\sum x_{i} (y_{i} - \overline{y})$
3. $\sum x_{i} y_{i} - \frac{\sum x_{i} \sum x_{i}}{n}$
4. All 3 of the above are equal to SCP ✓
The table below, adapted from Glass and Hopkins, Table 5.1, shows the expected value of the range for random samples of size N from a normal distribution, when σ = 1. Suppose heights of men in the general population have a mean of 69 and a standard deviation of 2.9, and have a distribution that is very closely approximated by a normal distribution. What is the expected value of the tallest man in a random sample of size 100?

N E(Range)

2 1.1

5 2.3

10 3.1

20 3.7

50 4.5

100 5.0

200 5.5

500 6.1

1000 6.5
1. 78.425
2. 68.625
3. 83.875
4. 76.25 ✓
5. None of the above answers are correct
IQ scores have a distribution that is approximately normal in shape, with a mean of 100 and a standard deviation of 15 in the general population. Assuming a normal distribution is a good approximations, what proportion of the general population has IQ scores betweeen 81. 0 and 107. 0?
1. 0.67699
2. 0.57699 ✓
3. 0.63469
4. 0.47699
5. None of the above answers are correct
Fred took an exam and got a Z-score of 1.76. For which of the following exam metrics will Fred get the highest raw score?
1. Mean = 75.2, Standard Deviation = 11.3
2. Mean = 79.9, Standard Deviation = 10.1 ✓
3. Mean = 70.3, Standard Deviation = 11.0
You have 10 numbers with a mean of 11.0. You first add 5.0 to all the numbers, then divide them all by 3.0. What will the mean of the numbers be when you are finished?
1. 5.4333
2. 5.3333 ✓
3. 8.6667
4. 5.2333
5. None of the above answers are correct
When we compare the stem-leaf diagram with a standard frequency histogram plot, we see:
1. The stem-leaf diagram allows one to recover accurate estimates of summary statistics like the mean and variance more easily. ✓
2. The stem-leaf diagram does not allow one to see the basic shape of the distribution.
3. The frequency histogram is easier to construct with a typewriter
4. The stem-leaf diagram throws away information about the individual numbers, while the frequency histogram does not.
Four distributions of 25 numbers have the identical shape, except that the X distribution has a mean of 100 and a standard deviation of 15, while the Y distribution has a mean of 20 and a standard deviation of 10, the W distribution a mean of 50, and a standard deviation of 20, and the K distribution a mean of 500 and a standard deviation of 100. Which of the following scores has the highest percentile rank in its respective distribution?
1. A score of 70 in the W distribution.
2. A score of 30 in the Y distribution.
3. A score of 650 in the K distribution. ✓
4. A score of 119.5 in the X distribution.
After the first statistics exam, the professor posts the grades, and you compute your Z-score. It is +1.4. A day later, the professor announces that he forgot to scale the grades, and that he is linearly rescaling the grades so that the mean changes from 60 to 78, and the standard deviation changes from 10 to 12. What will your Z-score become?
1. 1.6
2. 1.4 ✓
3. 1.2
4. 1.0
5. It is impossible to determine from the information given
The maximum value of the regression coefficient b in the linear regression equation y = bx + c is
1. 1. 00
2. $ρ_{x, y}$
3. the ratio $\frac{σ_{x}}{σ_{y}}$
4. the ratio $\frac{σ_{y}}{σ_{x}}$ ✓
Compute the correlation coefficient for the following data:

1 4

2 5

4 4

5 3

2 6
1. -0.53403
2. -0.1268
3. 0
4. -0.61382 ✓
5. -0.4329
X has a variance of 40. Y has a variance of 40. X and Y have a covariance of 21.6. Find the variance of X - Y.
1. 77.28
2. 36.8 ✓
3. 33.12
4. 123.2
Given three uncorrelated variables X, Y, and Z, all with means of 15 and variances of 100. Consider the linear combination W = 7X + 11Y + 9Z. What would be its mean?
1. 405 ✓
2. 19683/50
3. 2700
4. 810
5. None of the above answers are correct
Given three uncorrelated variables X, Y, and Z, all with means of 15 and variances of 100. Consider the linear combination W = 8X + 12Y - 10Z. What would be its variance?
1. 30799
2. 30805
3. 30800 ✓
4. 30792
5. None of the above.
X has a variance of 40. Y has a variance of 54. X and Y have a covariance of 21.379. Find the correlation ρ_Y,X.
1. 0.46 ✓
2. 0.414
3. 9.8976 x 10^-3
4. 0.506
Suppose that you computed the covariance and the variances for 2 variables with N in the denominator of each of the 3 formulas, instead of N - 1, and you computed the correlation coefficient with these "new" definitions of sample variance and covariance. What effect would that have on the value of a positive correlation coefficient?
1. It would make it lower
2. It would stay the same ✓
3. It is impossible to say
4. It would make it higher
The correlation ρ_X,Y between X and Y is 0.67. The variance of X is 105. The variance of Y is 181.0. What is the variance () of the predicted scores Y when Y is predicted from Y X?
1. 92. 626
2. 70. 688
3. 86. 938
4. 81. 251 ✓
5. None of the above answers are correct
In a standard linear regression setup, you are predicting Y from X. The data are r X,Y = 0. 48, s 2 = 154. 0, s 2 = 121. 0, Y ∙ = 13. 0, X ∙ = 22. 0. In the linear regression X Y equation Y = bX + c for predicting Y from X, what are the values of b and c respectively?
1. b = -0.54151, c = 1.0867
2. b = 0.61091, c = -0.44
3. b = 0.54151, c = 1.0867 ✓
4. b = 0.54151, c = -1.0867
Melinda has a Z-score of 1. 0 on a standardized entrance exam used by a large university. The correlation between performance on this exam and first-year GPA at the university is 0. 67. At the university, first-year GPA’s average 2. 8 with a standard deviation of 0. 77. What is the predicted average GPA for students with the same score as Melinda on the entrance exam?
1. 3.6475
2. 3.9791
3. 3.3159 ✓
4. 2.3211
5. None of the above answers are correct.
Suppose W = bX + a, where b = σ_Y,X, the covariance between Y and X, and a is an unknown constant. In that case, what is the covariance of W and Y?
1. $σ_{y, x}^{2}$ ✓
2. $2 σ_{y, x}^{2}$
3. $2 σ_{y, x}^{2} + b a$
4. The answer cannot be determined from the information provided.
Which of the following expressions is a correct formula for the variance of W when W is obtained from X and Y from the formula W = 4X + 3Y?
1. s 2 = 4s 2 + s 2 + 14s X,Y W X Y 2 2
2. s W = 16s X + 9s 2 + 24s X,Y Y 2 2 2 ✓
3. s W = 4s X + 9s Y
4. s 2 = 2s 2 + 9s 2 + 12s X,Y W X Y
Suppose a large representative group of students in grade 7 were given a standardized intelligence test (X), and that the test was readministered (Y) in grade 8. On each occasion the mean was 100 and the standard deviation was 15. The correlation in this bivariate normal distribution was .80, and the linear regression error scores had a standard deviation of σ_E = 9. What percentage of examinees who scored 145 at grade 7 would be expected to do at least as well on the grade 8 test?
1. 50%
2. 34%
3. 16% ✓
4. 5%
5. It is impossible to determine from the information given

PSY 310: Statistical Inference

Will Holcomb

Practice Exam #1

Due: Wed., 17 October 2007

N	E(Range)
2	1.1
5	2.3
10	3.1
20	3.7
50	4.5
100	5.0
200	5.5
500	6.1
1000	6.5