Statistical Inference — Homework #1

Suppose you have a set of X numbers with a mean of 85 and a standard deviation of 14. What linear transformation of the form Y = aX + b (i.e., what values of a and b) will transform these numbers into a new set of Y values so that they have a mean of 70 and a standard deviation of 12?

Given:

\begin{matrix} \overline{x} & = & 85 \\ σ & = & 14 \\ \overline{x'} & = & 70 \\ σ' & = & 12 \\ {x'}_{i} & = & m x_{i} + c \end{matrix}

Find:

(m, c)

The three pertinent characteristics for a set of numbers are: location, spread, and shape. These characteristics are affected only be certain linear transformations:

Location — the position in the number line of the set — affected by additive and multiplicative operations
Spread — the amount of the number line covered by the set — affected only by multiplicative operations
Shape — the relative distances between elements of the set — unafected by linear transformations

Standard deviation is a measure of spread, so:

\begin{matrix} σ' & = & m σ \\ m & = & \frac{σ'}{σ} \end{matrix}

Mean is a measure of location, so:

\begin{matrix} \overline{x'} & = & m \overline{x} + c \\ c & = & \overline{x'} - m \overline{x} \end{matrix}

Substituting in from the original values:

\begin{matrix} m & = & \frac{σ'}{σ} \\ = & \frac{12}{14} \\ = & \frac{6}{7} \end{matrix}

\begin{matrix} c & = & \overline{x'} - m \overline{x} \\ = & 70 - \frac{6}{7} (85) \\ = & \frac{-20}{7} \\ ≊ & -2.9 \end{matrix}

Suppose that, in the previous problem a student had a Z-score of +2.5. What would her raw score be after the grades were scaled to a mean of 70 and a standard deviation of 12?

Given:

\begin{matrix} \overline{x} & = & 70 \\ σ & = & 12 \\ z_{i} & = & 1.25 \end{matrix}

Find:

(x_{i})

The Z-score is the score standardized to a mean of 0 and standard deviation of 1.

\begin{matrix} z_{i} & = & \frac{x_{i} - \overline{x}}{σ} \\ x_{i} & = & z_{i} σ + \overline{x} \end{matrix}

Substituting in the given values this is:

x_{i} = z_{i} σ + \overline{x} = 2.5 (12) + 70 = 100

Suppose two columns of scores, labeled X and Y, have a covariance σ_x,y = 10, and that X and Y both have standard deviations of 12. What is the correlation between X and Y?

Given:

\begin{matrix} σ_{x, y} & = & 10 \\ σ_{x} & = & σ_{y} \\ = & 12 \end{matrix}

Find:

(ρ_{x, y})

This solution follows pretty simply from the definition:

\begin{matrix} ρ_{x, y} & = & \frac{σ_{x, y}}{σ_{x} σ_{y}} \\ = & \frac{10}{12^{2}} \\ = & \frac{5}{72} \\ ≊ & 0.069 \end{matrix}

A student received a grade of 78 in a course where the class average was 70, and the standard deviation 10. If the class distribution was approximately normal in shape, what was the student's approximate percentile rank?

Given:

\begin{matrix} x & = & 78 \\ \overline{x} & = & 70 \\ σ & = & 10 \end{matrix}

Find the percentile rank, $r (z)$ .

The first step is to normalize the numeric score as a z-score:

\begin{matrix} z & = & \frac{x - \overline{x}}{σ} \\ = & \frac{78 - 70}{10} \\ = & 0.8 \end{matrix}

Then using a z-table, it is possible to look up $r (0.8) ≊ 78.81%$ .

If SAT scores have a mean of 500 and a standard deviation of 100, approximately what percentage of students obtain SAT scores between 550 and 650?

Given:

\begin{matrix} \overline{x} & = & 500 \\ σ & = & 100 \\ x_{a} & = & 550 \\ x_{b} & = & 650 \end{matrix}

Find:

(\int_{x_{a}}^{x_{b}} \frac{1}{\sqrt{2 π} σ} e^{- \frac{1}{2} {(\frac{x - \overline{x}}{σ})}^{2}})

It is possible to solve that integral, but not necessary to get an approximation. An alternative is to convert the scores to z-scores and use a table lookup to find the areas:

\begin{matrix} z_{a} & = & 0.5 \\ z_{b} & = & 1.5 \end{matrix}

Then using a z-table, one can find:

\begin{matrix} r (z_{a}) & = & r (0.5) & ≊ & 0.6915 \\ r (z_{b}) & = & r (1.5) & ≊ & 0.9332 \end{matrix}

The quantity between them then is:

\begin{matrix} Δ_{a, b} & = & r (z_{b}) - r (z_{a}) & ≊ & 0.9332 - 0.6915 & = & 0.2417 & ≊ & 24% \end{matrix}

Suppose that the class distribution in a large course is almost exactly normal in shape. Joe got an 88 and had a percentile rank of 79.8, while Felicia got a 75 and had a percentile rank of 40.1. From this information, estimate the mean and standard deviation of the class distribution.

Given:

\begin{matrix} x_{a} & = & 88 \\ r (z_{a}) & = & 79.8% \\ x_{b} & = & 75 \\ r (z_{b}) & = & 40.1% \end{matrix}

Find:

(\overline{x}, σ)

First, use the z-table to get the z-scores for the given correlations:

\begin{matrix} r (0.83) & ≊ & 0.7967 & < & r (z_{a}) & = & 0.798 & < & r (0.84) & ≊ & 0.7995 \\ z_{a} & ≊ & 0.83 \\ r (-0.26) & ≊ & 0.3974 & < & r (z_{b}) & = & 0.401 & < & r (-0.25) & ≊ & 0.4013 \\ z_{b} & ≊ & -0.25 \end{matrix}

Given these, you have the equations for the z-scores which combine the score, mean and deviation. Since there are two of them that makes two equations in two unknowns and they just need to be subtracted from each other:

\begin{matrix} z_{a} - z_{b} & = & \frac{x_{a} - \overline{x}}{σ} - \frac{x_{b} - \overline{x}}{σ} & = & \frac{x_{a} - x_{b}}{σ} \end{matrix}

\begin{matrix} σ & = & \frac{x_{a} - x_{b}}{z_{a} - z_{b}} & ≊ & \frac{88 - 75}{0.83 - (-0.25)} & ≊ & 12 \end{matrix}

\begin{matrix} \overline{x} & = & - (σ z_{a} - x_{a}) & ≊ & - (12 (0.83) - 88) & ≊ & 78 \end{matrix}

You have 9 numbers with a mean of 10 and a variance of 100. If you add a 10th number to this group, and this number is 15, what will be the mean and variance of the new list of numbers?

Given:

\begin{matrix} \overline{x} & = & 10 \\ σ_{x} & = & 100 \\ |x| & = & 9 \\ y_{n} & = & 15 \\ y & = & x ⋃ \{y_{n}\} \end{matrix}

Find:

(\overline{y}, σ_{y})

Finding the new mean is straightforward:

\begin{matrix} \overline{y} & = & \frac{\sum_{i = 1}^{|y|} y_{i}}{|y|} & = & \frac{\sum_{i = 1}^{|x|} x_{i} + y_{n}}{|x| + 1} & = & \frac{|x| \overline{x} + y_{n}}{|x| + 1} & = & \frac{(9) (10) + 15}{(9) + 1} & = & 10.5 \end{matrix}

The variance is a little trickier. First a transformation of $σ_{x}^{2}$ is needed:

\begin{matrix} σ_{x}^{2} & = & \frac{\sum {(x_{i} - \overline{x})}^{2}}{|x| - 1} \\ \sum {(x_{i} - \overline{x})}^{2} & = & σ_{x}^{2} (|x| - 1) \end{matrix}

This equation can then be used to find the new deviation:

\begin{matrix} σ_{y}^{2} & = & \frac{\sum_{i = 1}^{|y|} {(y_{i} - \overline{y})}^{2}}{|y| - 1} \\ = & \frac{\sum_{i = 1}^{|x|} {(x_{i} - \overline{x})}^{2} + {(y_{n} - \overline{y})}^{2}}{|x|} \\ = & \frac{σ_{x}^{2} (|x| - 1) + {(y_{n} - \overline{y})}^{2}}{|x|} \\ = & \frac{(100) (9 - 1) + {(15 - 10.5)}^{2}}{9} \\ = & \frac{3281}{36} \\ ≊ & 91 \end{matrix}

Suppose that SAT verbal scores for students at Vanderbilt have a mean of 670 and a standard deviation of 70, and that first year GPA at Vanderbilt correlates .57 with SAT verbal score. Suppose further that first year GPA has a mean of 3.3, a standard deviation of 0.7, and that SAT and GPA have a bivariate normal distribution. Sandy got a 595 on the SAT, and a GPA of 3.62. What is the percentile rank of her GPA among people with SAT's as high as hers?

Given:

\begin{matrix} \overline{x} & = & 670 \\ σ_{x} & = & 70 \\ ρ_{x, y} & = & 0.57 \\ \overline{y} & = & 3.3 \\ σ_{y} & = & 0.7 \\ x_{i} & = & 595 \\ y_{i} & = & 3.62 \end{matrix}

Find:

(r ({x_{i} |}_{y_{i}}))

Finding the answer to this question requires finding the conditional mean and deviation. Those equations are:

\begin{matrix} μ_{{y |}_{x = a}} & = & m x + b \\ m & = & ρ (\frac{σ_{y}}{σ_{x}}) \\ b & = & \overline{y} - b \overline{x} \\ σ_{{y |}_{x = a}}^{2} & = & (1 - ρ_{x, y}^{2}) σ_{y}^{2} \end{matrix}

These equations reduce significantly when dealing with z-scores where $\overline{x} = \overline{y} = 0$ and $σ_{x} = σ_{y} = 1$ .

\begin{matrix} μ_{{z_{y} |}_{z_{x} = a}} & = & ρ_{x, y} z_{x} \\ σ_{{z_{y} |}_{z_{x} = a}} & = & \sqrt{1 - ρ_{x, y}^{2}} \end{matrix}

So the question becomes, what is the rank of $z_{y}$ given $μ_{{z_{y} |}_{z_{x} = a}}$ and $σ_{{z_{y} |}_{z_{x} = a}}$ ? This requires one more normalization to a standard mean and then it's a simple table lookup.

\begin{matrix} z_{x_{i}} & = & \frac{x_{i} - \overline{x}}{σ_{y}} & = & - \frac{15}{14} & ≊ & -1.07 \\ μ_{{z_{y} |}_{z_{x}}} & = & ρ_{x, y} z_{x} & ≊ & .57 (-1.07) & ≊ & -0.610 \\ σ_{{z_{y} |}_{z_{x}}} & = & \sqrt{1 - ρ_{x, y}^{2}} & = & \sqrt{1 - {.57}^{2}} & ≊ & 0.823 \\ z_{y_{i}} & = & \frac{y_{i} - \overline{y}}{σ_{y}} & = & \frac{16}{35} & ≊ & 0.457 \\ {z_{y} |}_{z_{x}} & = & \frac{z_{y} - μ_{{z_{y} |}_{z_{x}}}}{σ_{{z_{y} |}_{z_{x}}}} & ≊ & \frac{0.457 - (-0.610)}{0.823} & ≊ & 1.296 \\ r ({x_{i} |}_{y_{i}}) & = & r ({z_{y} |}_{z_{x}}) & ≊ & r (1.296) & ≊ & 90.3% \end{matrix}

If you sample 100 students from a population with a mean of 120 and a standard deviation of 15 on measure X, what would you expect the highest score in your sample to be?

The method of figuring this out is beyond the scope of this class. To derive an answer, one simply uses table 5.1 on pg. 74 of Glass and Hopkins. It gives a value of 5 for the expected range with a sample size of 100. This 5 is for a deviation of 1, so the range where σ = 15 is (5)(15) = 75. This range extends on either side of the mean, so the maximum is:

\begin{matrix} μ + \frac{σ R_{100}}{2} & = & 120 + \frac{(15) (5)}{2} & = & 157.5 \end{matrix}

The solution is not what I did initially, which is, given:

\begin{matrix} \overline{x} & = & 120 \\ σ & = & 15 \\ |x| & = & 100 \end{matrix}

Find:

(max (x_{i}))

Using the equation from #13 for the maximum z-score for a given sample size, we find:

\begin{matrix} max (z_{i}) & = & \frac{n - 1}{\sqrt{n}} & = & \frac{99}{\sqrt{100}} & = & 9.9 \end{matrix}

Given that, it is striaghtforward to find the score:

\begin{matrix} x & = & σ z + \overline{x} & = & 15 (9.9) + 120 & = & 268.5 \end{matrix}

If X and Y have standard deviations of 10 and 5, respectively, what is the largest possible value for the covariance between X and Y?

Given:

\begin{matrix} σ_{x} & = & 10 \\ σ_{y} & = & 5 \end{matrix}

Find:

(max (σ_{x, y}))

σ increases as ρ increases and ρ is -1 ≤ ρ ≤ 1, so:

\begin{matrix} max (ρ_{x, y}) & = & \frac{max (σ_{x, y})}{σ_{x} σ_{y}} \\ max (σ_{x, y}) & = & σ_{x} σ_{y} max (ρ_{x, y}) \\ = & (1) (10) (5) \\ = & 50 \end{matrix}

Suppose you have 5 scores: 1,5,8,9,11. Evaluate the following expressions

Given:

\begin{matrix} x & = & \{1, 5, 8, 9, 11\} \end{matrix}

Find:

$\begin{matrix} \sum_{i = 2}^{5} \frac{x_{i}}{x_{i - 1}} & = & \frac{5}{1} + \frac{8}{5} + \frac{9}{8} + \frac{11}{9} & = & \frac{3221}{360} \end{matrix}$
$\begin{matrix} \sum_{i = 1}^{5} \sqrt{x_{i}} & = & \sqrt{1} + \sqrt{5} + \sqrt{8} + \sqrt{9} + \sqrt{11} & ≊ & 12.38 \end{matrix}$
$\begin{matrix} {(\sum_{i = 1}^{5} x_{i})}^{2} & = & {(1+ 5+ 8+ 9+ 11)}^{2} & = & 1156 \end{matrix}$

The correlation between height and weight is .65. Describe, in your own words, what would happen to this correlation coefficient if you changed all heights from inches to centimeters, and why.

It would not change. The correlation is a combination of scores which have been scaled to a uniform mean and standard deviation. Changing between inches and centimeters is a linear transform that will be canceled out in computing the correlation.

If you are in a class of 127 students, what is the highest possible Z-score you can get?

Given:

\begin{matrix} |x| & = & 127 \end{matrix}

Find:

(lim_{σ \to ∞} z)

This derivation relies on a transformation of the variance when dealing with z-scores: (Recall that $σ_{z} = 1$ and $μ_{z} = 0$ .)

\begin{matrix} σ_{z}^{2} & = & \frac{\sum {(z_{i} - \overline{z})}^{2}}{n - 1} \\ 1 & = & \frac{\sum {(z_{i} - 0)}^{2}}{n - 1} \\ \sum {z_{i}}^{2} & = & n - 1 \end{matrix}

\begin{matrix} \sum {z_{i}}^{2} & = & n - 1 \end{matrix}

Assume then that to take $σ \to ∞$ for a set $x$ , you take the element $x_{n} \to ∞$ . As you do this the z-scores for the elements $x_{1}$ to $x_{n - 1}$ will approach some uniform value, $- k$ . Since the mean for the set of z-scores is by definition 0, the z-score for $x_{n}$ has to be $(n - 1) k$ . From the previous equation then:

\begin{matrix} n - 1 & = & \sum {z_{i}}^{2} \\ = & \sum_{i = 1}^{n - 1} {z_{i}}^{2} + {z_{n}}^{2} \\ = & \sum_{i = 1}^{n - 1} {(- k)}^{2} + {((n - 1) k)}^{2} \\ = & (n - 1) k^{2} + {(n - 1)}^{2} k^{2} & = & n - 1 \\ k & = & \sqrt{\frac{n - 1}{(n - 1) + {(n - 1)}^{2}}} \\ = & \frac{1}{\sqrt{n}} \end{matrix}

So, substituting back into the original equation:

\begin{matrix} lim_{σ \to ∞} z & = & (n - 1) k & = & \frac{n - 1}{\sqrt{n}} \end{matrix}

The original values can then be substituted in to get:

\begin{matrix} lim_{σ \to ∞} z & = & \frac{126}{\sqrt{127}} & ≊ & 11.18 \end{matrix}

PSY 310: Statistical Inference

Will Holcomb

Homework Problems #1

Due: Fri., 28 September 2007