1. Suppose you have a set of X numbers with a mean of 85 and a standard deviation of 14. What linear transformation of the form Y = aX + b (i.e., what values of a and b) will transform these numbers into a new set of Y values so that they have a mean of 70 and a standard deviation of 12?

    Given:

    x_ = 85 σ = 14 x'_ = 70 σ' = 12 x'i = mxi+c

    Find:

    mc

    The three pertinent characteristics for a set of numbers are: location, spread, and shape. These characteristics are affected only be certain linear transformations:

    • Location — the position in the number line of the set — affected by additive and multiplicative operations
    • Spread — the amount of the number line covered by the set — affected only by multiplicative operations
    • Shape — the relative distances between elements of the set — unafected by linear transformations

    Standard deviation is a measure of spread, so:

    σ' = mσ m = σ'σ

    Mean is a measure of location, so:

    x'_ = mx_+c c = x'_ - mx_

    Substituting in from the original values:

    m = σ'σ = 1214 = 67 c = x'_ - mx_ = 70-6785 = -207 -2.9
  2. Suppose that, in the previous problem a student had a Z-score of +2.5. What would her raw score be after the grades were scaled to a mean of 70 and a standard deviation of 12?

    Given:

    x_ = 70 σ = 12 zi = 1.25

    Find:

    xi

    The Z-score is the score standardized to a mean of 0 and standard deviation of 1.

    zi = xi -x_σ xi = ziσ+x_

    Substituting in the given values this is:

    xi= ziσ+x_ =2.512+70 =100
  3. Suppose two columns of scores, labeled X and Y, have a covariance σx,y = 10, and that X and Y both have standard deviations of 12. What is the correlation between X and Y?

    Given:

    σxy = 10 σx = σy = 12

    Find:

    ρxy

    This solution follows pretty simply from the definition:

    ρxy = σxy σxσy = 10122 = 572 0.069
  4. A student received a grade of 78 in a course where the class average was 70, and the standard deviation 10. If the class distribution was approximately normal in shape, what was the student's approximate percentile rank?

    Given:

    x = 78 x_ = 70 σ = 10

    Find the percentile rank, rz.

    The first step is to normalize the numeric score as a z-score:

    z = x-x_ σ = 78-7010 = 0.8

    Then using a z-table, it is possible to look up r0.878.81%.

  5. If SAT scores have a mean of 500 and a standard deviation of 100, approximately what percentage of students obtain SAT scores between 550 and 650?

    Given:

    x_ = 500 σ = 100 xa = 550 xb = 650

    Find:

    xa xb 12πσ e -12 x-x_σ 2

    It is possible to solve that integral, but not necessary to get an approximation. An alternative is to convert the scores to z-scores and use a table lookup to find the areas:

    za = 0.5 zb = 1.5

    Then using a z-table, one can find:

    rza = r0.5 0.6915 rzb = r1.5 0.9332

    The quantity between them then is:

    Δab = rzb -rza 0.9332-0.6915 = 0.2417 24%
  6. Suppose that the class distribution in a large course is almost exactly normal in shape. Joe got an 88 and had a percentile rank of 79.8, while Felicia got a 75 and had a percentile rank of 40.1. From this information, estimate the mean and standard deviation of the class distribution.

    Given:

    xa = 88 rza = 79.8% xb = 75 rzb = 40.1%

    Find:

    x_σ

    First, use the z-table to get the z-scores for the given correlations:

    r0.83 0.7967 < rza = 0.798 < r0.84 0.7995 za 0.83 r-0.26 0.3974 < rzb = 0.401 < r-0.25 0.4013 zb -0.25

    Given these, you have the equations for the z-scores which combine the score, mean and deviation. Since there are two of them that makes two equations in two unknowns and they just need to be subtracted from each other:

    za-zb = xa-x_ σ - xb-x_ σ = xa-xb σ σ = xa-xb za-zb 88-75 0.83--0.25 12 x_ = - σza-xa - 120.83-88 78
  7. You have 9 numbers with a mean of 10 and a variance of 100. If you add a 10th number to this group, and this number is 15, what will be the mean and variance of the new list of numbers?

    Given:

    x_ = 10 σx = 100 x = 9 yn = 15 y = xyn

    Find:

    y_σy

    Finding the new mean is straightforward:

    y_ = i=1 y yi y = i=1 x xi+yn x+1 = x x_+yn x+1 = 910+15 9+1 = 10.5

    The variance is a little trickier. First a transformation of σx2 is needed:

    σx2 = xi-x_ 2 x-1 xi-x_ 2 = σx2 x-1

    This equation can then be used to find the new deviation:

    σy2 = i=1 y yi-y_2 y-1 = i=1 x xi-x_2 + yn-y_2 x = σx2 x-1 + yn-y_ 2 x = 100 9-1 + 15-10.5 2 9 = 328136 91
  8. Suppose that SAT verbal scores for students at Vanderbilt have a mean of 670 and a standard deviation of 70, and that first year GPA at Vanderbilt correlates .57 with SAT verbal score. Suppose further that first year GPA has a mean of 3.3, a standard deviation of 0.7, and that SAT and GPA have a bivariate normal distribution. Sandy got a 595 on the SAT, and a GPA of 3.62. What is the percentile rank of her GPA among people with SAT's as high as hers?

    Given:

    x_ = 670 σx = 70 ρxy = 0.57 y_ = 3.3 σy = 0.7 xi = 595 yi = 3.62

    Find:

    r xi| yi

    Finding the answer to this question requires finding the conditional mean and deviation. Those equations are:

    μy|x=a = mx+b m = ρσyσx b = y_-bx_ σy|x=a2 = 1- ρxy2 σy2

    These equations reduce significantly when dealing with z-scores where x_=y_=0 and σx=σy=1.

    μzy|zx=a = ρxyzx σzy|zx=a = 1- ρxy2

    So the question becomes, what is the rank of zy given μzy| zx=a and σzy| zx=a ? This requires one more normalization to a standard mean and then it's a simple table lookup.

    zxi = xi-x_σy = -1514 -1.07 μzy|zx = ρxyzx .57-1.07 -0.610 σzy|zx = 1- ρxy2 = 1-.572 0.823 zyi = yi-y_σy = 1635 0.457 zy|zx = zy- μzy|zx σzy|zx 0.457--0.6100.823 1.296 r xi| yi = rzy|zx r1.296 90.3%
  9. If you sample 100 students from a population with a mean of 120 and a standard deviation of 15 on measure X, what would you expect the highest score in your sample to be?

    The method of figuring this out is beyond the scope of this class. To derive an answer, one simply uses table 5.1 on pg. 74 of Glass and Hopkins. It gives a value of 5 for the expected range with a sample size of 100. This 5 is for a deviation of 1, so the range where σ = 15 is (5)(15) = 75. This range extends on either side of the mean, so the maximum is:

    μ+σR1002 = 120+1552 = 157.5

    The solution is not what I did initially, which is, given:

    x_ = 120 σ = 15 x = 100

    Find:

    maxxi

    Using the equation from #13 for the maximum z-score for a given sample size, we find:

    maxzi = n-1n = 99100 = 9.9

    Given that, it is striaghtforward to find the score:

    x = σz+x_ = 159.9+120 = 268.5
  10. If X and Y have standard deviations of 10 and 5, respectively, what is the largest possible value for the covariance between X and Y?

    Given:

    σx = 10 σy = 5

    Find:

    maxσxy

    σ increases as ρ increases and ρ is -1 ≤ ρ ≤ 1, so:

    maxρxy = maxσxy σxσy maxσxy = σxσy maxρxy = 1105 = 50
  11. Suppose you have 5 scores: 1,5,8,9,11. Evaluate the following expressions

    Given:

    x = 1, 5, 8, 9, 11

    Find:

    1. i=2 5 xixi-1 = 51+ 85+ 98+ 119 = 3221360
    2. i=1 5 xi = 1+ 5+ 8+ 9+ 11 12.38
    3. i=1 5 xi 2 = 158911 2 = 1156
  12. The correlation between height and weight is .65. Describe, in your own words, what would happen to this correlation coefficient if you changed all heights from inches to centimeters, and why.

    It would not change. The correlation is a combination of scores which have been scaled to a uniform mean and standard deviation. Changing between inches and centimeters is a linear transform that will be canceled out in computing the correlation.

  13. If you are in a class of 127 students, what is the highest possible Z-score you can get?

    Given:

    x = 127

    Find:

    limσz

    This derivation relies on a transformation of the variance when dealing with z-scores: (Recall that σz=1 and μz=0.)

    σz2 = zi-z_ 2 n-1 1 = zi-0 2 n-1 zi2 = n-1 zi2 = n-1

    Assume then that to take σ for a set x, you take the element xn. As you do this the z-scores for the elements x1 to xn-1 will approach some uniform value, -k. Since the mean for the set of z-scores is by definition 0, the z-score for xn has to be n-1k. From the previous equation then:

    n-1 = zi2 = i=1 n-1 zi2 + zn2 = i=1 n-1 -k 2 + n-1k 2 = n-1 k2+ n-12 k2 = n-1 k = n-1 n-1 + n-12 = 1n

    So, substituting back into the original equation:

    limσz = n-1k = n-1n

    The original values can then be substituted in to get:

    limσz = 126127 11.18