For all problems, unless otherwise indicated, assume a Type 1 error rate = .05.

  1. Consider an experiment that you anticipate conducting or that you think somebody should conduct — an experiment that strikes you as a good idea in your primary area of research or an area of research that you are familiar with. Try to come up with an experiment in which at least one factor is experimentally manipulated (i.e., under the experimenter’s control). Now consider this experimentally manipulated factor or factors.

    One of my research interests is recommender systems which attempt to make predictions about human preferences. In general these types of experiments tend to only involve random factors because the research interest is to identify general trends about types of people however it is possible to imagine experiments that could involve fixed factors.

    Imagine that I'm the marketer producing a website to sample tracks off an artists unreleased CD to drive up hype for the release. It is my belief that particular tracks will be more appealing at different times of day. (Perhaps people will like a melodic track earlier in the morning and a more energetic one in the evening.) I decide to run an experiment to test my hypothesis by randomly giving users a different tracks at different times of day and recording the length of time that they listen as a measurement of their interest.

    My null hypotheses will be:

    • Main Effect for Time — The time of day does not affect how much users like tracks from this CD.
    • Main Effect for Track — Users like all tracks from this CD equally.
    • Interaction Effect for Time and Track — No particular track is more or less interesting at a particular time of day.
    1. Should these factors optimally be manipulated on a between-subjects or within-subjects basis? Discuss the reasons for your choice.

      Steps should be taken to assure that the users being considered are all viewing the site for the first time. It is important experimentally that these tracks come from an unreleased album which listeners will have no previous experience with. Hearing a song for the second time will have an undetermined effect on the users preferences.

      Similarly a listener hearing a different song from an album where a song was previously liked or disliked will have an effect on preference.

      A different but also interesting experiment would be a within subjects experiment that looks at if a particular ordering of tracks could drive up overall satisfaction with the album.

    2. Should these factors be considered fixed or random? Discuss the reasons for your choice.

      The experiment was structured to look at only a specific album rather than something more general such as genre versus time of day so that the tracks would be a fixed effect.

      The time of day is a somewhat more difficult question. The experimental goal was to divide the samples into socially defined periods: morning, afternoon, evening, etc. The boundaries of these periods are ill defined in part because the sets are fuzzy rather than discrete. People might disagree on whether 11:30am is morning or afternoon, but there will be much less disagreement about 8am.

      For this experiment though the boundaries are chosen at random within a bound chosen based on survey data, so the factor is random.

  2. The table below shows cerebrospinal fluid (CSF) measurements of a neurotransmitter metabolite (LZ34) taken from 12 volunteer subjects at 4-hour intervals (5 measurements in all). The researcher is interested in the pattern of changes in LZ34 over time.

    Time of Measurement
    Subject ID12am4am8am12pm4pm
    0113.323.926.724.722.0
    0212.622.628.321.923.0
    0323.924.124.022.314.0
    0433.343.046.242.833.9
    0524.023.835.932.923.4
    0613.210.214.612.610.5
    0723.033.036.932.531.9
    0824.126.928.427.423.6
    0934.137.248.043.442.0
    1043.757.160.175.241.9
    1153.753.660.552.348.0
    1223.233.244.939.628.3
    1. Using SAS or hand calculations, test the null hypothesis that the population means across each of the 5 times of measurement are equal. Conduct and report both univariate (i.e., mixed model) and multivariate repeated measures tests of this hypothesis. Report the test statistics that you deem most appropriate (i.e., the most appropriate test statistic for the univariate test). In each case, indicate the observed F’s, the critical values (interpolate if you need to), and your decision concerning rejection of the null hypothesis.

      The univariate analysis from SAS produces:

      The multivariate analysis from SAS produces:

      Statistic                        Value    F Value    Num DF    Den DF    Pr > F
      Wilks' Lambda               0.13267694      13.07         4         8    0.0014
      Pillai's Trace              0.86732306      13.07         4         8    0.0014
      Hotelling-Lawley Trace      6.53710458      13.07         4         8    0.0014
      Roy's Greatest Root         6.53710458      13.07         4         8    0.0014

      There is a very small probability that the five means are the same. This means there is statistically significant support to reject the null hypothesis and support the experimenter's belief that there is a change in the levels of LZ34 over time.

    2. Regarding the univariate approach, what is the expected value of:

      The following equations are intentionally only symbolic. The expected value cannot be calculated, only the actual value for the mean squares for this data.

      1. The Mean Square Time in the above design?

        EMStime = σε2 + σtime×subject2 + nθtime2

        So the expected value is a combination of the sampling error, the interaction effect and the actual effect main effect of time.

        Note that under the null hypothesis, σtime2 =0 , the expected value is:

        E MStimeH0 = σε2 + σtime×subject2
      2. The Mean Square Time × Subjects in the above design?

        EMStime×subject = σε2 + σtime×subject2

        Note that for repeated measures designs using an additive model there is no test for the interaction effect because there is no expected value that only estimates σε2 . The mean square within which was used in between subjects designs no longer exists because each cell is only one measurement.

      3. The Mean Square Subjects in the above design?

        EMSsubject = σε2 + subject σsubject2

        Note that for repeated measures designs using an additive model there is no test for the main effect of subjects. This is largely unimportant because the researcher is rarely asking a question only of these specific subjects.

    3. Why does SAS not include a statistical test of the Mean Square Subjects in the output?

      Because there is no quantity whose expected value is σε2 , it is not possible to form an F-ratio to test the statistical likelihood that there was a meaningful difference between subjects.

    4. What would be the minimum and maximum possible values of epsilon for a repeated measures study assessing changes across these 5 time points?

      An assumption in the univariate analysis of repeated measures data is the uniformity of variances for the differences between all pairs of levels — a quantity known as sphericity. ε is measure of the degree to which sphericity is violated for a particular dataset. In general:

      0 ε 1

      With 1 being no violation of sphericity and 0 being a complete violation. For a repeated factor with J levels, however, the bounds are more narrow:

      1J-1 ε 1

      So, for this data those bounds are:

      15-1 = .25 ε 1
    5. Using SAS (and any supplementary calculations you might need), conduct 4 pairwise comparisons for these data that compare the means for each of the last 4 time points to the 12 AM mean. Assume that these contrasts are planned and two-tailed and that you want to preserve the familywise error rate at α =.05 by using the Bonferroni procedure. Indicate the observed values of your test statistics (t’s or F’s), the critical values, and your conclusions.

      For this data, SAS produces:

      Contrast Variable: time_2
        Source                     DF    Type III SS    Mean Square   F Value   Pr > F
        Mean                        1    368.5208333    368.5208333     11.78   0.0056
        Error                      11    344.0291667     31.2753788
      
      Contrast Variable: time_3
        Source                     DF    Type III SS    Mean Square   F Value   Pr > F
        Mean                        1    1460.813333    1460.813333     34.47   0.0001
        Error                      11     466.226667      42.384242
      
      Contrast Variable: time_4
        Source                     DF    Type III SS    Mean Square   F Value   Pr > F
        Mean                        1    927.5208333    927.5208333     11.19   0.0065
        Error                      11    912.1091667     82.9190152
      
      Contrast Variable: time_5
        Source                     DF    Type III SS    Mean Square   F Value   Pr > F
        Mean                        1     34.6800000     34.6800000      0.83   0.3814
        Error                      11    458.8000000     41.7090909

      Using Bonferroni to correct for the familywise error rate would simply use a per-comparison error rate αpc= αfwk= .054= 0.0125 . For this data, the correction does not change which tests are statistically significant.

      There is a significant difference between the measurements at 4am, 8am and 12pm and the measurement at 12am. There is not a significant difference between the measurement at 4pm and the measurement at 12am.

    6. Let’s say that the researcher hypothesizes that the pattern of LZ34 over time is an inverted U. Specifically, he predicts that the values will rise from 12 AM to 8 AM, peak at 8 AM, and then begin falling progressively across the 8 AM to 4 PM time interval. What would be an appropriate set of contrast coefficients for testing this specific hypothesis?

      There are an infinite number of second degree polynomial curves that could be fit to three points. The steepness of the curve will affect the placement of the intermediate (4am and 12pm) points. Assume that the researcher wants the curve with the minimum rate of change over time (minimum second derivative).

      Another alternative is that the researcher could want to model the change as a simple linear increase and simple linear decrease. These numbers are easier to calculate:

      x1 = x5 x2 = x4 = x3+x1 2 x1+ x2+ x3+ x4+ x5 = 0 2x1+ 2x2+ x3 = 0 2x1+ 2 x3+x1 2 + x3 = 0 2x1+ x3+x1 + x3 = 0 3x1+ 2x3 = 0 x3 = -3x12

      A set of coefficients meeting this ratio is: -1143214-1 . Note that these coefficients represent neurochemically unlikely possibility that there is an immediate and sharp change at 8am.

    7. Using SAS or hand calculations, conduct a trend analysis testing whether:

      1. the hypothesized pattern of change occurs;

        For a second-degree polynomial, SAS produces the following output:

        Contrast Variable: time_2
          Source                     DF    Type III SS    Mean Square   F Value   Pr > F
          Mean                        1    933.4285714    933.4285714     30.85   0.0002
          Error                      11    332.8157143     30.2559740
      2. other patterns of change that could be captured by trend analyses also occur. Consistent with tradition (admittedly not great justification), let’s assume there is no need to Bonferroni correct here (each trend can be evaluated at a per comparison alpha rate = .05). For each trend, indicate the observed values of your test statistics (t’s or F’s), the critical values, and your conclusions.

        The SAS output for other degree polynomials is:

        Contrast Variable: time_1
          Source                     DF    Type III SS    Mean Square   F Value   Pr > F
          Mean                        1     53.0670000     53.0670000      2.68   0.1296
          Error                      11    217.4190000     19.7653636
        
        Contrast Variable: time_3
          Source                     DF    Type III SS    Mean Square   F Value   Pr > F
          Mean                        1     27.6480000     27.6480000      1.52   0.2429
          Error                      11    199.6960000     18.1541818
        
        Contrast Variable: time_4
          Source                     DF    Type III SS    Mean Square   F Value   Pr > F
          Mean                        1     19.1407619     19.1407619      1.18   0.3014
          Error                      11    179.0249524     16.2749957

        The second degree polynomial is the most likely and the only statistically significant fit. This supports the researchers model of how the levels change.

    8. For the comparisons that you conducted in steps e and g, did you use pooled or contrast-specific error terms? Why?

      SAS uses contrast specific error term. Violations in sphericity will affect the reliability of the pooled error term. Using a contrast specific term reduces the size of the data determining the error term, but it also reduces the effects of sphericity violations.

    9. Let’s say that a research assistant analyzed the data above. The R.A. mistakenly believed that a between-groups design was used (i.e., not realizing that each subject contributed 5 data points). That is, the R.A. believed 60 subjects were used, with each subject contributing one and only one data point. Thus, the R.A. performed a one-way between-groups ANOVA. In general, what would be the consequences for Type 1 and Type 2 errors of analyzing the data this way instead of the correct way?

      The primary effect is to violate the assumtion of independence between groups nearly as completely as possible. This violation should have a significant effect on type I error rates. There will be trends across the time periods coming from the same subjects being sampled that will be incorrectly interpreted as trends within the population.