For all problems, unless otherwise indicated, assume a Type 1 error rate = .05.
Consider an experiment that you anticipate conducting or that you think somebody should conduct — an experiment that strikes you as a good idea in your primary area of research or an area of research that you are familiar with. Try to come up with an experiment in which at least one factor is experimentally manipulated (i.e., under the experimenter’s control). Now consider this experimentally manipulated factor or factors.
One of my research interests is recommender systems which attempt to make predictions about human preferences. In general these types of experiments tend to only involve random factors because the research interest is to identify general trends about types of people however it is possible to imagine experiments that could involve fixed factors.
Imagine that I'm the marketer producing a website to sample tracks off an artists unreleased CD to drive up hype for the release. It is my belief that particular tracks will be more appealing at different times of day. (Perhaps people will like a melodic track earlier in the morning and a more energetic one in the evening.) I decide to run an experiment to test my hypothesis by randomly giving users a different tracks at different times of day and recording the length of time that they listen as a measurement of their interest.
My null hypotheses will be:
Should these factors optimally be manipulated on a between-subjects or within-subjects basis? Discuss the reasons for your choice.
Steps should be taken to assure that the users being considered are all viewing the site for the first time. It is important experimentally that these tracks come from an unreleased album which listeners will have no previous experience with. Hearing a song for the second time will have an undetermined effect on the users preferences.
Similarly a listener hearing a different song from an album where a song was previously liked or disliked will have an effect on preference.
A different but also interesting experiment would be a within subjects experiment that looks at if a particular ordering of tracks could drive up overall satisfaction with the album.
Should these factors be considered fixed or random? Discuss the reasons for your choice.
The experiment was structured to look at only a specific album rather than something more general such as genre versus time of day so that the tracks would be a fixed effect.
The time of day is a somewhat more difficult question. The experimental goal was to divide the samples into socially defined periods: morning, afternoon, evening, etc. The boundaries of these periods are ill defined in part because the sets are fuzzy rather than discrete. People might disagree on whether 11:30am is morning or afternoon, but there will be much less disagreement about 8am.
For this experiment though the boundaries are chosen at random within a bound chosen based on survey data, so the factor is random.
The table below shows cerebrospinal fluid (CSF) measurements of a neurotransmitter metabolite (LZ34) taken from 12 volunteer subjects at 4-hour intervals (5 measurements in all). The researcher is interested in the pattern of changes in LZ34 over time.
Subject ID | 12am | 4am | 8am | 12pm | 4pm |
---|---|---|---|---|---|
01 | 13.3 | 23.9 | 26.7 | 24.7 | 22.0 |
02 | 12.6 | 22.6 | 28.3 | 21.9 | 23.0 |
03 | 23.9 | 24.1 | 24.0 | 22.3 | 14.0 |
04 | 33.3 | 43.0 | 46.2 | 42.8 | 33.9 |
05 | 24.0 | 23.8 | 35.9 | 32.9 | 23.4 |
06 | 13.2 | 10.2 | 14.6 | 12.6 | 10.5 |
07 | 23.0 | 33.0 | 36.9 | 32.5 | 31.9 |
08 | 24.1 | 26.9 | 28.4 | 27.4 | 23.6 |
09 | 34.1 | 37.2 | 48.0 | 43.4 | 42.0 |
10 | 43.7 | 57.1 | 60.1 | 75.2 | 41.9 |
11 | 53.7 | 53.6 | 60.5 | 52.3 | 48.0 |
12 | 23.2 | 33.2 | 44.9 | 39.6 | 28.3 |
Using SAS or hand calculations, test the null hypothesis that the population means across each of the 5 times of measurement are equal. Conduct and report both univariate (i.e., mixed model) and multivariate repeated measures tests of this hypothesis. Report the test statistics that you deem most appropriate (i.e., the most appropriate test statistic for the univariate test). In each case, indicate the observed F’s, the critical values (interpolate if you need to), and your decision concerning rejection of the null hypothesis.
The univariate analysis from SAS produces:
The multivariate analysis from SAS produces:
Statistic Value F Value Num DF Den DF Pr > F Wilks' Lambda 0.13267694 13.07 4 8 0.0014 Pillai's Trace 0.86732306 13.07 4 8 0.0014 Hotelling-Lawley Trace 6.53710458 13.07 4 8 0.0014 Roy's Greatest Root 6.53710458 13.07 4 8 0.0014
There is a very small probability that the five means are the same. This means there is statistically significant support to reject the null hypothesis and support the experimenter's belief that there is a change in the levels of LZ34 over time.
Regarding the univariate approach, what is the expected value of:
The following equations are intentionally only symbolic. The expected value cannot be calculated, only the actual value for the mean squares for this data.
The Mean Square Time in the above design?
So the expected value is a combination of the sampling error, the interaction effect and the actual effect main effect of time.
Note that under the null hypothesis, , the expected value is:
The Mean Square Time × Subjects in the above design?
Note that for repeated measures designs using an additive model there is no test for the interaction effect because there is no expected value that only estimates . The mean square within which was used in between subjects designs no longer exists because each cell is only one measurement.
The Mean Square Subjects in the above design?
Note that for repeated measures designs using an additive model there is no test for the main effect of subjects. This is largely unimportant because the researcher is rarely asking a question only of these specific subjects.
Why does SAS not include a statistical test of the Mean Square Subjects in the output?
Because there is no quantity whose expected value is , it is not possible to form an F-ratio to test the statistical likelihood that there was a meaningful difference between subjects.
What would be the minimum and maximum possible values of epsilon for a repeated measures study assessing changes across these 5 time points?
An assumption in the univariate analysis of repeated measures data is the uniformity of variances for the differences between all pairs of levels — a quantity known as sphericity. ε is measure of the degree to which sphericity is violated for a particular dataset. In general:
With 1 being no violation of sphericity and 0 being a complete violation. For a repeated factor with J levels, however, the bounds are more narrow:
So, for this data those bounds are:
Using SAS (and any supplementary calculations you might need), conduct 4 pairwise comparisons for these data that compare the means for each of the last 4 time points to the 12 AM mean. Assume that these contrasts are planned and two-tailed and that you want to preserve the familywise error rate at α =.05 by using the Bonferroni procedure. Indicate the observed values of your test statistics (t’s or F’s), the critical values, and your conclusions.
For this data, SAS produces:
Contrast Variable: time_2 Source DF Type III SS Mean Square F Value Pr > F Mean 1 368.5208333 368.5208333 11.78 0.0056 Error 11 344.0291667 31.2753788 Contrast Variable: time_3 Source DF Type III SS Mean Square F Value Pr > F Mean 1 1460.813333 1460.813333 34.47 0.0001 Error 11 466.226667 42.384242 Contrast Variable: time_4 Source DF Type III SS Mean Square F Value Pr > F Mean 1 927.5208333 927.5208333 11.19 0.0065 Error 11 912.1091667 82.9190152 Contrast Variable: time_5 Source DF Type III SS Mean Square F Value Pr > F Mean 1 34.6800000 34.6800000 0.83 0.3814 Error 11 458.8000000 41.7090909
Using Bonferroni to correct for the familywise error rate would simply use a per-comparison error rate . For this data, the correction does not change which tests are statistically significant.
There is a significant difference between the measurements at 4am, 8am and 12pm and the measurement at 12am. There is not a significant difference between the measurement at 4pm and the measurement at 12am.
Let’s say that the researcher hypothesizes that the pattern of LZ34 over time is an inverted U. Specifically, he predicts that the values will rise from 12 AM to 8 AM, peak at 8 AM, and then begin falling progressively across the 8 AM to 4 PM time interval. What would be an appropriate set of contrast coefficients for testing this specific hypothesis?
There are an infinite number of second degree polynomial curves that could be fit to three points. The steepness of the curve will affect the placement of the intermediate (4am and 12pm) points. Assume that the researcher wants the curve with the minimum rate of change over time (minimum second derivative).
Another alternative is that the researcher could want to model the change as a simple linear increase and simple linear decrease. These numbers are easier to calculate:
A set of coefficients meeting this ratio is: . Note that these coefficients represent neurochemically unlikely possibility that there is an immediate and sharp change at 8am.
Using SAS or hand calculations, conduct a trend analysis testing whether:
the hypothesized pattern of change occurs;
For a second-degree polynomial, SAS produces the following output:
Contrast Variable: time_2 Source DF Type III SS Mean Square F Value Pr > F Mean 1 933.4285714 933.4285714 30.85 0.0002 Error 11 332.8157143 30.2559740
other patterns of change that could be captured by trend analyses also occur. Consistent with tradition (admittedly not great justification), let’s assume there is no need to Bonferroni correct here (each trend can be evaluated at a per comparison alpha rate = .05). For each trend, indicate the observed values of your test statistics (t’s or F’s), the critical values, and your conclusions.
The SAS output for other degree polynomials is:
Contrast Variable: time_1 Source DF Type III SS Mean Square F Value Pr > F Mean 1 53.0670000 53.0670000 2.68 0.1296 Error 11 217.4190000 19.7653636 Contrast Variable: time_3 Source DF Type III SS Mean Square F Value Pr > F Mean 1 27.6480000 27.6480000 1.52 0.2429 Error 11 199.6960000 18.1541818 Contrast Variable: time_4 Source DF Type III SS Mean Square F Value Pr > F Mean 1 19.1407619 19.1407619 1.18 0.3014 Error 11 179.0249524 16.2749957
The second degree polynomial is the most likely and the only statistically significant fit. This supports the researchers model of how the levels change.
For the comparisons that you conducted in steps e and g, did you use pooled or contrast-specific error terms? Why?
SAS uses contrast specific error term. Violations in sphericity will affect the reliability of the pooled error term. Using a contrast specific term reduces the size of the data determining the error term, but it also reduces the effects of sphericity violations.
Let’s say that a research assistant analyzed the data above. The R.A. mistakenly believed that a between-groups design was used (i.e., not realizing that each subject contributed 5 data points). That is, the R.A. believed 60 subjects were used, with each subject contributing one and only one data point. Thus, the R.A. performed a one-way between-groups ANOVA. In general, what would be the consequences for Type 1 and Type 2 errors of analyzing the data this way instead of the correct way?
The primary effect is to violate the assumtion of independence between groups nearly as completely as possible. This violation should have a significant effect on type I error rates. There will be trends across the time periods coming from the same subjects being sampled that will be incorrectly interpreted as trends within the population.