For all problems, unless otherwise indicated, assume a Type 1 error rate = .05.
A cognitive psychologist is studying the effects of different stimulus-exposure durations on visual search performance. He is also interested in whether such effects differ in middle-aged vs. older individuals. There are 3 levels of the Exposure factor (short, medium, long) and 2 levels of the Age factor (middle-age/old-age). There are 8 participants per group. The means and variances (unbiased estimates of population variance) on the visual search performance measure for the 6 cells of this design are as follows (higher scores on this measure indicate better performance):
Exposure | |||||
---|---|---|---|---|---|
Short | Medium | Long | Marginal | ||
Age | Middle | 29 | 33 | 44 | 35.333 |
Old | 25 | 29 | 36 | 30 | |
Marginal | 27 | 31 | 40 | 32.666 |
s2 (unbiased) | Exposure Factor | |||
---|---|---|---|---|
Short | Medium | Long | ||
Age Factor | Middle | 105 | 102 | 108 |
Old | 110 | 94 | 111 |
Fill in the marginal means in the first table above.
This assignment will use a new notational shorthand to represent summations and averages:
The table then for two factors, A and B, represents the following information about the set of samples where sample yi,j,k is the ith samples from the jth group of factor A and kth group of factor B. Frequently, in the data table, the subscript for i is omitted.
Factor B | ||||||||
---|---|---|---|---|---|---|---|---|
B1 | B2 | … | Bk | … | Bnk | Marginal | ||
Factor A | A1 | … | … | |||||
A2 | … | … | ||||||
⋮ | ⋮ | ⋮ | ⋱ | ⋮ | ⋱ | ⋮ | ⋮ | |
Aj | … | … | ||||||
⋮ | ⋮ | ⋮ | ⋱ | ⋮ | ⋱ | ⋮ | ⋮ | |
Anj | … | … | ||||||
Marginal | … | … |
Find the effects corresponding to each of the marginal means of the Age factor. That is, if we considered Age to be "Factor A" here and Exposure to be "Factor B", find a1 and a2. Include the formula that you are using to determine the effects. Verify that the effects sum to 0.
Consider two population quantities:
To say that there is a "main effect" for age is to say that these two quantities vary and age in some way plays a part in the response to visual stimulus.
Note that this is not the only way for age to play a part. These two means could be equal, but with completely different distributions of scores. This situation would be captured by interaction effects.
If there is an actual population level effect for age, it is represented for each level of the factor (e.g. "Middle" and "Old") by a deviation from the grand mean:
An unbiased estimate of this quantity is simply derived from the sample means:
For the factor "Age," these quantities are:
The set of effects for a factor will always sum to 0 (since they are simply the set of unweighted deviations from the mean). This can be easily verified with these two:
Find the effects corresponding to each of the marginal means of the Exposure factor. Include the formula that you are using to determine the effects. Verify that the effects sum to 0.
The effects of the row-wise factor, called "Factor A," is represented with a variable α. The effects for the column-wise factor, called "Factor B," are conceptually identical, but represented with a variable β. Likewise, the estimate of the effect of factor A is a and the estimate of the effect of factor B is b.
For exposure, the estimates of these effects are:
As expected, these effects also sum to zero:
Find the interaction effects, that is each of the (ab)j,k terms. Include the formula that you are using to determine the effects. Verify that the effects sum to 0 within each column and each row.
The main effects model described above measures the relationship of factorwise groups to the entire population. For example, the effects of being old as the main effects for age (αOld) or the effect of short exposure (βShort). What the main effects model doesn't capture is the situation where being both old with short exposure, as opposed to say old and long exposure, has a particular effect on the outcome.
If exposure time and age are completely unrelated then to predict something about an old short-exposure person, one would only need the effect of being old and the effect of short exposure. If, however, there is some connection between these two characteristics then the factors are said to "interact." In order to properly make an estimate about old short-exposure people, one would then need to know an additional "interaction effect": (αβ)Old,Short. The complete model is represented mathematically as:
The estimate of this term is (ab)j,k, and for this particular set of data, those estimates are:
Exposure | |||||
---|---|---|---|---|---|
Short | Medium | Long | Sum | ||
Age | Middle | -0.666 | -0.666 | 1.333 | 0 |
Old | 0.666 | 0.666 | -1.333 | 0 | |
Sum | 0 | 0 | 0 | 0 |
Note that these values sum to zero both row-wise and column-wise.
Express the mean of the old age/long exposure cell as a linear combination of the grand mean, the two marginal effects, and the interaction effect and verify that this model reproduces the cell mean.
Compute the following sums of squares:
SSAge
SSExposure
SS(Age × Exposure)
SSWithin (remember that the contribution of each cell to the SSW can be computed as (nj,k - 1)sj,k2)
Conduct a two-way ANOVA (alpha = .05, 2-tailed) on these data testing for all relevant main effects and interactions. Show your results in an ANOVA source table. What do you conclude about the effects of Exposure and Age on visual search performance here?
The same form is used for all the results and produces the following table:
Term | DoF | Sum of Squares | Mean Square | F Value | P(>F) |
---|---|---|---|---|---|
age | 1 | 341.33333 | 341.33333 | 3.2507937 | 0.9214333 |
exposure | 2 | 1418.66667 | 709.33333 | 6.7555556 | 0.9971407 |
age:exposure | 2 | 42.66667 | 21.33333 | 0.2031746 | 0.1830669 |
Within | 42 | 4410 | 105 |
These results only support a significant main effect for exposure. Looking at the data, this means that longer exposure to visual stimulus data increased performance.
Consider the data from Maxwell and Delaney problem #9 on pp. 346-347. These data are from a study assessing the effects of 3 different treatments for phobias (Desensitization, Implosion, and Insight) and 2 differing levels of fear severity (mild and severe) on Behavioral Avoidance test scores among a group of phobics.
You can load in the data in R using the following statements:
snake.phobia.n = 8 data.frame(avoidance = c(16, 13, 12, 15, 11, 12, 14, 13, 16, 10, 11, 12, 6, 8, 14, 12, 14, 16, 17, 15, 13, 17, 15, 16, 13, 7, 3, 10, 4, 2, 4, 9, 15, 15, 12, 14, 13, 11, 11, 12, 15, 10, 11, 7, 5, 12, 6, 8), therapy = rep(c("Desensitization", "Implosion", "Insight"), each = snake.phobia.n * 2), phobia = rep(c("Mild", "Severe"), each = snake.phobia.n))
Use R for the questions below that involve computations. At some points you may have to go beyond the R print-out and do hand calculations.
Conduct a two-way ANOVA on these data. Indicate whether each effect tested is statistically significant.
To start off with a general understanding of the data, a means table is useful:
Type of Therapy | Marginal | ||||
---|---|---|---|---|---|
Desensitization | Implosion | Insight | |||
Degree of Phobia | Mild | 13.250 | 15.375 | 12.875 | 13.833 |
Severe | 11.125 | 6.500 | 9.250 | 8.958 | |
Marginal | 12.187 | 10.937 | 11.0625 | 11.396 |
The higher the score, the less a person in afraid of snakes.
Looking at the data, we can guess at a general model for the types of therapies:
The answers to some of these questions depend on the variances:
Type of Therapy | Marginal | ||||
---|---|---|---|---|---|
Desensitization | Implosion | Insight | |||
Degree of Phobia | Mild | 2.786 | 1.982 | 2.696 | 3.536 |
Severe | 10.125 | 15.143 | 11.357 | 14.911 | |
Marginal | 7.229 | 28.996 | 10.062 | 7.348 |
R produces the following output for the ANOVA:
Df Sum Sq Mean Sq F value Pr(>F) therapy 2 15.167 7.583 1.0320 0.365152 phobia 1 285.188 285.188 38.8104 1.855e-07 therapy:phobia 2 100.500 50.250 6.8384 0.002686 Residuals 42 308.625 7.348
There are three experimental questions addressed here:
Main Effect for Therapy: Does the type of therapy have an effect on the effectiveness of treating a phobia of snakes?
The data suggests there is a 36.5% chance that the differences in the scores from the differences in therapies were solely due to sampling error and not to a genuine difference in the mean. So the researcher would not be able to reject the hypothesis that there is a difference with greater than a 5% probability.
Main Effect for Phobia: Does the severity of a phobia affect how strongly an individual will react to therapy?
There is an extremely high probability (99.99999%) that the difference in the responses was from a genuine difference and not simply sampling error. So, the experimenter is able to reject the hypothesis and support the experimental belief that severity of phobia affects response to therapy. Because there are three types of therapy however, it is not possible to say simply with the ANOVA if all three differ or if one differs from the other two. To delve to that depth would require additional analysis; perhaps a contrast.
Interaction Effect for Therapy × Phobia: Are certain therapies more effective for people who have stronger or weaker phobias?
There is a 99.8% chance this data represents a significant interaction effect between the type therapy and strength of phobia. As before, it will take further analysis to determine specifically how this interaction takes place.
Conduct a Tukey HSD test assessing whether each of the three pairwise comparisons among the marginal means of the Therapy factor are significantly different. Set the familywise alpha for this set of contrasts at .05. Indicate the minimum mean difference necessary for a comparison to be judged significant and indicate whether each comparison is statistically significant.
R produces the following output:
Tukey multiple comparisons of means 95% family-wise confidence level factor levels have been ordered Fit: aov(formula = avoidance ~ 1 + therapy + phobia + therapy:phobia, data = snake.phobia.frame) $therapy diff lwr upr p adj Insight-Implosion 0.125 -2.203422 2.453422 0.9906676 Desensitization-Implosion 1.250 -1.078422 3.578422 0.4007739 Desensitization-Insight 1.125 -1.203422 3.453422 0.4751470
All of the confidence intervals contain 0, so there is no significance in any of the interactions.
The minimum mean difference confirms these findings:
In general, would the Tukey HSD test be the optimal multiple comparison procedure for the planned pairwise comparisons you did in part b? If not, what would be the optimal procedure?
For any set of contrasts that have a maximum degree of freedom of two, the Fisher LSD is permissible and will be the most powerful analysis. For this particular data the Fisher LSD did not allow for any contrasts on the type of therapy because the main effect for Therapy from the ANOVA was not significant.
In general the Tukey should not succeed when the Fisher LSD fails regardless of the degrees of freedom. The reason for not using the Fisher LSD is not that other tests are more powerful, but that it does not adequately control for type I errors. The ANOVA has already said that there is no reason to believe that any of these means differ. To then use the Tukey to determine which of these non-differing means is different will, not surprisingly, generally (always?) fail.
Using the Bonferroni procedure, conduct 3 planned interaction comparisons assessing whether the difference between Desensitization and Implosion is conditional upon severity, whether the difference between Desensitization and Insight is conditional upon severity, and whether the difference between Implosion and Insight is conditional upon severity. Set the familywise alpha level for this set of contrasts at .05. Indicate your relevant critical values t or F values for each comparison and whether each comparison is statistically significant.
The question of "is the effects of desensitization and implosion therapies conditional upon the severity phobia?" can be thought of mathematically as, "is there a difference between the difference of desensitization and implosion for mild phobics and the difference of those factors for severe phobics?" Or, mathematically:
To make these contrasts easier to represent in a table, the combination of factors will be listed as a 1×6 matrix rather than a 2×3:
Desen:Mild | Desen:Sev | Implo:Mild | Implo:Sev | Insight:Mild | Insight:Sev | |
---|---|---|---|---|---|---|
desensitization v. implosion is conditional on severity | -1 | 1 | 1 | -1 | 0 | 0 |
desensitization v. insight is conditional on severity | -1 | 1 | 0 | 0 | 1 | -1 |
implosion v. insight is conditional on severity | 0 | 0 | -1 | 1 | 1 | -1 |
Each of these comparisons is done using a two-dimensional general form of the unbiased estimate:
The distribution of the contrast remains the same:
For the contrast of "the effectiveness of desensitization v. implosion is conditional on severity of phobia" these calculations are:
Using this process, R produces the following result:
P(< t) | P(< F) | |
---|---|---|
desensitization v. implosion is conditional on severity | 0.9995 | 0.9990 |
desensitization v. insight is conditional on severity | 0.7809 | 0.5617 |
implosion v. insight is conditional on severity | 0.0045 | 0.9910 |
Recall that = . Also note that the t-test is two-tailed whereas the F-test is one-tailed. This is why the implosion v. insight comparison has an extremely small t-value probability and extremely high F-value probability.
Because multiple comparisons are being performed it is necessary to correct for multiplicity. Recall that the Bonferroni simply reduces the per-comparison error rate sufficiently to bring the familywise error rate to the desired level. Specifically for c comparisons:
This means that a test will be considered significant if the cumulative probability for the F-score is greater than 0.9833. Two of these contrasts are significant: "desensitization v. implosion is conditional on severity" and "implosion v. insight is conditional on severity."
Given the tests performed so far, what specific experimental beliefs are supported?
Consider the contrast "the effectiveness of desensitization v. implosion is conditional on severity of phobia" while examining the marginal means:
Type of Therapy | Marginal | ||||
---|---|---|---|---|---|
Desensitization | Implosion | Insight | |||
Degree of Phobia | Mild | 13.250 | 15.375 | 12.875 | 13.833 |
Severe | 11.125 | 6.500 | 9.250 | 8.958 | |
Marginal | 12.187 | 10.937 | 11.0625 | 11.396 |
All that the contrast says is that the degree of phobia affects the difference between these two therapies. For these particular numbers we can look at Desensitization:Severe (11.125) and Implosion:Severe (6.500) and say that there is likely a significant difference there. Perhaps there is one for mild phobics as well, or perhaps it is simply sampling error. Specifically, the contrast only really tells the experimenter that a difference exists dependent on severity. It does not say whether a particular therapy is more effective.
Conduct two simple-effects analyses testing for overall (i.e., omnibus) differences among the three treatment groups at each of the two levels of severity. Indicate your relevant critical values and whether each simple-effect analysis is statistically significant.
Simple effects analysis is an alternative method for exploring interaction effects. It examines each of the two strengths of phobia and determines if there is a significant difference within that effect.
Because there are two degrees of freedom and the difference based on strength of phobia was supported by the ANOVA, these comparisons can be done with α = 0.05 by the reasoning of the Fisher LSD.
The sum of squares for a simple effect is:
For mild phobics, this is:
This is essentially an ANOVA done on these numbers. However, to get a better estimate of the population variance, the mean square within for the entire population is used rather than only for the samples being considered. Given this, the following distribution makes sense:
For phobics, the two simple effects are:
So, this decomposition lends breaks down a general belief that the strength of phobia affects the effectiveness of treatment into a more specific belief that for severe phobics there is a difference in the effectiveness of the type of phobia whereas for mild phobics there isn't evidence that this is the case.