-
Notifications
You must be signed in to change notification settings - Fork 4
Using FMUT
FMUT uses the GND and GRP structs from the Mass Univariate Toolbox. These variables store the data for analysis and also store the results of the tests you run.
Instructions for creating a GND and the meaning of the various fields in the GND are described in the MUT documentation.
A GRP struct gathers the data from multiple groups, each with their own GND and is created via the GNDs2GRP
function. More information is available in the MUT documentation.
See also Example Scripts.
FMUT implements one-way and factorial within-subjects, between-subjects, and split plot ANOVAs. Fmax, cluster mass, and several false discovery rate (FDR) corrections are supported (see Mass Univariate Statistics and Corrections).
The main functions of FMUT are:
-
FmaxGND
: Fmax test for fully within-subjects designs -
FmaxGRP
: Fmax test for designs with a between-subjects factor -
FclustGND
: Cluster mass test for fully within-subjects designs -
FclustGRP
: Cluster mass test for designs with a between-subjects factor -
FfdrGND
: FDR corrected mass univariate test for fully within-subjects designs -
FfdrGRP
: FDR corrected mass univariate test for designs with a between-subjects factor
For more information on these tests see Mass Univariate Statistics and Corrections and References and Further Reading.
The sections below describe how to specify the ANOVA model and the various options for these tests.
Let’s say you are interested in the effects of semantic priming and corpus frequency on the ERPs elicited by words. Your study uses a 3 (Frequency: low, medium, high) X 2 (Priming: primed, unprimed) design and you want to conduct an ANOVA that will test the two main effects as well as the two-way interaction. In your GND, the bins representing the six relevant conditions are as follows:
- Bin1: unprimed low frequency
- Bin2: unprimed medium frequency
- Bin3: unprimed high frequency
- Bin4: primed low frequency
- Bin5: primed medium frequency
- Bin6: primed high frequency
A cluster mass test of this design can be conducted with the following:
GND = FclustGND(GND, ...
'bins', 1:6, ...
'factor_names', {'Frequency', 'Priming'}, ...
'factor_levels', [3, 2]);
The first argument to the function is the GND variable containing your data (see above). The output of the function is generally the same GND variable. This will simply add the results of the test to the F_tests field of the GND struct (see below).
'bins', 1:6
specifies that bins 1 through 6 contain the six conditions of the 3 x 2 design. In MATLAB 1:6
is short hand for the array [1, 2, 3, 4, 5, 6]
. If you have non-consecutive bins, or need to re-order your bins, you can specify them via the full array notation.
For the 'factor_names'
and 'factor_levels'
inputs, the factors are listed in fastest to slowest moving order within the bins specified.
Whenever you use FmaxGRP
, FclustGRP
, or FfdrGRP
, it is assumed that there is a between-subjects factor and this does not have to be explicitly specified.
FMUT only supports designs with a single between-subjects factor. By default, GRP-based functions will include all groups in your GRP variable as the levels of the between-subjects factor. If you would like to use only a subset of the groups, you can use the 'use_groups'
input.
The name of the between-subjects factor defaults to 'Group'. If you would like to use a different name, you can use the 'bg_factor_name'
input.
As an example, imagine you are conducting the Frequency x Priming paradigm described above and are interested in how these effects differ across age. You have three groups: teens, young adults, and older adults. You want to conduct a test with only the young adults and older adults. Here is the code for a cluster mass test of this design:
GRP = FclustGRP(GRP, ...
'use_groups', {'young_adults', 'old_adults'}, ...
'bg_factor_name', 'Age', ...
'bins', 1:6, ...
'wg_factor_names', {'Frequency', 'Priming'}, ...
'wg_factor_levels', [3, 2]);
There are a number of options and choices to be made for each of the three main mass univariate approaches. All options are specified as keyword-input arguments to the main functions. For example, here is an Fmax analysis with several options specified:
GND = FmaxGND(GND, ...
'bins', 1:6, ...
'factor_names', {'probability', 'emotion'}, ...
'factor_levels', [3, 2], ...
'time_wind', [500, 800], ...
'include_chans', {'Fz', 'Cz', 'Pz'}, ...
'n_perm', 1e4, ...
'alpha', 0.05);
For further examples, see Example Scripts.
You can see all the options for each function with a brief description of each by calling the function documentation at the MATLAB command line. For example:
>>help FmaxGND
You don't need to specify all of these optional inputs, as many have defaults that you will not need to change.
This section describes those options and defaults and gives some recommendations where relevant.
-
time_wind: The time window within which to conduct the test. For example,
[500, 800]
specifies that you want the test conducted at all time points between 500 and 800 ms (or the mean amplitude between those points if the'mean_wind'
option is specified). For Fmax and FDR functions, you can specify multiple time windows by separating them with a semicolon: e.g.,[300, 500; 501, 800]
. Default: zero to the end of the epoch. Recommendation: Restricting the time window will result in greater power. However, if an effect appears outside the time window you selected, you will not be able to analyze it without inflating the Type I error rate. You will generally have enough a priori information or hypotheses to provide some restriction in the time window you are interested in–i.e., except in the case of completely exploratory analyses, you rarely want to conduct an analysis across the entire epoch. How to best balance flexibility and power depends heavily on your design, hypotheses, and a priori knowledge about the temporal location of the effects you want to detect. -
mean_wind: If
'yes'
, the test will be conducted on the mean amplitude in the time window provided rather than each individual time point. Default:'No'
. Recommendation: Using the mean_wind option can increase power, but this depends on the mass univariate method you use, the nature of your effect, and how noisy your data is. For cluster mass tests, simulations suggest that using mean amplitude may actually decrease power, at least for relatively long-lasting effects. For Fmax and FDR, the mean_wind option can increase power, but may decrease power for very focal effects with a large effect at the peak (see Simulation Results). The mean_wind option may also increase power for data with a lot of high frequency noise (although a low pass filter maybe be a better option in many cases). In any case, the mean_wind option decreases flexibility to find effects that were not where you expected them. How to best balance flexibility and power depends heavily on your design, hypotheses, and a priori knowledge about the temporal location of the effects you want to detect. -
exclude_chans: Name of channels to exclude from the test, e.g.,
{'HEOG', 'VEOG', 'A1'}
. Default: Use all channels. Recommendation: You should make sure to exclude any non-data channels from you test (e.g., EOG, mastoids, etc.). Like with the time window option above, you can also use this option to define a spatial ROI, which will increase power in comparison to looking for effects at all electrodes. However, if an effect appears outside the ROI you selected, you will not be able to analyze it without inflating the Type I error rate. How to best balance flexibility and power depends heavily on your design, hypotheses, and a priori knowledge about the spatial location of the effects you want to detect. -
include_chans: Same as
'exclude_chans'
, except the input given are the channels you want to use in the test. You cannot use both'include_chans'
and'exclude_chans'
. -
plot_raster: If
'yes'
, a raster plot will be generated for each effect. Default:'yes'
. -
save_GND or save_GRP: If
'yes'
, the GND or GRP variable will be saved to disk. Default: User is prompted about whether to save when the test ends. - output_file: The name of an .xlsx file to output the results to. See Output to spreadsheet below. Default: No output.
-
verblevel: A value 0 through 3 which specifies how much information will be reported at the command window. If
0
, only errors and warnings will be reported. Default:2
.
-
q: The expected false discovery rate proportion. Default:
0.05
. -
method: Which of the three main FDR corrections to use.
'bh'
is the classic Benjamini & Hochberg (1995) procedure, which is guaranteed to control FDR when the tests are independent or positively dependent (e.g., positively correlated Gaussians).'by'
is a much more conservative version that always controls FDR (regardless of the dependency structure of the tests: Benjamini & Yekutieli, 2001).'bky
' is a "two-stage" version that is more powerful when a lot of the null hypotheses are false (Benjamini, Krieger, & Yekutieli, 2006).'bky'
is guaranteed to control FDR when the tests are independent and tends to be slightly less powerful than'bh'
when few or no null hypothese are false. Default:'bh'
. Recommendation: See Groppe et al. (2011a). -
sphericity_corr: Correction for violation of sphericity to use when calculating p-values before applying the FDR correction.
'none'
calculates the repeated measures ANOVA with no sphericity correction. Correction options are'gg'
for Greenhouse-Geisser,'hf'
for Hyunh-Feldt, and'lb'
for the lower bound correction. Default:'none'
-
n_perm: Number of permutations used to estimate the null distribution. Default:
10000
. Recommendation: Manly (1997) recommends at least 1,000 permutations for tests with an alpha of 0.05 and 5,000 permutations for tests with an alpha of 0.01. However, these recommendations were written when computing power was much less than it is now, and estimates of p-values can still be somewhat noisy with these recommendations. With modern computers, it is easy to do many more permutations. The default of 10,000 will give good estimates. For results I intend to publish or present, I usually use at least 100,000 permutations (for complex designs, this may take a few hours). -
alpha: The family-wise alpha of the test. See Mass Univariate Statistics and Corrections for a description of how family-wise error control operates for each test. Default:
0.05
. -
reproduce_test: Since permutation tests are based on a random subset of all the possible permutations, each time you run the test, even with the same data, the results will be slightly different (the difference will be smaller the more permutations you use–see above). This option sets the random number generator of the computer to exactly reproduce a previous test. For example,
'reproduce_test', 2
will reproduce the test inGND.F_tests(2)
(assuming all the other options are also set the same). This is mostly useful for software development and testing purposes. Default:0
.
-
thresh_p: The uncorrected p-value threshold for including an electrode/time point in a cluster. Default:
0.05
. Recommendation: Most research and simulation work has used a threshold of 0.05. You may have better power for more focal effects (or in exploratory analyses looking for effects in long time windows) with a threshold of 0.01, but for truly focal effects Fmax is preferable. See Simulation Results. -
chan_hood: Defines which electrodes are considered neighbors for the purpose of clustering. Generally you provide a value giving the maximum distance between neighboring electrodes. The units of this value depend on the units of your chanlocs information; e.g., it may be in percentage of the radius of the head, millimeters, or some other value. This value is used to create a # electrodes x # electrodes chan_hood matrix. This matrix gives Boolean values indicating whether channel pairs are neighbors. For example if
chan_hood(3, 20) == 1
, then channels 3 and 20 are neighbors; if it is0
, they are not. You can input this matrix directly as thechan_hood
argument rather than providing a distance value. Default:0.61
. Recommendation: Use common sense. Electrodes that are intuitively adjacent should be neighbors. You will likely use smaller values for a 128 electrode array than a 32 electrode array, so there is no universal default. To create a chan_hood matrix from a distance value, you can use the MUT functionspatial_neighbors
. You can play around with different values until the resulting matrix makes sense (although, of course, it would not be appropriate to try different values and see which one gives the most desirable statistical results!). - head_radius: The radius of the head in whatever units the Cartesian coordinates in GND.chanlocs are in. This is used to convert scalar values of chan_hood into centimeters. Default: Estimated from chanlocs.
The results of every FMUT test you run is stored in the F_tests
field of your GND
or GRP
variable. Here are what the fields of the F_tests
struct mean.
-
bins: Bin numbers in the
GND
orGRP
variable used in the test -
use_groups: For designs with a between-subjects factor, the names of the groups used. These names match those found in
GRP.group_desc
- group_n: The number of subjects in each group
- factors: The names of all the factors in the ANOVA design
- factor_levels: The number of levels of each factor in the ANOVA design
- time_wind: The time window(s) tested
-
used_tpt_ids: All the samples used in the test. For example, if your sampling rate is 500 Hz and you have a 100 ms baseline, then a 300 to 500 ms time window is samples 201 to 301. The time point in ms of each sample can be found in
GND.time_pts
- mean_wind: Indicates whether the mean_wind option was used. If 'yes', the test was conducted on the mean amplitude across the time window(s) listed above rather than each individual time point
- include_chans: Names of all electrodes included in the test
-
used_chan_ids: Index within
GND.chanlocs
of all electrodes included in the test - mult_comp_method: The multiple comparison correction used for the test
- n_perm: For permutation-based tests, the number of permutations performed to estimate the null distribution
- desired_alphaORq: The user specified family-wise alpha or false discovery rate
- estimated_alpha: For permutation-based tests, the estimated actual alpha. For approximate tests, no estimated alpha is given
- seed_state: The state of the random number generator at the beginning of the test. This allows for exact reproduction of permutation-based tests
- exact_test: For permutation-based tests, a 1 indicates an exact test and a 0 indicates an approximate test for each effect. (See Permutation Tests for Factorial ANOVA.)
- sphericity_corr: For FDR-based tests, the sphericity correction used with the ANOVA
- null_test: An electrode x time point array with 1 for locations where the null was rejected and 0 for locations where it was not for each effect
- adj_pval: An electrode x time point array of multiple comparison corrected p-values for each effect
- F_obs: An electrode x time point array of F-observed
- F_crit: The F-critical value for significance for each effect
- df: Degrees of freedom for each effect
-
chan_hood: For cluster-based tests, an electrode x electrode array specifying which electrodes were considered neighbors for clustering. For example if
GND.F_tests.chan_hood(1,4) == 1
, then channels 1 and 4 were considered neighbors. - clust_info: For cluster-based tests, information on each cluster found: whether it was significant, its p-value, the cluster mass statistic, and an id number for each cluster
By default, a summary of the test and results will be output at the MATLAB command window when you run a test. If you would like tests to run quietly, use the 'verblevel'
input (see above).
If you would like to generate a report of a previously run test, you can use the report_results
function.
By default, a raster a plot summarizing the results for each effect will be generated when you run a test. If you do not want to see a raster plot, you can use the 'plot_raster'
input (see above).
If you would like to generate a raster plot for a previously run test, you can use the F_sig_raster
function. This function also has a number of options. For example, you can specif the scaling of the plot if you want to make plots across multiple effects with the same scale. For more information see:
>>help F_sig_raster
FMUT functions can output the full results of your analysis to a spreadsheet by using the 'output_file'
input and specifying the name of an .xlsx file.
This spreadsheet will contain the full results including the design and options of the test, cluster statistics and ids, F-values, and p-values. It should also be formatted for convenient viewing (but see below for Mac and Linux users). You may find it easier to view and work with the results in this spreadsheet than in MATLAB.
This spreadsheet is also intended be something that can be included as supplementary materials with publications to report full results of mass univariate analyses (which are difficult to impossible to summarize or report completely in a paper).
If you would like to produce an output spreadsheet after a test has already been run and stored in the GND variable, you can use the Ftest2xls
function.
If you would like to produce an FMUT-style output spreadsheet for t-test results from the MUT, you can use the ttest2xls
function.
Note for Mac and Linux users
Due to limitations with MATLAB’s xlswrite
function, FMUT is distributed with a third-party function (xlwrite
) for writing to spreadsheets on non-Windows system. This function seems to work for basic output purposes, but I haven’t tested it nearly as much as the Windows equivalent.
One issue with this solution is that the code that applies formatting to spreadsheet output has problems working with xlwrite
created spreadsheets. As a result, spreadsheets are output without formatting applied by default. Here is a slightly annoying work-around if you want to add the formatting.
- Create the spreadsheet without formatting (this is the default on non-Windows systems).
- Open the spreadsheet in Excel and re-save it (this fixes some issues with the way formatting is encoded).
- Run
format_xls('filepath/your_spreadsheet.xlsx')
at the MATLAB command line.
This should work on Mac. It will only work on Linux if you have a Python distribution with openpyxl that is properly linked to MATLAB.
Consider that you obtain a main effect of Frequency in the factorial design described above. Since Frequency has three levels, you may want to conduct follow-up tests to determine which conditions differ. To do so, you will need to create a bin representing each Frequency condition, collapsed across the two Priming conditions. You can do this via the MUT function bin_mean
. For, example (using the bin structure described above):
GND = bin_mean(GND, 1, 4, 'Low Frequency');
GND = bin_mean(GND, 2, 5, 'Medium Frequency');
GND = bin_mean(GND, 3, 6, 'High Frequency');
The bin_mean
function will add bins sequentially. So if your original GND had 6 bins, the new bins you just created will be bins 7 through 9. Now you can run pair-wise contrasts to follow-up your main effect:
%Low Frequency vs Medium Frequency
GND = FclustGND(GND, 'bins', [7, 8], ...
'factor_names', 'Frequency', ...
'factor_levels', 2);
%Low Frequency vs High Frequency
GND = FclustGND(GND, 'bins', [7, 9], ...
'factor_names', 'Frequency', ...
'factor_levels', 2);
%Medium Frequency vs HighFrequency
GND = FclustGND(GND, 'bins', [8, 9], ...
'factor_names', 'Frequency', ...
'factor_levels', 2);
Above I conducted pair-wsie follow-ups as F-tests rather than t-tests. Here I explain why.
Traditionally, you would conduct pair-wise contrasts as t-tests. However, you could just as well conduct them as ANOVA. The answer would be the same. For the equivalent ANOVA and t-test, F = t2 and the p-values are identical.
This also holds true for the tmax/Fmax and FDR approaches. You can conduct pairwise tests using tmaxGND
or FmaxGND
and you should get the same answer (with slight differences due to random permutations). Similarly, tfdrGND
and FfdrGND
should give you the same answer.
However, for cluster mass tests, t-tests and F-tests are not equivalent. There are two reasons for this.
First, the MUT function clustGND
forms separate positive and negative clusters. Because F-values are squared and always positive, there are only positive clusters for FclustGND
. If a positive and negative cluster were temporally or spatially adjacent, they would be separate clusters for clustGND
but one large cluster for FclustGND
.
Second, as noted above, in the pairwise case, F-values are squared t-scores. Since the squaring occurs before the scores are summed to form a cluster mass statistic, the values don’t contribute in the same way to t-based and F-based cluster mass statistics. Consider the two simple clusters below.
Cluster | t-scores | F-scores | t cluster mass | F cluster mass |
---|---|---|---|---|
A | 1, 3, 5 | 1, 9, 25 | 9 | 35 |
B | 1, 1, 6 | 1, 1, 36 | 8 | 38 |
Cluster A has a larger cluster mass statistic when t-scores are used, but cluster B is larger when F-scores are used. This is because larger values contribute more to the cluster mass statistic when they are squared. The larger peak value in cluster B leads it to over-take cluster A when the squared F-values are used. Thus F-based cluster mass statistics are move heavily influenced by large values. Given the nature of ERPs, this means they are generally more biased toward the peak of the effect.
Which approach is preferable will likely depend on the nature of the effect you are trying to detect. (Both will have the same Type I error rate, but the F-test approach may have greater power for effects with a larger peak. How big the difference is in practice is not clear.) For present purposes, the point is that clustGND
and FclustGND
will not give identical answers for the same pairwise comparison.
If you are conducting pairwise tests to follow-up a main effect in an ANOVA, you probably want to test the pairwise contrasts in the same way you tested the original main effect. Otherwise, it may be difficult to interpret the pattern and (spatial and temporal) distribution of the pairwise results in light of the main effect. I would therefore recommend that you use ANOVA (and therefore FclustGND
) rather than t-tests to conduct pairwise follow-ups of main effects when using the cluster mass approach.
When multiple pairwise tests are conducted to follow-up an omnibus main effect, a multiple comparisons correction is usually required.
FMUT does not currently include functionality for applying such multiple comparisons corrections.
However, some common corrections can be applied to directly to p-values (e.g., Bonferroni or Dunn-Šidák) and thus are easy for researchers to apply themselves to results from FMUT functions.
An excellent overview and discussion of follow-up tests and multiple comparison corrections is available in Chapter 5 of Roger Kirk's Experimental Design.