Four things you might not (but should know) about false discovery rate control

Massive increases in the amount of data scientists are able to acquire and analyze over the past two decades have driven the development of new statistical tools that can better deal with the challenges of “big data.” One such set of tools is ways of controlling the “false discovery rate” (FDR) in a set of statistical tests. FDR is simply the mean proportion of statistically significant test results that are really false positives.  As you may recall from your introductory statistics course, when you perform multiple statistical tests, the probability of false positive results rapidly increases.  For example, if one were to perform a single test using an alpha level of 5% and there truly is no effect of the factor being tested, there is only a 5% chance of a false positive result.  However, if one were to perform 10 tests using an alpha level of 5% for each test, there is a 40% chance of one or more false positive results (again assuming that the factor being analyzed has no effect). FDR control, like Bonferroni correction, reduces the probability of false positive results by using a more conservative alpha level for each test.  The advantage of FDR over Bonferroni correction is that FDR is generally more powerful (i.e., better at detecting true effects) than Bonferroni.  This stems from the fact that Bonferroni correction is effectively designed to prevent any false positives, but FDR control is designed to prevent a large proportion of false positives and can afford to be less conservative.  In other words, FDR control simply attempts to ensure that the great majority of statistically significant results are accurate but generally lets in a small proportion of false positives in the process.

Because of FDR control’s relatively good statistical power and its ease of application, it has rapidly become commonplace in neuroscience. However, from talking to other researchers, I fear that many people who use FDR control do not understand the following simple, but important facts:

 1. There are multiple FDR control algorithms: Although some papers refer to FDR control as if there is only one procedure for controlling FDR, there are actually several different algorithms with different strengths and weaknesses (for reviews see Farcomeni, 2007; Groppe, Urbach, & Kutas, 2011a; Romano & Shaikh, 2006).  The most popular is the algorithm derived by Benjamini & Hochberg in 1995, which is relatively easy to implement and statistically powerful.  When you use FDR control, be sure to cite which algorithm you’re using.

2. The most popular FDR algorithm is not generally guaranteed to work: Benjamini and Hochberg’s FDR control algorithm (BH-FDR) is only guaranteed to control the false discovery rate when the statistical tests it is applied to are independent or exhibit “positive regression dependency.” When the data being analyzed are normally distributed, positive regression dependency means that none of the variables tested are negatively correlated. Note, that although BH-FDR is not generally guaranteed to work, Clark and Hall (2009) have shown that for normally distributed data BH-FDR will accurately control FDR as the number of tests it is applied to grows. Indeed, several studies using simulated data that violate BH-FDR’s assumptions have shown that in practice, it still works quite well (e.g., Groppe, Urbach, & Kutas, 2011b). That being said, there is at least one FDR control algorithm that is always guaranteed to work. It was derived by Benjamini and Yekutieli (2001), but is not commonly used because it is much more conservative than the BH-FDR procedure.

3. FDR control provides the same degree of assurance as Bonferroni correction that there is indeed some effect: If, after performing an appropriate FDR control procedure, you obtain some statistically significant results, you can be as certain as if you had performed Bonferroni correction that there is indeed at least one true positive in your set of tests.  In other words, if you control FDR at an alpha level of 5% and in truth there is no effect of the factor you’re analyzing at any variable, that means there is only a 5% chance that you will get any erroneously significant results.  Given that FDR control is usually more powerful than Bonferroni correction, this means that FDR control is a much better tool for simply determining if there is an effect or not. However, the disadvantage of FDR control is that because some false positives are allowed, you cannot be certain that any single significant result is accurate.  If it is important to establish the significance of every single test result, a technique like Bonferroni correction or a permutation test that provides “strong control of the family-wise error rate” is necessary.

4. Increasing the number of tests in your analysis can actually produce more significant results: Intuitively, the more individual tests in your set of analyses, the greater the chances of getting a false positive test result and, thus, the more stringently you should correct for multiple comparisons.  This is the way Bonferroni correction works, but it is not necessarily true of FDR control.  Adding more tests into your FDR control procedure can actually make it less conservative if the added tests exhibit an effect (i.e., the added test are likely to have small p values).  This is because when tests are added that are very likely true discoveries, FDR control can let in more false discoveries since it is simply

FDR Example

Table 1: A concrete example of how adding more tests into an FDR control procedure can increase the significance of other tests. The left column shows five p-values from five hypothetical statistical tests. The middle column shows the FDR adjusted p-values (Benjamini & Hochberg, 1995) if only the first four tests are included. The right column shows the FDR adjusted p-values if all five tests are included. * indicates “significant” p-values less than 0.05. Note that the first test result is not significant after FDR correction when four tests are included but is when five are included.

controlling the proportion of significant results that are false (see Table 1 for a concrete example).  Consequently, one should exclude known effects from an analysis when using FDR control as the known effects will make you less certain about the presence of any other effects.

-David Groppe

P.S. If you want to know more about false discovery rate control and other contemporary techniques for correcting for multiple comparisons (e.g., permutation tests). I have a review paper that might be of use.

ResearchBlogging.org

References:

Benjamini, Y., & Hochberg, Y. (1995). Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society Series B (Methodological), 57(1), 289–300.

Benjamini, Y., & Yekutieli, D. (2001). The control of the false discovery rate in multiple testing under dependency. Annals of statistics, 1165–1188.

Clarke, S., & Hall, P. (2009). Robustness of multiple testing procedures against dependence. Annals of statistics DOI: 10.1214/07-AOS557

Farcomeni, A. (2007). A review of modern multiple hypothesis testing, with particular attention to the false discovery proportion. Statistical Methods in Medical Research DOI: 10.1177/0962280206079046

Groppe, D. M., Urbach, T. P., & Kutas, M. (2011a). Mass univariate analysis of event‐related brain potentials/fields I: A critical tutorial review. Psychophysiology DOI: 10.1111/j.1469-8986.2011.01273.x

Groppe, D. M., Urbach, T. P., & Kutas, M. (2011b). Mass univariate analysis of event‐related brain potentials/fields II: Simulation studies. Psychophysiology DOI: 10.1111/j.1469-8986.2011.01272.x

Romano, J. P., & Shaikh, A. M. (2006). On stepdown control of the false discovery proportion. Institute of Mathematical Statistics Lecture Notes – Monograph Series DOI: 10.1214/074921706000000383

Advertisements

~ by eeging on September 7, 2013.

8 Responses to “Four things you might not (but should know) about false discovery rate control”

  1. From what I can tell (quick scan only) the basic FDR is just the Holm procedure from 1979, and even he wasn’t the first to publish the method. Certainly the idea is the same, whether the exact computation produces the exact same adjusted significance criteria or not.

  2. The Benjamini & Hochberg FDR procedure is similar to the Bonferroni & Holm 1979 multiple comparison correction procedure in that you sort the p-values of the individual tests and declare significant the smallest p-values below a threshold that increases with each p-value. However, the FDR threshold is generally lower than the Bonferroni-Holm threshold and provides only “weak” control of the family wise error rate. In contrast, Bonferroni-Holm correction, just like Bonferroni correction, provides strong control of the family wise error rate.

  3. A lovely XKCD comic illustrating the multiple comparison problem:
    http://xkcd.com/882/

  4. Reblogged this on Yannick's bioscience musings and commented:
    Useful advice on controlling the False Discovery Rate (FDR)!

  5. FDR is touted for ‘Big Data’, but what about small data? Such as 10-20 tests for example. Is there guidance on how ‘few’ tests are acceptable for FDR?

    • In principle, FDR control algorithms can be applied to any number of tests. In practice, however, FDR control performance will depend on the FDR algorithm you’re using and the structure of your data. For example, with small numbers of comparisons the Benjamini & Hochberg algorithm may not be accurate if variables are negatively correlated. Moreover, since the Benjamini & Hochberg algorithm doesn’t exploit dependencies between variables to increase statistical power, you might get more power from multiple comparison correction methods like a tmax permutation test that do exploit such dependencies.

      • Hi Matt,
        The Benjamini-Hochberg Linear Step-Up FDR procedure was published at 1995, before the familiar large-scale genetics revolution. So the BH idea was established long ago before the genomic large-scale revolution. The aim was to perform better in any multiple hypotheses setting (m=2 to m large enough). The FDR is a potentially smaller error rate, since FDR <= FWER (FWER is the family-wise error rare, eg, Bonferroni, Holm etc'). Therefore, any multiple testing problem which a solution that controls the FDR will be more powerful than one controlling the FWER.

        To see why FDR <= FWER, just go to the original 1995 paper.
        Note that FDR=FWER is true only if m=m_0, and in general FDR<<FWER if m_0 << m. This means that FDR will be considerably a lot more powerful in situations where m_0/m is very small. Therefore people cannot ignore the FDR benefits in such m_0/m situations. However the BH FDR will still work better for small m, and will be conservative for general type of positive-dependences (I think, Yekutieli and Benjamini 2001) and for other dependent situations it will be conservative is you correct it by dividing alpha by sum_(i=1,,m)(1/i).

        Vered Madar

  6. A recent method worth mentioning is Independent Hypothesis Weighting: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4930141/

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

 
%d bloggers like this: