![]() ![]() Multiple pairwise comparisonsĪfter one-way ANOVAs or Kruskal-Wallis tests, it is possible to perform multiple pairwise comparisons for each feature taken separately. Where p is the original p-value and nbp is the number of computed p-values in total. The corrected p-value according to the Bonferroni procedure is defined by: It is useful when the goal of the study is to select a very low number of differentially expressed features. It is rarely used in differential expression analyses. It is part of the FWER (Familywise error rate) correction procedure family. Where p is the original p-value, nbp is the number of computed p-values in total and j is the rank of the original p-value when p-values are sorted in ascending order.īonferroni: p-values increase only with their number. The corrected p-value according to the Benjamini-Yekutieli procedure is defined by: However, it is far less stringent than the Bonferroni approach which we describe just after. In addition to Benjamini-Hochberg’s approach, it takes into account a possible dependence between the tested features, making it more conservative than this procedure. It is part of the FDR (False Discovery Rate) correction procedure family. Where p is the original (uncorrected) p-value, nbp is the number of computed p-values in total and j is the rank of the original p-value when p-values are sorted in ascending order.īenjamini-Yekutieli: this procedure makes sure that p-values increase both with their number and the proportion of non-significant p-values. P BenjaminiHochberg = min( p* nbp / j, 1) The corrected p-value according to the Benjamini-Hochberg procedure is defined by: It is widely used in differential expression studies. It is therefore adapted to situations where we are looking for a large number of genes which are likely affected by the explanatory variables. The Benjamini-Hochberg correction is poorly conservative ( = not very severe). XLSTAT proposes three common p-value correction methods:īenjamini-Hochberg: this procedure makes sure that p-values increase both with their number and the proportion of non-significant p-values. ![]() Consequently, p-values should be corrected ( = increased = penalized) as their number grow. When working with high-throughput data, we often test the effect of an explanatory variable on the expression of thousands of genes, thus generating thousands of p-values. Considering a significance level alpha of 5%, we would likely find 5 significant p-values by chance over 100 computed p-values. Running a test several times increases the number of computed p-values, and subsequently the risk of detecting significant effects which are not significant in reality. The p-value represents the risk that we take to be wrong when stating that an effect is statistically significant. The statistical tests proposed in the differential expression tool in XLSTAT are traditional parametric or non-parametric tests: Student t-test, ANOVA, Mann-Whitney, Kruskal-Wallis). Those tools must therefore be slightly adapted in order to overcome these problems. However, the size of the data may cause problems in terms of computation time as well as readability and statistical reliability of results. ![]() In order to test if features are differentially expressed, we often use traditional statistical tests. At this stage, we may talk about omics data analyses, in reference to analyses performed over the genome (gen omics) or the transcriptome (transcript omics) or the proteome (prote omics) or the metabolome (metabol omics), etc. In this kind of studies, data often have a very important size ( = high-throughput data). Differential expression allows identifying features (genes, proteins, metabolites…) that are significantly affected by explanatory variables. For example, we might be interested in identifying proteins that are differentially expressed between healthy and diseased individuals. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |