Chapter 7 Statistics

7.1 An Introduction to Statistical Learning

  • So we realize actually–as often is the case, you can get some of the signal pretty quickly. But getting down to a very good error rate, in this case, trying to classify some of the harder to classify things is difficult.

7.2 Foundations of statistical natural language processing (by Christopher D. Manning & Hinrich schutze)

  • Zipf’s law: Human behavior and the principle of least effort. Principle of least effort: people will act so as to minimize their probable average rate of work (i.e. not only to minimize the work that they would have to do immediately, but taking due consideration of future work that might result from doing work poorly in the short term.)

  • Chomsky’s dictum: “Probability theory is inappropriate for formalizing the notion of grammaticality.”

7.3 ASA on p-value

It is science’s dirtiest secret: The ‘scientific method’ of testing hypotheses by statistical analysis stands on a flimsy foundation. (Siegfried, 2010)

The problem is not that people use p-value poorly; it is that the vast majority of data analysis is not performed by people properly trained to perform data analysis. (Leek, 2014)

The statistical community has been deeply concerned about issues of reproducibility and replicability of scientific conclusions.

Clarify some widely agreed principles:

  1. P-values can indicate how incompatible the data are with a specified statistical model
    The incompatibility can be interpreted as casting doubt on or providing evidence against the null hypothesis or the underlying assumptions.

  2. P-values do not measure the probability that the studied hypothesis is true, or the probability that the data were produced by random chance alone It is a statement about data in relation to a specified hypothetical explanation and is not a statement about the explanation itself.

  3. Scientific conclusions and business or policy decisions should not be based only on whether a p-value passes a specific threshold Need to consider other factors: (1) design of a study (2) quality of the measurement (3) external evidence for the phenomenon under study (4) validity of assumptions that underlie the data analysis.

  4. Proper inference requires full reporting and transparency
    • Also need to disclose: number of hypotheses explored, all data collection decisions, all statistical analyses conducted, p-values computed
    • Avoid: cherry-picking (aka: data dredging, significance chasing and significance questing, selective inference, p-hacking)
  5. A p-value or statistical significance does not measure the size of an effect or the importance of a result Smaller p-values do not necessarily imply the presence of larger or more important effects. Larger p-values do not imply a lack of importance or even lack of effect. Any effect, no matter tiny can produce a small p-value if the sample size or measurement precision is high enough and large effects may produce unimpressive p-values if the sample size is small or measurements are imprecise (does large p-value indicate uncertainty of making conclusion from the data using current model?)

  6. By itself, a p-value does not provide a good measure of evidence regarding a model or hypothesis.
    • Also try other available methods
    • Other approaches: confidence, credibility or prediction intervals (?), Bayesian methods, likelihood ratios (?) or Bayes Factors, decision-theoretic modeling (?) and false discovery rates (?).