Statistical analysis is that the process of generating statistics from stored data and analyzing the results to deduce or infer meaning about the underlying dataset or the truth that it attempts to describe.
Statistics is defined as “…the study of the collection, analysis, interpretation, presentation, and organization of data.” That’s basically the same as the definition of data science, and in fact the term data science was initially coined in 2001 by Purdue statistician William S. Cleveland in the title to his paper “Data Science: An Action Plan for Expanding the Technical Areas of the Field of Statistics.”
Statistical methods are discussed in greater detail during a separate chapter during this book. Three of the foremost prevalent statistical errors about which to be vigilant are (1) statistical analysis methods and sample size determinations being made after data collection (posteriori) instead of a priori, (2) lack of significance being interpreted to imply lack of difference [studies with negative (not statistically different) results must include power calculations so that the probability of type II errors (differences not detected when there are differences) are often assessed], and (3) multiple outcome measurements, multiple comparisons, and subgroup comparisons [in the absence of appropriate multivariable procedures and clear a priori hypotheses, suspect the presence of type I errors