Week #6 SOCRMx – Quantitative analysis

This section of the SOCRMx MOOC offers a fair introduction to statistics and the analysis of quantitative date. At least, enough to get a grasp on what is needed to get meaningful data and what it looks like when statistics are misused or misrepresented. (This bit in particular should be a core unit in the mandatory media and information literacy training that everyone has to take in my imaginary ideal world)

The more I think about my research, the more likely I think it is to be primarily qualitative but I can still see the value in proper methodology for processing the quant data that will help to contextualise the rest. I took some scattered notes that I’ll leave here to refer back to down the road.

Good books to consider – Charles Wheelan: Naked Statistics: Stripping the dread from data (2014) & Daniel Levitin: A Field Guide to Lies and Statistics: A Neuroscientist on How to Make Sense of a Complex World (2016)

Mean / Median / Mode

Mean – straightforward average.

Median – put all the results in a line and choose the one in the middle. (Better for average incomes as high-earners distort the figures)

Mode – which section has the most hits in it

Student’s T-Test – a method for interpreting what can be extrapolated from a small sample of data. It is the primary way to understand the likely error of an estimate depending on your sample size

It is the source of the concept of “statistical significance.”

A P-value is a probability. It is a measure of summarizing the incompatibility between a particular set of data and a proposed model for the data (the null hypothesis). https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5366529/

“a significance level is an indication of the probability of an observed result occurring by chance under the null hypothesis; so the more you repeat an experiment, the higher the probability you will see a statistically significant result.”

Overall this entire domain is one where I think I’m only really going to appreciate the core concepts when I have a specific need for it. The idea of a distribution curve where the mean of all data points represents the high point and standard deviations (determined by a formula) show us the majority of the other data points seems potentially useful but, again, until I can practically apply it to a problem, just tantalisingly beyond my grasp.