paint-brush
Quick Intro to Statistics — Power Your Stories with Databy@justindesign
10,763 reads
10,763 reads

Quick Intro to Statistics — Power Your Stories with Data

by Justin BakerNovember 1st, 2017
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

When working in the tech world (or at any job, for that matter), knowing how to harness statistics empowers you to make data-driven decisions. Whether you’re a marketer, designer, or developer, it is absolutely critical that you understand statistical terminology, how to interpret findings, and when to transform those findings into action.

Company Mentioned

Mention Thumbnail
featured image - Quick Intro to Statistics — Power Your Stories with Data
Justin Baker HackerNoon profile picture

Not a statistician? No problem! Learn the basics of statistical analysis in a few minutes

Storytelling with Statistics

When working in the tech world (or at any job, for that matter), knowing how to harness statistics empowers you to make data-driven decisions. Whether you’re a marketer, designer, or developer, it is absolutely critical that you understand statistical terminology, how to interpret findings, and when to transform those findings into action.

The most important take away should be that statistics alone will not necessarily make your arguments better. Statistics are fuel for your stories, but they are not stories in themselves. Make sure that you frame your findings in a way that persuasively move your audience, enriching your data with meaning and a call to action.


“Once something has occurred and we can put together a story to explain it, it starts to seem like the outcome was predestined. Statistics don’t appeal to our need to understand cause and effect, which is why they are so frequently ignored or misinterpreted. Stories, on the other hand, are a rich means to communicate precisely because they emphasize cause and effect.” ― Michael J. Mauboussin, The Success Equation

Population and Sample

A population is any large collection of objects or individuals, such as Americans, students, or trees about which information is desired.

A parameter is any summary number, like an average or percentage, that describes the entire population.

A sample is a representative group drawn from the population.

A statistic is any summary number, like an average or percentage, that describes the sample.

National Center for Education Statistics

Measures of Central Tendency

Mean

The mean of a set of numbers, sometimes simply called the average, is the sum of the data divided by the total number of data.

Median

The median of a set of numbers is the middle number in the set (after the numbers have been arranged from least to greatest) — or, if there are an even number of data, the median is the average of the middle two numbers.

Mode

The mode of a set of numbers is the number which occurs most often.

Range

The difference between the lowest and highest values in a set.

Further Learning

Hypothesis Testing

The general idea of hypothesis testing involves:

  1. Making an initial assumption.
  2. Collecting evidence (data).
  3. Based on the available evidence (data), deciding whether to reject or not reject the initial assumption.

Null Hypothesis

A null hypothesis proposes that no statistical significance exists in a set of given observations. It is the hypothesis that the researcher is trying to disprove.

Alternative Hypothesis

An alternative hypothesis simply is the inverse, or opposite, of the null hypothesis. So, if we continue with the above example, the alternative hypothesis would be that there IS indeed a statistically-significant relationship between multiple variables.

Errors

A Type 1 Error is the incorrect rejection of a true null hypothesis (also known as a “false positive” finding).

A Type 2 Error is incorrectly retaining a false null hypothesis (also known as a “false negative” finding)

Further Learning

LibGuides at La Trobe University

Statistical Significance

P-Value

The p-value is a number between 0 and 1, that can be interpreted as follows:

  • A small p-value (typically ≤ 0.05) indicates strong evidence against the null hypothesis, so you reject the null hypothesis.
  • A large p-value (> 0.05) indicates weak evidence against the null hypothesis, so you fail to reject the null hypothesis.
  • p-values very close to the cutoff (0.05) are considered to be marginal (could go either way). Always report the p-value so your readers can draw their own conclusions.

Further Learning

Explain XKCD

RealClearScience

Regression

A technique for determining the statistical relationship between two or more variables where a change in a dependent variable is associated with, and depends on, a change in one or more independent variables. See also correlation.

Independent Variable — It is a variable that stands alone and isn’t changed by the other variables you are trying to measure. For example, someone’s age might be an independent variable.

Dependent Variable — A dependent variable is the variable being tested and measured in a scientific experiment. The dependent variable is ‘dependent’ on the independent variable. As the experimenter changes the independent variable, the effect on the dependent variable is observed and recorded.

Regression Analysis — Regression is a statistical measure that attempts to determine the strength of the relationship between one dependent variable and a series of other changing variables (the independent variables).

Simple Linear Regression — Regression that uses only one independent variable and describes the relationship between the independent and dependent variables as a straight line.

Correlation Coefficient (r) — the correlation coefficient r measures the strength and direction of a linear relationship between two variables. It ranges from -1.0 to +1.0. The closer r is to +1 or -1, the more closely the two variables are related. If r is close to 0, it means there is no relationship between the variables.

R-Squared — R-squared is a statistical measure of how close the data are to the fitted regression line. It is the percentage of the variation that can be explained by a linear model.

  • R-squared = Explained variation / Total variation-
  • R-squared is always between 0 and 100%:
  • 0% indicates that the model explains none of the variability of the response data
  • 100% indicates that the model explains all the variability of the response data around its mean.

Further Learning

AnalyticsBuddhu

Percentage Change

Calculate a Percent Increase

First: work out the difference (increase) between the two numbers you are comparing.

Increase = New Number — Original Number

Then: divide the increase by the original number and multiply the answer by 100.

% increase = Increase ÷ Original Number × 100.

If your answer is a negative number then this is a percentage decrease.

Source

Calculate a Percent Decrease

First: work out the difference (decrease) between the two numbers you are comparing.

Decrease = Original Number — New Number

Then: divide the decrease by the original number and multiply the answer by 100.

% Decrease = Decrease ÷ Original Number × 100

If your answer is a negative number then this is a percentage increase.

Source

Corbettmaths

Learn More!

Here are some great online statistics guides to help you.