Statistics

Live Online (VILT) & Classroom Corporate Training Course

Statistics is the science in data science. Without it, your "data-driven" decision-making may be driving you off a cliff edge. A solid grasp of statistical reasoning ensures that you tease only valid insights from your data.
Statistics

How can we help you?

  • CloudLabs
    CloudLabs
  • Projects
    Projects
  • Assignments
    Assignments
  • 24x7 Support
    24x7 Support
  • Lifetime Access
    Lifetime Access
Box

Overview

Statistics is an essential course for anyone technical, managerial or administrative — interested in using data to inform their decision-making.

Box

Objectives

At the end of Statistics training course, participants will learn how to

  • Visualize data
  • Draw conclusions about the features and quality of data sets
  • Summarize your data
  • Determine correlation
  • Think of numbers as distributions
  • Understand sampling and it’s importance in statistic inference
  • Use the power of computers to generate distributions for any problem
  • Calculate confidence intervals and p-values
  • Make valid statistic inferences using a range of hypothesis tests
  • Critique statistical analyses
  • Design and execute your own statistical projects
Box

Prerequisites

There are no formal prerequisites for attending this course.

Box

Course Outline

  • Course philosophy
  • Software
  • Contents

  • Definition
  • Types of statistician
  • Variability
  • Probability
  • Die roll outcomes
  • Why is knowledge of statistics important?
  • Descriptive vs inferential statistics
  • Inferring population parameters
  • Quantitative data
  • Qualitative data
  • R statistical software
  • RStudio
  • Interactive exercise manual demo

  • What is exploratory data analysis (EDA)
  • Histograms and bar charts
  • Bar chart vs histogram
  • Central tendency and spread
  • Bin width is crucial
  • Right-skewed data
  • Outliers
  • Left-skewed data
  • Bimodal data
  • Separate subpopulations for analysis
  • Individual value plot
  • Subpopulation individual value plots
  • Benefits of boxplots
  • Boxplot
  • Boxplot vs histogram
  • Left-skewed boxplot
  • Compare subpopulations using boxplots
  • Swedish salaries by level of education
  • Measures of central tendency
  • Mean vs median
  • Mean vs median for skewed data
  • Mode
  • Measures of spread
  • Range and IQR
  • Standard deviation
  • Six figure summary
  • Central tendency and spread equations
  • Quantiles

  • Numbers are mostly reckless estimates
  • Random variables
  • Male life expectancy in UK distribution
  • What’s the probability that a US man is 6’ or more?
  • What is a probability distribution?
  • Populations vs samples
  • Sampling the heights of 10 random American men
  • Sampling the heights of 100,000 random American men
  • Discrete probability distributions
  • Roll two dice and histogram the results
  • Poisson distribution
  • Binary probability distributions
  • Probability distribution for cars/household in the UK
  • Binomial distribution
  • Geometric distribution
  • Negative Binomial distribution
  • Continuous probability distributions
  • Uniform distribution
  • Triangular distribution
  • Normal distribution
  • Properties of the normal distribution
  • Distribution of IQ scores
  • Different means (same standard deviation)
  • Different standard deviations (same mean)
  • z-distribution
  • 68–95–99.7 (empirical) rule
  • Quantile-Quantile (Q-Q) plot
  • Q-Q plot of non-normal data
  • Common probability distributions “family tree”

Transforming data with Relational Operators

  • Samples are proxies for the population of interest
  • Unfortunately, samples vary
  • Larger samples exhibition less variation
  • Statistics vs parameters
  • Distributions involved in statistical inference
  • Sampling distribution of mean IQ
  • Collecting more IQ samples
  • Sampling distribution of mean die roll
  • Sampling distribution of mean project duration
  • Create a sampling distribution
  • Central limit theorem
  • Implications of the central limit theorem
  • Standard error of the mean (SEM)
  • Impact of sample size on SEM
  • What is a confidence interval?
  • 95% confidence interval
  • Bigger samples give greater precision
  • Smaller confidence levels result in tighter intervals
  • How should we interpret the confidence interval?
  • Random sampling
  • Simple random sampling
  • Stratified sampling
  • Cluster sampling
  • What is bootstrapping?
  • Estimating median life expectancy

  • What is statistical inference?
  • Why must we use samples?
  • Why do we need to conduct hypothesis tests?
  • What is hypothesis testing?
  • Null hypothesis
  • Alternative hypothesis
  • Rejecting the null hypothesis
  • One- vs Two-tailed hypothesis tests
  • Choosing between one- and two-tailed tests
  • What are p-values?
  • Significance level (?)
  • Types of errors
  • Confidence levels vs significance levels
  • Performing hypothesis tests
  • p-value controversy
  • When to use a t-test
  • t-value
  • t-distribution
  • t-distributions
  • Slot machine observed ”Return to Player”
  • Are slot machine payouts within tolerance?
  • Preform a t-test on RTP data using R
  • Two-sample t-test
  • When to use a z-test
  • Conducting hypothesis tests using z-scores
  • When to use a 2 test
  • Education and Brexit vote
  • Brexit vote breakdown
  • 2 value
  • 2 distributions
  • Are education and Brexit vote related?
  • When to use a F-test
  • Conducting hypothesis tests using F-values
  • F-distributions
  • Height distribution by sex
  • Does height variation differ by sex?
  • When to use analysis of variance (ANOVA)
  • Determining the F-value
  • Are all diets the same?
  • All diets are apparently not the same
  • Normality hypothesis tests
  • Statistically significant treatments?
  • What is statistical power?
  • Calculating statistical power
  • Statistical power curve
  • Improving statistical power of hypothesis tests
Box

Testimonials