Course DescriptionThis course covers basic concepts and methods of descriptive and inferential statistics for data science. Topics include measures of central tendency and dispersion, probability distributions, sampling distributions, confidence intervals, hypothesis testing. The fundamental concepts of correlation, simple and multiple regression are discussed. The optional module provides basic ideas of Bayesian statistics with the real-world examples. All modules are designed to teach students how to perform statistical analysis using an open-source statistical language R within RStudio.
Course Outline• Block 4 recommends Block 2 as a prerequisite.
• The modules are designed to be taken in the following order: Introduction and Prerequisites Review; Probability; Inferential Statistics; Regression; Bayesian Statistics
• Delivery: Asynchronous, self-paced, 10-12 hours a week, suggested one week per module. Sequence for each module: Instructional videos, suggested readings, use of lecture slides, practice assignments with answers, video embedded practice quizzes, short quizzes after each sub-module, a final quiz for each module.
• Students who are confident about the material can test out of the module and advance to the next one by completing the final quiz for that module.
• You have to pass the final module quiz (passing rate 80%) to proceed to the next module.
Students will be able to:
• Find and interpret appropriate measures of center, dispersion, and other important statistics numerically and graphically.
• Understand the concepts and rules of probability, as well as Bayes theorem.
• Recognize important probability distributions (Binomial and Normal).
• Understand point estimates, sampling distributions, confidence intervals, and hypothesis testing.
• Identify linear relationship between one response and one or more quantitative explanatory variables.
• Interpret the results from confidence interval, hypothesis testing, and regression analysis.
• Perform statistical analysis using an open-source statistical language R within RStudio.
Specifically, students will take away from the course:
• Introduction to R focusing on statistical functions (Module 0).
• Review of numerical summaries (Module 0).
• Review of statistical graphs (Module 0).
• Review of probability (Module 0).
• Understanding random variables and their distributions (Module 1).
• Recognizing discrete distribution and its example binomial distribution (Module 1).
• Identification of continuous distribution and its example normal distribution (Module 1).
• The concept of service distributions, e.g., t, χ2, F distributions (Module 1).
• Understanding central limit theorem (Module 2).
• Learning sampling distribution (Module 2).
• Understanding point estimation and confidence interval (Module 2).
• Comprehension of hypothesis testing (Module 2).
• Understanding correlation (Module 3).
• Performing linear regression in R and interpretation of software outcomes (Module 3).
• Introduction to conditional probability (Module 4).
• Introduction to Bayesian, e.g., posterior (Module 4).