Programming for Business Analytics
This course introduces the basics of programming using R for business applications.
Course Description
This course leverages R to equip students with the fundamental skills required for robust business data analysis. It addresses both practical programming skills and the conceptual understanding necessary to apply data science effectively in business contexts.
The goal is to inspire a passion for data analysis and foster a community among students to deepen their learning and enhance their collaborative skills.
Git and GitHub
Statistical computing lives in the world of plain-text tools (like R, Quarto/RMarkdown, and Git). While tools like Excel and Word are familiar, they often trap us in a cycle of filenames like report_final_v2.docx and forgotten steps (“how did I make this figure?”). For a conceptual overview, see Kieran Healy’s The Plain Person’s Guide to Plain Text Social Science.
We write code that is universally portable and reproducible, allowing us to recreate entire analyses with a single command. To manage this work, we use Git—a version control system that tracks the history of our entire project folder, rather than just “tracking changes” inside a single file. This workflow helps us keep the mess in check and collaborate effectively.
Mastering these tools also allows you to build a Professional Identity on GitHub. Your profile serves as a living portfolio, showcasing your actual code and contributions to prospective employers.
Recommended Textbooks
We will use the following books in this class.
- [MD] Ismay, Chester and Albert Y. Kim. 2022. Statistical Inference via Data Science: A ModernDive into R and the Tidyverse.
- [QSS] Imai, Kosuke and Nora Webb Willaims. 2022. Quantitative Social Science: An Introduction with Tidyverse, 2022. Princeton University Press.
- [IMS] Mine Cetinkaya-Rundel and Johanna Hardin. 2021. Introduction to Modern Statistics. OpenIntro.
- [AAG] R for Everyone: Advanced Analytics and Graphics, 2nd Edition by Jared P. Lander, O’Reilly Media, 2017.
- [VT] Visualize This: The Flowing Data Guide to Design, Visualization, and Statistics by Nathan Yau, John Wiley & Sons, 2011.
Final Project
A specific focus of this course is the production of a polished, portfolio-ready project. Students develop a clear research question, locate and prepare data, and apply analytical techniques to answer it. The final deliverable is a publicly accessible article that demonstrates data fluency to prospective employers and peers.
| Deliverable | Timeline |
|---|---|
| Creating a GitHub Repository | Week 4 |
| Data and Proposal | Week 6 |
| First Visualization | Week 9 |
| First Analysis | Week 11 |
| Final Report | Week 15 |
Learning Objectives
- Data Visualization and Wrangling: Students will learn to summarize and visualize data, transforming messy data into tidy, analyzable formats.
- Causality and Regression Analysis: The course emphasizes evaluating claims about causality and using linear regression to conduct data analysis.
- Statistical Uncertainty: Understanding and quantifying uncertainty in data analysis are core components of the curriculum.
- Professional Tools: Mastery of professional tools such as R, RStudio, git, and GitHub will be developed.
Course Highlights
Data Wrangling & Viz
Transforming messy data into tidy formats and communicating insights.
Causality & Regression
Evaluating causal claims and conducting rigorous regression analysis.
Statistical Uncertainty
Simulation-based (Bootstrapping) vs. Conventional (CLT) approaches to uncertainty.
Professional Workflow
Mastery of R, RStudio, Git, and GitHub for collaborative analytics.
Lecture Materials
- Lecture 1: Course Intro
- Lecture 2: R Programming Basics
- Lecture 3: Data Types & Visualization I
- Lecture 4: User Defined Functions & Visualization II
- Lecture 5: Data Wrangling
- Lecture 6: Causality
- Lecture 7: Bivariate Relationships & Tidying Data
- Lecture 8: Prediction and Iteration
- Lecture 9: Regression and Model Fit
- Lecture 10: More on Regression
- Lecture 11: Sampling and Sampling Distributions
- Lecture 12: The Bootstrap and CIs
- Lecture 13: Hypothesis Testing
- Lecture 14: Models of Uncertainty
- Lecture 15: Inference for Regression