ACM 118, Fall 2009

Methods in Applied Statistics and Data Analysis


Basics:

Professor: Dr. Joel A. Tropp
E-mail: jtropp * acm.caltech.edu
Office: Annenberg 307
Office Hours: W 1:00--2:30 pm

Lectures: Guggenheim 101, TTh 1:00--2:25 pm
Recitation: Annenberg 105, F 3:00--4:00 pm
Textbook: Montgomery, Peck, and Vining, Introduction to Linear Regression Analysis, 4th ed., Wiley, 2006.
Website: http://www.acm.caltech.edu/~jtropp/courses/acm118-f09/

I gladly accept appointments---but only via email with advance notice of at least one business day. Time management, you know.

Teaching Assistants:

Assistant: John Bruer
E-mail: john.bruer * gmail.com
Office Hours: Annenberg 313, M 1:00--2:30 pm

Assistant: Michael McCoy
E-mail: michael.b.mccoy * gmail.com
Office Hours: Annenberg 313, Th 9:00--10:30 am

Prerequisites:

A complete calculus sequence, including multivariate calculus, and a strong background in linear algebra (e.g., Math 1abc). Introductory probability and statistics (e.g., Math 2ab). Exposure to basic concepts of programming (e.g., CS 1). This preparation is necessary.


Background and goals:

Introduction to fundamental ideas and techniques of statistical modeling, with an emphasis on conceptual understanding and on the analysis of real data sets. Assignments will draw on data analysis problems in various science and engineering fields.

Content:

We will cover the following topics:

  1. Simple linear regression (least squares estimation, analysis of residuals)
  2. Inferences about model parameters, multiple linear regression
  3. Analysis of variance, comparison of models, model selection
  4. Assessing goodness-of-fit, outliers, influential observations
  5. Collinearity and rank-deficiency, singular value decomposition
  6. Regularization (truncated SVD, ridge regression)
  7. Choosing regularization parameters (generalized cross-validation, L-curve)
  8. Principal component analysis
  9. Linear discriminant analysis
  10. Resampling methods and the bootstrap


Homework:

There will be approximately seven homework sets. Expect to spend 10 hours on each one. Assignments will consist of R programming, statistical computations, data analysis, written explanations, and the occasional proof. You are encouraged to discuss homework problems with other students, but you must prepare and turn in your own assignment independently. There is a separate handout on homework format.

We will have a weekly recitation during which you can ask questions about assignments or lectures. I also plan to provide solution guides. It is a violation of the honor code to use solution guides from current or previous terms when preparing your work.

Homework will (usually) be assigned on Wednesday and will come due the following Wednesday at 5 pm. Assignments will be returned the next Tuesday at the end of class. There will be no assignment due during the midterm exam week. On-time assignments should be delivered to the ACM 118 Mailbox in the lobby of Firestone Labs.

Due dates are strict. Late assignments must be delivered during regular business hours to Sheila Shull (217 Firestone), who will record the date the assignment is turned in. You will (automatically) receive three grace days over the course of the term. After you have consumed these grace days, each late assignment will receive zero credit.

At my discretion, extensions may be granted for academic reasons (e.g., conference trips) provided at least one week notice. I provide extensions for documented medical or psychological reasons. I will never grant an extension after an assignment comes due, unless you have proof of a genuine emergency. The teaching assistants are not empowered to grant extensions.

Exams:

The course will conclude with a take-home final exam. The exam will be due at noon on Wednesday, December 9. Other policies will be stated clearly before the exam.

Grading:

The course grade will be determined by a weighted average of the homework grade (70%) and the exam grade (30%). There is no fixed distribution of letter grades. Each student with a total score of 90% (resp., 80%, 70%, 60%) is guaranteed a final grade of A (resp., B or higher, C or higher, D or higher). I usually adjust these boundaries downward (i.e., in the favorable direction).

Grade review policy:

If you have a question or concern about an assignment grade, please discuss it first with the teaching assistant who graded your assignment. If you are unable to resolve the problem, you may attend my office hours or schedule an appointment. For your privacy, I will not discuss grades in the classroom.


Resources:

Introductory textbooks on applied statistics and data analysis on reserve:

You may find the following resources helpful in your quest to learn R: Some students may prefer to use general-purpose software to perform statistical computations, such as Matlab or GNU Octave. Reference material for these packages:


Schedule (subject to change at any moment)

# Date Topics Reading Due Assignments, etc.
1 9/29 Course intro,
statistical inference
Wainer article Homework 1. HW 1 Solutions.
2 10/1 Sample statistics,
hypothesis tests
3 10/6 Hypothesis tests,
confidence intervals
Homework 2. HW 2 Solutions.
** 10/7 HW 1
4 10/8 Simple linear regression MPV, Ch. 2, Sec. 4.1, 4.2.3
5 10/13 Properties of LS parameter estimates
** 10/14 HW 2 Homework 3. HW 3 Solutions.
6 10/15 Regression toward the mean,
model comparison (ANOVA)
7 10/20 Model assessment and outliers MPV, Ch. 4
** 10/21 HW 3 Homework 4. HW 4 Solutions.
8 10/22 Multiple regression MPV, Ch. 3
9 10/27 Geometry of linear models
** 10/28 HW 4 Midterm week: No assignment.
10 10/29 Joint confidence regions
11 11/3 Leverage, influence, outliers MPV, Ch. 6
** 11/4 Homework 5
12 11/5 Model building, variable selection MPV, Ch. 9
13 11/10 Collinearity, SVD
least squares again
MPV, Ch. 11
** 11/11 HW 5 Homework 6
14 11/12 Ridge regression,
choosing regularization parameters
15 11/17 Binomial random variables,
Tests for odds
MPV, Ch. 14
** 11/18 HW 6 Homework 7
16 11/19 Logistic regression
17 11/24 Logistic regression
18 12/1 Principal component analysis
** 12/2 HW 7
19 12/3 Linear discriminant analysis

Handouts, etc.


Data