![]() |
ACM 118, Fall 2009Methods in Applied Statistics and Data Analysis |
Professor: Dr. Joel A. Tropp
E-mail: jtropp * acm.caltech.edu
Office: Annenberg 307
Office Hours: W 1:00--2:30 pm
Lectures: Guggenheim 101, TTh 1:00--2:25 pm
Recitation: Annenberg 105, F 3:00--4:00 pm
Textbook: Montgomery, Peck, and Vining, Introduction to Linear Regression Analysis, 4th ed., Wiley, 2006.
Website: http://www.acm.caltech.edu/~jtropp/courses/acm118-f09/
I gladly accept appointments---but only via email with advance notice of at least one business day. Time management, you know.
Assistant: John Bruer
E-mail: john.bruer * gmail.com
Office Hours: Annenberg 313, M 1:00--2:30 pm
Assistant: Michael McCoy
E-mail: michael.b.mccoy * gmail.com
Office Hours: Annenberg 313, Th 9:00--10:30 am
A complete calculus sequence, including multivariate calculus, and a strong background in linear algebra (e.g., Math 1abc). Introductory probability and statistics (e.g., Math 2ab). Exposure to basic concepts of programming (e.g., CS 1). This preparation is necessary.
Introduction to fundamental ideas and techniques of statistical modeling, with an emphasis on conceptual understanding and on the analysis of real data sets. Assignments will draw on data analysis problems in various science and engineering fields.
We will cover the following topics:
There will be approximately seven homework sets. Expect to spend 10 hours on each one. Assignments will consist of R programming, statistical computations, data analysis, written explanations, and the occasional proof. You are encouraged to discuss homework problems with other students, but you must prepare and turn in your own assignment independently. There is a separate handout on homework format.
We will have a weekly recitation during which you can ask questions about assignments or lectures. I also plan to provide solution guides. It is a violation of the honor code to use solution guides from current or previous terms when preparing your work.
Homework will (usually) be assigned on Wednesday and will come due the following Wednesday at 5 pm. Assignments will be returned the next Tuesday at the end of class. There will be no assignment due during the midterm exam week. On-time assignments should be delivered to the ACM 118 Mailbox in the lobby of Firestone Labs.
Due dates are strict. Late assignments must be delivered during regular business hours to Sheila Shull (217 Firestone), who will record the date the assignment is turned in. You will (automatically) receive three grace days over the course of the term. After you have consumed these grace days, each late assignment will receive zero credit.
At my discretion, extensions may be granted for academic reasons (e.g., conference trips) provided at least one week notice. I provide extensions for documented medical or psychological reasons. I will never grant an extension after an assignment comes due, unless you have proof of a genuine emergency. The teaching assistants are not empowered to grant extensions.
The course will conclude with a take-home final exam. The exam will be due at noon on Wednesday, December 9. Other policies will be stated clearly before the exam.
The course grade will be determined by a weighted average of the homework grade (70%) and the exam grade (30%). There is no fixed distribution of letter grades. Each student with a total score of 90% (resp., 80%, 70%, 60%) is guaranteed a final grade of A (resp., B or higher, C or higher, D or higher). I usually adjust these boundaries downward (i.e., in the favorable direction).
If you have a question or concern about an assignment grade, please discuss it first with the teaching assistant who graded your assignment. If you are unable to resolve the problem, you may attend my office hours or schedule an appointment. For your privacy, I will not discuss grades in the classroom.
Introductory textbooks on applied statistics and data analysis on reserve:
| # | Date | Topics | Reading | Due | Assignments, etc. |
| 1 | 9/29 | Course intro, statistical inference |
Wainer article | Homework 1. HW 1 Solutions. | |
| 2 | 10/1 | Sample statistics, hypothesis tests | |||
| 3 | 10/6 | Hypothesis tests, confidence intervals | Homework 2. HW 2 Solutions. | ||
| ** | 10/7 | HW 1 | |||
| 4 | 10/8 | Simple linear regression | MPV, Ch. 2, Sec. 4.1, 4.2.3 | ||
| 5 | 10/13 | Properties of LS parameter estimates | |||
| ** | 10/14 | HW 2 | Homework 3. HW 3 Solutions. | ||
| 6 | 10/15 | Regression toward the mean, model comparison (ANOVA) |
|||
| 7 | 10/20 | Model assessment and outliers | MPV, Ch. 4 | ||
| ** | 10/21 | HW 3 | Homework 4. HW 4 Solutions. | ||
| 8 | 10/22 | Multiple regression | MPV, Ch. 3 | ||
| 9 | 10/27 | Geometry of linear models | |||
| ** | 10/28 | HW 4 | Midterm week: No assignment. | ||
| 10 | 10/29 | Joint confidence regions | |||
| 11 | 11/3 | Leverage, influence, outliers | MPV, Ch. 6 | ||
| ** | 11/4 | Homework 5 | |||
| 12 | 11/5 | Model building, variable selection | MPV, Ch. 9 | ||
| 13 | 11/10 | Collinearity, SVD least squares again |
MPV, Ch. 11 | ||
| ** | 11/11 | HW 5 | Homework 6 | ||
| 14 | 11/12 | Ridge regression, choosing regularization parameters |
|||
| 15 | 11/17 | Binomial random variables, Tests for odds |
MPV, Ch. 14 | ||
| ** | 11/18 | HW 6 | Homework 7 | ||
| 16 | 11/19 | Logistic regression | |||
| 17 | 11/24 | Logistic regression | |||
| 18 | 12/1 | Principal component analysis | |||
| ** | 12/2 | HW 7 | |||
| 19 | 12/3 | Linear discriminant analysis |