## Computational Statistics

**Math 154, Fall 2017**

Jo Hardin

2351 Millikan

jo.hardin@pomona.edu

**Office Hours: Tuesday 1:30-4pm, Thursday 10am-noon, or by appointment**

**Mentor Sessions:** Noah Keshner & Chris Donnay & Neel Kumar

Sunday 7-10pm & Tuesday 8-10pm

Millikan 1021 (Emmy Noether Room)

**Texts:**

An Introduction to Statistical Learning; James, Witten, Hastie, Tibshirani (http://www-bcf.usc.edu/~gareth/ISL/)

Recommended: Modern Data Science with R; Baumer, Kaplan, and Horton (free chapters and other information at: https://github.com/beanumber/mdsr and http://mdsr-book.github.io/)

Recommended: Visual and Statistical Thinking: Displays of Evidence for Making Decisions; Tufte (http://www.edwardtufte.com/tufte/books_textb)

Website for: Data Science in R: A Case Studies Approach to Computational Reasoning and Problem Solving; Nolan and Temple Lang (http://rdatasciencecases.org/)

**Important Dates:**

- 10/11/17 Exam 1
- 10/20/17 (Friday) Take home 1 due (on GitHub by 5pm)
- 10/23/17 Initial Project Proposal due
- 10/30/17 Final Project Proposal due
- 11/13/17 Data Science Panel
- 11/15/17 Exam 2
- 11/20/17 Project Update due
- 11/22/17 Take home 2 due (on GitHub by 5pm)
- 12/12/17 and/or 12/15/17 Group Presentations (during final time 2-5pm)

**Handouts:**

- Git help: http://happygitwithr.com/
- R documentation / help
- Google for R: http://www.rseek.org/
- R tutorial
- An Introduction to R, Venables & Smith
- R Language Definition, R Core Team
- Another tutorial, with exercises & solutions
- Mosaic Reference Guide, need to install the mosaic package
- A Student’s Guide to R; Horton, Pruim, Kaplan (click on “Raw” to download)

- Clicker Questions
- WU1 – wrangling 1
- WU2 – wrangling 2
- WU3 – hypothesis testing
- WU4 – pruning trees
- WU5 – support vector machines

**Homework:**

- Homework will be assigned from the text with some additional problems. One homework grade will be dropped. Homework will be done using the statistical software package R and posted on GitHub. All homework must be done in R markdown (or R Sweave if you want to use LaTeX).
__Homework will be due on Wednesdays by midnight to GitHub__- HW should be turned in to your GitHub repository by Wednesday at noon.
- Always post both a PDF and R Markdown (or Sweave) file, unless otherwise requested.
- HW is graded on a scale of 5/4/3/2/1. See the first HW assignment for more information.
- HW file should be in the format of: ma154-hw#-lname-fname.pdf

**Projects:**

- There will be a semester long group project. Your task is to use data to tell us something interesting. This project is deliberately open-ended to allow you to fully explore your creativity. There are three main rules that must be followed: (1) data centered, (2) tell us something, (3) do something new. The project information is available here: semester project.

**Computing:**

- GitHub will be used as a way to practice reproducible and collaborative science. There may be a slight learning curve, but knowing Git will be an extremely useful skill as you venture on after this class.
- R will be used for all homework assignments. R is freely available at http://www.r-project.org/ and is already installed on college computers. Additionally, you need to install R Studio in order to use R Markdown. http://rstudio.org/. If you are not already familiar with R, please work through some of the materials provided ASAP. In particular, http://swirlstats.com/ is a great way to walk through learning the basics of R.
- You are encouraged to use Pomona’s R Studio server at https://rstudio.campus.pomona.edu/ (or https://rstudio.pomona.edu if you are off campus). If you use the server, you can connect directly to your Git account without installing Git locally on your own computer.

**Participation:**

- This class will be interactive, and your participation is expected (every day in class). Although notes will be posted, your participation is an integral part of the in-class learning process.

**Course Goals:**

- to be able to work through the entire computational statistics flow chart as a data analyst.
- to be able to use graphical representations of data to communicate ideas about the data.
- to critically evaluate analyses / graphics of data (typically big data or dynamic data).
- to communicate results effectively.

**Academic Honesty:**

You are encouraged to work together on homework assignments. Additionally, the group presentation will require close collaboration with a group of your peers. Everything you turn in must represent your own work. Copying and pasting code (or text) from your colleagues constitutes plagiarism and will not be tolerated. All exams (including take-home) will be closed person. You may not collaborate (discuss, complain, etc.) with other individuals about the exams. Pomona’s academic honesty policy is given below and will be taken seriously.

- Pomona College is an academic community, all of whose members are expected to abide by ethical standards both in their conduct and in their exercise of responsibilities toward other members of the community. The college expects students to understand and adhere to basic standards of honesty and academic integrity. These standards include, but are not limited to, the following:
- In projects and assignments prepared independently, students never represent the ideas or the language of others as their own.
- Students do not destroy or alter either the work of other students or the educational resources and materials of the College.
- Students neither give nor receive assistance in examinations.
- Students do not take unfair advantage of fellow students by representing work completed for one course as original work for another or by deliberately disregarding course rules and regulations.
- In laboratory or research projects involving the collection of data, students accurately report data observed and do not alter these data for any reason.

**Advice:**

- Please feel free to stop by, email, or call if you have any questions about or difficulty with the material, the computing, the projects, or the course. Come see me as soon as possible if you find yourself struggling. The material will build on itself, so it will be much easier to catch up if the concepts get clarified earlier rather than later. Enjoy!

**Grading:**

- 25% Homework
- 45% Midterms
- 25% Group Project & Presentation
- 5% Class Participation