## Computational Statistics

**Math 154, Fall 2019**

Jo Hardin

2351 Millikan

jo.hardin@pomona.edu

**Office Hours: Monday 1:30-3:30pm, Wednesday 9-11:30am, or by appointment**

**Mentor Sessions:** Jack Hanley

Wednesday 8-10pm

Millikan 1021 (Emmy Noether Room)

**Texts:**

An Introduction to Statistical Learning; James, Witten, Hastie, Tibshirani (http://www-bcf.usc.edu/~gareth/ISL/)

Recommended: Modern Data Science with R; Baumer, Kaplan, and Horton (free chapters and other information at: https://github.com/beanumber/mdsr and http://mdsr-book.github.io/)

Recommended: Visual and Statistical Thinking: Displays of Evidence for Making Decisions; Tufte (http://www.edwardtufte.com/tufte/books_textb)

Website for: Data Science in R: A Case Studies Approach to Computational Reasoning and Problem Solving; Nolan and Temple Lang (http://rdatasciencecases.org/)

**Important Dates:**

- 10/17/19 Exam 1
- 10/25/19 Take home 1 due (on GitHub by midnight)
- 11/1/19 Initial Project Proposal due (via email to me by midnight)
- 11/8/19 Final Project Proposal due (on GitHub by midnight)
**November 14, 4:15-5:30 2019 Data Science Panel**- 11/21/19 Project Update due (on GitHub by midnight)
- 11/26/19 Take home 2 due (on GitHub by midnight)
- 12/5/19 Exam 2
- 12/13/19 (Friday) or 12/18/19 (Wednesday) Group Presentations (2-5pm)
- 12/18/19 Final write-up due (on GitHub by midnight)

**Handouts:**

- Best and most comprehensive Git help: http://happygitwithr.com/
- Version control with Git: http://swcarpentry.github.io/git-novice/
- More on Git & RStudio
- Online Git book with lots of info: https://git-scm.com/book/en/v2

- R documentation / help
- Great tutorials through the Coding Club
- Google for R: http://www.rseek.org/
- R tutorial
- An Introduction to R, Venables & Smith
- R Language Definition, R Core Team
- Another tutorial, with exercises & solutions
- Mosaic Reference Guide, need to install the mosaic package
- A Student’s Guide to R; Horton, Pruim, Kaplan (click on “Raw” to download)

- Clicker Questions

**Homework:**

- Homework will be assigned from the text with some additional problems. One homework grade will be dropped. Homework will be done using the statistical software package R and posted on GitHub. All homework must be done in R markdown (or R Sweave if you want to use LaTeX).
__Homework will be due on Thursdays by midnight to GitHub__- HW should be turned in to your GitHub repository by Thursday.
- Always post both a PDF and R Markdown (or Sweave) file, unless otherwise requested.
- HW is graded on a scale of 5/4/3/2/1. See the first HW assignment for more information.
- HW file should be in the format of: ma154-hw#-lname-fname.pdf

**HW advice **General notes on homework assignments (also see syllabus for policies and suggestions):

- please be neat and organized which will help me, the grader, and you (in the future) to follow your work.
- write your name on your assignment.
- please include at least the number of the problem, or a summary of the question (which will also be helpful to you in the future to prepare for exams).
- it is strongly recommended that you look through the questions as soon as you get the assignment. This will help you to start thinking how to solve them!
- for R problems, it is required to use R Markdown (or R Sweave)
- in case of questions, or if you get stuck please don’t hesitate to email me (though I’m much less sympathetic to such questions if I receive emails within 24 hours of the due date for the assignment).

**Homework assignments will be graded** out of 5 points, which are based on a combination of accuracy and effort. Below are rough guidelines for grading.

[5] All problems completed with detailed solutions provided and 75% or more of the problems are fully correct. **Additionally, there are no extraneous messages, warnings, or printed lists of numbers.**

[4] All problems completed with detailed solutions and 50-75% correct; OR close to all problems completed and 75%-100% correct. **Or all problems are completed and there are extraneous messages, warnings, or printed lists of numbers.**

[3] Close to all problems completed with less than 75% correct.

[2] More than half but fewer than all problems completed and > 75% correct.

[1] More than half but fewer than all problems completed and < 75% correct; OR less than half of problems completed.

[0] No work submitted, OR half or less than half of the problems submitted and without any detail/work shown to explain the solutions. You will get a zero if your file is not compiled and submitted on GitHub.

**Projects:**

- There will be a semester long group project. Your task is to use data to tell us something interesting. This project is deliberately open-ended to allow you to fully explore your creativity. There are three main rules that must be followed: (1) data centered, (2) tell us something, (3) do something new. The project information is available here: semester project.

**Computing:**

- GitHub will be used as a way to practice reproducible and collaborative science. There may be a slight learning curve, but knowing Git will be an extremely useful skill as you venture on after this class.
- R will be used for all homework assignments. R is freely available at http://www.r-project.org/ and is already installed on college computers. Additionally, you need to install R Studio in order to use R Markdown. http://rstudio.org/. If you are not already familiar with R, please work through some of the materials provided ASAP. In particular, http://swirlstats.com/ is a great way to walk through learning the basics of R.
- You are welcome to use Pomona’s R Studio server at https://rstudio.campus.pomona.edu/ (or https://rstudio.pomona.edu if you are off campus). If you use the server, you can connect directly to your Git account without installing Git locally on your own computer. [If you are not a Pomona student, you will need to get an account from Pomona’s ITS. Go to ITS, tell them that you are taking a Pomona course, and ask for an account for using RStudio.]

**Participation:**

- This class will be interactive, and your participation is expected (every day in class). Although notes will be posted, your participation is an integral part of the in-class learning process.
**In class:**after answering one question,**wait until 5 other people have spoken before answering**another question. [Feel free to**ask as many questions as often as you like**!]- For each midterm, one point will be given for having done the following (before 10/17 and again before 11/26):
- log on to Piazza (link will be sent via email)
- using
`reprex`

, ask a question about R (can be anonymous to peers, name must be visible to instructor for credit) https://teachdatascience.com/reprex/ - respond / help a peer who has asked a question about anything (can be anonymous to peers, name must be visible to instructor for credit)

**Course Goals:**

- to be able to work through the entire computational statistics flow chart as a data analyst.
- to be able to use graphical representations of data to communicate ideas about the data.
- to critically evaluate analyses / graphics of data (typically big data or dynamic data).
- to communicate results effectively.

**Academic Honesty:**

You are encouraged to work together on homework assignments. Additionally, the group presentation will require close collaboration with a group of your peers. Everything you turn in must represent your own work. Copying and pasting code (or text) from your colleagues constitutes plagiarism and will not be tolerated. All exams (including take-home) will be closed person. You may not collaborate (discuss, complain, etc.) with other individuals about the exams. Pomona’s academic honesty policy is given below and will be taken seriously.

- Pomona College is an academic community, all of whose members are expected to abide by ethical standards both in their conduct and in their exercise of responsibilities toward other members of the community. The college expects students to understand and adhere to basic standards of honesty and academic integrity. These standards include, but are not limited to, the following:
- In projects and assignments prepared independently, students never represent the ideas or the language of others as their own.
- Students do not destroy or alter either the work of other students or the educational resources and materials of the College.
- Students neither give nor receive assistance in examinations.
- Students do not take unfair advantage of fellow students by representing work completed for one course as original work for another or by deliberately disregarding course rules and regulations.
- In laboratory or research projects involving the collection of data, students accurately report data observed and do not alter these data for any reason.

**Advice:**

- Please feel free to stop by, email, or call if you have any questions about or difficulty with the material, the computing, the projects, or the course. Come see me as soon as possible if you find yourself struggling. The material will build on itself, so it will be much easier to catch up if the concepts get clarified earlier rather than later. Enjoy!

**Grading:**

- 25% Homework
- 45% Midterms
- 25% Group Project & Presentation
- 5% Class Participation