Computational Statistics
Math 154 HW Schedule, Fall 2017
Jo Hardin
2351 Millikan
jo.hardin@pomona.edu
Office Hours: Tuesday 1:304pm, Thursday 10amnoon, or by appointment
Mentor Sessions: Noah Keshner & Chris Donnay & Neel Kumar
Sunday 710pm & Tuesday 810pm
Millikan 1021 (Emmy Noether Room)
Texts:
Required: An Introduction to Statistical Learning (ISL); James, Witten, Hastie, Tibshirani (freely available: http://wwwbcf.usc.edu/~gareth/ISL/)
Recommended: Modern Data Science (MDS) with R; Baumer, Kaplan, and Horton (free chapters and other information at: https://github.com/beanumber/mdsr and http://mdsrbook.github.io/)
Recommended: Visual and Statistical Thinking (VST): Displays of Evidence for Making Decisions; Tufte (http://www.edwardtufte.com/tufte/books_textb)
Website for: Data Science in R: A Case Studies Approach to Computational Reasoning and Problem Solving; Nolan and Temple Lang (http://rdatasciencecases.org/)
Homework:
 Homework will be assigned from the text with some additional problems. One homework grade will be dropped. Homework will be done using the statistical software package R and posted on GitHub. All homework must be done in R markdown (or R Sweave if you want to use LaTeX). Homework will be due on Wednesdays by midnight to GitHub. Nonhomework activities (e.g., from the text) may be collected and added to your participation grade.
 HW should be turned in to your GitHub repository by Wednesday at noon.
 Always post both a PDF and R Markdown (or Sweave) file, unless otherwise requested.
 HW is graded on a scale of 5/4/3/2/1. See the first HW assignment for more information.
 HW file should be in the format of: ma154hw#lnamefname.pdf
Important Dates:
 10/11/17 Exam 1
 10/20/17 (Friday) Take home 1 due (on GitHub by 5pm)
 10/23/17 Initial Project Proposal due
 11/3/17 Final Project Proposal due
 11/13/17 Data Science Panel
 11/29/17 Exam 2
 11/20/17 Project Update due
 11/22/17 Take home 2 due (on GitHub by midnight)
 12/8/17 or 12/12/17 or 12/15/17 Group Presentations (25pm)
Handouts:
 Git help: http://happygitwithr.com/
 R documentation / help
 Google for R: http://www.rseek.org/
 R tutorial
 An Introduction to R, Venables & Smith
 R Language Definition, R Core Team
 Another tutorial, with exercises & solutions
 Mosaic Reference Guide, need to install the mosaic package
 A Student’s Guide to R; Horton, Pruim, Kaplan (click on “Raw” to download)
 Clicker Questions
 WU1 – wrangling 1
 WU2 – wrangling 2
 WU3 – simulating CIs
 WU4 – hypothesis testing
 WU5 – pruning trees
 WU6 – support vector machines
Date  Topic / Chapter  Handouts  Links 
Wed 8/30  data science & statistics (ISL1) 
http://algorithmstour.stitchfix.com/
https://www.youtube.com/watch?v=s3JldKoA0zw&feature=youtu.be
http://www.datasciencecentral.com/profiles/blogs/datasciencewarsrversuspython
http://cardiobrief.org/2016/04/06/pnaspaperbyprominentcardiologistanddeanretracted/


Mon 9/4  visualization (VST & optional: MDS 2) 
https://www.flickr.com/photos/walkingsf/sets/72157627140310742/
http://rstudio.grinnell.edu/Global_Terrorism_Plots/ http://rstudio.grinnell.edu/Global_Terrorism_Map_Basic/
http://www.census.gov/dataviz/visualizations/055/
http://www.visualisingdata.com/2017/07/10significantvisualisationdevelopmentsjanuaryjune2017/
https://www.nytimes.com/column/whatsgoingoninthisgraph?
https://medium.com/@kennelliott/39studiesabouthumanperceptionin30minutes4728f9e31a73 

Mon 9/11  data wrangling (MDS 4) 

Mon 9/18  simulating (optional: MDS 8) 


Mon 9/25  permutation tests 
https://www.youtube.com/watch?v=5Dnw46eC0o


Mon 10/2  bootstrapping (ISL 5) 

Mon 10/9  catchup & review  
Wed 10/11  exam1  
Mon 10/16  fall break / ethics
(MDS 6) 
https://www.nytimes.com/2015/07/10/upshot/whenalgorithmsdiscriminate.html?mcubz=0&_r=0
https://www.nytimes.com/2015/06/24/opinion/isspecialeducationracist.html?mcubz=0 

Friday 10/20  take home 1 due (noon) 
https://soundcloud.com/nssdpodcast/episode4agajilliontimeseries/ 

Mon 10/23  initial project proposal due
knn, ROC, trees 
http://fivethirtyeight.com/datalab/whythebronxreallyburned/


Mon 10/30  final project proposal due (Friday 11/3)
bagging, random forests 
http://www.wired.com/2008/06/pbtheory/
http://simplystatistics.org/2014/05/22/10thingsstatisticstaughtusaboutbigdataanalysis/ 

Mon 11/6  support vector machines (ISL 9) 

http://simplystatistics.org/2013/08/01/theroccurvesofscience/ 
Mon 11/13  Monday: Dataing in the Real World

http://www.stat.ucdavis.edu/seminars/conferences/index.html
http://fivethirtyeight.com/datalab/thestudentsmostlikelytotakeourjobs/ 

Monday 11/20  Monday: project update due
Wednesday: take home 2 due 
https://www.youtube.com/watch?v=JLs01Z5baSU
https://stat545ubc.github.io/packages00_index.html


Mon 11/27  clustering (ISL 10)
Wednesday: exam 2 


Mon 12/4  text analysis? EM algorithm? 


Friday 12/8 & Tuesday 12/12 & Friday 12/15
25pm 
Group Presentations (schedule to be arranged)  Some project examples / ideas:
