Fall 2017 Math 154 Homework Schedule

Computational Statistics

Math 154 HW Schedule, Fall 2017

Jo Hardin
2351 Millikan
jo.hardin@pomona.edu
Office Hours: Tuesday 1:30-4pm, Thursday 10am-noon, or by appointment

Mentor Sessions: Noah Keshner & Chris Donnay & Neel Kumar
Sunday 7-10pm & Tuesday 8-10pm
Millikan 1021 (Emmy Noether Room)

Texts:
Required: An Introduction to Statistical Learning (ISL); James, Witten, Hastie, Tibshirani (freely available: http://www-bcf.usc.edu/~gareth/ISL/)

Recommended: Modern Data Science (MDS) with R; Baumer, Kaplan, and Horton (free chapters and other information at: https://github.com/beanumber/mdsr and http://mdsr-book.github.io/)

Recommended: Visual and Statistical Thinking (VST): Displays of Evidence for Making Decisions; Tufte (http://www.edwardtufte.com/tufte/books_textb)

Website for: Data Science in R: A Case Studies Approach to Computational Reasoning and Problem Solving; Nolan and Temple Lang (http://rdatasciencecases.org/)

Homework:

  • Homework will be assigned from the text with some additional problems. One homework grade will be dropped. Homework will be done using the statistical software package R and posted on GitHub. All homework must be done in R markdown (or R Sweave if you want to use LaTeX). Homework will be due on Wednesdays by midnight to GitHub.  Non-homework activities (e.g., from the text) may be collected and added to your participation grade.
    • HW should be turned in to your GitHub repository by Wednesday at noon.
    • Always post both a PDF and R Markdown (or Sweave) file, unless otherwise requested.
    • HW is graded on a scale of 5/4/3/2/1. See the first HW assignment for more information.
    • HW file should be in the format of: ma154-hw#-lname-fname.pdf

Important Dates:

  • 10/11/17 Exam 1
  • 10/20/17 (Friday) Take home 1 due (on GitHub by 5pm)
  • 10/23/17 Initial Project Proposal due
  • 10/30/17 Final Project Proposal due
  • 11/13/17 Data Science Panel
  • 11/15/17 Exam 2
  • 11/20/17 Project Update due
  • 11/22/17 Take home 2 due (on GitHub by 5pm)
  • 12/12/17 and/or 12/15/17 Group Presentations (during final time 2-5pm)

Handouts:

Date Topic / Chapter    Handouts Links
Wed 8/30 data science & statistics
(ISL1)
  • Great algorithm for the whole process

http://algorithms-tour.stitchfix.com/

  • R vs. Python?  (My personal opinion is that neither of the languages is “best”.)

http://www.datasciencecentral.com/profiles/blogs/data-science-wars-r-versus-python

  • PNAS paper retracted due to problems with figure and reproducibility (April 2016):

http://cardiobrief.org/2016/04/06/pnas-paper-by-prominent-cardiologist-and-dean-retracted/

  • Analysis of Trump’s tweets with evidence that someone else tweets from his account using an iPhone

http://varianceexplained.org/r/trump-tweets/

http://varianceexplained.org/r/trump-followup/

Mon 9/4 visualization
(VST & optional:    MDS 2)
  • See something or Say Something

https://www.flickr.com/photos/walkingsf/sets/72157627140310742/

  • Global terrorism trends (created by students at Grinnell)

http://rstudio.grinnell.edu/Global_Terrorism_Plots/

http://rstudio.grinnell.edu/Global_Terrorism_Map_Basic/

  • Census trends visualized:

http://www.census.gov/dataviz/visualizations/055/

  • Visualization Internship (summer 2016) at 538:

http://fivethirtyeight.com/features/fivethirtyeight-is-hiring-a-data-visualization-intern-for-summer-2016/

  • Best Data Visualizations

http://www.visualisingdata.com/2017/07/10-significant-visualisation-developments-january-june-2017/

  • A new NYT column on visualizations

https://www.nytimes.com/column/whats-going-on-in-this-graph?

  • Studies about visualizations and perception

https://medium.com/@kennelliott/39-studies-about-human-perception-in-30-minutes-4728f9e31a73

Mon 9/11 data wrangling
(MDS 4)
 
Mon 9/18 simulating
(optional:  MDS 8)
  • Simulating who will be in the first GOP debate (NYT 7/29/15)

http://www.nytimes.com/interactive/2015/07/21/upshot/election-2015-the-first-gop-debate-and-the-role-of-chance.html

Mon 9/25 permutation
tests
  • The algorithm that could end partisan gerrymandering

https://www.youtube.com/watch?v=gRCZR_BbjTo&t=125s

Mon 10/2 bootstrapping
(ISL 5)
 
Mon 10/9 ethics

(MDS 6)

 
  • When algorithms discriminate:

https://www.nytimes.com/2015/07/10/upshot/when-algorithms-discriminate.html?mcubz=0&_r=0

  • Is special education racist?

https://www.nytimes.com/2015/06/24/opinion/is-special-education-racist.html?mcubz=0

Wed 10/11 exam1    
Mon 10/16 fall break    
Friday 10/20 take home 1 due (noon)  
  • What is it that we can learn (or not) from statistical models & machine learning? Series of podcasts by Hilary Parker (PO ’08) and Roger Peng.

https://soundcloud.com/nssd-podcast/episode-4-a-gajillion-time-series/

Mon 10/23 initial project proposal due

k-nn, ROC, trees
(ISL 4, 5, 8)

  • Why the Bronx really burned: “adjusting” data to give the wrong information

http://fivethirtyeight.com/datalab/why-the-bronx-really-burned/

  • SF vs. NYC housing (trees)

http://www.r2d3.us/visual-intro-to-machine-learning-part-1/

Mon 10/30 final project proposal due

bagging, random forests
(ISL 8)

  • The end of science:

http://www.wired.com/2008/06/pb-theory/

  • Maybe not so fast:

http://simplystatistics.org/2014/05/22/10-things-statistics-taught-us-about-big-data-analysis/

Mon 11/6 support vector machines
(ISL 9)
  • see section 9.6 in ISL for SVM code
  • data science panel: Argue Colloquium Room, 1:15-2:30pm
  • ROC curve of science

http://simplystatistics.org/2013/08/01/the-roc-curves-of-science/

Mon 11/13 Monday: Dataing in the Real World

 

 

Wednesday:  exam 2

 
  • Data Science jobs in high demand:

https://www.bloomberg.com/news/articles/2017-08-21/here-s-a-retail-job-that-s-still-in-high-demand-data-scientist

  • SAMSI: Workshop on Distributed Data Analysis with Applications in Finance and Healthcare (March 2016)

http://www.samsi.info/workshop/workshop-distributed-data-analysis-applications-finance-and-healthcare-march-21-22-2016

  • 2016 Statistical Sciences Symposium on Statistical Machine Learning: Theory and Methods, UC Davis, April 23, 2016

http://www.stat.ucdavis.edu/seminars/conferences/index.html

  • UCLA Datafest (May 4-6, 2017)

http://fivethirtyeight.com/datalab/the-students-most-likely-to-take-our-jobs/

http://datafest.stat.ucla.edu/

Monday 11/20 Monday: project update due

 

 

Wednesday: take home 2 due

 
  •  The Statistics Identity Crisis

https://www.youtube.com/watch?v=JLs01Z5baSU

  • Write your own R package

https://stat545-ubc.github.io/packages00_index.html

Mon 11/27 clustering   (ISL 10)  
Mon 12/4 text analysis?  EM algorithm?  
Tuesday 12/12 & Friday 12/15

2-5pm

Group Presentations (schedule to be arranged)   Previous Project Shiny Apps: