Computing in Optimization and Statistics 2017

Offered by the Operations Research Center at MIT

This project is maintained by PhilChodrow

About

Welcome to the course repository for the 2017 offering of 15.S60, Computing in Optimization and Statistics. This is an advanced course offered by and for practicing researchers in fields relating to operations research, computer science, applied mathematics, and computational engineering. For an overview of the course, including logistics and session content, please consult the syllabus. Advanced users may wish to access the course repository directly. Below, you can learn more about each of the sessions and access primary materials.

In 2017, 15.S60 was organized by Brad Sturt and Phil Chodrow.

Course Materials

1. Introduction – Jackie Baek and Brad Sturt

Introduces foundational concepts and computing tools, including terminal navigation and basic commands; version control with git and GitHub; and elementary data inspection and manipulation in R.

Slides

  1. Introduction to Terminal
  2. Introduction to git and GitHub

Code and Data

2. Data Wrangling and Visualization – Steve Morse

This session will introduce basic techniques in data wrangling and visualization in R. Specifically, we will cover some basic tools using out-of-the-box R commands, then introduce the powerful framework of the “tidyverse” (both in wrangling and visualizing data), and finally gain some understanding of the philosophy of this framework to set up deeper exploration of our data. Throughout, we will be using a publicly available dataset of AirBnB listings.

Sessions 2-4 all use a data set of Boston AirBnB rentals, provided courtesy of AirBnB and Kaggle. You can download each of the three components by clicking the links below:

  1. Session Notes.
  2. Practice Script and with code filled-in.
  3. Exercises and solutions.

3. Statistical Modeling and Machine Learning – Clark Pixton and Colin Pawlowski

This session introduces basic concepts of machine learning and their implementation in R. Topics include elementary regression; regularization and model selection; natural language processing; and model diagnostics. Throughout the session, students use data manipulation and exploration skills to visualize and evaluate models.

Presentation Materials

R Code

4. Advanced Topics in Data Science – Phil Chodrow

Motivation

Data science is rarely cut-and-dried; each analysis typically provides answers but also raises new questions. This makes the data scientific process fundamentally cyclical:

The Data Science Pipeline

Image credit: Hadley Wickham

A skilled analyst needs to be able to smoothly transition from data manipulation to visualization to modeling and back. In this session, we focus on using the tidyverse set of packages to smoothly navigate the Cycle of Data Science.

Learning Objectives

Topics covered include:

  1. Reinforcement of Session 2 tools like dplyr and tidyr.
  2. Efficient, tidy iteration with purrr and map.
  3. Tidy model inspection and selection with broom.

While learning these tools, we work a complex case study that will require multiple iterations of manipulation, visualization, and modeling to test a data scientific hypothesis.

Materials

5. Introduction to Julia and JuMP, Linear Optimization, and Engaging – Sebastien Martin and Joey Huchette

Introduces the Julia programming language, elementary optimization in with the JuMP module, and interacting with MIT Sloan’s computing cluster, Engaging.

Materials

6. Nonlinear and Integer Optimization in JuMP – Miles Lubin and Yee Sian Ng

A discussion of more advanced optimization techniques with Julia and JuMP, including nonlinear and mixed integer techniques.

Materials

  1. Introduction - Notebook and solutions
  2. Using Callbacks - Notebook and solutions
  3. Nonlinear optimization - Notebook

7. Excel for Operations Research – Charlie Thraves

Introduces key functions in Excel, using concrete examples and case studies. Students learn to handle efficiently a wide variety of problems based on spreadsheet challenges faced by real companies. Topics include elementary formulas, reference types, matrix manipulation, pivot tables, and macros.

Materials

  1. Shortcuts and References
  2. Session Workbooks

8. Deep Learning in TensorFlow Python – Eli Gutin

Introduces TensorFlow, a state-of-the-art library from Google for working with neural networks and other deep-architecture computations. Students deploy neural networks on a series of challenging machine learning problems, with a focus on classification tasks.

Materials

  1. Obtaining TensorFlow -
  2. Slides
  3. Elementary Nonlinear Classification - exercises and solutions
  4. Teaching a Computer Binary - exercises and solutions
  5. Handwritten Digit (MNIST) Classification - exercises and solutions