Welcome to the course repository for the 2017 offering of 15.S60, Computing in Optimization and Statistics. This is an advanced course offered by and for practicing researchers in fields relating to operations research, computer science, applied mathematics, and computational engineering. For an overview of the course, including logistics and session content, please consult the syllabus. Advanced users may wish to access the course repository directly. Below, you can learn more about each of the sessions and access primary materials.
Introduces foundational concepts and computing tools, including terminal navigation and basic commands; version control with
git and GitHub; and elementary data inspection and manipulation in
Code and Data
This session will introduce basic techniques in data wrangling and visualization in R. Specifically, we will cover some basic tools using out-of-the-box R commands, then introduce the powerful framework of the “tidyverse” (both in wrangling and visualizing data), and finally gain some understanding of the philosophy of this framework to set up deeper exploration of our data. Throughout, we will be using a publicly available dataset of AirBnB listings.
Sessions 2-4 all use a data set of Boston AirBnB rentals, provided courtesy of AirBnB and Kaggle. You can download each of the three components by clicking the links below:
This session introduces basic concepts of machine learning and their implementation in
R. Topics include elementary regression; regularization and model selection; natural language processing; and model diagnostics. Throughout the session, students use data manipulation and exploration skills to visualize and evaluate models.
Data science is rarely cut-and-dried; each analysis typically provides answers but also raises new questions. This makes the data scientific process fundamentally cyclical:
Image credit: Hadley Wickham
A skilled analyst needs to be able to smoothly transition from data manipulation to visualization to modeling and back. In this session, we focus on using the
tidyverse set of packages to smoothly navigate the Cycle of Data Science.
Topics covered include:
While learning these tools, we work a complex case study that will require multiple iterations of manipulation, visualization, and modeling to test a data scientific hypothesis.
Introduces the Julia programming language, elementary optimization in with the JuMP module, and interacting with MIT Sloan’s computing cluster, Engaging.
A discussion of more advanced optimization techniques with Julia and JuMP, including nonlinear and mixed integer techniques.
Introduces key functions in Excel, using concrete examples and case studies. Students learn to handle efficiently a wide variety of problems based on spreadsheet challenges faced by real companies. Topics include elementary formulas, reference types, matrix manipulation, pivot tables, and macros.
Introduces TensorFlow, a state-of-the-art library from Google for working with neural networks and other deep-architecture computations. Students deploy neural networks on a series of challenging machine learning problems, with a focus on classification tasks.