# Exercises # FIRST SET OF EXERCISES: Linear Regression # 1. Regress price on review_scores_rating. Plot the regression line and the actual training # points, and find the out-of-sample RMSE. (Read below for more details if you need them.) # DETAILS: # -Remove rows where review_scores_rating is NA # -Create a new train/test split using `resample_partition()` # -Use `lm()` to learn the linear relationship # -Use `add_predictions()` and ggplot tools for the plotting # -Calculate RMSE on the test data # 2. Try to beat the out-of-sample performance of the price ~ accommodates model by adding other # variables. You can use `names(listings)` to explore potential predictors. If you start getting # errors or unexpected behavior, make sure the predictors are in the format you think they are. # You can check this using the `summary()` and `str()` functions on listings$. # Median Regression. Since we're dealing with data on price, we expect that there will be high # outliers. While least-squares regression is reliable in many settings, it has the property # that the estimates it generates depend quite a bit on the outliers. # One alternative, median regression, minimizes *absolute* error rather than squared error. # This has the effect of regressing on the median rather than the mean, and is more robust to outliers. # In R, it can be implemented using the `quantreg` package. # 3. Install the quantreg package, and compare the behavior of the median regression fit (using the `rq()) # function) to the least squares fit from `lm()` on the original listings data which includes all the # price outliers. You can enter ?rq for info on the rq function. # (More details: One easy way to compare the quantitative behavior is by plotting the two regression # lines together. You can use the `gather_predictions()` function as we did in class.) # SECOND SET OF EXERCISES: glmnet # 1. The glmnet package is actually more versatile than just LASSO regression. It also does ridge regression # (with the l2 norm), and any mixture of LASSO and ridge. The mixture is controlled by the parameter # alpha: alpha=1 is the default and corresponds to LASSO, alpha=0 is ridge, and values # in between are mixtures between the two. One could use cross validation to choose this # parameter as well. For now, try just a few different values of alpha on the model we # built for LASSO using `cv.glmnet()` (which does not cross-validate for alpha # automatically). How do the new models do on out-of-sample RMSE? # THIRD SET OF EXERCISES (time permitting): Classification # 1. Try to beat the out-of-sample performance for logistic regression of elevators # on price by adding new variables.