Predicting Car Mileage Using Machine Learning
Audience
Data Scientists
Purpose
This article explores if car year of manufacture and weight are good predictors of mileage of an automobile.
Purpose of ML Model
Predict the car mileage per gallon based on features like weight and year of manufacture. KNN (K-Nearest Neighbor) regression model is being used.
About the Dataset
Auto dataset available in R, ISLR package was used for this analysis. 392 observations of cars with 9 attributes as follows
mpg : miles per gallon
cylinders : number of cylinders between 4 to 8
displacement : in cubic inches
horsepower : engine horsepower
weight : in pounds
acceleration : time to accelerate 0 to 60 miles in seconds
year : model year
origin: origin of car as American (1), European (2) or Japanese (3)
name : vehicle name
ML Modeling
- Divide dataset into train and test sets, 65% and 35% approximately.
- Scale weight and year columns of the test set, as standard deviations are different.
- Standardize the test set columns based on the original mean and standard deviation of the training set.
- Ran KNN regression with k-value of 1. This resulted in a MSE of 15.25
- Applied 10-fold cross validation, KNN regression for 50 k-values and computed mean squared error.
Analysis
A k-value of 17 was chosen since it has the lowest MSE of 9.05. Larger k-value would mean low variance model.