Predicting Car Mileage Using Machine Learning

Poonam Rao
2 min readOct 13, 2021


Photo Credits Why Kei.

Data Scientists

This article explores if car year of manufacture and weight are good predictors of mileage of an automobile.

Purpose of ML Model
Predict the car mileage per gallon based on features like weight and year of manufacture. KNN (K-Nearest Neighbor) regression model is being used.

About the Dataset
Auto dataset available in R, ISLR package was used for this analysis. 392 observations of cars with 9 attributes as follows

mpg : miles per gallon
cylinders : number of cylinders between 4 to 8
displacement : in cubic inches
horsepower : engine horsepower
weight : in pounds
acceleration : time to accelerate 0 to 60 miles in seconds
year : model year
origin: origin of car as American (1), European (2) or Japanese (3)
name : vehicle name

ML Modeling
- Divide dataset into train and test sets, 65% and 35% approximately.
- Scale weight and year columns of the test set, as standard deviations are different.
- Standardize the test set columns based on the original mean and standard deviation of the training set.
- Ran KNN regression with k-value of 1. This resulted in a MSE of 15.25
- Applied 10-fold cross validation, KNN regression for 50 k-values and computed mean squared error.

A k-value of 17 was chosen since it has the lowest MSE of 9.05. Larger k-value would mean low variance model.

Author created image



Poonam Rao

Exec Director StratEx - I bring to the table blend of data science, finance and strategy management skills with 20+ years of experience in insurance & fintech.