Predicting Iris Flower Species with Machine Learning

About the Dataset

This famous (Fisher’s or Anderson’s) iris data set gives the measurements in centimeters of the variables sepal length and width and petal length and width, respectively, for 50 flowers from each of 3 species of iris. The species are Iris setosa, versicolor, and virginica. The data were collected by Anderson, Edgar (1935, The irises of the Gaspe Peninsula, Bulletin of the American Iris Society, 59, 2–5).

Exploratory Data Analysis

Before we begin clustering let us visualize the dataset. Following are few ways to visualize data.

From this graph below we can see that setosa species can be linearly separated but versicolor and virginica have overlap.

Density Plot shows the distribution of observations for sepal lengths.

Clustering Using K-means

Decision Tree

From the confusion matrix we see that 6 observations are incorrectly classified.

Decision Tree gives us 85% accuracy on unknown data.

References

Fisher, R. A. (1936) The use of multiple measurements in taxonomic problems. Annals of Eugenics, 7, Part II, 179–188.

Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole. (has iris3 as iris.)

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Poonam Rao

Poonam Rao

20 Followers

Exec Director StratEx - I bring to the table blend of data science, finance and strategy management skills with 20+ years of experience in insurance & fintech.