Predicting Iris Flower Species with Machine Learning

3 min readNov 12, 2021

--

Photo Credits:
Christina Brinza. https://unsplash.com/photos/TXmV4YYrzxg

About the Dataset

This famous (Fisher’s or Anderson’s) iris data set gives the measurements in centimeters of the variables sepal length and width and petal length and width, respectively, for 50 flowers from each of 3 species of iris. The species are Iris setosa, versicolor, and virginica. The data were collected by Anderson, Edgar (1935, The irises of the Gaspe Peninsula, Bulletin of the American Iris Society, 59, 2–5).

Exploratory Data Analysis

Before we begin clustering let us visualize the dataset. Following are few ways to visualize data.

From this graph below we can see that setosa species can be linearly separated but versicolor and virginica have overlap.

Density Plot

Density Plot shows the distribution of observations for sepal lengths.

Clustering Using K-means

Decision Tree

From the confusion matrix we see that 6 observations are incorrectly classified.

Decision Tree gives us 85% accuracy on unknown data.

References

Fisher, R. A. (1936) The use of multiple measurements in taxonomic problems. Annals of Eugenics, 7, Part II, 179–188.

Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole. (has iris3 as iris.)

Machine Learning

Predictive Analytics

Written by Poonam Rao

Exec Director StratEx - I bring to the table blend of data science, finance and strategy management skills with 20+ years of experience in insurance & fintech.

No responses yet

Help
Status
About
Careers
Press
Blog
Privacy
Terms
Text to speech
Teams