Predicting Iris Flower Species with Machine Learning

Poonam Rao
3 min readNov 12, 2021


Photo Credits:
Christina Brinza.

About the Dataset

This famous (Fisher’s or Anderson’s) iris data set gives the measurements in centimeters of the variables sepal length and width and petal length and width, respectively, for 50 flowers from each of 3 species of iris. The species are Iris setosa, versicolor, and virginica. The data were collected by Anderson, Edgar (1935, The irises of the Gaspe Peninsula, Bulletin of the American Iris Society, 59, 2–5).

Exploratory Data Analysis

Before we begin clustering let us visualize the dataset. Following are few ways to visualize data.

From this graph below we can see that setosa species can be linearly separated but versicolor and virginica have overlap.

Density Plot

Density Plot shows the distribution of observations for sepal lengths.

Clustering Using K-means

Decision Tree

From the confusion matrix we see that 6 observations are incorrectly classified.

Decision Tree gives us 85% accuracy on unknown data.


Fisher, R. A. (1936) The use of multiple measurements in taxonomic problems. Annals of Eugenics, 7, Part II, 179–188.

Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole. (has iris3 as iris.)



Poonam Rao

Exec Director StratEx - I bring to the table blend of data science, finance and strategy management skills with 20+ years of experience in insurance & fintech.