Text, Sentiment and Emotion Mining for a Women’s eCommerce Clothing Store

Poonam Rao
9 min readNov 18, 2021

--

Text Analytics and Visualization Purpose

This analysis for a Women’s eCommerce Clothing store is to primarily explore the sentiments and emotions from customer reviews. These insights would enable the store manager, key stakeholders and employees to make corrective action in areas of negative sentiments and sustain/optimize the activities that are resulting in positive feedback.

Overall, understanding customer sentiments can permeate positively in areas of customer service, product development, support, operations ultimately helping the business grow and stay relevant.

What is Text Analytics?

Text Analytics, Text Mining or Text Analysis are interchangeable terms. Text analysis is a machine learning technique that helps efficiently mine enormous volumes of data in a scalable, unbiased, and consistent fashion across extracting valuable insights, trends, and patterns. Text mining leverage statistical pattern learning to obtain insights. These insights backed with visualizations help determine the best course of actionables and help make informed decisions.

What are some possible use cases?

In this Big Data Era, businesses that automate text analysis have an edge over the competition. Text analytics can be applied real-time to areas of business operations. Few use cases include:

  • Marketing & Advertising: Tracking and monitoring brand mentions in social media platforms and analyzing marketing funnels to determine what sparks customers’ interest and what results in closed deals (conversational AI). Mentions could be analyzed to see if they indicate a desire to purchase the product or complaints.
  • Public Relations & Brand Reputation Analysis: Tracking sentiments and opinion polarity (positive, negative, and neutral) from reviews, tweets, blogs, forums, audio, documents, and conversations on portals and social media platforms such as Amazon, Google, Yelp, Facebook, Twitter, Instagram, YouTube, LinkedIn, or just emails and aggregator portals. It can help detect urgent matters 24/7 to take timely action.
  • Customer Support & Relationship Management: Monitoring customer support notes, comments, and chatbot conversation logs can help an organization learn from its customers. Text mining can be leveraged to find trends in customer tickets and categorizing by topic and sentiment. Categorization can help determine if the tickets are truly support-related in nature or repetitive product complaints are repetitive on a certain topic. It can help determine urgent vs low priority tickets beyond the basic criteria of origination time of tickets.
  • Customer Experience: Analyzing NPS (Net Promoter Score) survey responses to understand VoC (Voice of the Customer), what went wrong, in addition to triaging IVR responses. Keeping NPS at the forefront will help retain those customers that took considerable time investment to acquire.
  • Strategy Formulation: Analyzing news reports, investor analysis reports, white papers, competitive intelligence, market research, and literature with machine learning models to supplement data-based strategy formulation, crafting go-to-market strategies, guiding new product launches, and improving products and services.
  • Customer training and self-help: Finding the most relevant topics (topic modeling) to include in product demonstrations, product training guides, identifying chatbot topics, or knowledge base.

About the Dataset

The dataset has been selected from Kaggle that includes 23486 rows and 10 feature variables. Each row corresponds to a customer review, and includes the following variables. The data author anonymizes the store name and indicates that the source is from a real store, however, the data seems to be curated to some extent given the fact it has reviews from women 80+ upto age 99. This finding was made late in the analysis, and since the purpose of the project is most text analytics, the review text was the key focus area.

  • Clothing ID: Integer Categorical variable that refers to the specific piece being reviewed.
  • Age: Positive Integer variable of the reviewers age.
  • Title: String variable for the title of the review.
  • Review Text: String variable for the review body.
  • Rating: Positive Ordinal Integer variable for the product score granted by the customer from 1 Worst, to 5 Best.
  • Recommended IND: Binary variable stating where the customer recommends the product where 1 is recommended, 0 is not recommended.
  • Positive Feedback Count: Positive Integer documenting the number of other customers who found this review positive.
  • Division Name: Categorical name of the product high level division.
  • Department Name: Categorical name of the product department name.
  • Class Name: Categorical name of the product class name.

Subset Used for Exploratory Data Analytics

As part of exploratory data analysis to determine the review categories, ratings for departments and divisions, and age of the respondents. The following variables from the entire dataset have been used for this analysis.

  • Age: Positive Integer variable of the reviewers age.
  • Rating: Positive Ordinal Integer variable for the product score granted by the customer from 1 Worst, to 5 Best.
  • Recommended IND: Binary variable stating where the customer recommends the product where 1 is recommended, 0 is not recommended.
  • Positive Feedback Count: Positive Integer documenting the number of other customers who found this review positive.
  • Division Name: Categorical name of the product high level division with values as General, General Petite and Intimate.
  • Department Name: Categorical name of the product department name. Values include Tops, Trends, Bottoms, Dresses, Intimate, Jackets.

Subset Used for Text Analytics

For text analytics that includes determining polarity of reviews, emotions, and sentiments, the following variables have been used for the analysis of 150 reviews.

  • Title: String variable for the title of the review.
  • Review Text: String variable for the review body.

Data Exploration Topics

The goal is the explore the following areas for the Women’s eCommerce Clothing Store:

  • How many women recommended in all vs those who did not?
  • What are the average ratings for each department?
  • What are the department categories and how many positive reviews and recommendations for each?
  • How much positive feedback for each department and division?
  • How many recommendations for each department and division?
  • What is the age group that is providing high ratings and for which categories?
  • What are the words used in positive and negative feedback?
  • What are the most common words used in feedback regardless of sentiments?
  • What emotions are the reviews conveying?

Text Analytics Methods and Techniques

Tableau has been used for storytelling and R programming language has been used for text analytics. R packages like sentimentr have been used for sentiment and emotion analysis. This package has been specifically designed for psychological or sociological studies and uses a lexicon based approach. Group of words in a sentence are classified based on an overall sentiment score. The package does a good job in segregating “I am disappointed with this purchase.”, a negative sentiment vs “You won’t be disappointed!”, a positive sentiment.

Results of SentimentR for variations in sentences (R language)

The following steps were followed for text, sentiment and emotion mining:

  • Initial data eyeballing was done in Google Sheets. Null values in “Department Name” and “Division Name” were imputed with the value “Trend”.
  • The “Review Text” column and “Title” column was converted to a .txt file for analysis in R. This file is processed by R and this step was done to eliminate additional coding within R to split the column and save it. Optionally, this step can be done programmatically.
  • Bubble charts, bar graphs, and word clouds have been used for visualization using Tableau. A story compiles all the graphics and summaries into a single tabbed navigable and interactive deck, providing annotations of key insights and observations from each graph.
  • For emotion and sentiment mining, a mix of 175 reviews have been used since the file is very large with ~30K reviews and due to laptop in-memory processing limitations. In a production environment that enables large data processing, complete analysis can be done.
  • A sentiment score is returned by the sentimentr package. For our analysis, any score below 0 is considered negative; between 0.1 and 0.2 is considered neutral; and positive scores are those greater than 0.2. This can be adjusted based on the sensitivity the store wants to measure. For example, any scores below 0.3 could be considered negative. There are pros and cons to this approach as too many negatives would lead to unfocussed areas of corrective action.
  • Another approach could be to have 5 levels of rating as — Worse, Slightly Negative, Neutral, Mostly Positive, Excellent. Each of these approaches have pros and cons. In my opinion, the best outcomes can be derived from the 3-point polarity rating.
  • Rough plots were created within R and Tableau for basic analysis. These are not part of the final story but kept in the Tableau workbook and R code for reference.

Visualizations

Following are some of the visualizations from the analysis. The Tableau packaged workbook is best suited to review the storyline. It includes annotations and presents insights in a logical format, unveiling layers of data. Insights and observations have been described in the following section.

In terms of the design, purple/pink theme has been used for graphs keeping in the theme of women’s clothing. An overall minimalistic designing style has been incorporated.

Insights from Text Analytics

- Most positive ratings. “Tops” department ranks high, followed by “Dresses”. “Trend” ranks the least.

- Of the 23K customers, 83% provided recommendations and 17% did not provide recommendations.

- “General” division has the most recommendations, followed by “General Petite”.

- Most women in age group 35–40 have provided recommendations.

-25K people have agreed with the reviews provided and found it positive.

- It seems that petite women are generally happy with the dress fit, color and fabric. This is indicated by the positive word cloud and high mentions of word petite. “Love” is also one of the most popular words and indicates an overall high level of customer satisfaction with the online store.

- Sentiment polarity shows 10% negative , 34% neutral and 56% positive.

- Associated positive words: soft, comfy, nice, love, warm, quality, fine, fit.

- Associated negative words: wait, stuck, cheap, flimsy, snags, itchy, awkward, sticky, wrinkle, flaw.

- Top emotions: Trust, Joy and Anticipation. Negative emotions: Anger, Sadness, Disgust and Fear

Conclusion

It is safe to conclude that the majority of the reviews the clothing store has are positive. Additional research in product quality and operational aspects to conduct root cause analysis and identify strategic and tactical initiatives and projects.

Next Steps

Additional research could be done to analyze feedback for each department and division to see how the sentiments and emotions are trending between departments. Here we have done a post-mortem analysis of reviews from the past. This analysis could be performed real-time, maybe on a daily basis, by the clothing store to make corrective actions on a timely basis. This process could be automated with a machine learning algorithm which has been trained on past reviews and predicts if the new review is positive or negative or neutral. Negative reviews could be escalated to the right resources within the organization that have decision-making power and can execute on corrective action.

Wordcloud of all reviews
Top words used in reviews
Top 10 Words used in Reviews
Number of words used in reviews and Sentiment Score
Emotion Analysis
Words conveying positive sentiments

Negative Words

Words conveying negative sentiments
Sentiment Polarity

--

--

Poonam Rao
Poonam Rao

Written by Poonam Rao

Exec Director StratEx - I bring to the table blend of data science, finance and strategy management skills with 20+ years of experience in insurance & fintech.

No responses yet