Text Analytics & Text Mining: the next Big thing in Data Science

Concept introduction, application, approach, and methodology

Poonam Rao
Nerd For Tech

--

Photo credits Piotr Łaskawski, Unsplash.com

Most businesses have untapped volumes of structured, semi-structured, and unstructured text-based data from internal and external sources. In a small-shop setup, the owner/proprietor would eyeball such data to get a pulse of customer sentiments. It would however be inaccurate while expensive. Given the storm of data bought by Big Data, it is cumbersome, time-consuming, and nearly impossible for humans to do this manually.

What is Text Analysis, Text Analytics or Text Mining?

They are all the same terms used interchangeably. Text analysis is a machine learning technique that helps efficiently mine enormous volumes of data in a scalable, unbiased, and consistent fashion across extracting valuable insights, trends, and patterns. Text mining leverage statistical pattern learning to obtain insights. These insights backed with visualizations help determine the best course of actionable and help make informed decisions.

Is there any difference between Text Analysis, Text Analytics?

The nuance is that text analysis delivers qualitative insights (ideas & opinions) while text analytics is quantitative (numerical data). For example, trying to figure out the tickets handled by an individual customer support representative is quantitative text analytics to represent data in visual graphs. However, if the manager wants to know the outcomes of those tickets as either positive, negative, or neutral then the text within each ticket will need to be analyzed to understand how the rep is influencing customer satisfaction and impacting the customer experience.

What are some possible use cases?

In this Big Data Era, online content analysis can help gain actionable insights by mining the volumes of information available. Businesses that automate text analysis have an edge over the competition. Text analytics can be applied real-time in a variety of domains from FMCG, Banking, Fintech, Retail, Restaurant, Government, Insurance sectors among others. Few use cases include:

  • Marketing & Advertising: Tracking and monitoring your brand mentions in social media platforms and analyzing marketing funnels to determine what sparks customers' interest and what results in closed deals (conversational AI). Mentions could be analyzed to see if they indicate a desire to purchase the product or complaints.
  • Public Relations & Brand Reputation Analysis: Tracking sentiments and opinion polarity (positive, negative, and neutral) from reviews, tweets, blogs, forums, audio, documents, and conversations on portals and social media platforms such as Amazon, Google, Yelp, Facebook, Twitter, Instagram, YouTube, LinkedIn, or just emails and aggregator portals. It can help detect urgent matters 24/7 to take timely action.
  • Customer Support & Relationship Management: Monitoring customer support notes, comments, and chatbot conversation logs can help an organization learn from its customers. Text mining can be leveraged to find trends in customer tickets and categorizing by topic and sentiment. Categorization can help determine if the tickets are truly support-related in nature or repetitive product complaints are repetitive on a certain topic. It can help determine urgent vs low priority tickets beyond the basic criteria of origination time of tickets.
  • Customer Experience: Analyzing NPS (Net Promoter Score) survey responses to understand VoC (Voice of the Customer), what went wrong, in addition to triaging IVR responses. Keeping NPS at the forefront will help retain those customers that took considerable time investment to acquire.
  • Strategy Formulation: Analyzing news reports, investor analysis reports, white papers, competitive intelligence, market research, and literature with machine learning models to supplement data-based strategy formulation, crafting go-to-market strategies, guiding new product launches, and improving products and services.
  • Customer training and self-help: Finding the most relevant topics (topic modeling) to include in product demonstrations, product training guides, identifying chatbot topics, or knowledge base.
  • Natural Language Processing (NLP): NLP is yet another branch of text analysis and helps computers interpret human speech and human communication using language and intent detection.
  • Text summarization: Condense complex texts of multi-pages into less than 300-page summaries.

Big Text Mining Approach

  • Gather data, have a data strategy for your organization. Use all relevant data from all sources.
  • Mine & model data with a focus on key insights to be obtained.
  • Start small then scale.
  • Act on key insights.

Text Analytics Methodology

  • Data gathering: Everyday data gathered from business processes and operations, inclusive of internal and external data. Data can be gathered via API or exported in CSV, text, JSON, XML, etc. formats. Visual web scraping tools can be used for gathering data. Open datasets, APIs can be leveraged to gather external data. Facebook, Twitter, New York Times, Guardian, etc. offer their own APIs to extract data.
  • Data cleaning: This involves removing duplicates and imputing missing data. Online reviews from Amazon may sometimes have duplicate/fake data to boost review counts and rating, such data will need to be dropped from analysis to avoid skewed and incorrect analysis.
  • Data preparation: Tokenization is the process of breaking down the text into words that can be analyzed. Once tokens are identified, they are tagged on parts of speech. The language model and categorized tokens are available at this point. Next, based on the grammar of the language, parsing is done to determine the syntactic structure of the language of the text. Stemming and Lemmatization is done to identify the stem or root word. Conjunctions could help identify auto-categorization rules. Stopwords will be eliminated as relevant to the problem.
  • Analyzing data: Python and R both offer extensive packages for text analytics to implement the two popular techniques of text extraction and text classification. R includes koRpus, OpenNLP, Stringr, TM framework, Wordcloud, Text2vec, RWeka, Tidytext, Spacyr, and Quanteda among others. Simpler techniquest like text extraction, text classification, and word frequency, clustering, collocation (words commonly occurring together), concordance (context of the keyword), word sense disambiguation (words that have more than one meaning).
  • Visualizing data: Visualization makes it easier to see trends and patterns with data and gather insights. Google Data Studio, Looker, Tableau, in addition to Python and R libraries to generate word cloud and statistical graphs.
  • Modeling & Testing Algorithms: Testing and training machine learning models.
  • Deployment: Deploying in real-world in either on-premises or cloud-based model. Ongoing monitoring of the model and refining it.

Conclusion

80% of business data is unstructured and untapped for insights. In a world where emojis are used to express feelings on products and services, text mining offers tremendous power to transform your business beyond the vision of traditional approaches. A combination of machine-driven and user-guided approaches will be needed for analysis. Given the changing demographics, it will be important to even analyze English in different dialects. A growing interest in multi-lingual text mining is predicted.

Text Analytics, a branch of data science, is an emerging market. The potential scope with Big Text is promising. What makes it attractive is the ability to sense sentiments. With the rise and adoption of social media, we can expect Big Data to grow in exponential magnitudes. Global Text Analytics Market is presently valued at approximately $7 billion and expected to grow to $20 billion by 2024.

Future use cases could involve analyzing the strength of social ties; analyzing changes in speakers over time; identifying mindsets (optimistic/pessimistic, conservative/liberal); identifying informal hierarchies in social conversations; personality analysis; higher-level analysis of conversations to identify humor, satire, secrets, shame, disgust, anger, and self-revelation; a deeper understanding of communities and informal groups; and recommendation engine enhancements based on conversations. Obviously, there are lots of ethical aspects to be considered before we integrate technology.

--

--

Poonam Rao
Nerd For Tech

Exec Director StratEx - I bring to the table blend of data science, finance and strategy management skills with 20+ years of experience in insurance & fintech.