Analysis of Internet Movie Database (source: Kaggle) for an imaginary strategic objective for expanding the Disney+ Platform.
Disney+ (pronounced Disney Plus) is an on-demand streaming service, a division of The Walt Disney Company. They distribute movies and TV series produced in-house in addition to the Fox catalog of both new and classic movies and shows.
Target audience & why it is apt for this presentation
The primary audience is Disney+ executives who are key decision-makers of Disney+ strategy and execution. The executives are looking for analysis of the IMDB database as inputs to their upcoming strategy planning exercise…
This paper shares my takeaways, best practices and mitigation steps to mitigate unfairness and bias while designing machine learning algorithms.
Data science projects go wrong either due to flawed models or insufficiently/ incorrectly trained algorithms or emergent bias on new/ unanticipated contexts. Fairness is a human, not a mathematical decision, grounded in shared ethical beliefs. While machine learning does not make decisions based on feelings and emotions, it does inherit a lot of human biases leading to disparate impact. In this era where consequential decisions are algorithm-based it is imperative that they are fair, not perpetuated without users knowledge. …
This paper utilizes Fargo Health Group dataset to forecast the demand for heart examinations expected in 2014 for Abbeville Health Center. It outlines how a business problem can be solved using a data-driven decision making approach and explains the methodology, model leveraged, ethical implications and recommendations for Fargo Health Group.
Fargo Health Group faces the following business problems:
Diamond pricing involves a complex mechanism influenced by multiple factors such as carat, cut, color, and price. This article analyzes the correlation between these factors and depicts with visualizations.
Exploratory data analysis
R diamond.csv dataset includes approximately 54K observations with 10 variables including carat, cut, color, clarity, depth, table, price, x (length in mm), y (width in mm), and z (depth in mm). Overall a clean dataset with no missing values or messy data.
Structure of the dataset (R lang)
Data visualizations created for academic purposes in Python
PayPal has long been the go-to for online payments as well as in-store, with a wallet share of 48% in all payment categories.Nearest competitor Apple Pay has a 15% market share followed by Google Pay at 11%.
On the platform end, Android smartphone users dominate the market at 73%, followed by the affluent Apple users at 26%.
Our face serves as our identity more like a fingerprint in the modern world. Face recognition technology has gained attention in the last decade. Face Recognition technology enables detecting faces (both humans and pets) in an image or any locations using biometric technologies, mimicking how humans recognize faces, classify genders and race. Artificial Intelligence (AI) enables detecting faces with cameras and comparing them with a searchable database of faces using sophisticated methods for analyzing the depth of eyes, angle of jawlines, and other facial traits. The photos are usually passport size used for drivers license, passport and other IDs. Additional…
This paper does a critical review of literature on the topic of ethics in Web Scraping.
Web Scraping Explained
Big Web Data is dynamic content including HTML tables, blog, tweets, photos, audio, videos, structured and unstructured data. It evolves at extreme velocity, has high volume and variety.
Web scraping, a revolutionizing research practice, is described as the automated method of extracting and harvesting publicly available web data (Luscombe et al., 2021). Macapinlac (2019) defines web scrapers as a bot, involving components of website analysis, web crawling and data organization. A web request to retrieve data is sent to the…
Most businesses have untapped volumes of structured, semi-structured, and unstructured text-based data from internal and external sources. In a small-shop setup, the owner/proprietor would eyeball such data to get a pulse of customer sentiments. It would however be inaccurate while expensive. Given the storm of data bought by Big Data, it is cumbersome, time-consuming, and nearly impossible for humans to do this manually.
They are all the same terms used interchangeably. Text analysis is a machine learning technique that helps efficiently mine enormous volumes of data in a scalable, unbiased, and consistent fashion across extracting valuable insights, trends, and patterns…