Data Descriptions (fb_ad_data.csv):
ad_id: a unique ID for each ad.
xyz_campaign_id: an ID associated with each ad campaign of xyz company.
fb_campaign_id: an ID associated with how Facebook tracks each campaign.
age: age of the person to whom the ad is shown.
gender: gender of the person to whom the ad is shown
interest: a code specifying the category to which the person’s interest belongs (interests areas mentioned in the person’s Facebook public profile).
Impressions: the number of times the ad was shown.
Clicks: number of clicks on for that ad.
Spent: Amount paid by xyz company to Facebook, to show that ad.
Total_Conversion: Total number of people who enquired about the product after seeing the ad.
Approved_Conversion: Total number of people who bought the product after seeing the ad.
Part A. Statistical Inferencing Tasks:
Compute the following:
Probability of an ad having one click
Probability of an ad showed to Male
Probability of an ad showed to Female
Probability of an ad having one-click given that it is shown to Male
Probability of an ad having one click given that it is shown to Female
Probability of an ad having one click and it is shown to Male
Based on the computation of the above-mentioned probabilities, draw inference regarding the independence of events, if any. In particular, do you think that the event of “an ad having one click” is independent of the events that “the ad is shown to Male” or “the ad is shown to Female”
Compute the mean value of a click on an ad.
Compute 95% and 99% confidence intervals for the true value of a mean click. Note that a 100(1−" style="transition: all 0.2s cubic-bezier(0.2, 0.2, 0.38, 0.9) 0s; margin: 0px; line-height: 0 !important; display: inline-block; border-collapse: separate; border-spacing: 0px;">α)% confidence interval for the mean can be computed as (for n≥30)¯x−Zα/2S√n≤μ≤¯x+Zα/2S√n
where n=number of observations,¯xis sample average, s is sample standard deviation,αis the significance level (=0.05 for 95% confidence interval, and 0.01 for 99% confidence interval), andZα2is that value from a standard normal distribution such that area to the right isα/2.
Suppose a data scientist is designing a recommendation system based on the available data. One of the crucial parameters in the algorithm is the mean value of a click on an ad. She believes that the algorithm might perform arbitrarily bad for low or high values of mean click. Based on past experience, she believes that the mean number of clicks on an ad is 31. She wants to formally test this hypothesis based on the given advertising data. What should be the null and alternative hypotheses? What statistical conclusion can she draw?
Suppose the data scientist, in the future, wants to predict several probabilities regarding a number of clicks on an ad. So, she wants to fit a probability density function to the given click data. Help her by doing the following:
Fit an exponential pdf. [One way to do this will be to use the following function in python: scipy.stats.expon.fit()]
Compute the probability of a number of click exceeding 100 for an ad using,
Analyse the difference between the computed probabilities in 2, if any.
Repeat the exercise mentioned in 2and 3for a number of clicks exceeding 400 for an ad.
Part B: Exploratory Data Analysis
To analyse the business performances, one can construct various Key Performance Indicators (KPI), such as, Click through rate, Conversion rate, Return on advertising spend, etc. The choice will mainly depend on the impactful solution to a relevant question which business organisations are seeking. For the current study, we will focus on the "Cost per conversion (CPC)". It is defined as follows:
CPC = (Amount spent by the company on an ad) / (Total number of people who enquired about the product + Total number of people who bought the product)
Exploratory Data Analysis Tasks:
Carry out detailed Exploratory Data Analysis on the assigned data set where the target KPI is CPC and draw meaningful insights from various data displays, pictorial representations, measures, and present your findings. It must include the following (but don’t necessarily restrict yourself to this):
Scatter plot and correlation: CPC vs Click, CPC vs Spent, CPC vs Total_Conversion
CPC analysis by age, gender, and interest. Then draw conclusions about the groups to target (and not to target) in these genres, so that the business performance can be improved.