[Q3 ~ Q7]
Exploring and analyzing coffee shop transaction data.
1.
(7 points) Create the following panel data set that consists of the unique customer ID and customers’ weekly total spending (in USD) by coffee type.
2.
(8 points) Draw a boxplot for spending by coffee type and interpret your boxplot.
3.
(10 points) The following table shows the unit price (in USD) for each coffee type. Please add a new variable,
quantity, to the “coffee_weekly_trans” data set.
americano
|
espresso
|
latte
|
cappuccino
|
drip
|
1.5
|
1.25
|
2.95
|
2.45
|
1
|
[Hint: You may first create a new variable that represents the unit price for each coffee type and add it to the data set. Then you may calculate the quantity sold by using the following equation:
Quantity sold = Total spending / Unit price.]
4.
(12 points) To increase sales of cappuccino, the owner of the coffee shop plans to initiate a promotion that provides a free drink with customers who buy more than three cappuccinos in a certain week.
To identify the customers on a weekly basis, you are asked to add a dummy variable “promo” that is equal to 1 if a customer bought
more than three cappuccinos
in a certain week and 0 otherwise to the “coffee_weekly_trans” data set.
Please also identify the customers who are eligible for a free drink (i.e., customers whose promo = 1) in each week. Specify the customers’ IDs and Weeks. For example, “ID in Week #.”
5.
(12 points) Create a new data set “coffee_weekly_trans_sp” (from “coffee_weekly_trans” data set) that presents the total spending over the four weeks for each customer. Who is the most profitable customer? Specify the customer’s ID.
[Q8 ~ Q12]
LJ is a multichannel clothing retailer in Bloomington-Normal. The owner of LJ has collected sales data by city and detailed customers’ transaction data for the last two years. To understand the shopping behavior of its customers, he pulled up the transaction data for a month and requested you to analyze it. Please access one of his datasets,
LJ.trans.csv
(posted in “Assignments” → “Exam I” @ ReggieNet), explore it and analyze it through the following steps.
Variable description
cust.id: unique customer ID
age: customer age
email: “yes” if LJ has the customer’s email address; “no” otherwise
city: living in Bloomington or Normal
sex: female or male
distance.to.store: distance between home to the closest offline store (mile)
online.visits: total number of visits to online store for a month
online.trans: total number of transactions via online store for a month
online.spending: total spending ($) in online store for a month
online.return: total product return amount ($) via online store for a month
offline.trans: total number of transactions via offline store for a month
offline.spending: total spending ($) in offline store for a month
offline.return: total product return amount ($) via offline store for a month
review: “yes” if the customer posted her/his reviews; “no” otherwise
income: income ($)
loyalty: customer segments (high, med, low) based on customer’s level of loyalty
6.
(5 points) How many variables and observations in the
LJ.trans
dataset?
7.
(8 points) What are the average, median, standard deviation, and variance of the following variables?
-
age
-
distance.to.store
-
offline.spending
-
online.return
8.
(13 points) Suppose we have the two customer groups’ profiles as below. What is the
average offline spending
for the customer group A who match all of the following conditions? What is the
average offline spending
for the customer group B who match all of the following conditions? Which group of customers spend more in the offline store on average?
Customer group A’s profile
a. Age >= 20
b. Living in Normal
c. Female
d. Customers who posted reviews
e. Customers who live more than 10 miles (> 10) away from the store
f. Customers who never returned products through offline store
Customer group B’s profile
a. Age
b. Income
c. Loyalty level = low
d. Living in Bloomington
e. Customers who live within 40 miles (
f. Customers who never visited online store
g. Customers who purchased products through offline store at least once
9.
(10 points) Create a new variable,
generation_group, that represents the generation of the LJ customers and add it to the
LJ.trans
dataset. More specifically, generation_group variable takes a value “Post-Millennials” for ages between 1 and 21 (including 1 and 21), “Millennials” for ages between 22 and 37 (including 22 and 37), “Generation X” for ages between 38 and 53 (including 38 and 53), and “Baby Boomers” for ages between 54 and 72 (including 54 and 72). Please use for() and if else() functions or ifelse() function to create the
generation_group
variable.
10.
(5 points) Boxplots are a compact way to represent a distribution of variable and good for initial data visualization. Boxplots are not appropriate for:
- age
- loyalty
- online.return
- offline.spending
- income
11.
(7 points) Write a R function that converts “kcal” to “kJ” and show how many kilojoules (kJ) of energy are found in a 700 kilocalorie (kcal) meal?
12.
(12 points) When you multiply the square of the cumulative sum of natural numbers 1, 2, …,
n, i.e., 12 ,
(1+2)2 ,
(1+2+3)2, …
(1+2+…+n)2
find
n
that produces the product which is more than 10,000 and the closest to 10,000”
Hint: Use either while() or repeat() loop.
[Q15 ~ Q18]
LJ is a multichannel clothing retailer in Bloomington-Normal. The owner of LJ has collected sales data by city and detailed customers’ transaction data for a year. Specifically, he has collected net sales information by city and sales channel. After taking Marketing Analytics course at ISU, he realized that his way of collecting and storing data was wrong. That is, he has created “untidy” datasets. Please access one of his datasets,
LJ.sales.csv
(posted in “Assignments” → “Exam I” @ ReggieNet), and transform it into “tidy” dataset through the following steps.
Variable description
channel: offline or online
normal: net sales (= total sales ($) - total product return amount ($)) in Normal
bloomington: net sales in Bloomington
peoria: net sales in Peoria
champaign: net sales in Champaign
springfield: net sales in Springfield
joliet: net sales in Joliet
13.
(8 points) Step 1: Import “LJ.sales.csv” file into R first. Transform the original untidy data (LJ.sales) into “step1.data” as follows:
13.[Hint: You can load “tidyr” package by using library(tidyr)]
original
channel
|
bloomington
|
champaign
|
joliet
|
normal
|
peoria
|
springfield
|
offline
|
1130 - 313
|
883 - 413
|
1143 - 201
|
1978 - 225
|
2236 - 296
|
2004 - 123
|
online
|
1122 - 285
|
950 - 204
|
1098 - 187
|
2534 - 145
|
3245 - 300
|
2542 - 135
|
→ step1.data
channel
|
city
|
netsales
|
offline
|
bloomington
|
1130 - 313
|
online
|
bloomington
|
1122 - 285
|
offline
|
champaign
|
883 - 413
|
online
|
champaign
|
950 - 204
|
offline
|
joliet
|
1143 - 201
|
online
|
joliet
|
1098 - 187
|
offline
|
normal
|
1978 - 225
|
online
|
normal
|
2534 - 145
|
offline
|
peoria
|
2236 - 296
|
online
|
peoria
|
3245 - 300
|
offline
|
springfield
|
2004 - 123
|
online
|
springfield
|
2542 - 135
|
14.
(10 points) Step 2: Transform the “step1.data” you created in Step 1 into the “step2.data” as follows:
step1.data
channel
|
city
|
netsales
|
offline
|
bloomington
|
1130 - 313
|
online
|
bloomington
|
1122 - 285
|
offline
|
champaign
|
883 - 413
|
online
|
champaign
|
950 - 204
|
offline
|
joliet
|
1143 - 201
|
online
|
joliet
|
1098 - 187
|
offline
|
normal
|
1978 - 225
|
online
|
normal
|
2534 - 145
|
offline
|
peoria
|
2236 - 296
|
online
|
peoria
|
3245 - 300
|
offline
|
springfield
|
2004 - 123
|
online
|
springfield
|
2542 - 135
|
→ step2.data
city
|
offline.netsales
|
online.netsales
|
bloomington
|
1130 - 313
|
1122 - 285
|
champaign
|
883 - 413
|
950 - 204
|
joliet
|
1143 - 201
|
1098 - 187
|
normal
|
1978 - 225
|
2534 - 145
|
peoria
|
2236 - 296
|
3245 - 300
|
springfield
|
2004 - 123
|
2542 - 135
|
15.
(10 points) Step 3: Transform the “step2.data” you created in Step 2 into the “tidy.data” as follows:
step2.data
city
|
offline.netsales
|
online.netsales
|
bloomington
|
1130 - 313
|
1122 - 285
|
champaign
|
883 - 413
|
950 - 204
|
joliet
|
1143 - 201
|
1098 - 187
|
normal
|
1978 - 225
|
2534 - 145
|
peoria
|
2236 - 296
|
3245 - 300
|
springfield
|
2004 - 123
|
2542 - 135
|
→ tidy.data
city
|
offline.sales
|
offline.return
|
online.sales
|
online.return
|
bloomington
|
1130
|
313
|
1122
|
285
|
champaign
|
883
|
413
|
950
|
204
|
joliet
|
1143
|
201
|
1098
|
187
|
normal
|
1978
|
225
|
2534
|
145
|
peoria
|
2236
|
296
|
3245
|
300
|
springfield
|
2004
|
123
|
2542
|
135
|
[Hint: You may take two steps to get the
tidy.data. At first, try to create “offline.sales” and “offline.return” columns, and then create “online.sales” and “online.return” columns.]