[Q3 ~ Q7] Exploring and analyzing coffee shop transaction data. 1. (7 points) Create the following...

Question

[Q3 ~ Q7] Exploring and analyzing coffee shop transaction data. 1. (7 points) Create the following panel data set that consists of the unique customer ID and customers’ weekly total spending (in USD)...

[Q3 ~ Q7]
Exploring and analyzing coffee shop transaction data.

1.
(7 points) Create the following panel data set that consists of the unique customer ID and customers’ weekly total spending (in USD) by coffee type.

R Code:

2.
(8 points) Draw a boxplot for spending by coffee type and interpret your boxplot.

R Code:

Answer:

3.
(10 points) The following table shows the unit price (in USD) for each coffee type. Please add a new variable,
quantity, to the “coffee_weekly_trans” data set.

americano	espresso	latte	cappuccino	drip
1.5	1.25	2.95	2.45	1

[Hint: You may first create a new variable that represents the unit price for each coffee type and add it to the data set. Then you may calculate the quantity sold by using the following equation:
Quantity sold = Total spending / Unit price.]

R Code:

4.
(12 points) To increase sales of cappuccino, the owner of the coffee shop plans to initiate a promotion that provides a free drink with customers who buy more than three cappuccinos in a certain week.

To identify the customers on a weekly basis, you are asked to add a dummy variable “promo” that is equal to 1 if a customer bought
more than three cappuccinos
in a certain week and 0 otherwise to the “coffee_weekly_trans” data set.

Please also identify the customers who are eligible for a free drink (i.e., customers whose promo = 1) in each week. Specify the customers’ IDs and Weeks. For example, “ID in Week #.”

R Code:

Answer:

5.
(12 points) Create a new data set “coffee_weekly_trans_sp” (from “coffee_weekly_trans” data set) that presents the total spending over the four weeks for each customer. Who is the most profitable customer? Specify the customer’s ID.

R Code:

Answer:

[Q8 ~ Q12]
LJ is a multichannel clothing retailer in Bloomington-Normal. The owner of LJ has collected sales data by city and detailed customers’ transaction data for the last two years. To understand the shopping behavior of its customers, he pulled up the transaction data for a month and requested you to analyze it. Please access one of his datasets,
LJ.trans.csv
(posted in “Assignments” → “Exam I” @ ReggieNet), explore it and analyze it through the following steps.

Variable description

cust.id: unique customer ID

age: customer age

email: “yes” if LJ has the customer’s email address; “no” otherwise

city: living in Bloomington or Normal

sex: female or male

distance.to.store: distance between home to the closest offline store (mile)

online.visits: total number of visits to online store for a month

online.trans: total number of transactions via online store for a month

online.spending: total spending ($) in online store for a month

online.return: total product return amount ($) via online store for a month

offline.trans: total number of transactions via offline store for a month

offline.spending: total spending ($) in offline store for a month

offline.return: total product return amount ($) via offline store for a month

review: “yes” if the customer posted her/his reviews; “no” otherwise

income: income ($)

loyalty: customer segments (high, med, low) based on customer’s level of loyalty

6.
(5 points) How many variables and observations in the
LJ.trans
dataset?

R Code:

Answer:

7.
(8 points) What are the average, median, standard deviation, and variance of the following variables?

-
age

-
distance.to.store

-
offline.spending

-
online.return

R Code:

Answer:

8.
(13 points) Suppose we have the two customer groups’ profiles as below. What is the
average offline spending
for the customer group A who match all of the following conditions? What is the
average offline spending
for the customer group B who match all of the following conditions? Which group of customers spend more in the offline store on average?

Customer group A’s profile

a. Age >= 20

b. Living in Normal

c. Female

d. Customers who posted reviews

e. Customers who live more than 10 miles (> 10) away from the store

f. Customers who never returned products through offline store

Customer group B’s profile

a. Age

b. Income

c. Loyalty level = low

d. Living in Bloomington

e. Customers who live within 40 miles (

f. Customers who never visited online store

g. Customers who purchased products through offline store at least once

R Code:

Answer:

9.
(10 points) Create a new variable,
generation_group, that represents the generation of the LJ customers and add it to the
LJ.trans
dataset. More specifically, generation_group variable takes a value “Post-Millennials” for ages between 1 and 21 (including 1 and 21), “Millennials” for ages between 22 and 37 (including 22 and 37), “Generation X” for ages between 38 and 53 (including 38 and 53), and “Baby Boomers” for ages between 54 and 72 (including 54 and 72). Please use for() and if else() functions or ifelse() function to create the
generation_group
variable.

R Code:

10.
(5 points) Boxplots are a compact way to represent a distribution of variable and good for initial data visualization. Boxplots are not appropriate for:

age

loyalty

online.return

offline.spending

income

Answer:

11.
(7 points) Write a R function that converts “kcal” to “kJ” and show how many kilojoules (kJ) of energy are found in a 700 kilocalorie (kcal) meal?

R Code:

Answer:

12.
(12 points) When you multiply the square of the cumulative sum of natural numbers 1, 2, …,
n, i.e., 1^{2 ,}(1+2)^{2 ,}
(1+2+3)^{2, …}(1+2+…+n)²find
n
that produces the product which is more than 10,000 and the closest to 10,000”
Hint: Use either while() or repeat() loop.

R Code:

Answer:

[Q15 ~ Q18]
LJ is a multichannel clothing retailer in Bloomington-Normal. The owner of LJ has collected sales data by city and detailed customers’ transaction data for a year. Specifically, he has collected net sales information by city and sales channel. After taking Marketing Analytics course at ISU, he realized that his way of collecting and storing data was wrong. That is, he has created “untidy” datasets. Please access one of his datasets,
LJ.sales.csv
(posted in “Assignments” → “Exam I” @ ReggieNet), and transform it into “tidy” dataset through the following steps.

Variable description

channel: offline or online

normal: net sales (= total sales ($) - total product return amount ($)) in Normal

bloomington: net sales in Bloomington

peoria: net sales in Peoria

champaign: net sales in Champaign

springfield: net sales in Springfield

joliet: net sales in Joliet

13.
(8 points) Step 1: Import “LJ.sales.csv” file into R first. Transform the original untidy data (LJ.sales) into “step1.data” as follows:

13.[Hint: You can load “tidyr” package by using library(tidyr)]

original

channel	bloomington	champaign	joliet	normal	peoria	springfield
offline	1130 - 313	883 - 413	1143 - 201	1978 - 225	2236 - 296	2004 - 123
online	1122 - 285	950 - 204	1098 - 187	2534 - 145	3245 - 300	2542 - 135

→ step1.data

channel	city	netsales
offline	bloomington	1130 - 313
online	bloomington	1122 - 285
offline	champaign	883 - 413
online	champaign	950 - 204
offline	joliet	1143 - 201
online	joliet	1098 - 187
offline	normal	1978 - 225
online	normal	2534 - 145
offline	peoria	2236 - 296
online	peoria	3245 - 300
offline	springfield	2004 - 123
online	springfield	2542 - 135

R Code:

14.
(10 points) Step 2: Transform the “step1.data” you created in Step 1 into the “step2.data” as follows:

step1.data

channel	city	netsales
offline	bloomington	1130 - 313
online	bloomington	1122 - 285
offline	champaign	883 - 413
online	champaign	950 - 204
offline	joliet	1143 - 201
online	joliet	1098 - 187
offline	normal	1978 - 225
online	normal	2534 - 145
offline	peoria	2236 - 296
online	peoria	3245 - 300
offline	springfield	2004 - 123
online	springfield	2542 - 135

→ step2.data

city	offline.netsales	online.netsales
bloomington	1130 - 313	1122 - 285
champaign	883 - 413	950 - 204
joliet	1143 - 201	1098 - 187
normal	1978 - 225	2534 - 145
peoria	2236 - 296	3245 - 300
springfield	2004 - 123	2542 - 135

R Code:

15.
(10 points) Step 3: Transform the “step2.data” you created in Step 2 into the “tidy.data” as follows:

step2.data

city	offline.netsales	online.netsales
bloomington	1130 - 313	1122 - 285
champaign	883 - 413	950 - 204
joliet	1143 - 201	1098 - 187
normal	1978 - 225	2534 - 145
peoria	2236 - 296	3245 - 300
springfield	2004 - 123	2542 - 135

→ tidy.data

city	offline.sales	offline.return	online.sales	online.return
bloomington	1130	313	1122	285
champaign	883	413	950	204
joliet	1143	201	1098	187
normal	1978	225	2534	145
peoria	2236	296	3245	300
springfield	2004	123	2542	135

[Hint: You may take two steps to get the
tidy.data. At first, try to create “offline.sales” and “offline.return” columns, and then create “online.sales” and “online.return” columns.]

R Code:

Oct 08, 2021

SOLUTION.PDF

[Q3 ~ Q7] Exploring and analyzing coffee shop transaction data. 1. (7 points) Create the following panel data set that consists of the unique customer ID and customers’ weekly total spending (in USD)...

Get Answer To This Question

Related Questions & Answers

Submit New Assignment