4M Scanner Data
Many supermarkets offer customers discounts in return for using a shopper’s card. These stores also have scanner equipment that records the items purchased by the customer. The combination of a shopping card and scanner software allow the supermarket to track which items customers regularly purchase.
The data in this table are based on items purchased in 54,055 shopping trips made by families participating in a test of scanners. As part of the study, the families also reported a few things about themselves, such as the number of children and pets. These data are the basis of the following probabilities.
This table shows the number of dogs owned by the person who applied for the shopping card. The table also shows in rows the number of dog food items purchased at the supermarket during each shopping trip in the time period of the study. The table gives the probability for each combination.
Motivation
(a) Should markets anticipate the row and column events described in this table to be independent?
(b) What could it mean that the probability of customers with no dogs buying dog food is larger than zero?
Method
(c) Which conditional probability will be most useful here? Will it be more useful to markets to calculate row probabilities or column probabilities?
(d) The smallest probabilities in the table are in the last column. Does this mean that owners of more than three dogs buy relatively little dog food?
(e) These probabilities come from counts of actual data. What would it mean if the probabilities added up to a value slightly different from 1?
Mechanics
(f) Expand the table by adding row and column marginal probabilities to the table.
(g) Find the conditional probability of buying more than three items among customers reported to own no dogs. Compare this to the conditional probability of buying more than three dog food items among customers reported to own more than three dogs.
(h) If a customer just bought eight cans of dog food, do you think that she or he owns dogs? Give a probability.
Message
(i) What do you conclude about the use of this version of the scanner data to identify customers likely to purchase dog food?