Identifying Substitutable Goods using Large-scale Shopping Cart Basket Data across Retailers & Geography
A fundamental challenge of analyzing consumer behavior data is identifying substitutable goods (substitutes). Large-scale shopping cart data offers several opportunities for analyzing consumer purchase behavior at a granular level, such as in uncovering product substitutes. In a previous blog post, we demonstrated how these data can be applied in a sequential probabilistic model of shopping baskets known as “SHOPPER,” developed by Ruiz, Athey, and Blei to identify substitutable goods for several products for one retailer [1]. The data showed that the identified substitutes typically belong to the same category as the products that were being analyzed. For instance, Doritos Nachos resulted in other chips items as substitutes, Cinnamon Toast Crunch yielded other cereal items, and Fresh Strawberries presented other fruit items as substitutes. Second, within the same category we observe intuitively related substitutes. Many cheese-based chips were suggested as substitutes for Doritos Nacho Cheese; Premium Strawberries had the highest substitution score for Fresh Strawberries; and many other high sugar cereals were suggested as substitutes for Cinnamon Toast Crunch (instead of more healthier cereals such as Cheerios).
In this blog post, we expand on our previous analysis by analyzing shopping cart data from several retailers across various geographical regions. Similar products are queried to investigate how the ranked lists of substitutable products vary across different retailers and geographies. These results provide insights into what products to substitute at various retailers and for the same retailers in different geographies. The next section discusses the data used for this analysis. The results section reports our findings and interprets them. The conclusion section provides insights gained from analyzing these results.
Brief Description of Data and Substitution Scores
The large-scale shopping cart data we analyze is a subset of Numerator’s extensive consumer panel data. Previously, we examined a small slice of Numerator’s data from 2021-2022 for one retailer (anonymized for data privacy and which we call retailer “A”) for the United States [2]. For this study, we first analyze how results vary across various regions, by subsampling previously collected data from a single retailer for two states, Indiana and Ohio, as well as for the United States as a whole. This helps us understand whether substitutable products vary within the same retailer across different states and on a national level. Next, we collect additional data from two other retailers (anonymized for data privacy and referred to as retailer “B” and “C”) for the same timeframe of 2021-2022 across the United States. This augmented dataset includes over 10,000 items, 320,000 shoppers, 3,400,000 shopping trips, and 13,000,000 purchases. These additional data allow us to compare how ranked lists of substitutable products vary between retailers within the same geography (i.e., the United States) and during the same time period 2021-2022.
As mentioned in the methodology section of our previous blog post, we implement a sequential probabilistic model that exploits shopping trip data to estimate substitution scores, which range from 1 (most substitutable) to 0 (not substitutable). These substitution scores are based on estimating the conditional probability of selecting a product, given what other goods are already in a household’s basket. We estimate the model for these additional retailers and geographical regions to observe the variation in substitutes retrieved, with the results report in the next section. Although the substitution scores estimated cannot be directly compared across retailers and geographical regions, the ranked lists provide a lot of insight into what is being substitute and the variation in consumer preferences for certain products.
Examples of Substitutes for various products across retailers and geographies
Previously, we provided a list of the top substitutes for three different products from the same retailer A. Our current analysis illustrates variation in substitutes for Doritos Nacho Cheese depending on location. Table 1 presents the top 5 substitutes at state and national levels within the same retailer.
There are a several insights to highlight. First, the list of substitutes often includes many of the same products, such as Cheetos Crunchy and Fritos Original, across different regions. This finding suggests that across certain regions the next best products tend to be very similar. However, these results could look different if the geographical regions chosen were farther apart, either physically and/or culturally. For example, analyzing different countries or states that were further apart may be a different ranked list. Nonetheless, the substitutable products are in the same category (i.e., chips) as the queried product. Second, cheese-based products frequently appear on the list of substitutes, which makes sense given that the queried product is a cheese-based chip product. Third, while many of the same products are included in the list of substitutes, their exact rankings differ slightly, suggesting there are regional effects on product preference. These results provide insights into the next best option if a particular product, in this case Doritos Nacho Cheese, is not available in the store or priced too high relative to its competitor brand.
Table 1: Substitutions for Doritos Nacho Cheese 9.25 OZ based on location |
---|
Indiana | Ohio | United States | |||
---|---|---|---|---|---|
Item Substitutes: | Scores: | Item Substitutes: | Scores: | Item Substitutes: | Scores: |
Lays Classic, 8 OZ | 0.875 | Doritos Cool Ranch, 9.25 OZ | 0.905 | Ruffles Original, 8.5 OZ | 0.944 |
Fritos Original, 9.25 OZ | 0.830 | Doritos Spicy Sweet Chili, 9.25 OZ | 0.872 | Doritos Spicy Nacho, 9.25 OZ | 0.935 |
Ruffles Original, 8.5 OZ | 0.823 | Ruffles Cheddar and Sour Cream, 8 OZ | 0.863 | Fritos Chili Cheese, 9.25 OZ | 0.934 |
Lays Barbecue, 7.75 OZ | 0.815 | Cheetos Crunchy, 8.5 OZ | 0.849 | Cheetos Crunchy, 8.5 OZ | 0.928 |
Lays Wavy Original, 7.75 OZ | 0.790 | Cheetos Flamin’ Hot, 8.5 OZ | 0.847 | Fritos Original, 9.25 OZ | 0.927 |
The next set of results illustrates variation in Doritos Nacho Cheese Substitutes across different retailers at the national level. Table 2 displays these results for three retailers and highlights the variation in the list of substitutes obtained. Similar to the results in Table 1, comparable products are identified as top substitutes across retailers and all the suggested substitutes are in the same category as the queried product. However, there is slightly more variation in the substitutes retrieved. This variation could be due to the differences including, but not limited to, product assortment/display across retailers, product availability, pricing, and promotion/advertising strategies. Regarding product assortment across retailers, products highlighted in blue are available in all three retailers, those in yellow are available in two retailers, and those in red are available in only one retailer. In addition, there could be more variation in the ranked list of substitutes depending on the channel of the retailer being analyzed.
Table 2: Substitutions for Doritos Nacho Cheese 9.25 OZ based on retailer |
---|
Retailer A | Retailer B | Retailer C | |||
---|---|---|---|---|---|
Item Substitutes: | Scores: | Item Substitutes: | Scores: | Item Substitutes: | Scores: |
Ruffles Original, 8.5 OZ | 0.944 | Funyun Onion Rings, 6 OZ | 0.918 | Lays Salt & Vinegar, 7.75 OZ | 0.971 |
Doritos Spicy Nacho, 9.25 OZ | 0.935 | Fritos Original, 9.25 OZ | 0.917 | Lays Sour Cream and Onion, 7.75 OZ | 0.968 |
Fritos Chili Cheese, 9.25 OZ | 0.934 | Lays Barbecue, 12.5 OZ | 0.891 | Ruffles Source Cream and Onion, 8 OZ | 0.968 |
Cheetos Crunchy, 8.5 OZ | 0.928 | Lays Sour Cream and Onion, 7.75 OZ | 0.890 | Doritos Cool Ranch, 9.25 OZ | 0.965 |
Fritos Original, 9.25 OZ | 0.927 | Lays Sour Cream and Onion, 12.5 OZ | 0.890 | Lays Original, 7.75 OZ | 0.965 |
Tables 3 and 4 provide additional results for two products, Cinnamon Toast Crunch and Reese’s Peanut Butter Cups (PBC), in different categories across three retailers. We observe similar results for these two queried products. Cereal and candy products were retrieved as substitutes for Cinnamon Toast Crunch and Reese’s PBC, respectively. Interestingly, for Reese’s PBC the same brand, but a different package size is found as a potential substitute for two of the three retailers. Also, in Table 4 the most substitutable item at Retailer B, Zero, 1.85 OZ, is not carried by the other two retailers. These additional results highlight that potential sources of variation in the ranked list and substitutes obtained could again be due to product assortment/display variation across retailers.
Table 3: Substitution Scores for Cinnamon Toast Crunch, 12 OZ based on retailer |
---|
Retailer A | Retailer B | Retailer C | |||
---|---|---|---|---|---|
Item Substitutes: | Scores: | Item Substitutes: | Scores: | Item Substitutes: | Scores: |
Frosted Flakes, 11 OZ | 0.932 | Frosted Flakes, 13.5 OZ | 0.866 | Reese’s Peanut Butter Puffs, 11.5 OZ | 0.965 |
Lucky Charms, 10.5 OZ | 0.915 | Cinnamon Toast Crunch, Chocolate, 12.4 OZ | 0.848 | Fruit Loops, 10.1 OZ | 0.959 |
Trix, 10.7 OZ | 0.914 | Cookie Crisp, 10.6 OZ | 0.827 | Lucky Charms, 14.9 OZ | 0.951 |
Apple Jacks, 10.1 OZ | 0.896 | Reese’s Peanut Butter Puffs, 11.5 OZ | 0.827 | Apple Jacks, 10.1 OZ | 0.950 |
Fruit Loops, 10.1 OZ | 0.896 | Trix, 10.7 OZ | 0.821 | Frosted Flakes, 13.5 OZ | 0.939 |
Table 4: Substitution Scores for Reese’s Peanut Butter Cup 1.5 OZ based on retailer |
---|
Retailer A | Retailer B | Retailer C | |||
---|---|---|---|---|---|
Item Substitutes: | Scores: | Item Substitutes: | Scores: | Item Substitutes: | Scores: |
Reese’s PBC, King Size, 2.8 OZ | 0.975 | Zero, 1.85 OZ | 0.958 | M&Ms, 1.74 OZ | 0.985 |
Snickers, 52.7 GM (1.86 OZ) | 0.975 | Hershey Cookies n Creme, 1.55 OZ | 0.957 | Kit Kat, 1.5 OZ | 0.984 |
Kit Kat, 1.5 OZ | 0.974 | Reese’s PBC, King Size, 2.8 OZ | 0.956 | Snickers, 52.7 GM (1.86 OZ) | 0.984 |
M&Ms, 1.74 OZ | 0.973 | Twix, King Size, 3.02 OZ | 0.954 | Twix, 1.79 OZ | 0.983 |
York, 39 GM (1.38 OZ) | 0.964 | Reese’s Big Cup, 1.4 OZ | 0.951 | Hershey Almond, 1.45 OZ | 0.981 |
The results shown above provides retailers and consumer packaged goods (CPG) companies with insights into which products are most likely to be substituted for a product of interest based on analyzing millions of shopping trips. Although the substitution scores cannot be directly compared across retailers and geographical regions.
In addition, we have illustrated that results could vary depending on the geographical region or retailers being analyzed. This allows a potential user to understand what are the next best products to stock on a retailer’s shelf if a particular product is unavailable.
Conclusion
This article expands on the previous substitution results by obtaining substitutes across retailers and geographical regions. By expanding the analysis, the data illustrates that a ranked list of substitutes can vary depending on the retailer or geographical regions being analyzed. Through comparing retailers and locations with different product assortments, we can identify alternative substitutes that may score higher than the current assortment in a retailer’s stock, or discover different product assortments that may perform better in certain regions. These insights allow a user to understand what are most likely going to be the next best products to purchase which could be different depending on the retailer or geographical region. If you’re interested in learning more about our substitutes solutions, how substitutes are constructed, and what insights we can uncover, please feel free to reach out at info@tickr.com. We hope you found this article helpful and look forward to hearing from you!
Citations
[1] Ruiz, F. J. R., Athey, S., & Blei, D. M. (2020). SHOPPER: A probabilistic model of consumer choice with substitutes and complements. The Annals of Applied Statistics, 14(1). https://doi.org/10.1214/19-aoas1265
[2] Numerator (2024). Numerator OmniPanel Data. Numerator https://www.numerator.com/omnipanels/
- Publish Date
- July 23rd, 2024
- Abstract
- In a previous post, substitutable goods were identified by leveraging large-scale shopping cart data to estimate a sequential probabilistic model called “SHOPPER,” developed by Ruiz, Athey, and Blei. In this post, the analysis is expanded by analyzing shopping cart data from several retailers across various geographical regions. Similar products are queried to investigate how the ranked lists of substitutable products vary across different retailers and geographies. These results provide insights into what products are substituted at various retailers and for the same retailers in different geographies.