Identifying Substitutable Goods using Large-scale Shopping Cart Basket Data

This post addresses the challenge of identifying substitutable goods in consumer behavior analysis by leveraging large-scale shopping cart data. We introduce a sequential probabilistic model called “SHOPPER” developed by Ruiz, Athey, and Blei, to analyze hundreds of thousands of shopping trips encompassing millions of transactions. Our approach captures the complexity of consumer purchase behavior, considering various factors such as shopping purpose, seasonal variations, in-store promotions, and personal preferences.

Identifying Substitutable Goods using Large-scale Shopping Cart Basket Data across Retailers & Geography

In a previous post, substitutable goods were identified by leveraging large-scale shopping cart data to estimate a sequential probabilistic model called “SHOPPER,” developed by Ruiz, Athey, and Blei. In this post, the analysis is expanded by analyzing shopping cart data from several retailers across various geographical regions. Similar products are queried to investigate how the ranked lists of substitutable products vary across different retailers and geographies. These results provide insights into what products are substituted at various retailers and for the same retailers in different geographies.

Digital Twins – Generative AI Extensions of Qualitative Survey Data

By creating a retrieval augmented LLM pipeline, (RAG with OpenAI’s GPT-4o), we explored the capability of LLMs to ‘twin’ real respondents in a publicly available survey. By providing the model context of previously answered questions from a particular respondent, we were able to achieve far better than random accuracy at masked questions. This opens the door to utilizing LLMs to augment known sentiment, extending surveys beyond the original question set and providing a far more flexible framework for working with qualitative data from real respondents and their ‘digital twins’.