Skip to main content

Amazon upgrades AI model for 'Just Walk Out'

Amazon Multimodal AI model (Source: Amazon)
Amazon multimodal AI model. (Source: Amazon)

Amazon is streamlining its frictionless shopping solution with advanced artificial intelligence capabilities.

The omnichannel giant is deploying  a new multi-modal AI foundation to support its "Just Walk Out" checkout-free platform, which is used by Amazon in its physical stores and also licensed to other retailers. The new model analyzes data from cameras and sensors throughout the store simultaneously, instead of looking at which items shoppers pick up and put back in a linear sequence.

This enables the Just Walk Out system, which is based on generative AI technology, to analyze multiple store information inputs, such as cameras and weight sensors, and prioritize the most important data to accurately determine the variety and quantity of items selected. 

It also uses continuous self-learning and transformer technology, a type of neural network architecture that transforms inputs such as sensor and image data into receipts for checkout-free shopping.

Previously, the AI system underlying Just Walk Out analyzed shopper behavior, such as movement and location in the store, what they picked up, and the quantity of each item, sequentially, processing one after another. 

However, in unusual or novel shopping scenarios, such as if a camera view was obscured due to bad lighting or a nearby shopper, the sequential approach could take time to determine purchases with confidence, and sometimes required manual retraining of the model.

The new Just Walk Out AI system is designed to able to achieve higher levels of accuracy by analyzing all sensor data simultaneously, rather than sequentially. For example, a shopper might pick up and put down multiple varieties of yogurt, in different combinations, and as they are doing so, another customer might reach for the same item, or the freezer door could fog up, obscuring the cameras’ view.

In more complex situations like these, the new model is developed to quickly and accurately determine the actual items taken by each shopper by simultaneously processing inputs from various sources, such as weight sensors on the fridge shelves, continuously learn from these inputs, and decide which are most important in order to accurately determine what an individual shopper is taking.

Advertisement - article continues below
Advertisement

“A single, generalized AI model can produce results equally as good as a model that would be overtrained or overfit on a subset of the data,” Jon Jenkins, VP Just Walk Out technology, Amazon, said during an invitation-only press tour attended by Chain Store Age at Amazon offices in Seattle. “We have been working on the underlying transformer technology for years. The model allows us to generate receipts faster and more accurately and efficiently."

"Previously, there was a model to detect if someone's hand went into a product space," Jenkins explained during the presentation. "Then there was a model to determine if the image of the item that came out of that product space looked like the item that we thought was there. Then there might be a model for counting the number of items that came out of that space. And what Amazon has learned is like what has been learned in the large language model (LLM) AI space elsewhere, which is you can actually combine all of these data inputs into a single model."

According to Amazon, Just Walk Out is currently available in over 170 third-party locations such as airports, stadiums, universities, and hospitals in the U.S., the U.K., Australia and Canada. Amazon plans to launch more Just Walk Out stores in 2024 than any year prior, more than doubling the number of third-party stores with the technology this year.  

"The new, Just Walk Out multi-modal foundation model for physical stores is a significant advancement in the evolution of checkout-free shopping," Jenkins said in a corporate blog post. "It increases the accuracy of Just Walk Out technology even in complex shopping scenarios with variables such as camera obstructions, lighting conditions, and the behavior of other shoppers, while allowing us to simplify the system."

X
This ad will auto-close in 10 seconds