The Shopping Cart Abandonment Problem: How Machine Learning Can Help!

Published in

Product Coalition

9 min readFeb 28, 2022

One of the biggest challenges for an ecommerce store is shopping cart abandonment. As a store owner, you have done all the work to get the customer interested, finally engaged with them for them to come to your web site or the app and only to lose them when they were ready to make a purchase. That is one of the biggest nemesis for the e-commerce store owner.

Industry Stats

The average cart abandonment rate across all the industries is north of 70%. That is even worse for Mobile users which is 85% (15% higher).

E-commerce vendors lose more than $18 billion dollars in revenue every year because of this problem.

Consumer goods has the lowest shopping cart abandonment rate while Travel companies, automotive industry, fashion and luxury products have the most abandoned carts.

Top Reasons for abandonment

The reasons for cart abandonment can vary depending on the industry and depending on the product being sold:

Cart abandonment happens to both big as well as small retailers. Retailers can look at some of the reasons above and address their checkout process to be smooth as the first goal to reduce cart abandonment.

Key questions

Two important questions that need to be asked are

Can we predict card abandonments and take proactive action before it happens? Recommend the type of proactive action to take?
Can we predict the best way to convert card abandons?

In this article, we will focus on the 1st question.

Predict card abandons and take proactive action before it happens

Most users have a pattern on how they shop. How do they research products they want to buy? What kinds of products do they want to buy? What is the pattern they follow to buy the product? What is their pattern they follow if they abandon the cart? Especially if it’s an existing customer, the past purchase history can further help us predict their default behaviour.

The first thing to identify is what behaviours cause people to abandon the cart and such actions need to be automatically tracked in the ecommerce site. We need to know what are positive behaviours as shown by the users who buy the product and what are negative behaviours by users who abandon the cart.

Critical metrics to assemble for predictive models

The best way to detect cart abandon incidents is to assemble all business level KPIs and data points to train to a machine learning system and analyse the patterns that exist. Let’s take a look at a few example data points that need to be assembled that will help to do this analysis / prediction. This is a long list.

Product page metrics

How long does a user stays on a product page in the cart?
How many times did the user visit the product page in the cart in the current visit?
How many times did the user visit the product page in the last 3 visits?
How many product pages did the user visit during the session?

Website metrics

How long has the user been on the web site?
Did the user click on the help documentation?
Any website errors during the current visit?
Any Web Site performance issues?

Historical stats

Did the product or the category fall in the top frequently abandoned product or category?

Promotion metrics

Product on promotion?
Did they try to unsuccessfully add a coupon?

Cart Metrics

How many varieties of product are in the cart?
How many products do they normally checkout vs the current cart?
How much order value do they normally checkout vs the current cart?
How many times has the customer gone to the cart page in the current isit?

User profile metrics

Geographical location of the user?
Number of times a user buys the product that is in promotion vs not in promotion?
Registered user vs guest user?
Mobile or Desktop or tablet?

Past User abandonment metrics

How many times has this user left items in the cart?
Has this user abandoned more than once in the last 30 days? 90 days? 1 year?

User purchase metrics

How often does this user make the purchase in a month?
Timing closer to begin or end of the month for the purchase?
Typical dollar amount this user spends in a month?

Time metrics

Session on a weekday or weekend
Session 1 to 2 PM or 6 to 9 PM?
Session in the month of November or December ?

These are all examples of data points to assemble. Depending on the ecommerce store, not all data points may be captured. These data points provide a very good view of the customer behaviour. It makes use of user past behaviour, product stats, category stats, user group stats etc.

Please note that the time metrics are based on hypothesis that weekdays, specific timings and specific months have more abandonment. Such hypothesis can be validated with real data from a store.

This kind of data needs to be assembled from a combination of your google analytics along with your ecommerce data from Shopify or Magento or BigCommerce or WooCommerce or Weebly etc.

Applying machine learning to predict shopping abandonment

Using anomaly detection to flag possible cart abandonment

Some of the variables above are real time variables. For such variables the best way to identify issues is by doing anomaly detection. For example, say we keep track of website errors or the number of transactions per minute or number of users in the web site per minute etc.

If there is a significant change in those values from the normal level, that is a cause of concern and that will trigger an user to abandon the cart. For example, slow response or web site errors are common causes for cart abandonment. When such items are noticed a proactive action to be taken to prevent users currently using your ecommerce site from abandoning the cart.

Here is a good example of a spike in one of the real time variables being tracked (response time). That is a sign to monitor and take action for the users currently having products in the cart.

Using classification model to predict abandonment

Second is to build out a predictive model with all these data points using the past data.

To illustrate this example, let’s take a simple data set that depicts this problem. For illustration purposes I have taken a smaller subset of the input variables and smaller subset of the data to train the model using RandomForest and GBM.

Here the goal is to assemble the historical data discussed earlier to train a predictive model. This includes data from the web site (google analytics), ecommerce data, past usage analytics and web site performance data. All these data are properly merged. The training process involves taking the historical data, splitting that in to train and test data segments. You will be using the training segment to train the model and use the test segment to verify the model and calculate the accuracy., Once you are happy with the accuracy the model can be used to make such predictions on new data coming from the system.

Here is a screenshot of the sample dataset. Click on the picture to zoom.

The data set covers for a session,

Metrics from product viewed
Metrics from cart usage
Metrics from any checkouts
Past behaviour metrics
if that session resulted in an abandoned cart or not. <- Target column to predict

I have tried to keep the technical part below simple to keep the scope of the article reasonable. For those interested in learning more they can go to handling missing values, data augmentation, churn model, imbalance data and picking the right model

For the sample dataset I am using, the distribution of abandon’s carts is 85 to 15% which is called a class imbalance problem as one class is represented a lot more than the other class. You can read more about that here.

To deal with Imbalance problem there are many techniques to use. I ended up using a technique based on Smote to balance the data and calculate the accuracy metrics for the dataset.

Two models were created. One where I included the number of successful & total checkouts and the other only the total checkouts initiated.

The following are the weighted feature importance for those 2 models. Both of them the accuracy was good and comparable but the 1st model the sensitivity was high but the 2nd model it was low. Both models is showing promise and can be further improved with more data and additional parameter tweaking.

Here is the feature importance for this anonymized data. This tells you which of the data columns have the most contribution to the target variable we are trying to predict.

Model 1: Weighted variable importance with both successful and total checkouts included as input

Model 2: Weighted variable importance with only total checkouts included as input

One of the practical ways to use the variable importance is, say this dataset has more than 50 columns. One can pick the top 10 variables present in the variable importance list and use that to show in the dashboard.

Free shipping?
Discount on another related product.

That is the beauty of machine learning. Once we assemble the relevant data to illustrate the past behaviour, that data can be used to predict the right course of action. Something to consider for another article.

Solving real world problems is done by studying the past & current behavior and training multiple models that work in conjunction to predict the problem and predict the specific actions take.

Predict Shopping Cart Abandonment

With 70 to 85% cart abandonment and $18 billion in loss, there are two ways to handle it. The 1st step is to predict who will abandon and take proactive action which was the focus of this article. Second step is to take followup action to convert those who abandoned the cart.

We discussed and walked through the process of identifying the metrics to use and building out a predictive model for predicting if a user will abandon the cart or not. The simple model that was built using metrics from product views, metrics from cart usage, metrics from any checkouts and past behaviour metrics showed good promise to continue to iterate with more data.

It does not end there. Once we predict who will abandon, it starts the next process of figuring out what is the best action to prevent them from abandoning the cart. That is another predictive model we can add to the pipeline to make the entire ecommerce system intelligent and responding to user actions 24 by 7.

Special thanks to Tremis Skeete, Executive Editor at Product Coalition for the valuable input which contributed to the editing of this article.