Ecommerce and Data Science: It’s Not as Simple as You Might Think

Alex Jonas
Product Coalition
Published in
4 min readJun 10, 2019

--

Background

Machine learning and data mining seem to be all the rage these days in web-tech circles. Looking back though, there is nothing actually new or cutting edge about the science behind these terms. They’ve been around since the 1960s and were used to get us to the Moon. At their core, machine learning and data mining are rooted in simple statistics equations created to assess large data sets in order to find patterns and make predictions. What has changed recently however, is their accessibility to the non-mathematician. Whereas there was once a very manual process involved in dividing data sets into training and testing data to run complex regression or classification simulations, now, Automation is widening the playing field with the appearance of programs such as R and Tableau. So what’s stopping you from downloading these programs and putting them to use for your website? Nothing. But don’t expect an immediate lift in sales or conversions. Here’s why:

As mentioned above, data analysis is about using an existing dataset to create a model to predict an outcome. Think about that for a second... How much value does prediction using historical data really have on the web? Clearly useful for static simulations such as predicting weather or traffic, such a backward looking approach may not be as applicable to the dynamic structure of online activity. This hasn’t deterred many sites though. In fact, investments into Personalization and User Segmentation are very of-the-moment. They are no longer distant goals on the horizon, but are expected to drive digital growth today and into the future especially as Ecommerce and media consumption continue to get more and more competitive.

Consider this, data mining allows you to answer the following questions: Who are you? and what are you likely to do on my site? It can predict, but can’t infer. Perhaps it’s better to answer the questions: Who are you? And what do you likely want to do on my site? The first set of questions makes the assumption that the website already exists in a way that the user can do anything he/she pleases. In reality, the UX and design pushes the individual in certain directions. If you don’t find the same answer for the two sets of questions above, then your site may be designed the wrong way.

Audience Segmentation and Personalization

PC: Piwik Pro Blog

Before beginning any kind of machine learning project, it makes sense to set some high level goals and targets. These can be as simple as trying to accurately answer a few questions as mentioned above. Gaining insights into some of the tasks your users are trying to accomplish can be very valuable. Acting on them through personalization can be prove to be very difficult however as describes Paul Boag from eCommerce platform Shopify: “Many seem to see providing a personalized ecommerce experience as a guarantee of success. But the reality is more complex.” Just simply looking at who someone is or where they are searching from may not be enough to result in a noticeable sales lift. Boag continues, It doesn’t matter if you recommend products if I’m busy doing something else. It’s more important that your content is timely and relevant, than personalized.” Timely and relevant suggests that it may be more valuable to focus on soft or variable traits rather than hard or static ones. For example, try how long someone has been on the site rather than if he is a retired police officer and train-enthusiast. Unless, that is, you sell toy trains. The point here is that various segmented experiences can quickly get complex and difficult to maintain.

Sarah Chambers from A/B testing giant VWO expands upon this idea of complexity by remarking how “Personalization isn’t a set-it-and-forget-it tactic.” Not only do you have to make sure that each individual experience is optimized for each unique user, but you also need to make sure data sources are up to date. After all, data mining is only useful if your model can provide actionable intel to predict customer activity. If the training dataset is not recent, then the model can potentially hurt more than it can help. This is something that Utsav Kaushish, a former Facebook engineer has come face-to-face with on many occasion. He remarks how it is difficult to maintain “sample sizes necessary for a rigorous quantitative analysis.” His solution is to keep things simple. And despite the power of machine learning suggests a more personal relationship with a user: “there are times when you have to dig deeper than possible with a query or a regression model […] simply talk to the people you want to use your product.” Speaking to customers directly can be expensive and time consuming, but has the advantage of providing clarity. There is a smaller chance of misinterpreting the customer’s expectations from a face-to-face interview than from tracking their activity online.

Now don’t mistake a hesitancy for pursuing Machine Learning with the assumption that it provides no value. If executed diligently and with sufficient resources, data solutions such as digital targeting and activity anticipation can have a strong positive impact on KPIs. Just be careful at gauging the level of effort required to do this right. From truly understanding customer intentions, to gauging ongoing maintenance costs, to pinpointing descriptive enough input variables there is a lot that can go wrong and throw your model off course. After all of that investment, are you really going to see a strong return? Maybe. But, maybe there is lower hanging fruit that can have just as big if not a bigger impact for your business.

--

--

Digital Product for Comcast Business. MBA — Johns Hopkins Carey Business School.