How to Measure & Optimise Your Predictive Model for Prime Time?

Published in

Product Coalition

10 min readFeb 16, 2022

Most companies these days have a healthy dose of Artificial Intelligence at the centre of their technology spectrum. Artificial Intelligence is no longer just a buzzword. Many companies are rolling out applications that make use of this technology at its core.

A product manager identifies the needs of the customer and the business objectives, communicates clearly defined metrics of success and collaborates with the team to make it a reality. Great product managers help teams find and prioritise the most impactful ideas to work on.

A product has to solve a real-world problem. Technology is a means to solve the problem. The important part is what problem are we solving, who are we solving it for and how is our solution better than others? In this context, when your product has a predictive model in the solution spectrum, product managers need to really understand and evaluate what are we optimising the predictive model for?

Questions a product manager need to think about evaluating a predictive model for the product need is

What is the business problem the predictive model is solving?
If the prediction is incorrectly predicted as true, what is the impact on the business? Cost impact of the incorrect prediction?
If the prediction is incorrectly predicted as false, what is the impact on the business? Cost impact of the incorrect prediction?
Which has a severe impact? Incorrectly predicting as True or incorrectly predicting as False.

Depending on the business problem, one has to identify the tradeoffs and decide on what to optimise to reduce the impact of the wrong predictions.

Use Case example

Let’s take a simple example to illustrate this problem. Let’s say our predictive model is to identify who are high risk patients (of say cancer).

We have two problems here

The predictive model sometimes tags a low risk patient as a high risk patient.
The predictive model sometimes tags a high risk patient as a low risk patient.

Which is a problem we want to optimise?

If the model misses lot of high risk patients, that may lead to a catastrophic outcome and is not good
On the other hand, if it goes overboard and tags a lot of patients as high risk, you are actually capturing quite a number of the high risk patients which is good. But on the downside, a lot of low risk patients are tagged as high risk and that is not good either and is an unnecessary nuisance for the patients.

The best model is the one where we catch as many high risk patients and reduce the number of low risk patients incorrectly tagged as high risk patients.

The 4 outcomes mentioned above can be classified as

Patients predicted as high risk and are actually high risk patients. They are called True Positives
Patients predicted as high risk and are incorrectly predicted as high risk patients. They are called False Positives
Patients predicted as low risk and are actually low risk patients. They are called True Negatives
Patients predicted as Low risk and are actually high risk patients. They are called False Negatives

Visualise the predictive problem

Let’s illustrate that with real numbers.

To visually explain this problem, let’s say my data has 20 patients.

Out of those 20 customers, 12 are low risk patients (green) and 8 are high risk patients (red). Those are the actual outcomes. Now, let’s apply the trained model to predict who are high risk patients.

Predicted result from the model

These are the predicted results from the model.

It predicted some correctly and it predicted few incorrectly. Wrong predictions are marked in purple. For example, it marked some of the high risk patients as low risk and some of the low risk patients as high risk.

The 4 outcomes mentioned earlier for the above scenarios are

Patients predicted as high risk and are actually high risk patients. They are called True Positives = 6
Patients predicted as high risk and are incorrectly predicted as high risk patients. They are called False Positives = 4
Patients predicted as low risk and are actually low risk patients. They are called True Negatives = 8
Patients predicted as Low risk and are actually high risk patients. They are called False Negatives = 2

Tip

The way to remember these buzzwords is…False Positive is incorrectly predicted as positive (aka incorrectly predicted as high risk) and False Negative is incorrectly predicted as negative (aka incorrectly predicted as low risk).

Lets analyze the outcome in practical ways:

10 patients are predicted as high risk and out of them only 6 predicted correctly as high risk & 4 predicted incorrectly as high risk. This metric is called Precision which tells what fraction of the predicted high risk patients are actually high risk patients. It’s simply the ratio of correct positive predictions out of all positive predictions made. So here 6 correct predictions out of 10 positive predictions. So its 6 / 10 = 0.6 = 60%. This means 60% of the predicted high risk cancer patients actually are high risk and the rest 40% are not high risk cancer patients. This is not the best model as 40% of the patients are subjected to unnecessary hardship/trouble.

6 high risk patients are correctly predicted as high risk and 2 of high risk patients predicted incorrectly as low risk. This metric is called Recall which tells you how many are correctly predicted as high risk patients out of all the actual high risk patients. It’s simply the ratio of correct positive predictions out of all the positive observations. So here we have 6 correct high risk cancer predictions out of 8 actual high risk cancer patients. So its 6 / 8 = 0.75 = 75%. This means 75% of the actual high risk cancer patients were identified and it missed 25% of the high risk cancer patients. This is not the best model as 25% of the high risk patients are missed.

As a product manager you want both precision and recall to be high. Meaning you want less number of people incorrectly tagged as high risk cancer patients and you want all the high risk patients correctly tagged as high risk.

Model retraining / tweaking

This is when the product manager provides this feedback to the engineering team and they make tweaks to the model / retrains model with additional data and come back with an updated model. Note that in reality model training never stops. This is because your model is sensitive to changes in the real world, and user behaviour keeps changing with time. Although all machine learning models decay, the speed of decay varies with time.

The next section talks about the predictions made by the new updated model.

Updated model evaluation — 1

The following is an output of the updated model. Let’s review how this model performs.

Precision is a fraction of the predicted high risk patients are actually high risk patients. 9 are predicted high risk patients and out of them 7 are high risk. So Precision is 7 / 9 = 0.777 = 77.8%

Recall is how many are correctly predicted as high risk patients out of all the actual high risk patients. 8 are high risk patients and 7 of them were correctly identified as high risk. So Recall is 7/8 = 0.875 = 87.5 %.

You want less number of people incorrectly tagged as high risk cancer patients and you want all the high risk patients correctly tagged as high risk. We are getting there with the model.

Let’s say the engineering team continue to tweak the model and further improve the accuracy.

Updated model evaluation — 2

The following is an output of the new updated model. Let’s review how this model performs.

Precision is a fraction of the predicted high risk patients are actually high risk patients. 9 are predicted high risk patients and out of them 8 are high risk. So Precision is 8 / 9 = 0.888 = 88.9%

Recall is how many are correctly predicted as high risk patients out of all the actual high risk patients. 8 are high risk patients and 8 of them were correctly identified as high risk. So Recall is 8/8 = 1 = 100 %.

You want less number of people incorrectly tagged as high risk cancer patients and you want all the high risk patients correctly tagged as high risk.

We are almost there. We are able to identify 100% of the patients who are high risk patients. And only 11% of the patients are incorrectly tagged as high risk patients.

Can we get 100% of both? That is very hard. Here what are we optimising for?

We want to make sure we get every high risk cancer patient correctly identified so we do not miss any of the high risk cancer patients. While doing that we also want to minimise the number of patients incorrectly marked as high risk cancer patients. That is the optimization we are doing with this model.

High Risk vs High Recall

What optimization you do is very dependent on the problem you are trying to solve, There is always a trade-off. It very much depends on your product and what your customer wants.

High Recall, Low Precision. This means all the high risk patients are tagged as high risk but also many low risk patients are also tagged as high risk. That is not good.

Low Recall, High Precision. This means all the predicted high risk patients are actually high risk patients but many high risk patients are also incorrectly tagged as low risk patients. That is not good either.

For your business problem, you need to evaluate what you want to optimise for, High recall or High Precision. And if it’s High Recall, how much precision can you live with and the associated cost. Medium Precision? High Medium Precision? And if it’s High Precision, how much Recall can you live with and the associated cost. Medium Recall? High Medium Recall?

The same problem dependending on the customer you will have different optimizations.

A patient would rather want to be tagged as high risk and evaluated and treated early versus tagged as not a high risk patient and miss the opportunity for early treatment and finding out you have cancer very late in the cycle.

On the other hand, insurance companies would not want unnecessary costs for too many screening / treatments to find out they are not well. Or do they want to catch patients early so they can save on the treatment by treating early? Both are valid optimization questions.

What does optimization mean for your business?

Recall Optimization: You want 100 % of high risk patients captured. The fallacy of this is, we are including a larger number of patients as high risk and some of these patients actually may be low risk.

Precision Optimization: You want 100 % of high risk patients predicted as high risk are actually high risk. The fallacy of this is, we are missing a number of patients who are actally high risk patients that are incorrectly tagged as low risk.

Search results example: Let’s take another example. You make a search using a search engine and you get the number of documents as the search results. Precision is what % of the returned documents are the relevant documents. Recall is what percentage of the actual relevant documents are returned by the search engine. Recall tells you how well a search finds relevant documents. You want to be able to return as many relevant documents as possible (high recall) but be good enough to avoid returning too many irrelevant documents (low precision). If the precision is low, the user has to read the documents the search returned manually to remove irrelevant documents (false positives). Dream is to only return all the relevant documents which is high recall and high precision which is hard to achieve.

Bottomline

As a product manager, you need to clearly understand what is the business problem you are trying to solve with the predictive model and what does your end customer want to optimise for? What is the cost of False positive? What is the cost of False Negatives? All those have an important decision factor in choosing what to optimise. What trade-offs you want to focus on so you can optimise for one problem and keep the other problem with a manageable number.