IoT Product Manager — how to preach about data quality

Boost buy-in for your Industry 4.0 or IoT Projects

Daniel Sontag
Product Coalition

--

People support what they understand

Data is our lifeblood — at least in Industry 4.0 projects.

It gives us the insights needed to advance the value creation.

When you get started with Industry 4.0 projects, buy-in from your organization is key. For one, this requires great people skills on your part. But a basic understanding for data throughout the company is just as important.

Some departments might not know where your hunger for data comes from.

In my experience, talking about data basics is the way to go. It brings everyone on board and boosts interest in tech across departments. Benefits of a higher “data literacy” include:

  • Common understanding why data is important, less confusion
  • Creative input will pour in from your fellow co-workers. They may have good use cases for data.
  • Helps to collect relevant data sets of good quality

Basics

What is data anyway?

Data can be qualitative or quantitative raw information.

We can gather electric signals from sensors or from manual user input.

Why should we care about data?

“Data is the new oil” (The Economist)

Whether you agree with The Economist (and A. Banga, CEO of MasterCard) or not (like the WEF), data is a great source of value. But to realize this value, much like in the analogy of oil, there is some ground work necessary.

Instead of oil wells and refineries, you will need the following to fuel your data driven IoT projects:

  • Access to relevant data
  • Access to data sets of significant size
  • Access to data of quality / ability to refine the data

An easy access to the raw data is a major enabler for your IoT project’s success.

But how to achieve that?

Your odds are the best when you can leverage your organization to help you gather data and insights.

The basis is a common understanding in the organization what data quality means. Also the opportunities from having access to relevant data need to be clear.

Add in some basic knowledge about data in general and about the ways we can work with it.

This is what we can call basic “data literacy”

Putting Data to Work

What makes good data?

Unfortunately, we live in a messy world, this also applies to data. Where you would expect nicely structured and complete data sets, you might (will!) be disappointed. So, when talking about data sources, we should aways ask about the quality of the raw information. According to the definition of data quality, it should be:

  • Correct (unbiased, not skewed and no duplications)
  • Complete (no gaps in the data)
  • Significant (enough data to do perform simple analytics)
  • Relevant (should be relevant to the analytics to perform)
  • Timely (fresh data helps us in time critical decisions)

As you won‘t get this sort of quality in your initial data set, this is the first task of data science — ”cleaning up” the data.

If your organization is aware of this, they can help you to receive data sets of higher quality.

Where does the data come from?

Data can come from sensors, in which case we need to be sure that no data is lost on the way to the database.

Also, we want to ask in which intervals or at which time the sensors send data.

Another data source is human input — but to err is human. So, depending on where the data comes from, the data may be biased. If data comes from people or institutions you should know their intentions to judge the credibility.

If data comes from sample investigations, it should be sure that the sample is representative.

How large should the minimum dataset be?

With a small dataset your insights might be skewed and lead to misleading results. Wrong insights and decisions are the result.

A rough guideline for the minimum dataset comes from the number of variables:

(x * y + z * 10) = minimum number of rows

x number of categories with y levels z number of continuous variables

Example:

12 categories with 5 levels each and 15 continuous variables, leads to a minimum dataset of (12 * 5 + 15 * 10) = 210 rows.

What makes data relevant?

Data is everywhere.

But when selecting the data for your dataset, it pays to take a little time to think about relevance. So, what data should you pick?

For example:

1) A table with “sensor data”, “operator age”, “opening speed of front gate” → irrelevant combination of variables

2) A table with “timestamp”, “sensor data”, “machine status”→ potentially relevant combination

Most probable, it will not be as simple as in the examples above to distinguish between relevant and non relevant data.

Common practice is to start with the problem you’re trying to solve and to identify the influencing factors. Or, ask how a human would solve the problem and the data he would probably look up to do it.

Source: Pexels

How do we work with data

So, why do we put up this effort to gather and clean data? These are 5 typical questions we might be able to better answer with the help of data:

  • “Which answer is most probable?” Classification which is useful for two classes (Will the machine fail soon?) or multiple classes (which pricing model is the most profitable one?)
  • “How much/many?” Using regression to forecast or predict a value. (What is the expected average productivity of a certain machine over a year?)
  • “Is something different?” Anomaly detection i.e. to figure out whether data is out of the expected range (Is the pressure critical?)
  • “How is the data structured?” Forming data categories by clustering, (Which machine types fail in a similar way?)
  • “What should I do now?” Either guiding humans in making decisions or making autonomous decisions. (Should the drive speed be reduced to enhance longevity of the part?)

Final thoughts

Further basic concepts which you will be able to introduce around data management:

  • Statistical basics (Median, mean, standard deviation, trends, correlations)
  • Data visualization (How to show data in a comprehensible format to tell a story)
  • Data management technologies (SQL database and communication standards)

The digitization of the organization has a much wider scope than the outline developed above.

In the scope of an enterprise, the needs of several stakeholders, such as the different departments need to be considered.

Forbes suggests a “democratization of data”, where all stakeholders can access a common data pool and derive relevant insights from it.

Daniel Sontag connects the bots: As Industry 4.0 lead and manager for connected products, he does what he loves — tying business to tech, and theory to practice.

Hi, great you enjoyed the article! Feel free to give the applause button a few good clicks or leave a short response below, thanks.

Stay tuned: On The Industry 4.0 Blog and on LinkedIn

--

--