I interviewed Edith Simchi-Levi who discussed How to Deal with Imperfect Data in Supply Chain Analytics.
Please provide a brief background of yourself?
I have a computer science education and software development background which in the last twenty years has been primarily devoted to supply chain management applications. This came about through collaboration with my husband Professor David Simchi-Levi from MIT, who is a well known supply chain expert. Our collaboration has been both on the business side as well as some very well received publications,including the text book “Designing and managing the supply chain”.
In our first company, LogicTools, we developed software for network design, inventory optimization and a few other applications. The company was sold to ILOG in 2007 and is now part of IBM Business Analytics Solutions. Our software has been used by over 350 companies and by over 50% of the AMR/Gartner top 50.
I am currently VP Operations of our second venture - OPS Rules,a consulting company focused on analytics and optimization. Our goal is to help companies identify and capture hidden opportunities in their supply chains and operating models by becoming more data-driven.
Many of the ideas are based on David’s recent book “Operations Rules” which highlights the scientific rules of how the supply chains work as well as a new approach to crucial operations topics such as complexity, risk and flexibility.
What are some of the main causes of imperfect data in supply chain analytics?
In the operations area, many strides have been made to consolidate master data through Enterprise Resource Planning systems, allowing for a single source of truth and providing more real-time updates. But companies still struggle with serious data quality issues. This is one of the most common things we see and this is true across many industries.
The three major symptoms:
First, Inconsistency - different results depending on who you ask about the source of the data, the time you’re pulling the data, and the source of the data extract. That is why the mantra of “single source of the truth” is so critical .
Second, Inaccessibility - While it may be easy to identify the source of your data, can you actually use the data and sample it whenever you need it? It is of no benefit to your business to have good clean data if it is locked up in a vault with no means of access or if you have to jump through hoops to access it. To become data-driven, Master and transactional data should be easily accessible to a wide base of users who can gain value out of the information.
Third, Incompleteness - Imagine you have found the right source of data and you have access to it. There is often a moment of truth when you start analyzing and reviewing the data and you find that what’s supposed to be there isn’t there. Master and transactional data sets with large chunks missing are almost as worthless as no data at all.
What do you do when faced with imperfect data?
Our approach is that you are already making important operations decisions and they will definitely improve if they are more data-driven. Therefore, alongside projects to improve data quality you should use what you have and at the same time gain better understanding of the gaps.
We will provide some examples from the area we are most familiar with – end to end optimization. In order to do this kind of analysis – network design, supplier risk or inventory optimization – you need data on all your network as well as external information on customers and suppliers.
The internal data includes information on products, plants, warehouses and stores including various costs, capacities, bills of material, inventory levels, service times and transportation. No company has all this data in one place and there will be inevitable inconsistencies between different parts of the company, different facilities etc.
Based on our quite extensive experience with this type of modeling, we make 7 observations:
1) The process of collecting data will provide quite a lot of insight into how the company works and where there are performance issues. Several years ago we worked with a paint company that was about to close one of its plant because it was old and inefficient. After collecting data about its production line cost and performance, which did not existbefore, setting up the model and running optimization scenarios they, surprisingly,decided to expand this plant for certain paints where it was more efficient than others.
2) Missing or inaccurate data can be an issue but some of it can be estimated based on similar data and some of it ignored as too small to be meaningful. An example of missing data could be weight of a product which is required for transportation calculations. In many cases, the missing information is related to low volume products which can be ignored or estimated without affecting results. For the few important parts it is important to find the right information.
3) An important part of the process is building the baseline and validating it relative to the company’s overall financial data. This process providesinsight into where there are discrepancies in company data and how large they are, in effect providing a measure of the data problems. This is invaluable information as companies may not be aware of this and it can provide the additional benefit of a way to prioritize where to tackle data issues.
4) End to end optimization helps companies understand the drivers of cost. Even if the data is not perfect, it can help them see the direction of where they should be focusing their investment efforts. They can also run different scenarios based on different assumptions to analyze the impact of changes or even data quality. Can you trust these results? As long as they are consistent with intuition and experience or if not, the difference is understood, it is very likely much better than having no way to evaluate decisions.
5) Consider that most planning analysis is for the future supply chain. The demand forecast data is not all that accurate, many costs could change over time and other factors may not be considered. Therefore, the rest of the data does not need to be perfect just close enough to help make meaningful decisions.
6) An end to end model enables analysis of various risks in the supply chain. You can systematically explore what happens if a facility is down and how much time you have to recover it before you lose money. The data you need for this does not need to be completely accurate but enough to provide the ability to spot trouble areas in the supply chain.
7) Not all data is all that critical to the decision. There is a phenomenon in optimization the result is flat around the optimum. This means that there are a range of options with similar cost results not just one “perfect” optimized solution. Therefore the model is not necessarily sensitive to all the data. Sensitivity analysis through different scenarios will help the modeler decide how much data is enough data.
When is it ok to work with imperfect data?
It is interesting to take a step back from supply chain analytics and see what is going in with big data analytics which is gaining a lot of traction and attention. In this space, data-driven decision making is all the rage with the wide-range availability of big data from the internet and analytics methods to glean better associations, predictions and recommendations.
But one of the challenges is still matching the data available to the one needed for analytics. A recent NYT article titled “For big data scientist hurdle to insights is janitor work” noted that “Data scientists, according to interviews and expert estimates, spend from 50 percent to 80 percent of their time mired in the mundane labor of collecting and preparing unruly digital data, before it can be explored for useful nuggets.”
Our approach is that you are already making important operations decisions and they will definitely improve if they are more data-driven. Therefore, alongside projects to improve data quality you should use what you have and at the same time gain better understanding of the gaps. The NYT article also mentioned that “Data scientists emphasize that there will always be some hands-on work in data preparation, and there should be. Data science, they say, is a step-by-step process of experimentation.” This type of work is called “data wrangling”.
In real life you are going to work with bad quality data. The data is not only imperfect but you also need to understand how to use the usage and the analytics so don’t wait for perfect data but start experimenting with supply chain analytics as soon as you can.
To quote Professor Jeffrey Heer“It’s an absolute myth that you can send an algorithm over raw data and have insights pop up.”
About Edith Simchi-Levi
VP Operations at OPS Rules