An Intelligent Data Discovery to Analyse Supply Chain Risks & Impacts

Authors: Abhijeet Singh Bais1 and Dr. Navneet Sharma2

1Data Science Solution Lead , Elevance Health , Chicago , USA . and  Research Scholar at Department of CS & IT, IIS (Deemed to be University), Jaipur

2Department of CS & IT, IIS (Deemed to be University), Jaipur


Data is backbone of any organization and so critical to any organization’s overall operational success, Web mining, warehousing and KM are logical extensions of existing operational activities. Timely, accurate decisions, an important element is to get the simplest DIKs possible to supply appropriate and effective courses of action. The concept of Web mining originated with the event of knowledge mining. 

The only difference between data and Web mining is that within the latter, the underlying database is that the entire World Wide Web. As a readily accessible resource, the online may be a huge data warehouse that contains volatile information that’s gathered and extracted into something valuable to be used within the organization situation. Using traditional data processing methodologies and techniques (Tech Reference, 2003), the Web mining is that the process of extracting data from the web and sorting them into identifiable patterns and relationships.

An Intelligent Data Discovery and its analysis will help in finding impacts on various business activities and in prediction and in decision making. In my research “Supply chain management” is going to be used as use case. These days worldwide supply chains ties empower organizations to build up upper hands, increase manufacturing adaptability and diminish costs through a more extensive determination of supplier. Despite these advantages, however, inadequate comprehension of unsure regional differences and changes regularly expands chances in supply chain network tasks and even outcomes in an entire disruption of a supply chain.

Increased danger exposure levels, innovative turns of events and the growing information over-burden in supply chain network drive organization to accept information driven methodologies in Supply Chain Risk Management (SCRM). DM utilizes different scientific procedures for intelligent and timely decision making.

The increasing demand for mass customization in numerous enterprises has made the present supply chain more difficult in a non-direct interconnected and interdependent worldwide setting. At the point when an organization works over an enormous topographical region, it exposes itself to the danger of business disruptions because of unanticipated occasions like catastrophic events.

Online media today plays an inexorably significant part for broadcasting real-time natural disasters. Examination of such media has been utilized to catch the public’s consideration of the dangers related with different risks. With the developing number of interpersonal interaction sites, online media information like tweets or sounds/recordings in YouTube on a specific issue of interest are continually being gathered. Many organizations additionally influence on person-to-person communication stages to distribute occasion data in an ideal way. This has prompted the development of progressively wide uses of web-based media information, persuading specialists to investigate their applications further in production network hazard the executives.

The system, at that point alert the supply chain managers of the potential dangers presented to various supply chain facilities. Furthermore, quake influenced regions are additionally examined and shown on the platform to visually reflect the detailed risks information. Besides, the organization geographies can be changed to plan the specific areas of each supply chain facility on the world map.

    Big data Source    Potential SC risks          Description
Public NewsDisaster and uncertaintiesDate, focus, contents, source, potential risks
Policies of economies, politics, industries etc.Policy changesDate, regions, focus, contents, source, potential risk
Weather RecordsExtreme WeatherDate and time, regions, weather, future weather forecast, sources
Natural disaster recordsNatural disastersDate, regions, pre-disaster Forecast, real situation
Social networks and other social mediaDisaster and uncertaintiesDate, focus, contents, source, impacts to SC

SC external information gathered from public news, online media and so on.  It is more complicated than SC internal data. It uncovers possible catastrophes and vulnerabilities in external environments. Data from media is enhanced in its formats and contents, but also in addition in its dialects and reliabilities. These bring about toughness for information discrimination and analysis. Our intention is to find out potential risks and ongoing disasters from SC external big data as early as possible.


Category: News Classification and Its Techniques: A Review

  • Jimmy Boon Som Ong1, Zhaoxia Wang1*, Rick Siow Mong Goh1, Xiao Feng Yin1, Xin Xin1, Xiuju Fu1 (2014) With the expanding pattern of worldwide outsourcing, organizations are presently confronting perpetually complex SC. In this paper, they considered natural disasters as a form of risks in supply chains and propose to aid its management by analyzing Web data collected in real-time. Using Twitter “tweets” as their primary source of Web data, a real-time data crawler is developed to collect and analyze tweets that are identified as relevant to natural disasters. Practical Implications and Conclusions The applicability of such a system and its effectiveness for making informed decisions in risk mitigations are then discussed via a case study
  • Ponis, Stavros & Ntalla, Athanasia & Koronis, Epaminondas (2014) This paper focuses on a literature review of available SCRM frameworks and models. Identified frameworks and models are studied and analyzed according to their method of validation and the normative elements that constitute the conceptual construct, be it a framework or model. In each case, the constraints and limitations of the modeling effort are identified resulting in the determination of two major issues, which must be addressed by researchers in the future, these being the absence of a holistic approach for SCRM and the frequent oversight of behavioral aspects, such as the risk behavior of decision makers.
  • Choudhary, N.A., Singh, S., Schoenherr, T (2022),The year 2020 can be earmarked as the year of global supply chain disruption owing to the outbreak of the coronavirus (COVID-19). It is however not only because of the pandemic that supply chain risk assessment (SCRA) has become more critical today than it has ever been. With the number of supply chain risks having increased significantly over the last decade, particularly during the last 5 years, there has been a flurry of literature on supply chain risk management (SCRM), illustrating the need for further classification so as to guide researchers to the most promising avenues and opportunities. We therefore conduct a bibliometric and network analysis of SCRA publications to identify research areas and underlying themes, leading to the identification of three major research clusters for which we provide interpretation and guidance for future work.
  • OuTangacS.Nurmaya Musa (2011), This review has piloted us to identify and classify the potential risk associated with different flows, namely material, cash, and information flows. Consequently, we identify some research gaps. Even though there is a pressing need and awareness of SCRM from industrial aspect, quantitative models in the field are relatively lacking and information flow risk has received less attention.
  • Amulya Gurtu,Jestin Johny(2021), This paper aims to review the existing literature on risk factors in supply chain management in an uncertain and competitive business environment. Papers that contained the word “risk” in their titles, keywords, or abstracts were selected for conducting the theoretical analyses. Supply chain risk management is an integral function of the supply network. It faces unpredictable challenges due to nations’ economic policies and globalization, which have raised uncertainty and challenges for supply chain organizations. These significantly affect the financial performance of the organizations and the economy of a nation.
  • Abla CHAOUNI BENABDELLAH, Asmaa BENGHABRIT, Imane BOUHADDOU, El Moukhtar ZEMMOURI (2016),This paper outlines the value that Big Data offers for supply chains that are increasingly complex. Indeed, Big Data have the potential to revolutionize supply chain dynamics. In this survey, we analyze Big Data applications, their opportunities and challenges in the different supply chain processes of the SCOR model.

Category: Text Mining Framework for Supply Chain Risk

  • Tonya Boone, Ram Ganeshan, Aditya Jain, Nada R. (2019), This paper surveys the effect and explosion of data on product forecasting and how it is further developing it. While much of this review on time series data.
  • Anna Corinna Cagliano , Alberto De Marco ,Sabrina Grimaldi  & Carlo Rafele(2012), This work develops a risk identification and analysis methodology that integrates widely adopted supply chain and risk management tools. In particular, process analysis is performed by means of the standard framework provided by the Supply Chain Operations Reference Model, the risk identification and analysis tasks are accomplished by applying the Risk Breakdown Structure and the Risk Breakdown Matrix, and the effects of risk occurrence on activities are assessed by indicators that are already measured by companies in order to monitor their performances. In such a way, the framework contributes to increase companies’ awareness and communication about risk, which are essential components of the management of modern supply chains. 
  • Paul Murray(2015),This study begins by reviewing the relevant literature, then attempts to support the key findings using two forecasting case studies. Our findings are in stark contrast to those in the previous literature, as we find that established univariate forecasting benchmarks, such as exponential smoothing, consistently perform better those that include online information. Our research underlines the need for a thorough forecast evaluation and argues that the usefulness of online platform data for supporting operational decisions may be limited.
  • Kijung Park(2012), This paper addresses this issue by proposing a text-mining based global supply chain risk management framework involving two phases. First, the extant literature about global supply chain risks was collected and analyzed using a text-based approaches, including term frequency, correlation, and bi-gram analysis. The results of these analyses revealed whether the term-related content is important in the studied literature, and correlated topic model clustering further assisted in defining potential supply chain risk factors.
  • Yavuz(2016),This paper is a case study of forecasting method selection for a global manufacturer of lubricants and fuel additives, products usually classified as specialty chemicals. We model the supply chain using actual demand data and both optimization and simulation techniques. The optimization, a mixed integer program, depends on demand forecasts to develop production, inventory, and transportation plans that will minimize the total supply chain cost. Tradeoff curves between total costs and customer service are used to compare exponential smoothing methods. The damped trend method produces the best tradeoffs.

Category: Risk Prediction Models for Supply Chain Risk

  • Chih-Yuan Chua, Gül E. Kremera (2020), handled the text-mining based supply chain risk management framework. In this study a total 7 global supply chain risk types and risk factors was developed. Based on risk factors sentiment analysis also done.
  • Qi Li, Ang Liu (2019) reviewed development history of SC management then proposed framework is illustrated. This study presented a new framework to support data driven SC.
  • Fahimnia, B., Tang, C. S., Davarzani, H., & Sarkis, J. (2015) , presented a quantitative and analytical model for handling the risks in supply chain . Here, they used bibliometric and network analysis tools to generate insights. Some of the findings are SC risks are increasing rapidly. 
  • Kuldeep Lamba,Surya Prakash Singh(2017), In this paper, the literature relating to the integration of big data with operations and supply chain management is reviewed. In particular, reviewing past work is primarily focused on three key areas of the operations and supply chain management, namely manufacturing, procurement and logistics where big data has been applied. In addition to reviewing past literature, paper also proposes application of big data in operations and supply chain management.
  • Deepak Arunachalam(2018),This study evaluates the performance of different data clustering approaches for searching the profitable consumer segments in the UK hospitality industry. The paper focuses on three aspects of datasets including the ordinal nature of data, high dimensionality and outliers. Data collected from 513 sample points are analysed in this paper using four clustering approaches: Hierarchical clustering, K-Medoids, fuzzy clustering, and Self-Organising Maps (SOM). The findings suggest that Fuzzy and SOM based clustering techniques are comparatively more efficient than traditional approaches in revealing the hidden structure in the data set.

Category: Data Mining based Supply Chain risk Management

  • Er Kara, M., Ümit Oktay Fırat, S., & Ghadge, A. (2018) The paper fosters a DM-based framework for identification, evaluation, and moderation of various kind of risks in supply chains. Increased risk exposure, innovative methods, and the growing data over-burden in supply chain drive organizations to accept information driven methodologies in Supply Chain Risk Management (SCRM). Information Mining (DM) utilizes numerous logical procedures for smart and timely decision making. The system is approved with a contextual investigation dependent on a progression of semi-organized meetings, conversations.
  • Stefanovic, N. (2014) The present business environment requires supply chains to be proactive as opposed to receptive, which requests another methodology that fuses information mining prescient examination. This paper presents a prescient store network execution the executive’s model which joins measure displaying, execution estimation, information mining models, and online interface advances into a remarkable model. It presents the production network displaying approach dependent on the specific metamodel which permits demonstrating of any inventory network design and at various degree of subtleties.
  • Paul W Murray(2015),Demand forecasts are essential for managing supply chain activities but are difficult to create when collaborative information is absent. Many traditional and advanced forecasting tools are available, but applying them to a large number of customers is not manageable. In our research, we use data mining techniques to identify segments of customers with similar demand behaviors. Historical usage is used to cluster customers with similar demands. Once customer segments are identified, a manageable number of forecasting models can be built to represent the customers within the segments.
  • Julian, Trek(2015),This paper proposes a concept for a prescriptive control of business processes by using event-based process predictions. In this regard, it explores new potentials through the application of predictive analytics to big data while focusing on production planning and control in the context of the process manufacturing industry. This type of industry is an adequate application domain for the conceived concept, since it features several characteristics that are opposed to conventional industries such as assembling ones. These specifics include divergent and cyclic material flows, high diversity in end products’ qualities, as well as non-linear production processes that are not fully controllable. 
  • YidanShuLiang Ming FeifanCheng ZhanpengZhangJinsongZhao(2016), This paper, developments of chemical process FDD are briefly reviewed. The reason why FDD has not been widely implemented in the chemical process industry is discussed. One of the insights gained is that some basic problems in FDD such as how to define faults and how many faults to diagnose have not even been addressed well while researchers tirelessly try to invent new methods to diagnose fault. A new framework is proposed based on the big data in a cloud computing environment of a big chemical corporation for addressing the challenging issues in ASM.

At last, a particular logical web-based interface which offers shared execution checking and dynamic is introduced. The outcomes show that these models give precisely exact KPI projections and give significant bits of knowledge into recently arising patterns, openings, and issues. This should prompt more savvy, prescient, and responsive inventory fastens equipped for adjusting to future business climate.

Literature Review Comparative study based on Parameters (2010 – 2022)

Risk Score – 15 
Prediction – 20
Infer of Intelligent – 15
Unstructured Data – 5
Structured Data – 30
News/Stocks/Exchange/Generally available data – 7
Big Data – 15

The above comparative study shows the framework from News/Stocks/Exchange/Generally available data and for unstructured data is less

Table 1: Comparative Analysis of Research Papers

An Intelligent Data Discovery to Analyze Supply Chain Risks & Impacts Knowledge and understanding the topic can proceed with a common base of assumptions, definitions and frameworks that will guide the formulation of interesting and relevant research questions. The results of such efforts will enable the researcher not only to intelligently identify stand-alone factors of successful supply chain design but also to meet customer needs. Supply chain disruptions are growing day by day. 

The past few years have seen multiple notable types of catastrophes:

 • Natural calamities like flood, earthquakes etc., 

• News inputs and Security-related disruption 

• Tariff and tax disruption 

To give solution for those catastrophes, a framework is going to be designed to fulfill the hourly need of supply chain. This framework is going to provide faster notice when a disruption does occur. An organization can be able to track hurricane, delayed flights and so many incidents and accidents. 

Research Gap:

Various research papers showcased only how DM supports in discovering hidden and useful information from unstructured risk data for making intelligent risk management decisions, the risk hierarchy and sentiment analysis and the applicability of such a system and its effectiveness for making informed decisions in risk mitigations are then discussed via a case study. But an intelligent data discovery and framework to analyze supply chain risks and impacts was missing in those papers.


Based upon the literature review, comparative analysis done. But an intelligent data discovery and framework to analyze supply chain risks and impacts was missing in those papers. This research gap motivated me to think of framework- “An Intelligent Supply Chain Risk Management Platform” for unstructured data and generally available data. 

Framework needs Data discovery and inference to detect events and understand relevant risks so that mitigation can be planned.


1. To detect events and understand relevant risks using data analysis.

2. To develop a framework for the identification, assessment, and mitigation of different    

      type of risks at least two types of risks.

3. To Validate the framework by demonstrating risk prediction application to Supply Chain Consultants.

Tools and technology:

Server: Azure , Location: Central India, Operating system: Linux Ubuntu, RAM: 8 GB

vcpus: 2

Software configuration:  Apache, Technology: Python Django, HTML, CSS

Database: MYSQL, NEO4J

Dream Risk Management System – Flow chart

Proposed Framework Architecture: 

Dream Risk Management System — (An Intelligent Supply Chain Risk Management Platform)

This framework works on 3 factors:

  • External Data
  • Internal Data 
  • Intelligence

1. Data Analysis Module:

This module through web crawling methods collects data from various sources.

  • Understanding External Data

Definition of External Data

  • External data refers to information and datasets that are sourced from external or third-party sources outside the organization. This data is generated by various entities, such as social media platforms, news outlets, government agencies, compliance portals, and other external sources. It encompasses a wide range of data types, including textual, numerical, multimedia, and geospatial data.

Examples of External Data Sources

  • Examples of external data sources include social media platforms like Twitter and Facebook, where user-generated content and discussions can provide valuable insights into customer preferences, opinions, and market trends. News data from reputable news outlets offers real-time information on current events, industry developments, and economic indicators that can influence business strategies. External data can also include information related to natural disasters, such as earthquakes or weather data, which is essential for risk assessment and contingency planning. Legal issues and compliance portals provide regulatory and legal information that organizations need to stay compliant with laws and regulations.

Importance of External Data for Organizations

  • External data holds immense value as it provides insights and information that can significantly impact decision-making processes within organizations. It offers a broader perspective by incorporating external factors, trends, events, and sentiments that are relevant to the organization’s ecosystem. By integrating external data into organizational systems, businesses can gain a more comprehensive understanding of their environment, identify emerging opportunities and risks, and make informed strategic decisions.

However, the extraction of relevant information from these vast external datasets presents a significant challenge. The volume, variety, and velocity of data make it difficult to filter out noise and identify the most pertinent insights. It requires advanced techniques and methodologies to analyse, process, and correlate the data effectively. The proposed approach in this research paper, the Newspaper Reading Method, aims to address this challenge by applying a context-based approach utilizing a Vector-based Knowledge Graph.

 The Newspaper Correlation Approach

Newspaper Reading Method as an Analogy

  • An innovative approach to integrating external data into organizational systems. Similar to how individuals read a newspaper, where they initially skim through the headlines to identify relevant topics before delving into the details, this method aims to filter and correlate external data based on its relevance to the organization’s ecosystem or interests.
  • When reading a newspaper, readers naturally gravitate towards articles and news items that align with their specific areas of concern or curiosity. They focus on topics that are likely to impact their lives or the domains they are interested in. In a similar vein, the Newspaper Reading Method applies this filtering and relevance-based approach to external data analysis.

Extracting Relevant Data through Correlation

  • The vast array of external data sources, ranging from social media platforms to news outlets, presents a challenge in extracting the most pertinent information for organizational decision-making. To overcome this challenge, the proposed context-based approach utilizes a Vector-based Knowledge Graph. This knowledge graph acts as a framework for representing and organizing relevant concepts and relationships within the data, enabling efficient correlation analysis.
  • By constructing a context upfront, organizations can establish a foundation for assessing the relevance of external data events as they occur. The Vector-based Knowledge Graph provides a structured representation of the data, facilitating the identification of correlations and enabling informed decision-making.

Figure: explaining events relevant to an organization

Figure: Showing natural disaster news

Figure: Showing corelation between events and supply chain components

 2. Finding Impacts:

This module finds impacts on supplier, customer, products, and parts based upon the data discovery of particular event.

3. Risk Analysis:

In this module it is classified into internal and external risk factors. An internal risk score is an assessment of any risk factor that comes from within the company. 

Common Internal Risks:

  • Human error, such as unintentional data leaks, union strikes, or ineffective management.
  • Inadequate organizational structure and reporting responsibilities
  • Asset loss, including damage or destruction of company property or unforeseen costs of doing business.

Common External Risks:

  • Natural Disasters—everything from hurricanes and flooding to droughts and earthquakes
  • Economic Change, including recessions and industry disruption.
  • Political Factors: changes in governmental policies and regulations
  • Cyber Attacks, such as data theft by hackers, ransomware attacks, and the like
  • Many more

Data Collection and Risk Analysis. The characteristics of external environments data are large volume, unstructured, and increasing over time. Thus, data collection techniques, such as web crawling and text mining, are used to extract information from websites and web services. Advanced data analysis technology is required for analysis of SC external big data. The purpose of external risk analysis is to find out external threats and parameters of each threat. External threats include bad weather, policy changes, economic changes, social changes, terrorist attacks etc. Parameters of each threat may refer to its geographic region, possibility, and severity.

Risk Analysis.  External risk reports should at least include information on uncertainties at the location of SC partners and during transportation of products. This encompasses parameters of probabilities, duration, impacts of uncertain events. The cost for each uncertain event should be designed and calculated by the SC itself since it depends on emergency plans, which should be decided by companies of the SC. It calculates:  Risk= probability of event x magnitude of loss

Probability of OccurrenceHigh probability – (80 to100%)Medium-high probability – (60 to 80%)Medium-Low probability – (30 to 60%)Low probability (0 to 30%)Risk ImpactHigh – Catastrophic (Rating A – 100)Medium – Critical (Rating B – 50)Low – Marginal (Rating C – 10)
  1. ERP: 
  • At its core, an ERP is an application that automates business processes, and provides insights and internal controls, drawing on a central database that collects inputs from departments including accounting, manufacturing, supply chain, sales, marketing and human resources (HR). 
  • We are using the ERP to get the supply chain data of an organization to deliver the event driven risk score and impact of an event on the supply chain.
  1. Azure SQL Database: 
  • Azure SQL Database is an intelligent, scalable, relational database service built for the cloud. Optimize performance and durability with automated, AI-powered features that are always up to date.
  • Azure’s Layers of protection, built-in controls and intelligent threat detection keep your data secure.
  • SQL Database helps our app to store the events, users’ data, supply chain data if ERP is not available, etc.
  1. Web Crawler: 
  • A web crawler, spider, or search engine bot downloads and indexes content from all over the Internet.
  • The goal of such a bot is to learn what (almost) every webpage on the web is about, so that the information can be retrieved when it’s needed. 
  • They’re called “web crawlers” because crawling is the technical term for automatically accessing a website and obtaining data via a software program.
  • Our system uses the web crawler to crawl the web to get relevant organizational data to help us fetch the news and scrape an event to know the details of an event using NLP.
  1. NLP:
  • Natural language processing (NLP) refers to the branch of computer science—and more specifically, the branch of artificial intelligence or AI—concerned with giving computers the ability to understand text and spoken words in much the same way human beings can.
  1. News Fetch: 
  • News is fetched using API’s and python script to get the relevant news.
  1. Event Categorization: 
  • Prediction or categorization of events are done using machine learning model to know the type of event and impact of the event using a set of data.
  • These ML model is continually trained from a stream of data. In practice, this means supporting the ability of a model to autonomously learn and adapt in production as new data comes in.
  1. Python:
  • Django is a high-level Python web framework that enables rapid development of secure and maintainable websites.
  • Our dashboard is created using the Django. On the dashboard the data is visualized to understand the impacts and risks of events seamlessly.

The above mentioned are used in the framework.


  • Real time notifications of all events provide better visibility.
  • Super simplified dashboard with Customizable features
  • A comprehensive Risk Analysis from 12 Categories
  • Comprehensive risk report based on Real-time Analysis
  • AI enabled virtual assistant that monitor 24*7
  • Utilize advanced AI technology such as Natural Language Processing (NLP) and Sentiment Analysis to analyse RSS feed.
  • Provide 360-degree Risk Analysis
  • Deep impact analysis by providing easy to read reports on impact of risks for individual business assets.
  • Prioritize notifications by sending alerts & recommendations to appropriate personnel based on the risk level of an event.



Supply Chain is impacting and enhancing lifestyle of individuals in the society. There are many issues which are directly or indirectly impacting supply chain. Similarly, event happening in the environment are also impacting supply chain. Supply chain management continues to grow in prominence across both practitioner and academic communities, we observe that truly effective supply chain management is planned and purposive. The concept of supply chain design lies at the very heart of these investment decisions and continuation models

Yet, supply chain design presents managers and researchers with its own set of issues, concerns and obstacles. As this concept is relatively new, the salient issues that define its content, scope and boundaries are still emerging. We now recognize that ‘one size’ does not fit all when it comes to supply chain design. What works well in one setting may not work well in another.