Why should you care about Big Data and Advanced Analytics?

There has been a lot of “Big Data” talk in industry for several years now.  If you work for any size corporation, you most likely have heard the words “Big Data” or seen a demo or observed the glazed over look on the faces of management when the term comes up.  The last 5 years have been spent educating the majority of the market on the definition of Big Data.  Additionally, there have been strong efforts to peel back the covers within companies and reveal the massive treasure trove of data sitting on servers, desktops, laptops, and other sources.  Connecting the dots of how Big Data intersects the data within corporations is where a corporation’s competitive advantage will come from in the coming years.  Those willing to make this connection a priority will be the companies that can do more with less, and gain key insights to make the next step change in their operations.

Big Data, at its core, describes a volume of data that is beyond traditional information technology’s capability to store, manage, analyze and process efficiently.  Most companies don’t have, or are just investing in, new technology to handle the data they have on premises, so most companies are in a situation where they already have big data.  Other definitions mention high volume/velocity/variety data.  Again, with the many automated systems, equipment, applications, and 3rd party data feeds, companies will find that they’ve entered the big data landscape, whether they realize it or not.

The market for Big Data is expected to exceed $60B US by 2020 (from $7.6B in 2011).  Executive teams are getting pressure to do something with Big Data, and several companies are chasing technology – putting in big data platforms for millions of dollars, but not understanding its use or how to best gain value from their data assets.  Several vendors are entering the space both for application solutions, but also for consulting expertise to help companies navigate this new technology option.

The Big Value from Big Data is the ability to effectively combine Information Technology (IT) with Operational Technology (OT).  In addition, this technology is enabling companies to combine and democratize the data in their siloed disparate best in breed operations applications with one another to reveal new insights in the hands of the user base within a company.  The Big Data platform allows these advances to happen, and also has the ability to move companies towards new levels of analytics – from descriptive, to predictive and on to prescriptive.

So, why should you care?

  • Companies have continually improved operations through modernization of equipment, adopting new technology for data capture and reporting, automating data feeds, and applying process improvement methodologies to cut waste and add value. The effect of improved operations can be seen in increased productivity and safety, decreased energy usage and cost, better situational awareness and better two-way insight with customers, partners and vendors.  Big Data and the opportunities it enables will be the next step change in improvement to operations.
  • Competitors are going to embrace the technology, particularly as it gets more approachable. Companies that had good data strategy and stewardship foundation will gain value sooner, but many companies are on a relatively level playing field when it comes to stewardship (poor), so developing a data strategy and reviewing your own data asset health now is key to quick wins and potentially gaining a competitive advantage in the near future.
  • Technology is always improving, and the adoption of paradigm shifting toolsets is inevitable.
  • The change involved in adopting Big Data and Advanced Analytics within your organization will involve culture, procedural and technologic shifts in the way your company operates. Developing a plan now will ease some of the growing pains that come with this new technology.

There are several compelling reasons (and value cases) to adopt this new technology.  Big Data and its value doesn’t have to be scary.  In fact, developing a plan that focuses on your operations priorities will make the transition relatively painless.  Big Data complements your existing applications and systems, and makes them more valuable and more powerful in the hands of your employees.  Don’t ignore the value that you’ve accumulated in your Data Assets over the years.  Unleash it!



Gartner IT Glossary – Big Data

Wikibon Big Data Vendor Revenue and Market Forecast, 2020

Revonos – Three Categories of Analytics

Three Categories of Analytics

The different levels of analytics can be summarized into three categories. These are descriptive, predictive and prescriptive analytics.

The first category, descriptive analytics, is necessary to be able to perform analysis. The tools and processes that belong in this category are not necessarily simple, but are sometimes only simple statistics describing data. Traditional business intelligence (BI) falls under this category. The products of data exploration during data science does as well. Interestingly enough, even “alarming” falls under this category. Many times, alarms are considered advanced tools. They are certainly valuable, but really only tell a business what has already happened based on hard data and rules. The reason analytics in this category are necessary for being able to perform analytics in either of the other two categories is that achieving descriptive analytics prepares the organization’s data for consumption. Even if analytics implemented in systems are not part of an organizations regular business, personnel take in the descriptive analytic products and perform additional processing themselves to determine decisions. Another area that falls under descriptive analytics is data exploration. Data exploration is a manual, iterative process that generates statistics, which lead to insight on how to build models useful for implementation in a process. This is the data scientist’s first step.

Predictive analytics are the more common of proactive analytics. This category is vast. In statistics, prediction is also known as estimation and not necessarily forecasting. Predictive analytics are concerned with unknown quantities and factors. Factors can be flags, levels, classification or clustering. Predictive analytics are performed with any of a multitude of tools. A simple example is ordinary least squares (OLS) estimation. The machine learning wave mostly lives in this category. Hierarchical & k-means clustering, random forest regression & classification, support vector machines and ridge regression are some of the tools that exist in this category. These tools each have their strengths and weaknesses and it takes a learned person (or automated evaluation of each one) to figure out which is the best to apply to a specific problem. These problems include finding production excursions, detecting fraud, estimating power output, demand forecasting, risk estimation and claims prediction. This list is barely a scratch on the surface of the use cases for predictive analytics.

The final category, prescriptive analytics, is the other proactive category of analytics shared by two general classes of tools. Machine learning is one of these. Operations research is the other. Each of these classes makes extensive use of mathematical optimization techniques. Artificial neural networks and deep learning methods are machine learning techniques which are capable of recommending decisions in certain applications. The problem domain for deep learning is continually expanding. This technology is what makes autonomous vehicles possible. There are other techniques that can be applied to recommend decisions through mathematical optimization, such as stochastic gradient descent and other unconstrained minimization techniques. When the rules of the decisions are known ahead of time, a machine learning method can be applied to recommend a decision. Many times, decisions must be made within more general constraints instead of rules. For instance, a supplier might need to deliver product from different plants to a variety of depots. This scenario has demands for product at each depot and maximum supply at each plant. The transportation costs between plants and depots can be taken into account and a minimum-cost solution found using linear programming. This use case is one that has had an efficient mathematical solution since World War II. The applications of linear programming have been explored and expanded throughout many industries. It can be applied to manufacturing (blending, for instance) and finance (using portfolio balancing) for two examples.

These categories of analytics can be regarded as a progression, but they need not be, with the exception of having a handle on descriptive analytics before beginning any proactive analytics. The two proactive categories can overlap some as far as techniques, but one distinguishing characteristic is the level of additional insight that must be obtained to make a decision. For instance, if a rule is applied to results of a machine learning method, then the method is not prescriptive. Whereas, if the results of a technique are decision points by themselves, then it is prescriptive.


Linear Models with R, Second Edition

Ford and Fulkerson: Solving the Transportation Problem

Python | scikit-learn