In the ever-evolving landscape of technology, many boards and CEOs find themselves overwhelmed by the pace of innovation and the complexity of AI and ML. The stakes are particularly high, with data being the last untapped asset on the balance sheet and quite possibly the key driver of enterprise value, going forward.
How can non-technical executives and board members understand the strategic importance of data, AI, and ML, particularly as they relate to their businesses and industries? CEOs and boards must feel comfortable asking these critical questions and exploring the foundations of what enables a company to drive enterprise value with such cutting-edge technologies.
There is no mystery in this process, nor should there be any reason to feel inadequate. Data is tough, and extracting business value from data is even harder. Data is spread across multiple systems and is often incomplete and inaccurate. And the tsunami of data continues to grow exponentially. It is very well known that structured data is neatly organized and easily searchable in relational databases. But what about the massive growth in unstructured data and, in particular, voice, video, and even documents or PDFs? How do we extract the nuggets of wisdom from all that data and put it into a form such that we can apply machine learning and NLP to it to drive consequential value creation quickly?
The first step is to build a data catalog – an itemization of the company’s data. If we ask a CEO to list all the locations of their manufacturing facilities, most will have no problem doing so. Same with IP or other critical assets. But ask the same about data assets, and the answers will not be satisfying. In fact, most companies in America don’t even have a data catalog. If data is the last untapped asset on the balance sheet and the key to competitive success, why don’t most companies know what data they have, much less a catalog?
Once data assets are cataloged, it is essential for management teams and boards to assess data quality. Data is often incomplete, inaccurate, or messy, a challenge faced by organizations across various industries. However, there is hope for improving data quality through the creation of data pipelines.
What is a data pipeline, and why should Boards and CEOs care? The data pipeline is the process through which the enterprise accesses data to be analyzed. It is the process of ingesting data, performing a data quality check, cleaning it, and enriching it as appropriate for the task at hand. Most companies get this foundational data preparation step wrong. The negative effects should be obvious. Building data pipelines consistently and at scale is critical because enterprises have a lot of data. It needs to be accessed and prepared for visualization and for more sophisticated tasks such as machine learning. You get the data pipeline part wrong and have a world of problems. Some enterprises build thousands of data pipelines each year. The importance of this cannot be underestimated. So, it is critical to start with these foundational steps (e.g., data catalog and data pipeline) and have “control” over your data to turn it into a competitive weapon to drive business value.