Why Data Management

The data ecosystem is arguably the biggest value generator for the next decade. Data is foundational to various organisational functions, such as product development, marketing, and operations. It is also fundamental for improving areas of technology like AI/ML and cybersecurity, and delivering richer and clearer business insights so orgs can remain competitive and grow.

On a daily basis, enterprises have terabytes or petabytes of data coming into them, but they are losing the vast majority of its value. System fragmentation results in numerous data silos, inhibiting the generation of insights and impinging decision making. This disconnect between voluminous data and inadequate data usage, is a consequence of orgs being unable to adapt their infrastructure to coincide with the Cambrian-like explosion of data since the iPhone was launched back in 2007.

Orgs can attempt to rectify things themselves; however, it requires high capex, high opex, and the recruitment of high-calibre engineering talent, which is increasingly hard to come by. Thankfully, a vibrant ecosystem has emerged, breeding with innovation, to help orgs fully access and make sense of their data, in a simplified and cost-effective manner.

At the heart of the blossoming ecosystem are data warehouses and data lakes - that are merging in what is referred to as a data lakehouse architecture - and data streaming processors for real-time ingestion and analytics.

Data warehouses have revolutionised the database industry. They abstract away the many complexities of managing multiple databases for different applications, making it super efficient to ingest and transform structured and semi-structured data types, and deliver useful and timely data for large scale analysis in BI and other front-end applications. Their downsides, however, are that they do not scale very efficiently and they cannot house the more abundant form, which is unstructured data.

Data lakes have made it possible to process and generate insights from unstructured big data, such as video, images, and social media posts. They connect to real-time data streams, store extremely voluminous amounts of data, interoperate with data warehouses, serve the needs of data scientists, and feed data into ML models. The disadvantages, however, include poor data governance and inaccessibility for the broader data consumer base (like businses analysts).

Data lakehouses have very recently come to the fore to address the limitations of warehouses and lakes: storing data in a way that is cheap, scalable, and explorable, while also making the data manageable and queryable by common business intelligence tools. Ultimately, lakehouses reduce many overheads for enterprises managing an abundance of different data types, and serve needs related to data science, machine learning, and business decision making.

Data streaming processors connect to data sources to deliver data in real-time. Use cases may involve industrial sensors delivering data for predictive maintenance, stock market data for a hedge fund's value-at-risk, or social media data to assess a company's brand image.

Around these fundamental blocks, is a plethora of innovative tools created by startups to assist DevOps teams in connecting all the pieces together. By having Snowflake (data warehouse), Databricks (data lake), and Confluent (data streaming) in their stack, with supporting tools for things like data governance and data observability, orgs can develop a holistic view and make better decisions, in an affordable and low stress way.

This is what makes the key names in the data ecosystem so valuable right now and in the future. Tales of business giants that have failed or lost substantial competitiveness are invariably attributed to a lack of good data at their disposal. Would Blockbuster have collapsed if they could have generated richer insights from their business environment? Likewise, would Polaroid and Kodak have stayed at the forefront of their industry if they had a better understanding of the digital camera evolution? AOL, Toys R' Us, and Borders, are some of the innumerable other examples of past failures at managing data.

This is certainly an intriguing area of technology for investors, because the ecosystem is still developing, and the majority of orgs are still yet to fully commit to transforming to a modern data stack. At present, both the platform vendors in which are building an ecosystem around themselves, and the smaller niche players, will likely generate long-term excess returns for investors.