Skip to main content

2025 DataOps Predictions - Part 1

As part of APMdigest's 2025 Predictions Series, industry experts offer predictions on how DataOps and related technologies will evolve and impact business in 2025.

2025: REAL-TIME DATA IS KEY FOR AI

Real-time data will be a key differentiator for competitive advantage: Industries will increasingly rely on real-time or near real-time data to maintain a competitive edge. Companies that can integrate up-to-date data into their AI systems will provide superior customer experiences with fewer issues and more personalized solutions. The ability to capture and analyze data in real-time will separate industry leaders from those who struggle to modernize their data infrastructure.
Ayman Sayed
CEO, BMC Software

Enterprises Will Augment GenAI with Real-Time Data: The true value of GenAI is realized when integrated into enterprise applications at scale. While enterprises have been cautious with trial deployments, 2025 will be a turning point as they begin to scale GenAI across critical systems like customer support, supply chain, manufacturing, and finance. This will require tools to manage data and track GenAI models, ensuring visibility into data usage. GenAI must be supplemented with specific real-time data, such as vectors and graphs, to maximize effectiveness. In 2025, leading vendors will begin rolling out applications that leverage these advancements.
Lenley Hensarling
Technical Advisor, Aerospike

MULTIMODAL DATA

Multimodal data will be very big, extracting corporate value: Back in 2004, Tim O'Reilly coined the phrase, "Data is the Intel Inside." We don't think quite as much about Intel these days, but Tim was absolutely right about data. We became obsessed with data. We've been talking about data science, being data-driven, and building data-driven organizations ever since. Artificial Intelligence is the current expression of the importance of data.

One problem with being data-driven is that most of any organization's data is locked up in ways that aren't useful. Being data-driven works well if you have nicely structured data in a database. Most companies have that, but they're also sitting on a mountain of unstructured data: PDF files, videos, meeting recordings, real-time data feeds, and more. They aren't even used to thinking of this as data; it's not amenable to SQL and database-centric "business intelligence."

That will change in 2025. It will change because AI will give us the ability to unlock this data as well as the ability to analyze it. It will be able to give structure to the information in PDFs, in videos, in meeting transcripts, and in raw data coming in from sensors. In his Generative AI in the Real World interview, Robert Nishihara asked us to think of the video generated by an autonomous vehicle. Most of that is of limited value — but every now and then, there's a traffic situation that is extremely valuable. Humans aren't going to watch hours of video to extract the value; that's a job for AI. Multimodal AI will help companies to unlock the value of data like this. We're at the start of a new generation of tools for data acquisition, cleaning, and curation that will make this unstructured data accessible.
Laura Baldwin
President, O'Reilly Media

AI DRIVES NEW FOCUS ON DATA QUALITY

AI will renew the focus on data quality, for two reasons: First, high quality data is required for training and fine-tuning models. Second, AI-powered analytics tools will offer a higher-resolution view of data, revealing previously undetected quality issues.
Ryan Janssen
CEO, Zenlytic

Enterprises that ready their data for AI will pull ahead competitively: In 2025, companies will focus on building an organized, high-quality data ecosystem to maximize AI's effectiveness and to pull ahead of their competition. This includes managing metadata through structured data catalogs, ensuring data accuracy with rigorous cleansing and validation, and establishing robust governance practices to safeguard data privacy and security. By implementing clear, ethical guidelines, organizations will create a trustworthy AI framework, empowering data scientists with easy access to reliable data for generating precise, impactful insights across business functions. Enterprises that do this will be hard to compete with. 
Scott Voigt
CEO and Founder, Fullstory

AI DRIVES DATA PIPELINE AUTOMATION

GenAI and as-code first technologies drive data pipeline automation: The ubiquitous use of Kubernetes has led to a configuration-first experience in defining data pipelines. It's as simple as selecting a container image and adding configuration. We'll increasingly see GenAI, trained on processing and execution engines generating this configuration and deploying pipelines automatically through just natural language prompts. Traditional visual ETL tooling, even low code platforms are now at risk of disruption. What a power user could do in a few days (remember you still need to learn these platforms), GenAI does in seconds, spitting out configuration for real-time pipelines. This leads to the question. What is the wider future of any UX if my interface is a prompt? Just view data results and metrics? Engineers may as well be going back to a command line!
Andrew Stevenson 
CTO, Lenses.io

AI-ENHANCED DATA MANAGEMENT AND GOVERNANCE

AI is changing how companies manage and govern their data. Organizations now use data lakehouses to support data scientists and AI engineers working with large language models (LLMs). These lakehouses simplify data access, helping teams avoid juggling multiple storage systems. AI is also helping to automate manual processes like data cleaning and reconciliation—a pain point for many professionals. As AI continues to scale, automated governance will allow companies to manage data more effectively with less manual work.
Emmanuel Darras
CEO and Co-Founder, Kestra

UNIFIED DATA ACCESS AND FEDERATION

A unified approach to data access is high on the agenda for enterprises that plan to consolidate analytics data into a single, accessible source. Data lakehouses support this by providing federated access, allowing teams across the organization to tap into the same data without duplicating it. This approach is expected to drive cross-functional analytics and reduce latency, making it easier for teams to work together on the same shared data.
Emmanuel Darras
CEO and Co-Founder, Kestra

TRUST IN DATA

Establishing trust in data will become the top priority for leaders: In the AI era, data is no longer just a byproduct of operations; it's the foundation for resilience and innovation. Without a strong trust in the data that organizations have and use, businesses will continue to struggle to make informed decisions or leverage emerging technologies like AI. Building this trust will go beyond technology and require leaders to boost data literacy and choose a data strategy that emphasizes both capability and quality. 
Daniel Yu
SVP, SAP Data and Analytics

DATA LABELING

Microscopic lens on the source of data labeling: In technical circles, there are constant discussions around how to get the right dataset — and in turn, how to label that dataset. The reality is that this labeling is outsourced on a global scale. In many cases, it's happening internationally, and often in developing countries, with questionable conditions and levels of pay. You may have task-based workers assessing hundreds of thousands of images and being paid for the number accurately sorted. While AI engineers may be highly in demand and paid well above the market rate, there are questions about this subeconomy.
Gordon Van Huizen
SVP of Strategy, Mendix

EXTENSIVE DATA SETS

Retaining Extensive Data Sets Will Become Essential: GenAI depends on a wide range of structured, unstructured, internal, and external data. Its potential relies on a strong data ecosystem that supports training, fine-tuning, and Retrieval-Augmented Generation (RAG). For industry-specific models, organizations must retain large volumes of data over time. As the world changes, relevant data becomes apparent only in hindsight, revealing inefficiencies and opportunities. By retaining historical data and integrating it with real-time insights, businesses can turn AI from an experimental tool into a strategic asset, driving tangible value across the organization.
Lenley Hensarling
Technical Advisor, Aerospike

SMALL DATA

The past few years have seen a rise in data volumes, but 2025 will bring the focus from "big data" to "small data." We're already seeing this mindset shift with large language models giving way to small language models. Organizations are realizing they don't need to bring all their data to solve a problem or complete an initiative — they need to bring the right data. The overwhelming abundance of data, often referred to as the "data swamp," has made it harder to extract meaningful insights. By focusing on more targeted, higher-quality data — or the "data pond" — organizations can ensure data trust and precision. This shift towards smaller, more relevant data will help speed up analysis timelines, get more people using data, and drive greater ROI from data investments.
Francois Ajenstat
Chief Product Officer, Amplitude

Go to: 2025 DataOps Predictions - Part 2

Hot Topics

The Latest

The rise of hybrid cloud environments, the explosion of IoT devices, the proliferation of remote work, and advanced cyber threats have created a monitoring challenge that traditional approaches simply cannot meet. IT teams find themselves drowning in a sea of data, struggling to identify critical threats amidst a deluge of alerts, and often reacting to incidents long after they've begun. This is where AI and ML are leveraged ...

Three practices, chaos testing, incident retrospectives, and AIOps-driven monitoring, are transforming platform teams from reactive responders into proactive builders of resilient, self-healing systems. The evolution is not just technical; it's cultural. The modern platform engineer isn't just maintaining infrastructure. They're product owners designing for reliability, observability, and continuous improvement ...

Getting applications into the hands of those who need them quickly and securely has long been the goal of a branch of IT often referred to as End User Computing (EUC). Over recent years, the way applications (and data) have been delivered to these "users" has changed noticeably. Organizations have many more choices available to them now, and there will be more to come ... But how did we get here? Where are we going? Is this all too complicated? ...

On November 18, a single database permission change inside Cloudflare set off a chain of failures that rippled across the Internet. Traffic stalled. Authentication broke. Workers KV returned waves of 5xx errors as systems fell in and out of sync. For nearly three hours, one of the most resilient networks on the planet struggled under the weight of a change no one expected to matter ... Cloudflare recovered quickly, but the deeper lesson reaches far beyond this incident ...

Chris Steffen and Ken Buckler from EMA discuss the Cloudflare outage and what availability means in the technology space ...

Every modern industry is confronting the same challenge: human reaction time is no longer fast enough for real-time decision environments. Across sectors, from financial services to manufacturing to cybersecurity and beyond, the stakes mirror those of autonomous vehicles — systems operating in complex, high-risk environments where milliseconds matter ...

Technology's role in the workplace has expanded rapidly, framing how we work and communicate. Now, with the explosion of new and innovative AI-driven tools, people are struggling to navigate how to work in this new emergent era. And although the majority of these applications are designed to make our lives easier, for many knowledge workers, they've become a source of stress and anxiety. "Technostress" ... describes the feelings of being overwhelmed by constant connectivity and cognitive overload from information and notifications, and it's on the rise ...

People want to be doing more engaging work, yet their day often gets overrun by addressing urgent IT tickets. But thanks to advances in AI "vibe coding," where a user describes what they want in plain English and the AI turns it into working code, IT teams can automate ticketing workflows and offload much of that work. Password resets that used to take 5 minutes per request now get resolved automatically ...

Governments and social platforms face an escalating challenge: hyperrealistic synthetic media now spreads faster than legacy moderation systems can react. From pandemic-related conspiracies to manipulated election content, disinformation has moved beyond "false text" into the realm of convincing audiovisual deception ...

Traditional monitoring often stops at uptime and server health without any integrated insights. Cross-platform observability covers not just infrastructure telemetry but also client-side behavior, distributed service interactions, and the contextual data that connects them. Emerging technologies like OpenTelemetry, eBPF, and AI-driven anomaly detection have made this vision more achievable, but only if organizations ground their observability strategy in well-defined pillars. Here are the five foundational pillars of cross-platform observability that modern engineering teams should focus on for seamless platform performance ...

2025 DataOps Predictions - Part 1

As part of APMdigest's 2025 Predictions Series, industry experts offer predictions on how DataOps and related technologies will evolve and impact business in 2025.

2025: REAL-TIME DATA IS KEY FOR AI

Real-time data will be a key differentiator for competitive advantage: Industries will increasingly rely on real-time or near real-time data to maintain a competitive edge. Companies that can integrate up-to-date data into their AI systems will provide superior customer experiences with fewer issues and more personalized solutions. The ability to capture and analyze data in real-time will separate industry leaders from those who struggle to modernize their data infrastructure.
Ayman Sayed
CEO, BMC Software

Enterprises Will Augment GenAI with Real-Time Data: The true value of GenAI is realized when integrated into enterprise applications at scale. While enterprises have been cautious with trial deployments, 2025 will be a turning point as they begin to scale GenAI across critical systems like customer support, supply chain, manufacturing, and finance. This will require tools to manage data and track GenAI models, ensuring visibility into data usage. GenAI must be supplemented with specific real-time data, such as vectors and graphs, to maximize effectiveness. In 2025, leading vendors will begin rolling out applications that leverage these advancements.
Lenley Hensarling
Technical Advisor, Aerospike

MULTIMODAL DATA

Multimodal data will be very big, extracting corporate value: Back in 2004, Tim O'Reilly coined the phrase, "Data is the Intel Inside." We don't think quite as much about Intel these days, but Tim was absolutely right about data. We became obsessed with data. We've been talking about data science, being data-driven, and building data-driven organizations ever since. Artificial Intelligence is the current expression of the importance of data.

One problem with being data-driven is that most of any organization's data is locked up in ways that aren't useful. Being data-driven works well if you have nicely structured data in a database. Most companies have that, but they're also sitting on a mountain of unstructured data: PDF files, videos, meeting recordings, real-time data feeds, and more. They aren't even used to thinking of this as data; it's not amenable to SQL and database-centric "business intelligence."

That will change in 2025. It will change because AI will give us the ability to unlock this data as well as the ability to analyze it. It will be able to give structure to the information in PDFs, in videos, in meeting transcripts, and in raw data coming in from sensors. In his Generative AI in the Real World interview, Robert Nishihara asked us to think of the video generated by an autonomous vehicle. Most of that is of limited value — but every now and then, there's a traffic situation that is extremely valuable. Humans aren't going to watch hours of video to extract the value; that's a job for AI. Multimodal AI will help companies to unlock the value of data like this. We're at the start of a new generation of tools for data acquisition, cleaning, and curation that will make this unstructured data accessible.
Laura Baldwin
President, O'Reilly Media

AI DRIVES NEW FOCUS ON DATA QUALITY

AI will renew the focus on data quality, for two reasons: First, high quality data is required for training and fine-tuning models. Second, AI-powered analytics tools will offer a higher-resolution view of data, revealing previously undetected quality issues.
Ryan Janssen
CEO, Zenlytic

Enterprises that ready their data for AI will pull ahead competitively: In 2025, companies will focus on building an organized, high-quality data ecosystem to maximize AI's effectiveness and to pull ahead of their competition. This includes managing metadata through structured data catalogs, ensuring data accuracy with rigorous cleansing and validation, and establishing robust governance practices to safeguard data privacy and security. By implementing clear, ethical guidelines, organizations will create a trustworthy AI framework, empowering data scientists with easy access to reliable data for generating precise, impactful insights across business functions. Enterprises that do this will be hard to compete with. 
Scott Voigt
CEO and Founder, Fullstory

AI DRIVES DATA PIPELINE AUTOMATION

GenAI and as-code first technologies drive data pipeline automation: The ubiquitous use of Kubernetes has led to a configuration-first experience in defining data pipelines. It's as simple as selecting a container image and adding configuration. We'll increasingly see GenAI, trained on processing and execution engines generating this configuration and deploying pipelines automatically through just natural language prompts. Traditional visual ETL tooling, even low code platforms are now at risk of disruption. What a power user could do in a few days (remember you still need to learn these platforms), GenAI does in seconds, spitting out configuration for real-time pipelines. This leads to the question. What is the wider future of any UX if my interface is a prompt? Just view data results and metrics? Engineers may as well be going back to a command line!
Andrew Stevenson 
CTO, Lenses.io

AI-ENHANCED DATA MANAGEMENT AND GOVERNANCE

AI is changing how companies manage and govern their data. Organizations now use data lakehouses to support data scientists and AI engineers working with large language models (LLMs). These lakehouses simplify data access, helping teams avoid juggling multiple storage systems. AI is also helping to automate manual processes like data cleaning and reconciliation—a pain point for many professionals. As AI continues to scale, automated governance will allow companies to manage data more effectively with less manual work.
Emmanuel Darras
CEO and Co-Founder, Kestra

UNIFIED DATA ACCESS AND FEDERATION

A unified approach to data access is high on the agenda for enterprises that plan to consolidate analytics data into a single, accessible source. Data lakehouses support this by providing federated access, allowing teams across the organization to tap into the same data without duplicating it. This approach is expected to drive cross-functional analytics and reduce latency, making it easier for teams to work together on the same shared data.
Emmanuel Darras
CEO and Co-Founder, Kestra

TRUST IN DATA

Establishing trust in data will become the top priority for leaders: In the AI era, data is no longer just a byproduct of operations; it's the foundation for resilience and innovation. Without a strong trust in the data that organizations have and use, businesses will continue to struggle to make informed decisions or leverage emerging technologies like AI. Building this trust will go beyond technology and require leaders to boost data literacy and choose a data strategy that emphasizes both capability and quality. 
Daniel Yu
SVP, SAP Data and Analytics

DATA LABELING

Microscopic lens on the source of data labeling: In technical circles, there are constant discussions around how to get the right dataset — and in turn, how to label that dataset. The reality is that this labeling is outsourced on a global scale. In many cases, it's happening internationally, and often in developing countries, with questionable conditions and levels of pay. You may have task-based workers assessing hundreds of thousands of images and being paid for the number accurately sorted. While AI engineers may be highly in demand and paid well above the market rate, there are questions about this subeconomy.
Gordon Van Huizen
SVP of Strategy, Mendix

EXTENSIVE DATA SETS

Retaining Extensive Data Sets Will Become Essential: GenAI depends on a wide range of structured, unstructured, internal, and external data. Its potential relies on a strong data ecosystem that supports training, fine-tuning, and Retrieval-Augmented Generation (RAG). For industry-specific models, organizations must retain large volumes of data over time. As the world changes, relevant data becomes apparent only in hindsight, revealing inefficiencies and opportunities. By retaining historical data and integrating it with real-time insights, businesses can turn AI from an experimental tool into a strategic asset, driving tangible value across the organization.
Lenley Hensarling
Technical Advisor, Aerospike

SMALL DATA

The past few years have seen a rise in data volumes, but 2025 will bring the focus from "big data" to "small data." We're already seeing this mindset shift with large language models giving way to small language models. Organizations are realizing they don't need to bring all their data to solve a problem or complete an initiative — they need to bring the right data. The overwhelming abundance of data, often referred to as the "data swamp," has made it harder to extract meaningful insights. By focusing on more targeted, higher-quality data — or the "data pond" — organizations can ensure data trust and precision. This shift towards smaller, more relevant data will help speed up analysis timelines, get more people using data, and drive greater ROI from data investments.
Francois Ajenstat
Chief Product Officer, Amplitude

Go to: 2025 DataOps Predictions - Part 2

Hot Topics

The Latest

The rise of hybrid cloud environments, the explosion of IoT devices, the proliferation of remote work, and advanced cyber threats have created a monitoring challenge that traditional approaches simply cannot meet. IT teams find themselves drowning in a sea of data, struggling to identify critical threats amidst a deluge of alerts, and often reacting to incidents long after they've begun. This is where AI and ML are leveraged ...

Three practices, chaos testing, incident retrospectives, and AIOps-driven monitoring, are transforming platform teams from reactive responders into proactive builders of resilient, self-healing systems. The evolution is not just technical; it's cultural. The modern platform engineer isn't just maintaining infrastructure. They're product owners designing for reliability, observability, and continuous improvement ...

Getting applications into the hands of those who need them quickly and securely has long been the goal of a branch of IT often referred to as End User Computing (EUC). Over recent years, the way applications (and data) have been delivered to these "users" has changed noticeably. Organizations have many more choices available to them now, and there will be more to come ... But how did we get here? Where are we going? Is this all too complicated? ...

On November 18, a single database permission change inside Cloudflare set off a chain of failures that rippled across the Internet. Traffic stalled. Authentication broke. Workers KV returned waves of 5xx errors as systems fell in and out of sync. For nearly three hours, one of the most resilient networks on the planet struggled under the weight of a change no one expected to matter ... Cloudflare recovered quickly, but the deeper lesson reaches far beyond this incident ...

Chris Steffen and Ken Buckler from EMA discuss the Cloudflare outage and what availability means in the technology space ...

Every modern industry is confronting the same challenge: human reaction time is no longer fast enough for real-time decision environments. Across sectors, from financial services to manufacturing to cybersecurity and beyond, the stakes mirror those of autonomous vehicles — systems operating in complex, high-risk environments where milliseconds matter ...

Technology's role in the workplace has expanded rapidly, framing how we work and communicate. Now, with the explosion of new and innovative AI-driven tools, people are struggling to navigate how to work in this new emergent era. And although the majority of these applications are designed to make our lives easier, for many knowledge workers, they've become a source of stress and anxiety. "Technostress" ... describes the feelings of being overwhelmed by constant connectivity and cognitive overload from information and notifications, and it's on the rise ...

People want to be doing more engaging work, yet their day often gets overrun by addressing urgent IT tickets. But thanks to advances in AI "vibe coding," where a user describes what they want in plain English and the AI turns it into working code, IT teams can automate ticketing workflows and offload much of that work. Password resets that used to take 5 minutes per request now get resolved automatically ...

Governments and social platforms face an escalating challenge: hyperrealistic synthetic media now spreads faster than legacy moderation systems can react. From pandemic-related conspiracies to manipulated election content, disinformation has moved beyond "false text" into the realm of convincing audiovisual deception ...

Traditional monitoring often stops at uptime and server health without any integrated insights. Cross-platform observability covers not just infrastructure telemetry but also client-side behavior, distributed service interactions, and the contextual data that connects them. Emerging technologies like OpenTelemetry, eBPF, and AI-driven anomaly detection have made this vision more achievable, but only if organizations ground their observability strategy in well-defined pillars. Here are the five foundational pillars of cross-platform observability that modern engineering teams should focus on for seamless platform performance ...