The Evolution of Data Integration
Martijn Veldkamp
“Strategic Technology Leader | Customer’s Virtual CTO | Salesforce Expert | Helping Businesses Drive Digital Transformation”
November 1, 2024
In today’s interconnected applications landscape, we rely on data shared across multiple systems and services. Throughout my career as an architect, integration strategies have evolved from basic ETL database copies to monolithic middleware, operational databases, API-driven microservices, and now ZeroCopy patterns.
As I’m almost a year older again I look back at all the stuff I build, supported and architected. I see kind of four big phases in Data integration architecture. (cool idea for a t-shirt).
ETL -> Middleware -> Microservices -> ZeroCopy
Phase 1: Database Copies and ETL
See content credentials
It looks so spiffy, but generating images and getting the text right is difficult
In the early days, applications were often monolithic—self-contained systems that handled all functions internally. Database copies or ETL (Extract, Transform, Load) processes were standard for sharing data between these applications. We created copies of parts of entire databases to allow other applications to access data without impacting the first system. ETL batch jobs would extract data, transform it into compatible formats, and load it into the destination system on a scheduled basis.
I still remember the C++ libraries I’ve written to access the mainframe and get some batch job started. Those integrations were hard coded and oh so fragile.
While this approach enabled some data sharing, it had many limitations. Copied data was stale, becoming out-of-date, creating “data silos” and thus inconsistent information across applications. These running ETL batch jobs introduced their own costs.
As the number of applications grew together with them becoming more specialized, the need for real-time data grew. Together with the technology push we kind of created the rise of middleware layers.
Phase 2: Middleware Layers
See content credentials
I’m not sure where DALL-E gets it’s inspiration from but I love the nonsensical images
Middleware allowed applications to connect with predefined contracts without requiring direct database copies. And service calls provided a standardized, flexible way to request specific data without duplicating parts of databases.
The amount of SOAP and XSDs were staggering. There were so many debates on the best way to build XSDs, I still remember the Venetian Blinds or Russian Doll patterns. Also this phase started the different integration wars. Either MQ Series or Microsoft BizTalk and the “guaranteed delivery”. We had some lovely discussions around Idempotency. There were even companies that the middleware layer was so crucial that it never was allowed to be updated! Ahhh, the good old days of Technical Debt.
This stage also marked the initial shift towards a more modular design (Service Oriented Architecture). Applications began interfacing with smaller, more specialized data parts. Which allowed us developers to create and maintain parts of a system landscape independently. APIs provided a way for these modules to interact without tight coupling (we thought). APIs allowed applications to fetch data on demand, reducing the delays of scheduled ETL batch jobs. We could access specific data without creating additional copies, improving data consistency. And new applications could easily connect to the existing ecosystem, providing a more flexible integration model.
But if I’m honest this traditional middleware and Service calls also had it’s limitations. These requests could create latency, especially when handling complex queries or high-frequency requests. Managing versions or dependencies became increasingly challenging, often requiring additional infrastructure like Service Locators and gateways.
What also started around that time frame were Operational Databases. A sort of specialized Data Warehouse were all of the different application data was stored to offload the stress of all these API calls. The trouble was with storing the application data in another database, the internal security model of these applications was lost and we had to come up with an extra security layer to have a measure of control on who could access what.
Phase 3: Event-Driven Architecture or Microservices
See content credentials
I’m not sure what DALL-E picked up, but we now have more colour and not the cool 60s vibe
Around this timeframe we have started to leave the company’s datacenter and move towards Cloud infrastructure. As applications evolved into microservices architectures, our previous traditional approach to integration strategies fell short.
Microservices require real-time data access and their inherent loose coupling to function effectively and scale independently. This need gave rise to event-driven architectures as a central integration strategy, enabling microservices to communicate efficiently while maintaining autonomy.
In an event-driven architecture, instead of requesting data through direct API calls or relying on periodic updates, microservices subscribe to events—discrete records of state changes within the system. Each time a relevant change occurs, such as a new order, it generates an event. This event is then picked up by any microservice that subscribes to it, allowing them to react instantly.
The event-driven model offers significant advantages. Microservices are designed to function independently, with each service subscribing only to events it needs. This decoupling ensures that services are not tightly bound to one another, improving resilience and allowing changes or updates in one service without impacting others.
Events propagate in real-time, so when a state change occurs, any relevant microservice can act immediately. For example, in an order processing flow, a payment service can trigger an inventory update and notify a shipping service as soon as an order is confirmed.
Microservices also introduce new challenges: managing lifecycle, handling high volumes, and ensuring data consistency across events. I still remember the talk from the CIO from Uber when they were the flagship microservices implementor. “I love the weekends, all the developers are off and they are not breaking everything, everywhere at once.”
Managing and scaling event-driven systems can be challenging, particularly as microservices count rises, setting the stage for the next level of integration: ZeroCopy.
Phase 4: The Rise of ZeroCopy Patterns
See content credentials
It’s probably my state of mind that I find these images so incredibly funny. But really? Wata access?
ZeroCopy patterns represent the latest and greatest in data integration (according to the suppliers of said technology). This approach enables applications to access data directly in shared memory or in isolated, secure environments without needing to copy it across systems. ZeroCopy offers a streaming approach, allowing applications to interact with a single, real-time data source.
As my focus as an architect over time drifted form building integrations to just using what the different apps offer, I don’t have much to say about ZeroCopy. Other than that it reduces storage costs and bandwidth usage by eliminating data duplication. Also with a single data source accessable by all applications, ZeroCopy avoids synchronization issues common in earlier integrations phases.
So what is next? Where are we headed?
Future Directions: Hybrid Approaches and Secure ZeroCopy Models
As organizations continue to embrace microservices together with their big applications like Salesforce and ERP, future data integration strategies are likely to blend traditional approaches with ZeroCopy for a more adaptable, performance-oriented model.
Here’s what I think we can expect moving forward:
Hybrid Integration Models: ZeroCopy patterns will coexist with APIs, operational databases, and event-driven models, giving microservices the flexibility to choose the best integration method based on data access requirements. And making it hard for architects that like simple models and their complexity decreased.Advanced Security and Access Controls for Shared Data: As ZeroCopy adoption grows, ensuring data privacy and security will be crucial. Innovations in data isolation, encryption, and access control will protect shared data in environments with multi-tenant applications and services. Just look at Salesforce’s DataCloud.Data Mesh and Data Fabric Architectures: With the rise of data mesh and fabric architectures, organizations will start to move towards decentralized data ownership and access, reducing the need for data duplication and aligning well with ZeroCopy principles. These architectures emphasize local data access within domains while supporting broader, seamless data sharing across the organization.
The evolution of data integration strategies, from early shared databases copies to ZeroCopy, reflects to me that we do not (yet) have a cohesive approach to data sharing, where applications and microservices can interact with real-time data securely and efficiently. But, let’s keep it positive, we are starting to find our way. I think.





