The Evolution of Customer Data Integration

By Ken Rugg

Maintaining high quality information about customers is an imperative for today’s enterprises. Organizations require access to the most current and complete view of their customers. Enterprises may posses a substantial amount of customer data, but much of it is locked in silos distributed throughout the enterprise. Often the data is accessible only from a single application and for a single purpose.

Merging available information about customers together into a single coherent view is the domain of Customer Data Integration (CDI) technology. There are a number of approaches to deliver this integration, ranging from Enterprise Application Integration (EAI) which delivers process level integration to Enterprise Information Integration (EII) or virtual data federation which leaves customer data in place, but distributes queries across all the data sources. CDI Hubs, which pull all customer data together into a single, centralized database, have been widely promoted by industry experts as a better approach.

CDI Hubs provide a number of advantages over alternative approaches. First, CDI Hubs are information-centric and not process-centric. There are many reasons to integrate the processes that drive customer facing applications, particularly when introducing a service oriented redesign to those processes. However, if the goal is a single view of customer information that can support those processes, an information-centric approach makes sense. Secondly, the single centralized database allows new applications to immediately access data, without negatively impacting existing applications. Any new approach to CDI must preserve these benefits while addressing their short comings.

Customer information is everywhere
The demand for up-to-date, consistent customer information spans the enterprise. Any operation that touches the customer (sales, marketing and customer service) and any application that supports those operations requires timely, accurate data. To meet such requirements, CDI must deliver data wherever and whenever necessary and to whomever, regardless of the application that originated the data.

While current CDI Hubs are adept at pulling together a single view of the customer, they often fail to deliver the data where it is really needed. CDI Hubs provide a centralized view of customers in support of new applications, but those applications must be built to use the hub as their data source. Effectively, they overcome the problems of customer information trapped in silos by introducing yet another silo. While this enables new applications to be built with a consolidated view of the customer, it leaves older applications languishing with existing data stores and schemas. A better approach would deliver a projection of integrated customer data to current production applications as well.

Beyond the challenge of supporting enterprise applications at the central office, the right approach to CDI recognizes that customer data has utility in many different places. For example, it might be at a branch office or a call center that provides the first line customer service, yet uses enterprise applications that may have only intermittent connectivity to the central office, or on a plane where a sales person reviews account information on the way to a sales meeting. To handle such situations, the CDI infrastructure of the future must deliver complete and accurate customer information to corporate users and recognize that they will not always be connected to a central CDI Hub.

As the value of integrated customer information becomes recognized, additional users will invariably emerge. This increases demand for scalability on a hub-based approach. Such users will likely require access to additional information to augment the core customer information and will want it in a consistent view with the customer information. This will drive extensions to the data schema to support new use cases and new data from different originating systems. Organizations may be tempted to avoid this, believing it goes beyond the charter of customer data integration. A CDI approach that provides this extensibility will continue to increase in value as its user base grows. Over time it could emerge as the basis of a coherent enterprise information management strategy. It is short sighted to miss this tremendous opportunity simply due to concerns about “whose job it is” to do this integration.

Professional data management
While high quality information about customers is one of the most valuable assets a company can possess, it isn’t always managed as such. Database management systems have long provided a significant level of quality of service for the information they contain. These capabilities range from simply being able to see up-to-date information, to being able to modify that data when necessary, to ensuring that data integrity and availability is preserved even in the face of adverse conditions. CDI Hubs on the other hand are only beginning to offer these same levels of service and in some cases face significant hurdles in fully achieving this. Let’s examine the reasons this is the case.

As the enterprise moves inexorably toward “zero-latency” operations, the availability of up-to-date information becomes critical. It is no longer good enough to have information about your customer that was accurate as of some time yesterday. Many of the tools for moving customer data into CDI Hubs, however, have their roots in batch oriented Extract, Transform and Load, ETL. ETL has long been used for constructing data warehouses. Unfortunately, systems that are great for quickly loading vast amounts of data are generally not well suited for delivering continuously up-to-date information from many sources with near-zero latency to systems that are always on-line. To support these new operational parameters, new approaches that are optimized for real-time operations will be needed.

The data management requirements of CDI do not stop with delivering access to the most up-to-date customer information. While it is certainly valuable to be able to view the data, it is even more valuable to be able to update that information as well. Wouldn’t it be nice to be able to add a new contact or edit an existing customer record? While one could certainly log onto the appropriate operational systems and make those changes, this is and additional step to slow them down and one more opportunity for the new information to be lost. For these and other similar situations, support for updates is an important next step in the evolution of CDI Hubs.

This is where another complication comes in. Most users just assume that the customer information that they interact with is consistent. All modern database management systems provide ACID properties which make this a pretty good assumption. The problem with this, in the context of CDI Hubs, is that they have the ability to prevent the database management system from accomplishing this task. Why?

To better understand, let’s go back to that model of how information is moved to the CDI Hubs using ETL-based tools. These tools deliver a snapshot of the data from the source system to the hub. How do the updates get from the hub to the systems of record? Often it is simply by running another pipe “in-reverse” delivering a snapshot of the data that was modified back to its source. In this scenario, conflicting changes can happen in both systems while neither one is aware of the changes occurring in the other. This needs to be managed carefully to ensure no information is lost or corrupted. For example, if someone updates a customer’s billing address at the hub, while at the same time someone else posts a charge against that customer’s account, you need to make sure that neither of those changes is lost. Otherwise the bill might not include the new charge or be sent to new billing address. The problem is that once you allow two database management systems to update the same data independently, neither system can ensure that the actions are consistent. CDI systems of the future must address this concern if they are to provide the same level of support users have come to expect from systems managing their data.

Making CDI easier
If CDI is recognized as a key initiative in the enterprise today, why isn’t it universally adopted? During a recent trip to a cell phone store, the clerk was unable to help me because they didn’t have my customer information available. They had merged with another cell phone company a year ago, so they recommended I drive to a location that had been part of the company from whom I purchased the phone.

The reason enterprises tolerate situations like this is it’s not yet easy enough to implement and evolve CDI solutions. The sad truth is that for large corporations today, the time it takes to integrate the data from an acquisition can be longer than the time it takes to complete the next acquisition.

To make it easier to introduce CDI, systems must become more agile. It is important that the hub can be used to integrate a little bit of data today and then expand on that as needed. Creating unified master schema that includes all information about the customer that may ever be needed, is a highly complex task and introduces too high a barrier to CDI adoption. In contrast, an agile approach to CDI that accommodates extensions and changes to the customer schema over time lets integration begin with only the customer schema required to perform the initial integration. Once this is in place the CDI schema can evolve from there to support each new system to be integrated.

To further ease the burden of integrating customer data, the next generation of CDI infrastructure must be compatible with, and ideally able to leverage, service oriented architectures (SOA). Most would agree that SOA is a significant change and a big step forward in application architectures being developed today. Existing enterprise applications are being re-factored into “services.” New applications are being implemented with service orientation in mind, and some are even being outsourced to third parties as web services over the internet. Tools are being developed to reassemble these services into new “composite” applications to provide new capabilities. This has big implications on CDI.

All services assembled into a composite application need to work on consistent customer information. Ideally they could all just access the information in a CDI Hub, but since services come from many different sources, including legacy systems and applications executing outside the walls of company, this is unrealistic. As with applications mentioned previously, the unified customer information must be delivered to the application or service where it is needed. With all these services, each may have its own idea of how the customer information should look.

The modularity of services within a SOA also allows them to have independent life cycles. This means that even if a set of services initially share a common schema when developed, over time their requirements for customer information could diverge or evolve at a different rate. This further drives the need for the next generation of CDI to deliver data in the format of the applications, and services, that it is supporting. While these requirements of SOA are really the same ones discussed above in the context of more traditional applications, the dynamics of service orientation will dramatically accelerate the need for these capabilities.

Conclusion
CDI hubs clearly represent a significant step forward, providing all users with constant access to the most current and complete view of customer information available. To deliver on the full promise of customer data integration, however, there remain a number of challenges to overcome. CDI must evolve to deliver customer information wherever and whenever it is needed, and to whomever needs it. It must provide the qualities of service that are expected in more traditional database management systems, namely the information must be up-to-date, updatable and guaranteed to be consistent. Finally CDI must continue to get easier to adopt and evolve both by becoming more flexible as well as adapting to leverage the latest trends that make software development in general easier. Providing all this will require a new type of infrastructure that can build on the advances made in the development of CDI hubs.

About the Author
Ken Rugg is vice president of products for the Progress Real Time Division. In this role, he is responsible for the strategic direction and development of the product line, DataXtend. Previously, Rugg was chief technology officer with eXcelon Corporation. In his 10 years at eXcelon, Rugg served in many leadership roles, ranging from technical support and education to documentation, QA and product development. Under his leadership, eXcelon established its position as the leader in XML database and XML integration technologies while maintaining its dominant position in the object database market.  He can be reached at krugg@progress.com.