Reference data - or information about counterparties, financial products, issuers, exchange rates, corporate actions and prices - is a critical input for all key business systems of capital markets firms. Every financial transaction involves reference data, and it is used across front, middle and back office systems. Given its key role, capital markets firms rely heavily on sourcing reference data from data providers.
However, there are several challenges involved in sourcing the right reference data. First among them is siloed data. In most firms, business units maintain their own separate reference-data databases sourced from external data providers. This results in data duplication and excessive spends on sourcing data from expensive data providers. It also leads to issues such as poor data quality, multiple data sources, data duplication and lack of data governing policies.
The resulting inconsistent, inaccurate, and incomplete reference data causes major disruptions. Capital markets firms use straight through processing (STP) to reduce transaction time and to facilitate seamless initiation and settlement of securities without manual intervention. Poor quality reference data causes STP failures that lead to monetary losses and increased operational costs. The impact is not limited to STPs. Poor quality reference data impacts all critical business functions as teams have to spend time to investigate and correct errors.
Another challenge is regulatory compliance. Stricter financial regulation requires higher levels of reporting from capital markets firms. They are required to report data on trades and counterparties to reduce systemic risk. Consistent and accurate reference data across the organization is necessary for firms to better assess risk profiles of legal entities and financial instruments they trade with.
To overcome the challenges of traditional data sources, enterprises use a host of technologies to assimilate reference data on cloud. Gartner has said that enterprises with a cohesive strategy incorporating data hubs, lakes and warehouses will support 30% more use cases than competitors. Deploying a single source of reference data across the organization will eliminate most of the issues related to reference data, leading to significant reduction in operating costs.
Moving reference data to cloud
The key to building a successful data paradigm on cloud is taking a holistic approach to map business drivers to a unified data service model that assures end-to-end data lineage, data quality and compliance. This will help enterprises move away from the overheads of on-demand data management to self-service mode. The data lake should be accessible across the organization and delivered through a secure, robust and scalable platform, such as AWS Cloud. Some of the key steps in the process includes:
Creating a centralized enterprise-wide data lake for a growing pool of data to be consumed by several downstream applications, a cloud deployment makes for a compelling case as it provides near-infinite compute and storage capacity. AWS DynamoDB and Redshift can be used to build reference data lakes and manage associated metadata.
Next comes centralized sourcing of all external and internal data. All data transformations and custom data ingestion rules can be applied at one location, ensuring reference data integrity and data governance across the enterprise. WS Glue/EMR can be used for ingestion and transformation of data from a variety of sources; and AWS Athena and QuickSight can be used for analytics and reporting.
To provide access to reference data, APIs need to be developed that all downstream systems will use. AWS Lambda and Amazon API Gateway provide the API layer for all downstream systems.
Finally, the access to reference data needs to be controlled and monitored. Solutions such as AWS KMS and CloudWatch serve as efficient gatekeepers.
Advantages of a centralized cloud-based data lake
Data operations should enable business-aligned scalability, security and performance for enterprises of all sizes. The following benefits can be realized immediately with the implementation of a cloud-based solution:
Eliminate and rationalize duplicate data sourcing from expensive data providers, reducing operational costs.
Ensure data integrity and establish an enterprise-wide data governance framework with centralized data ingestion processes.
Reduce operational risks and costs due to system failures stemming from reference data issues.
Store and access vast quantities of historical records needed for regulatory compliance.
Monetize investments by selling enriched data to customers and stakeholders of capital markets entities.
Realize the benefits of cloud deployment, like scalability, cheaper storage costs, pay-as-you go pricing, plethora of services, and so on.
Over the long term, a cloud-based centralized reference data source offers even more opportunities, such as:
Develop state-of-the-art data analytics platforms to turn vast quantities of data into valuable source of information.
Build more effective AI/ML models, leveraging significant volumes of ‘training data’, for use cases such as portfolio analytics, price predictions, scenario expansions, exposure analysis, and so on.
Effectively back test algorithmic trading strategies, leveraging the near unlimited capacity for data and processing power of cloud platforms.
In summary, reference data usage presents a lot of challenges in capital markets. A cloud deployment will help resolve most of the challenges that plague reference data usage besides providing flexibility and long- term savings.
Read more about our storage services on AWS here.