How Successful Investment Firms Deal with Bad Data



The search for alpha has pushed investment management firms into a domain where data reigns supreme. This has made the need for accurate, valid, consistent, relevant, and reliable data (“good data”) more important than ever.

Most successful investment management firms realize that high-quality data plays an integral part in fund management, especially when it comes to minimizing operational risk and maximizing alpha. Data informs nearly every aspect of day-to-day fund operations, from the communication with current and potential investors to your investment strategies, post-trade reconciliations, analytics, and reporting.  Therefore, using bad data can come at a high cost, and not just for the investment management firm industry.

According to a study from the Harvard Business Review, “Bad Data Costs the U.S. $3 Trillion Per Year”. Who can forget the infamous “London Whale” trade-in 2013? Leaving the significant financial cost, and the increased regulatory scrutiny aside, the loss of reputation among peers and investors can be almost impossible to repair in the short term. As a result, bad data can radically impede an investment management firm’s ability to maintain its place in an already competitive market.   On the other hand, good data leads to better-informed investment decisions and increased investor confidence, resulting in higher returns, increased client satisfaction, and growth in assets.

Microsoft Excel is a powerful tool for aggregating and analyzing data.  As a result, it has become entrenched in the business processes of both emerging and established investment management firms. Microsoft Excel is used to store historical data and for performing data analysis, including financial modeling, what-if scenarios, reporting, and visual analytics, among other daily, front, middle, and back-office tasks.  However, due to the need for a low learning curve, Microsoft Excel, by design, has limitations that are obvious to most seasoned fund professionals. These limitations increase the likelihood of storing and generating bad data within Microsoft Excel.

What Is Covered in This Article?

Based on our experience working with a multitude of investment management firms, we have identified some of the most common data quality issues. Our goal is to provide a better understanding of the types of data related frustration that operations and investment teams might be facing, and the steps you could take to address these issues, as part of the long-term data strategy for your fund.

What Exactly Is ‘Bad’ Data Within Your Investment Management Firm Operations?

Bad data, or dirty data as it is often referred to, is erroneous data that negates the integrity of the whole data set, in some cases making it unusable. The most common four types that we have come across are discussed below.

Missing or Incomplete Data

Missing or incomplete data is one of the most common types of bad data. This occurs when important cells within a master (source) workbook are left blank (null) due to unforeseeable reasons.  A common example of this is missing corporate action data for equity when calculating its market value.    This could happen because the CSV file that you received from the counterparty does not yet have this information available at the time of download from their FTP server. Another example could be a missing currency conversion rate for a specific trading day due to the market being closed when working with multi-currency portfolios.

Inconsistent Data

Inconsistent data is one of the biggest problems that exist for data-driven businesses and can affect not only your bottom line but your reputation in a competitive market. If data is entered into a Microsoft Excel workbook in inconsistent formats, analyzing or storing the information becomes difficult and requires back-office analysts to clean the data to allow for accurate interpretation. For example, different users have entered the pricing information into multiple workbooks and have used a different format type in multiple worksheets for the same information.     Another example is the formatting of a date field is not consistent among multiple workbooks. In one workbook it’s entered as MM-DD-YYYY, whereas in another it’s entered as DD-MM-YYYY. The information entered into these fields may be accurate but if not in the pre-determined format, it will lead to issues when performing analysis on this data.   While most investment management firms have predetermined requirements for the format of each column, it is easy for these guidelines to not be followed, especially, when working under pressure.

Invalid Data

Invalid data can occur when the data entered is not within the predetermined range. For example, a subscription was entered with a negative sign and redemption with a positive.   Another example of invalid data could be gross margin values that are not between 0 and .9 in certain instances. Again, this can result in the investment management firm operations or investment team spending a significant amount of time and resources cleaning the data.    This can be a potentially costly business interruption, as analysts may be spending the majority of their time focused on data quality, rather than providing useful analysis and reporting.

Duplicate Data

Duplicate data refers to the same data being entered into a file more than once. Duplicate data is typically due to human error and can occur when one or more people with access to the same file enter the same data several times.    Or, it can be an automated system process that has gone wrong. For example, an asset is listed multiple times in the position data you received from a counterparty. This error usually occurs due to human oversight, and there is no way to identify this in Microsoft Excel, without manually reviewing the file for these types of issues. In most instances, this is caught by the investment manager or the person looking at the analysis or report, which is typically too late in the process to be of benefit to the team or prevent wasted resource time.

Bad Raw Data Vs Bad Calculated Data – What’s the Difference?

Raw data or source data is primarily unprocessed data used to extract meaningful information for analyzing and processing, drawing conclusions, and making projections, etc. The end product of this processed data is visual analytics and various reports.   Raw data is of little use to your company before being processed and turned into information. Therefore, if you begin with dirty source data you will end up with unreliable, inaccurate information, hindering your ability to make sound decisions for your investors.   Calculated data is a numeric or date field that derives its data from the calculation of other fields using calculated columns in Microsoft Excel tables. Calculated columns in Microsoft Excel is an incredible tool, as it allows you to enter a single formula in one cell, and that formula will automatically expand to the rest of the columns. Although this can create great efficiencies for the operations team, it can also create significant inefficiencies. For example, if you have multiple rows of data and make an error in a formula, that change will expand to the rest of the calculated columns, corrupting all of the data throughout the Microsoft Excel workbook.

Bad Data Is Wasting Your Operations Team’s Valuable Time

Garbage in, garbage out — your investment management firm’s operational cost, risk, and even more importantly your reputation, depends on the accuracy of your data.  Poor data quality can significantly reduce productivity and the business impact of dirty data is staggering.    It is not uncommon for operations teams to spend as much as 2-3 hours daily reconciling issues created by bad data. Modern techniques and technology can minimize this impact, allowing your operations team to be more agile while reducing inefficiencies and operational costs.   Bad data results in poor investment decisions, lead to reputational consequences and loss of revenue, greater inefficiencies as valuable time is spent cleaning data, wasted resources due to wrong decisions, and unprofitable communications. Experian reports that on average, companies across the globe feel that 26% of their data is dirty, costing the average business 15% to 25% of revenue, and the US economy over $3 trillion annually.

What’s Are Some Promising Solutions?

Most investment management firms use considerable amounts of diverse data. Once you have the data, cleaning and aggregating the data into a usable format presents a significant challenge within most investment management firms.   Often, the most instinctive solution is a short-term fix, which includes adding more people to the operations or investment team to focus on cleaning the data. At Cognivo, we are taking two approaches to help investment management firms address this challenge for the long-term.

Approach #1 – Building on Top of Your Current Microsoft Excel Reliant Set up 

Microsoft Excel is not going anywhere, and investment management firms will continue to use Microsoft Excel as one of their primary tools, due to the familiarity and comfort level that users have with it.   Investment management firms on a growth trajectory, realize that they need to introduce more advanced and creative solutions in addition to Microsoft Excel, to minimize operational cost and risk. It is imperative to employ solutions that ensure your raw data is accurate and reliable. This elicits confidence in the knowledge gleaned from the data, ensuring it’s an asset and not a long-term liability.   In this approach, we utilize proprietary methods, software, and tools that we have developed over the years. These tools are built on top of your current Microsoft Excel datasets, allowing us to extend Microsoft Excel, and provide more advanced visual analytics, dashboards, and reporting, while utilizing your current Microsoft Excel workbooks, CSV files, emails, and other data sources.   Realizing the substantial value inherent in integrating highly customized business intelligence solutions, with your current Microsoft Excel set up, to support your stores of data, will eliminate inefficient processes and minimize the frustration your operation team experiences. In turn, this approach makes it much easier to manipulate, analyze, and summarize data, for reporting to the investment management team, compliance reporting, as well as investor reporting.

Approach #2 – Cleaning, Aggregation & Centralization of Your Internal Fund Data

A Robust Data-Driven Strategy Is Vital to Your Investment Management Firm’s Success

While data is abundant and inexpensive to store, it will be a burden and put you at a disadvantage if it is not clean, correct, and readily available in a usable format. There is no getting around the critical process of ensuring you have clean data. Therefore, modernizing your processes is a necessary investment if you want your data to be an asset and not a liability.   While investment firms understand the importance of analyzing data and data quality, data aggregation is often overlooked when developing a data strategy. This, in turn, wastes time and resources, while hindering their ability to mitigate risk.    Implementing a data aggregation solution can help enhance your data analysis and reporting capabilities, by adding features like real-time P/L, drill-downs, interactive charts, and one-click export to multiple formats for sharing. Over and over, we see that automating critical manual processes improves the fund’s ROI, as it frees up time to focus on the analytics process.   As the investment management industry becomes more data-driven, it is crucial for investment management firms to have strong data plans in place that guarantee data retrieved is clean and ensures analysts can turn it into reliable insights in an efficient manner.   We hope this article helps you formulate your thoughts as you think through your investment management firm’s data strategy and challenges in the coming year.