Managing The Data Supply Chain |
sup·ply chain /səˈplī CHān/ noun the sequence of processes involved in the production and distribution of a commodity.
Motivated by the above definition and noting that data is not necessarily a commodity (even though the comparison to oil has been made by many, as data is thought of as fueling the modern information economy), we can offer up the following broad summary: Data Supply Chain Management is the selection, collection, organization, streamlining and flow-control of data – including any pre-processing, repair and normalization steps - to make it usable, guided by domain knowledge, for the subsequent downstream process. Typically, the next step involves analysis via traditional statistical or contemporary machine learning tools. The end goal of the exercise is to generate insights that can imply customer value, inform revenue or pricing metrics, optimize costs and help gain a competitive advantage in the marketplace. Why is the Data Supply Chain important? The outcome of analytical engines in general and machine learning in particular can be highly dependent on the quality and other attributes of the ingested data. The exponential increase in the quantity and variety of data provides opportunities as well as challenges with regards to the sourcing, selection, and preparation (cleansing, aggregating, and normalizing) of the data. Another dimension of the data supply chain is its integrity as well as compliance with many evolving regulations regarding privacy, security and ethical use. Trustworthy data is key to trusted outcomes. The moniker ‘Big Data’ has been popularized over the last decade. In the following section, we describe its nuances, especially as it relates to performing due diligence on the data supply chain. Key data dimensions: These dimensions are considered standard across many domains.
(e.g., via radar or satellite imagery, sensors installed in industrial facilities or on equipment, drones). Datasets generated by individuals are often in the unstructured textual format and commonly require natural language processing. Sensor-generated data can be produced in both structured and unstructured formats. Many business generated datasets, such as credit card transactions, and company ‘exhaust’ data may need to comport with existing and emerging legal and privacy considerations. In addition to the source of data, collection techniques can be passive or active, including proactively seeking additional elements. Additional data elements may be collected, following a cost-benefit analysis – e.g., when it is estimated that spending $X on additional data collection and processing may result in a benefit exceeding $2X! Other attributes of data (besides the preceding classification) may be important from an intended use standpoint. Investment professionals may want to map a dataset to an asset class or investment style, after considering its quality, technical specifications and alpha potential. The following related issues come to mind and will be discussed:
Why the buzz around Alternative Data in particular and Big Data in general? Alternative data promises to provide an “edge” for the investment professionals. As more investors adopt alternative datasets, the market will start reacting faster and will increasingly anticipate traditional or ‘old’ data sources such as quarterly corporate earnings or low-frequency macroeconomic data. This change gives a potential edge to quant managers and those willing to adapt and learn about new datasets and methods, deploying them rapidly. Eventually, ‘old’ datasets will lose most of their predictive value, and new alternative datasets - that anticipate official, reported values - may increasingly become standardized. There will be an ongoing effort to uncover new higher-frequency datasets and refine/supplement old ones. Machine learning techniques will become a standard tool for quantitative investors and perhaps some fundamental investors too. Systematic strategies deployed by quants such as risk premia, trend following, and equity long-short will increasingly adopt machine learning tools and methods, acting on information, a majority of which may have its origin in alternative datasets. Hedge fund managers are positioned to get much value from big data. While big data may provide the most value for short-term investors, long-term institutional investors can also benefit from its systematic use. More importantly, they may not have to compete with hedge fund managers to acquire alternative data sets that are most suited for short-term trading. By focusing on defensive or longer-duration alpha strategies, long-term investors should be able to acquire alternative datasets at reasonable costs, as they may not necessarily be in high demand by the trading-oriented asset managers. For example, ESG investing is becoming a major force, with the twin goals of medium-term returns and long-term positive impact on the planet and society.
Big data and machine learning algorithms could also be used to improve compliance at the firm level and to aid in performing due diligence on managers hired by long-term investors. As pointed out by Lopez de Prado in a Bloomberg article(1), the majority of the hedge funds that have consistently beaten the market averages in recent years have employed highly mathematical, data-intensive strategies. In the same article, it is reported that several traditional hedge funds have closed, while quantitative funds such as Renaissance Technologies(2) and Two Sigma are still attracting new assets and clients. In summary, the data supply chain is something a traditional as well as alternative portfolio analyst / manager should pay attention to. It can help inform security selection, asset allocation, portfolio tilting (towards desired factors) and risk management tasks, to name a few. Companion reference: Alternative Data: Don’t bark up the wrong tree (too early)! By Ganesh Mani, PhD References (1) https://www.bloomberg.com/news/articles/2018-10-09/the-big-problemwith-machine-learning-algorithms (2) The Man Who Solved the Market: How Jim Simons Launched the Quant Revolution by Gregory Zuckerman Contact us: Mehrzad Mahdavi | Ganesh Mani, Ph.D. Adjunct Faculty Carnegie Mellon, FDP Advisory Board Ganesh Mani is on the adjunct Faculty of Carnegie Mellon University and is considered a thought leader in the areas of investment management and AI / FinTech. He has been a pioneer in applying innovative techniques to multiple asset classes and has worked with large asset management firms incl. hedge funds, after having sold one of the earliest AI/ML-based investment management boutiques into SSgA (in the late nineties), nucleating the Advanced Research Center there. Ganesh has been featured on ABC Nightline and in a Barron’s cover story titled “New Brains on the Block”. Mr. Mani has an MBA in Finance and a PhD in Artificial Intelligence from the University of Wisconsin-Madison, as well as an undergraduate degree in Computer Science from the Indian Institute of Technology, Bombay. Ganesh is a charter member of TiE (www.TiE.org), was an early member of the Chicago Quantitative alliance and is on the advisory board of the Journal of Financial Data Science. Dr. Mahdavi is a technology entrepreneur with focus on breakthrough digital transformation for the Fintech and the energy sectors. He is a recognized expert and frequent keynote speaker on application of AI, IoT, and Cloud computing in financial sector and industries. Dr. Mahdavi managed major global businesses in the energy sector. He is currently the executive director of the Financial Data Professionals Institute (FDPI), a non-profit organization founded with CAIA. Mehrzad holds a PhD in Nuclear Science and Technology from the University of Michigan a Bachelor of Science in Electrical and Electronics Engineering from the University of Illinois at Urbana-Champaign. |