Web Crawling as alternative data, a regulatory perspective.
Eagle Alpha has released a new white paper created by our advisory board member Gene Ekster, CFA on the topic of web crawling and the associated compliance risks. The paper is split into 5 sections:
1. An understanding of the evolving law surrounding web harvesting;
2. Review of the terms and conditions associated with the websites crawled;
3. Control over the potential interference with harvested web sites;
4. Review of web harvesting projects;
5. Review of vendors utilized for web harvesting.
There are fewer than 50 known web crawling legal cases and none of them directly relate to the asset management community. The only well-known case in relation to financial institutions is Barclays Capital v. Theflyonthewall.com and it involved copyright issue which is not applicable to funds. Mr. Ekster thinks that the website operators are not motivated to take legal action: “In part, this reluctance is due to a small likelihood of a win in court and an absence of a clear reward to the plaintiff in case of a win. Thus, few companies are willing to deploy their legal resources to pursue a web crawler even if the crawler is in violation.”
Tonia Klausner, an attorney partner at Wilson Sonsini Goodrich & Rosati specializing in internet data, shares the sentiment: “The value of legal claims against web crawlers is low where the crawler does not crash or otherwise harm the website, and the crawled data is not used in competition with the website operator. This is one reason why we don’t see many claims being filed in court against web-crawlers, and why the claims that are filed tend to be driven by the by the crawlers’ somehow damaging the business of the data owners, whether directly or due to opportunity cost.”
Mr. Ekster prepared the below graphic showing most of the known web crawling cases to date, key issues, outcome, and relevance to the investment community.
Mr. Ekster discusses vendor management and mentions that “using a third party vendor to collect web data can offset some of the liability.” Cost-benefit analysis has to be done when deciding whether to gather data internally or buy/license it. You can have more control and independence with new tools like diffbot available to structure raw data when crawling. On the other hand,the availability of back data and time savings associated with leveraging vendors’ data expertise are the advantages of buying and licensing. Costs of doing business can be reduced and risks mitigated when you have a great relationship with data vendors. Insights and analytics are beneficial to vendors, so it is always a two-way conversation.
CLICK THE WEBSITE LINK TO ACCESS THE REPORT.