Hadoop has become synonymous with Big Data. And, more recently it has become an increasingly polarizing topic, in many ways like the term Big Data itself.
Hadoop was not an original to finance. It is slower with real-time data and in comparison to relational databases which dominate finance is less apt to slice and dice data. Hadoop however has its own advantages – not least of which is its distributed nature which allows for faster larger scale analytics. Another major advantage is its file system, Hadoop distributed file system or HDFS. It is excellent at storing unstructured and semi-structured data that do not have a well and previously defined schema. Such a schema fits new and novel data sources especially well.
It is important to stress that when people refer to Hadoop they are often mixing terms. Some people refer to Hadoop as being map-reduce (or the distributed and batch oriented processing), whereas others imply the file system or HDFS, and others are really talking about the Hadoop ecosystem which today includes many other open-source Big Data projects that integrate with or leverage the Hadoop infrastructure.
Going forward, it would appear that map-reduce will lose traction. It is inherently slow for real-time analysis, but actually good for certain larger scale calculations that might not be as time critical. Over time, batch type analysis will become less important, especially for finance, but will still continue to be used. Especially for overnight type calculations or end of day analysis, map-reduce will continue to dominate.
There are many new projects emerging ready to fill the gap between streaming Big Data analytics and Hadoop. Such projects as Storm and Spark allow for increasingly real time coverage, and resolve many of the issues revolving around Hadoop’s slower real-time analysis. Summingbird, a newer open source project, actually focuses on bridging the gap between Hadoop and real-time by creating a platform to dial back and forth between these two paradigms. In short, Hadoop’s map-reduce will likely decrease in usage and importance but it will not disappear – it will be used for less time-critical cases and will be buttressed by emerging open source projects made to fill in where map-reduce left off.
Its file system, HDFS, in contrast appears to have become fully ingrained in the Big Data world. It appears that even though map-reduce will diminish in importance over time, the file system will remain. That is not to say that all data will necessarily be required to land there before analytics take place. In fact, there are considerable financial uses cases where streaming data is first analyzed outside of Hadoop, only to end up in HDFS later for longer term storage and batch processes.
As for finance in general, most will likely continue to rely upon relational databases. This will especially be true for highly structured data that have very well defined use cases. If you want the data fast to do the same or similar types of straightforward analysis, relational databases are likely a good choice and will continue to be. If you want to leverage new data sources, especially those that contain unstructured or semi-structured data where the use cases and analytical paths are less well defined, you will likely end up using, or at least wanting your service provider, to use Hadoop and its related ecosystem.
Nothing on this site constitutes an offer to sell any securities or the solicitation of an offer to purchase any securities. The content, commentary, posts, links, tweets, re-tweets, presentations, documents, videos, and messages (collectively as "Information") on this site are presented for informational purposes only. The intention of presenting such Information is to provoke thought by the reader as to the potential usefulness of big data, social media, text analytics, social mood, sentiment, user generated content, geo-location, news, events, publicly available information, financial data, investment trends and other data as a way to improve and/or support investment analysis and decision making. The reader should make an informed decision as to his/her investment strategy with his/her registered investment advisor ("RIA") or as an independent investor. Zettacap is not a RIA and does not provide such services. This site may contain forward looking statements and conditional statements using such words as "expects", "appears", "may", "should", "anticipates", or similar words and expressions referring to potential future outcomes. Such statements do not constitute guarantees or recommendations of any kind. Information presented on this site may be incomplete and out-of-date. Additionally, the reader understands that the algorithms used by Zettacap as well as the sources of information used may change periodically, and that the reader will not receive notifications or explanations concerning such changes. By reviewing the Information on this site, the reader agrees to the aforementioned statements and agrees to discharge and exonerate the creators of the Information as well as Zettacap, its owners, and its related parties, from any liability associated with the taking or not taking of actions due to the reader reviewing Information on this site.
© 2017 Zettacap // All rights reserved.