The Zettacap Blog

Zettacap allows investors to leverage the power of big data.

Learn More

To Hadoop or not to Hadoop

Published: April 29, 2014

By: Kevin Coogan

Hadoop has become synonymous with Big Data. And, more recently it has become an increasingly polarizing topic, in many ways like the term Big Data itself.

Hadoop was not an original to finance. It is slower with real-time data and in comparison to relational databases which dominate finance is less apt to slice and dice data. Hadoop however has its own advantages – not least of which is its distributed nature which allows for faster larger scale analytics. Another major advantage is its file system, Hadoop distributed file system or HDFS. It is excellent at storing unstructured and semi-structured data that do not have a well and previously defined schema. Such a schema fits new and novel data sources especially well.

It is important to stress that when people refer to Hadoop they are often mixing terms. Some people refer to Hadoop as being map-reduce (or the distributed and batch oriented processing), whereas others imply the file system or HDFS, and others are really talking about the Hadoop ecosystem which today includes many other open-source Big Data projects that integrate with or leverage the Hadoop infrastructure.

Going forward, it would appear that map-reduce will lose traction. It is inherently slow for real-time analysis, but actually good for certain larger scale calculations that might not be as time critical. Over time, batch type analysis will become less important, especially for finance, but will still continue to be used. Especially for overnight type calculations or end of day analysis, map-reduce will continue to dominate.

There are many new projects emerging ready to fill the gap between streaming Big Data analytics and Hadoop. Such projects as Storm and Spark allow for increasingly real time coverage, and resolve many of the issues revolving around Hadoop’s slower real-time analysis. Summingbird, a newer open source project, actually focuses on bridging the gap between Hadoop and real-time by creating a platform to dial back and forth between these two paradigms. In short, Hadoop’s map-reduce will likely decrease in usage and importance but it will not disappear – it will be used for less time-critical cases and will be buttressed by emerging open source projects made to fill in where map-reduce left off.

Its file system, HDFS, in contrast appears to have become fully ingrained in the Big Data world. It appears that even though map-reduce will diminish in importance over time, the file system will remain. That is not to say that all data will necessarily be required to land there before analytics take place. In fact, there are considerable financial uses cases where streaming data is first analyzed outside of Hadoop, only to end up in HDFS later for longer term storage and batch processes.

As for finance in general, most will likely continue to rely upon relational databases. This will especially be true for highly structured data that have very well defined use cases. If you want the data fast to do the same or similar types of straightforward analysis, relational databases are likely a good choice and will continue to be. If you want to leverage new data sources, especially those that contain unstructured or semi-structured data where the use cases and analytical paths are less well defined, you will likely end up using, or at least wanting your service provider, to use Hadoop and its related ecosystem.

Risks of Big Data and Investing

Published: March 20, 2014

By: Kevin Coogan

Of course, there is both a bright side and a dark side to any radical shift.  The shift that Big Data is producing and will continue to produce in society will create winners and losers.  Finance and investing will be no different.

Those embracing Big Data and figuring out the best ways to leverage it will capture alpha like never before.  The very best of these will in fact become the new Buffets, Soros, and Simons.

The downside is that many, if not most, will continue along the same traditional path. They will follow the inertia of their investment process and hope for better days. They will likely ignore new data and analytical techniques until they are forced upon them.

At this stage, there is risk in all directions. This is exactly what finance guys do not want to hear.  Diving into new data and analytics is risky, as is staying out.  Really the question becomes which is more risky and at what point do you need to make the decision.

In many cases, it is difficult to fully back-test newer sources of data and concepts due to restrictions on historical data (meaning the data just does not go back very far).  Additionally, many of the current datasets are yet to be fully standardized and vetted.  Lastly, although academic research is starting to pick apart Big Data in general, it still pales in comparison to the decades of acknowledged research supporting traditional financial data and analysis.

But the other choice of putting your head in the sand at this important turning point in financial history does not appear to be such a low risk move either.  Big Data is no longer coming, it is already here.  To ignore it at this stage is the real risk.

Zettacap, Naming a Company

Published: March 20, 2014

By: Kevin Coogan

Naming a company is always a difficult process. Anyone who has gone through it has their own war stories. Hitting seemingly endless “oh, drat they got that name too” moments is way too common.

As for Zettacap, it took a while to get here, but the name is actually fairly meaningful.
“Zetta” implies Big Data—being the unit of measuring data that comes after exa. Most people are familiar with giga, as in a gigabyte. After giga come tera, peta, exa, and then zetta. A zettabyte is approximately one trillion gigabytes (for a really fascinating description of a zettabyte this infographic is worth a look). In other words, it is a ton of data. And—no, we do not have a zettabyte of data—nobody does yet, but most certainly will in our lifetime.

The age of the zettabyte is upon us, however, as evidenced by this study from Cisco that estimates 2016 as the first year that a zettabyte of data is transferred over the internet. Additionally, the same report points out that although the size of data is hitting exceptional levels, its expected growth rate remains in the twenty-plus range. In other words Big is getting bigger.

“Cap” implies finance and investing. Cap can be short for capital, capitalization, and/or capitalize—all terms used alone and in conjunction with other finance terms. Basically it would be difficult to go through a single day at a fund or bank without hearing a form of cap used. It is also fairly common in names, including hedge funds (“Soros Capital Management”) and data provides (“CaptialIQ,” often shortened to CapIQ).

Zetta implies really Big Data, and Cap implies traditional finance.
Zetta+Cap = Big Data for Finance.

For a particularly hilarious commentary on naming hedge funds, refer to the marketfolly blog. At least we avoided the Greek mythology and predatory animal naming themes.

More Entries:

1 2 3 4

Contact Zettacap

Please leave this field empty.

If you have questions about Zettacap, please contact us. Visit our website if you'd like to enroll in our free beta program.

Legal Notice: