While working on HBase bug fixes and feature development, it’s often quite convenient to test changes on a local-mode HBase. This is done by running HBase right out of your developer sandbox. Though a lot of HBase development happens on Macs these days, it’s a system designed first to run on Linux. That means there are a couple minor annoyances for non-Linux users. Let me show you how I work around one of them.
Between HBaseCon and Hadoop Summit I took a short trip to Europe. I got to spend some more time working along side Nicolas and meet some of the Scaled Risk crew. I also took a small holiday through the hillside in Romania! Along the way, I was invited to present for both the Paris HPC Meetup and the London HBase Meetup.
See you in June!
Edit: Unfortunately, Nicolas was unable to make it so I presented solo. I hope I did his section justice.
HBaseCon was another fantastic conference this year! It’s a great resource for information about and around HBase, no matter where you are along your path. This year I presented a talk along with a colleague of mine, Nicolas Liochon of Scaled Risk fame. Our topic: HBase as an online, low-latency system.
BlockCache is an important structure for enabling low
latency reads. As of HBase 0.96.0, there are no less than three different
BlockCache implementations to choose from. But how to know when to use one
over the other? There’s a little bit of guidance floating around out there, but
nothing concrete. It’s high time the HBase community changed that! I did some
benchmarking of these implementations, and these results I’d like to share with
Note that this is my second post on the
BlockCache. In my
previous post, I provide an overview of the
BlockCache in general as
well as brief details about each of the implementations. I’ll assume you’ve
read that one already.
Edit: The sequel post, BlockCache Showdown is now available!
HBase is a distributed database built around the core concepts of an ordered
write log and a log-structured merge tree. As with any database, optimized I/O
is a critical concern to HBase. When possible, the priority is to not perform
any I/O at all. This means that memory utilization and caching structures are
of utmost importance. To this end, HBase maintains two cache structures: the
“memory store” and the “block cache”. Memory store, implemented as the
MemStore, accumulates data edits as they’re received, buffering
them in memory 1. The block cache, an implementation of the
BlockCache interface, keeps data blocks resident in memory
after they’re read.
This is the second of two posts examining the use of Hive for interaction with HBase tables. This is a hands-on exploration so the first post isn’t required reading for consuming this one. Still, it might be good context.
“Nick!” you exclaim, “that first post had too many words and I don’t care about JIRA tickets. Show me how I use this thing!”
This is post is exactly that: a concrete, end-to-end example of consuming HBase over Hive. The whole mess was tested to work on a tiny little 5-node cluster running HDP-1.3.2, which means Hive 0.11.0 and HBase 0.94.6.1.
This is the first of two posts examining the use of Hive for interaction with HBase tables. The second post is now available.
One of the things I’m frequently asked about is how to use HBase from Apache Hive. Not just how to do it, but what works, how well it works, and how to make good use of it. I’ve done a bit of research in this area, so hopefully this will be useful to someone besides myself. This is a topic that we did not get to cover in HBase in Action, perhaps these notes will become the basis for the 2nd edition ;) These notes are applicable to Hive 0.11.x used in conjunction with HBase 0.94.x. They should be largely applicable to 0.12.x + 0.96.x, though I haven’t tested everything yet.
I spent last week in NYC at this year’s Strata+Hadoop World, where I was invited to speak. The title of this talk is the same as the talk I gave at the Big Data Deep Dive in May, but the content received a thorough overhaul. Thanks to all the attendees and friends who give me great advice on this first go-around. Hopefully the improvements were helpful.