While working on HBase bug fixes and feature development, it’s often quite convenient to test changes on a local-mode HBase. This is done by running HBase right out of your developer sandbox. Though a lot of HBase development happens on Macs these days, it’s a system designed first to run on Linux. That means there are a couple minor annoyances for non-Linux users. Let me show you how I work around one of them.
Greetings From Europe
Between HBaseCon and Hadoop Summit I took a short trip to Europe. I got to spend some more time working along side Nicolas and meet some of the Scaled Risk crew. I also took a small holiday through the hillside in Romania! Along the way, I was invited to present for both the Paris HPC Meetup and the London HBase Meetup.
BlockCache 101: Lightning Talk Edition
Every year at Hadoop Summit there’s a little un-conference, call the Birds of a Feather Sessions, or BoF for short. These are topical meetups that take place after the conference proceedings and are open to non-attendees. This year I helped organize the HBase BoF, along with Subash D’Souza.
Latency Talk at Hadoop Summit
The Latency Talk Nicolas and I gave at HBaseCon has been accepted for Hadoop Summit San Jose. If you missed us at HBaseCon, you get one more opportunity! We’re speaking on June 4th at 3:25p.
See you in June!
Edit: Unfortunately, Nicolas was unable to make it so I presented solo. I hope I did his section justice.
HBase: Where Online Meets Low Latency
HBaseCon was another fantastic conference this year! It’s a great resource for information about and around HBase, no matter where you are along your path. This year I presented a talk along with a colleague of mine, Nicolas Liochon of Scaled Risk fame. Our topic: HBase as an online, low-latency system.
BlockCache Showdown
The HBase BlockCache
is an important structure for enabling low
latency reads. As of HBase 0.96.0, there are no less than three different
BlockCache
implementations to choose from. But how to know when to use one
over the other? There’s a little bit of guidance floating around out there, but
nothing concrete. It’s high time the HBase community changed that! I did some
benchmarking of these implementations, and these results I’d like to share with
you here.
Note that this is my second post on the BlockCache
. In my
previous post, I provide an overview of the BlockCache
in general as
well as brief details about each of the implementations. I’ll assume you’ve
read that one already.
BlockCache 101
Edit: The sequel post, BlockCache Showdown is now available!
HBase is a distributed database built around the core concepts of an ordered
write log and a log-structured merge tree. As with any database, optimized I/O
is a critical concern to HBase. When possible, the priority is to not perform
any I/O at all. This means that memory utilization and caching structures are
of utmost importance. To this end, HBase maintains two cache structures: the
“memory store” and the “block cache”. Memory store, implemented as the
MemStore
, accumulates data edits as they’re received, buffering
them in memory 1. The block cache, an implementation of the
BlockCache
interface, keeps data blocks resident in memory
after they’re read.
HBase via Hive, Part 2
This is the second of two posts examining the use of Hive for interaction with HBase tables. This is a hands-on exploration so the first post isn’t required reading for consuming this one. Still, it might be good context.
“Nick!” you exclaim, “that first post had too many words and I don’t care about JIRA tickets. Show me how I use this thing!”
This is post is exactly that: a concrete, end-to-end example of consuming HBase over Hive. The whole mess was tested to work on a tiny little 5-node cluster running HDP-1.3.2, which means Hive 0.11.0 and HBase 0.94.6.1.
HBase via Hive, Part 1
This is the first of two posts examining the use of Hive for interaction with HBase tables. The second post is now available.
One of the things I’m frequently asked about is how to use HBase from Apache Hive. Not just how to do it, but what works, how well it works, and how to make good use of it. I’ve done a bit of research in this area, so hopefully this will be useful to someone besides myself. This is a topic that we did not get to cover in HBase in Action, perhaps these notes will become the basis for the 2nd edition ;) These notes are applicable to Hive 0.11.x used in conjunction with HBase 0.94.x. They should be largely applicable to 0.12.x + 0.96.x, though I haven’t tested everything yet.
HBase for Architects, Redux
I spent last week in NYC at this year’s Strata+Hadoop World, where I was invited to speak. The title of this talk is the same as the talk I gave at the Big Data Deep Dive in May, but the content received a thorough overhaul. Thanks to all the attendees and friends who give me great advice on this first go-around. Hopefully the improvements were helpful.