Several months ago I took up a monster of a project I had no clear idea how to tackle. What was abundantly clear was that I would need to gain a broad-reaching understanding of company policies, internal systems, and open source technologies to succeed. I had one quarter to deliver meaningfully. In an act of desperation, I reached for my coding agent as a research assistant.
n10k.com: blog et al.
I have previously released artifacts for Apache HBase using MacOS and Windows 11 + WSL2. Now I am running a native Linux installation, and so I again have some minor details to work through. This install is built on systemd, which is of minor concern. More interestingly, I decided to drop Docker and instead use Podman and crun as my interface over Linux containers.
While working on HBase bug fixes and feature development, it’s often quite convenient to test changes on a local-mode HBase. This is done by running HBase right out of your developer sandbox. Though a lot of HBase development happens on Macs these days, it’s a system designed first to run on Linux. That means there are a couple minor annoyances for non-Linux users. Let me show you how I work around one of them.

Between HBaseCon and Hadoop Summit I took a short trip to Europe. I got to spend some more time working along side Nicolas and meet some of the Scaled Risk crew. I also took a small holiday through the hillside in Romania! Along the way, I was invited to present for both the Paris HPC Meetup and the London HBase Meetup.
Every year at Hadoop Summit there’s a little un-conference, call the Birds of a Feather Sessions, or BoF for short. These are topical meetups that take place after the conference proceedings and are open to non-attendees. This year I helped organize the HBase BoF, along with Subash D’Souza.

The Latency Talk Nicolas and I gave at HBaseCon has been accepted for Hadoop Summit San Jose. If you missed us at HBaseCon, you get one more opportunity! We’re speaking on June 4th at 3:25p.
See you in June!
Edit: Unfortunately, Nicolas was unable to make it so I presented solo. I hope I did his section justice.

HBaseCon was another fantastic conference this year! It’s a great resource for information about and around HBase, no matter where you are along your path. This year I presented a talk along with a colleague of mine, Nicolas Liochon of Scaled Risk fame. Our topic: HBase as an online, low-latency system.
The HBase BlockCache is an important structure for enabling low
latency reads. As of HBase 0.96.0, there are no less than three different
BlockCache implementations to choose from. But how to know when to use one
over the other? There’s a little bit of guidance floating around out there, but
nothing concrete. It’s high time the HBase community changed that! I did some
benchmarking of these implementations, and these results I’d like to share with
you here.
Note that this is my second post on the BlockCache. In my
previous post, I provide an overview of the BlockCache in general as
well as brief details about each of the implementations. I’ll assume you’ve
read that one already.
Edit: The sequel post, BlockCache Showdown is now available!
HBase is a distributed database built around the core concepts of an ordered
write log and a log-structured merge tree. As with any database, optimized I/O
is a critical concern to HBase. When possible, the priority is to not perform
any I/O at all. This means that memory utilization and caching structures are
of utmost importance. To this end, HBase maintains two cache structures: the
“memory store” and the “block cache”. Memory store, implemented as the
MemStore, accumulates data edits as they’re received, buffering
them in memory 1. The block cache, an implementation of the
BlockCache interface, keeps data blocks resident in memory
after they’re read.
This is the second of two posts examining the use of Hive for interaction with HBase tables. This is a hands-on exploration so the first post isn’t required reading for consuming this one. Still, it might be good context.
“Nick!” you exclaim, “that first post had too many words and I don’t care about JIRA tickets. Show me how I use this thing!”
This is post is exactly that: a concrete, end-to-end example of consuming HBase over Hive. The whole mess was tested to work on a tiny little 5-node cluster running HDP-1.3.2, which means Hive 0.11.0 and HBase 0.94.6.1.
