Apache Kafka is a powerful stream processing application that can be found at the heart of the largest data warehouses around the world. Responsible for the heavy lifting from data sources to data sinks, Apache Kafka is capable processing millions of records or message per second while still maintaining sub-second end-to-end latency. However, this is only possible if we keep our Apache Kafka Clusters along with their consumers and produce secured.
In this post we will discuss some standard Apache Kafka security best practices to help us do exactly that, including recommendations for authentication, encryption, updates, access control lists, and more.
Getting Started with Apache Kafka
Some of the information discussed in this blog does have some prerequisite concepts that will be helpful for the reader to be familiar with. Understanding basic Kafka concepts such as topics, partitions, and consumer groups will be very handy. I recommend visiting our Enterprise Kafka Resources hub, downloading The Decision Maker’s Guide to Apache Kafka, or watching the video below that covers some basic tips for configuring and testing your deployments.
8 Kafka Security Best Practices
“Out of the box” default configurations are great for prototyping and proof-of-concept designs, but to really unleash the performance and reliability of Kafka, it must be properly secured.
We all know the easiest way to stand up any enterprise resource is to just run with its default configuration. Or, maybe bypassing encryption like TLS, or skipping hardening processes like SELinux. But if we can’t guarantee the integrity of our services, then we cannot provide the reliability of services and accuracy of data required for enterprise operations.
With that in mind, here are a few elemental Apache Kafka security best practices your organization should be applying in your Kafka environments.
1. Authenticate Everything
An often ignored security practice we find when doing Kafka environment analysis for customers is client authentication. Many organizations fail to authenticate their Kafka producers and consumers, which may be understandable in some contexts, but with strong support for multiple SASL offerings — including SASL/GSSAPI (Kerberos), SASL/OAUTHBEARER, SASL/SCRAM-SHA-256, SASL/SCRAM-SHA-512, and SASL/PLAIN — securing your cluster to only talk with authenticated clients is fairly straightforward, and definitely worth your time.
It’s unfortunately common that organizations will spend a lot of time securing external attack vectors while ignoring their internal attack surfaces. This creates a hardened outer shell, but leaves the inside “gooey” and easily compromised from internal threats. Adding client authentication is a major step in hardening that “gooey” center.
In addition to authenticating clients, we should be authenticating broker communications to ZooKeeper as well. Starting with version 3.5.6 of ZooKeeper that is shipped with Kafka 2.4, support for mutual TLS (mTLS) was implemented. Authentication support was expanded to include SASL mechanisms starting with Kafka 2.5. In KRaft mode this is accomplished by enabling TLS for inter-broker communication. With Strimzi, TLS for inter-broker communication is enabled by default and configured out of the box. This is not the case with the vanilla Kafka; the security.inter.broker.protocol must be set to TLS, and the TLS keystore and truststore must be configured in the server,properties file manually as well.
2. Encrypt Everything
The availability of free and easy-to-implement cryptography is ubiquitous in the modern enterprise. With freely available PKI solutions that either self-sign or utilize free services like LetsEncrypt, there is very little reason to have non-encrypted traffic crossing your network infrastructure. While disabled by default, encrypting communications on your Kafka Cluster is fairly easy and helps ensure the integrity of your cluster.
Keep in mind, there are performance considerations in regard to CPU and JVM implementations when enabling cluster encryption, but the benefits of enabling encryption will almost always outweigh the performance considerations.
Also keep in mind that some older clients do not have support for encryption, and require that versions 0.9.0 and higher of the consumer and producer API be utilized (which ties into the next security best practice).
3. Update Regularly
While I would consider this a “performance tuning” best practice as well (and it is definitely applicable to more than just Kafka), keeping your software updated with the most recent bug and security fixes is a must.
We are all too familiar with looking out over our enterprise services and feeling that sinking feeling in the pit in our stomachs when anyone even mentions the word “upgrade,” but it’s paramount that updates get done in a timely manner. To make that sinking “pit of doom” feeling a little less pronounced, have an upgrade plan. You should have both a long-term and short-term upgrade plan within your organization.
What versions will you be running 3-4 months?
How about in 6-12 months?
18-24 months?
These plans should not only include your infrastructure and DevOps folks, but your development teams as well. The responsibility for maintaining Kafka infrastructure like brokers, ZooKeeper, etc., and the responsibility for maintaining consumer and producer code will probably fall across multiple teams or groups. The people who are responsible for upgrading your cluster very likely won’t be responsible for maintaining your producer or consumer code. Coordinating your upgrade plans between these two groups is crucial as there can be breaking changes in Kafka versions that require changes to the producer or consumer code.
One challenge with keeping up-to-date with Kafka versions is its release cadence. With three yearly planned releases and a short window of community support (12-16 months) for each, staying on top of your Kafka upgrades can be difficult, particularly for large enterprises that must maintain hundreds or thousands of clusters. At that scale, most enterprises can’t turn the ship, so to speak, that quickly. That’s where a solution like OpenLogic’s Kafka LTS can be helpful for patching sunsetted versions, giving teams additional time to plan and implement their upgrades on a schedule that works for them.
Getting these changes scheduled into your developer’s sprints ahead of time will take out a lot of the heartburn of upgrading your Kafka infrastructure.
4. Audit All the Things
A major pillar of any security posture is auditing. To do any auditing, we must have logs to audit. Unfortunately, logging infrastructure is an all too often overlooked step in the rollout of an inordinate amount of projects. Many times, it’s only after something goes wrong and organizations need to be able to view and correlate logs across an array of disparate and distributed systems that the necessity of centralized logging and metrics collections becomes clear.
With Kafka being an inherently distributed system, a robust approach to logging and metrics is even more critical to long-term success. Throw the Strimzi operator in the mix and the ephemeral nature of Kubernetes means there is a risk of completely losing logs altogether without centralized logging. Luckily, cloud-native solutions like Prometheus and Loki make the process a breeze. Solutions like Logstash and OpenSearch also present a low barrier to entry for a centralized logging solution. In many hosted Kafka on Kubernetes environments, like Google Cloud’s GKE, simply configuring PodMan can be enough to create a searchable, centralized collection of logs.
Regardless of the tools, once a centralized and searchable logging infrastructure is in place, organizations can easily audit things like ACL changes, authentication events, topic creation/deletion and consumer group activity.
About Perforce
The best run DevOps teams in the world choose Perforce. Perforce products are purpose-built to develop, build and maintain high-stakes applications. Companies can finally manage complexity, achieve speed without compromise, improve security and compliance, and run their DevOps toolchains with full integrity. With a global footprint spanning more than 80 countries and including over 75% of the Fortune 100, Perforce is trusted by the world’s leading brands to deliver solutions to even the toughest challenges. Accelerate technology delivery, with no shortcuts.
About Version 2 Limited
Version 2 Digital is one of the most dynamic IT companies in Asia. The company distributes a wide range of IT products across various areas including cyber security, cloud, data protection, end points, infrastructures, system monitoring, storage, networking, business productivity and communication products.
Through an extensive network of channels, point of sales, resellers, and partnership companies, Version 2 offers quality products and services which are highly acclaimed in the market. Its customers cover a wide spectrum which include Global 1000 enterprises, regional listed companies, different vertical industries, public utilities, Government, a vast number of successful SMEs, and consumers in various Asian cities.

