Apache Kafka is all about getting large amounts of data from one place to another, rapidly, and reliably. Apache Kafka is a messaging system that is tailored for high throughput use cases, where vast amounts of data need to be moved in a scalable, fault tolerant way. I have been designing and building integrations for more than a decade using mostly Microsoft Technologies (BizTalk, WCF). I have been working lately with other integration technologies from Oracle (Fusion), IBM (ODM), Informatica and Kafka. A few aspects of Kafka seems interesting to me that cover some of the issues that I had points of pain with in other integration platforms. Here some of the points that crossed my mind:
Saving Data locally rather than in a Data Center or DB
Figure 1: How BizTalk works
BizTalk works on by saving all messages into MessageBox Database that resides on a SQL server. The SQL server is usually implemented as a SQL cluster with the data files saved on a SAN disk. This introduces some latency in the processing and if as some organizations do use the SQL Cluster for other systems beside BizTalk such as SharePoint, the stress on SQL cluster grows. Tuning the SQL cluster for multiple systems is not a trivial endeavor as there are some conflicts between the requirements of these systems.
Figure 2: BizTalk Infrastructure Cluster
Of course you can use a dedicated DB cluster for each system but the cost of the implementation sky rockets. The fact that Kafka just saves the message data locally on the drive makes processing the message faster and hence the high through put of Kafka
Keeping the messages available for reprocessing for a configurable Duration of time
BTS wants to handle the messages as quickly as possible to move them to the Archive. If a message lingers in the message box BTS raises an error. If too many messages lingers in the message box, it would degrade BizTalk performance if not bring the whole cluster down. So messages are cleared from the message box as soon as possible. Now, if a receiving system has an issue with a message, it has no way of asking for that message again from BTS, and resubmitting that message manually is a big hustle. Kafka on the other hand keeps messages for 168 hours by default. And you can configure how long you want to keep the messages. Receiving systems called consumers in Kafka ask for message from certain topics, partitions and starting at a certain point (called message offset in Kafka). This makes recovering from erroneous processing much less of a hustle.
Keeping all the logic in the Client systems (Consumers, Producers)
Figure 3: Kfaka Producers and Consumers
BizTalk has many capabilities that allows the designer/developer to put logic in it. You got Orchestrations, Maps that transforms between different schemas, pipeline components, and you can inject custom code to all these components. This kind of seduces the designers and developers into putting integration logic in BTS. While this has its benefits, the Spaghetti integration shown in figure 4, where application A keeps the integration logic with all other applications integrating with.
Figure 4: Spaghetti Integration
Now when Application D is replace then all the applications integration with it has to be updated (Code change, QA, UAT, Deployment to production etc.). This has repel effect especially in large enterprises with many systems. Keeping the integration logic contained in BTS or ESB (HUB) would alleviate this situation as there would be only one system that needs to be updated and changed.
Figure 5: Hub and Spoke Integration
In Kafka all the integration logic is encapsulated in the Kafka producer and Kafka Consumer which are usually parts of the Applications. This might lead to the Spaghetti situation described in Figure 4. One might be tempted to create a hub that includes all the producers and consumers for Kafka, but really you will need to think this one through. Ok this is a big issue, I am working on a solution for it.
While BizTalk can scale horizontally and vertically no one can deny that Kafka can scale much better than BizTalk.
BizTalk comes with many adapters and integration accelerators for various systems that makes it quicker and easier to integrate different systems. Kafka lacks in that regard.