notes/rabbitmq-v-kafka.md

319 lines
9.4 KiB
Markdown

# Perbandingan RabbitMQ vs Apache Kafka
## Use Case: Integrasi HC Portal, PMS, dan Genba Apps
---
## 1. Overview Kebutuhan Sistem
Perbandingan ini fokus pada analisis objektif antara RabbitMQ dan Apache Kafka untuk kebutuhan:
- Integrasi 3 sistem (HC Portal, PMS, Genba Apps)
- Sinkronisasi progress dan update project
- Audit trail untuk aktivitas di PMS dan Genba
- Notification service untuk berbagai channel (email, SMS, push notification)
**Fokus Analisis**: Cost saving dan efisiensi operasional
---
## 2. Cost Analysis Comparison
### 2.1 Infrastructure Cost
| Komponen | RabbitMQ | Kafka |
|----------|----------|-------|
| **Minimum Production Setup** | 2 nodes cluster | 3 brokers + Kraft |
| **Server Requirements** | 2x (2 CPU, 8GB RAM, 100GB SSD) | 3x (3 CPU, 8GB RAM, 200GB SSD) |
| **Monthly Cloud Cost (AWS/GCP)** | $150-200/month | $250-300/month |
| **Storage Growth** | Linear (messages deleted after consumption) | Exponential (retention-based 7 hari) |
| **Network Transfer** | Lower (push model) | Higher (pull + replication) |
| **Monitoring Tools** | Management UI untuk sistem | Open Source Kafka UI untuk debug sistem dan messages |
### 2.2 Operational Cost
| Aspek | RabbitMQ | Kafka |
|-------|----------|-------|
| **Learning Curve** | Easy | Agak Sulit |
| **Backup/Recovery** | Simple (export/import) | Complex (partition management) |
| **Upgrade Complexity** | Low (rolling upgrade) | High (brokers harus dimatikan dan sistem harus mati) |
| **Troubleshooting** | Straightforward | Harus memahami paritition, brokers, dan replication |
### 2.3 Development Cost
| Area | RabbitMQ | Kafka |
|------|----------|-------|
| **Initial Setup Time** | 1 days | 3-7 days |
| **Client Library** | Simple | More complex configuration |
| **Testing Environment** | Lightweight Docker setup | Swarm/Compose/Kubernetes needed |
| **Development Skills** | Common knowledge | Specialized expertise |
| **Documentation/Training** | 1-2 hour | 1-2 days |
**Total Cost Comparison (1st Year)**:
- RabbitMQ: ~$2,400 infra + $15,000 development = **$17,400**
- Kafka: ~$7,200 infra + $25,000 development = **$32,200**
---
## 3. Technical Capability Comparison
### 3.1 Core Architecture
| Aspek | RabbitMQ | Kafka |
|-------|----------|-------|
| **Message Model** | Queue-based (FIFO) | Log-based (append-only) |
| **Delivery Guarantee** | At-least-once, Exactly-once* | At-least-once, Exactly-once |
| **Message Ordering** | Per queue | Per partition |
| **Message Retention** | Until consumed (deleted) | Time/Size-based (configurable) |
| **Consumer Model** | Push (broker distributes) | Pull (consumer fetches) |
| **Routing Flexibility** | Complex (Exchange, Binding, Routing Key) | Simple (Topic, Partition) |
### 3.2 Performance Metrics
| Metrik | RabbitMQ | Kafka |
|--------|----------|-------|
| **Throughput** | 20K-50K msg/sec per node | 100K-1M msg/sec per broker |
| **Latency** | Sub-millisecond (<1ms) | 2-5ms average |
| **Message Size** | Best for small messages (<1MB) | Handles large messages well |
| **RAM Usage** | Higher (stores in memory) | Lower (disk-based) |
| **CPU Usage** | Moderate | Lower per message |
| **Disk I/O** | Lower | Higher (sequential writes) |
### 3.3 Feature Support untuk Use Case
#### Audit Trail & Logging
| Feature | RabbitMQ | Kafka |
|---------|----------|-------|
| **Long-term Storage** | Requires external DB (Clickhouse,etc) | Native support with retention |
| **Message Replay** | Not supported natively | Full replay from any offset |
| **Audit Query** | Must implement separately | Can read historical data |
| **Storage Cost** | Higher (need separate storage) | Lower (built-in retention) |
| **Compliance** | Manual implementation | Built-in immutable log |
#### Notification Service
| Feature | RabbitMQ | Kafka |
|---------|----------|-------|
| **Priority Messages** | Built-in priority queues | Manual implementation |
| **Delayed Messages** | Plugin available | Requires custom solution |
| **Dead Letter Queue** | Native feature | Manual implementation |
| **TTL (Time To Live)** | Built-in | Manual cleanup needed |
| **Retry Mechanism** | Built-in with DLQ | Consumer-side implementation |
#### System Integration & Synchronization
| Capability | RabbitMQ | Kafka |
|------------|----------|-------|
| **Request-Reply Pattern** | Native support | Requires correlation |
| **Event Sourcing** | Manual implementation | Native pattern |
| **CQRS Support** | Possible but complex | Natural fit |
| **Transactional Messaging** | Support with limitations | Full transaction support |
| **Multi-system Fan-out** | Via Exchange | Via Consumer Groups |
---
## 4. Implementation Comparison
### 4.1 Audit Trail Implementation
**RabbitMQ Approach:**
```javascript
// Requires additional database for persistence
async function auditWithRabbitMQ(action) {
// Send to queue
channel.sendToQueue('audit-queue', Buffer.from(JSON.stringify(action)));
// Consumer must save to database
channel.consume('audit-queue', async (msg) => {
const audit = JSON.parse(msg.content);
await database.save(audit); // Additional storage needed
channel.ack(msg);
});
}
// Cost: Message broker + Database storage
```
**Kafka Approach:**
```javascript
// Direct storage in Kafka with retention
async function auditWithKafka(action) {
await producer.send({
topic: 'audit-trail',
messages: [{ value: JSON.stringify(action) }]
});
// Data retained in Kafka (e.g., 365 days)
// No additional database required for audit storage
}
// Cost: Only Kafka storage
```
### 4.2 Notification Service Implementation
**RabbitMQ Approach:**
```javascript
// Built-in routing and priority
await channel.publish('notifications', 'email.high', payload, {
priority: 10,
expiration: '3600000' // TTL 1 hour
});
// Automatic retry with Dead Letter Queue
await channel.assertQueue('email-queue', {
arguments: {
'x-dead-letter-exchange': 'dlx',
'x-max-retries': 3
}
});
```
**Kafka Approach:**
```javascript
// Manual routing implementation
await producer.send({
topic: 'notifications',
messages: [{
key: 'email',
value: JSON.stringify(payload),
headers: { priority: '10', channel: 'email' }
}]
});
// Consumer must implement retry logic
const consumer = kafka.consumer({ groupId: 'notification-group' });
await consumer.run({
eachMessage: async ({ message }) => {
try {
await processNotification(message);
} catch (error) {
// Manual retry implementation
await retryQueue.add(message);
}
}
});
```
### 4.3 Project Synchronization
**RabbitMQ Approach:**
```javascript
// Request-Reply pattern
const correlationId = uuid();
await channel.sendToQueue('rpc-queue', Buffer.from(data), {
correlationId,
replyTo: replyQueue
});
// Waits for direct response
```
**Kafka Approach:**
```javascript
// Event-driven pattern
await producer.send({
topic: 'project-events',
messages: [{
key: projectId,
value: JSON.stringify({ event: 'updated', data })
}]
});
// All systems consume independently
```
---
## 5. Scalability & Maintenance Comparison
### 5.1 Scaling Characteristics
| Aspek | RabbitMQ | Kafka |
|-------|----------|-------|
| **Vertical Scaling** | Effective up to certain limit | Limited benefit |
| **Horizontal Scaling** | Add nodes to cluster | Add brokers + partition |
| **Auto-scaling** | Complex (manual rebalancing) | Better (automatic rebalancing) |
| **Scaling Cost** | Linear (add nodes) | Higher initial, better at scale |
| **Performance at Scale** | Degrades with queue depth | Consistent performance |
### 5.2 Maintenance Requirements
| Task | RabbitMQ | Kafka |
|------|----------|-------|
| **Backup** | Export definitions + messages | Partition replica + MirrorMaker |
| **Recovery Time** | Minutes to hours | Hours to days (large data) |
| **Monitoring Complexity** | Simple metrics | Complex (lag, ISR, etc.) |
| **Troubleshooting** | Clear error messages | Requires deep knowledge |
| **Version Upgrade** | Usually smooth | Careful planning needed |
| **Data Cleanup** | Automatic (after consumption) | Manual (retention policy) |
### 5.3 Resource Utilization Over Time
**Small Scale (< 10K msg/day)**:
- RabbitMQ: 2GB RAM, 2 CPU cores, 50GB storage
- Kafka: 8GB RAM, 4 CPU cores, 200GB storage
**Medium Scale (10K-100K msg/day)**:
- RabbitMQ: 8GB RAM, 4 CPU cores, 200GB storage
- Kafka: 16GB RAM, 8 CPU cores, 1TB storage
**Large Scale (> 1M msg/day)**:
- RabbitMQ: 32GB RAM, 16 CPU cores, 1TB storage
- Kafka: 32GB RAM, 16 CPU cores, 5TB+ storage
---
## 7. Monitoring & Observability
### 7.1 Metrics untuk Monitor
**Kafka Metrics:**
- Consumer lag per partition
- Message throughput
- Disk usage & retention
- Replication status
**RabbitMQ Metrics:**
- Queue depth
- Consumer utilization
- Message rates (publish/deliver/ack)
- Connection count
### 7.2 Tools Rekomendasi
```yaml
Kafka Monitoring:
- Prometheus + Grafana
- Kafka UI
RabbitMQ Monitoring:
- RabbitMQ Management Plugin
- Prometheus RabbitMQ Exporter
- Grafana Dashboard
Distributed Tracing:
- Jaeger atau Zipkin
- Correlation ID across systems
```
---
## 8. Estimasi Resource & Cost
### 8.1 Infrastructure Requirements
```yaml
Kafka Cluster (Production):
- 3 Brokers (4 CPU, 16GB RAM, 500GB SSD each)
- Estimated: $250-300/month (cloud)
RabbitMQ Cluster:
- 2 Nodes (2 CPU, 8GB RAM, 100GB SSD each)
- HAProxy Load Balancer
- Estimated: $150-200/month (cloud)
Total Infrastructure: ~$600-800/month
```
---
*Document Version: 1.0*
*Last Updated: October 2024*
*Author: System Architecture Team*