IoT System Design
Session 8.9 · ~5 min read
Scale of the Problem
An industrial IoT deployment does not look like a web application. A factory floor might have 100,000 temperature, vibration, and pressure sensors, each reporting a reading every 5 seconds. A smart city might have millions of connected devices: traffic sensors, air quality monitors, water meters, street lights. The data is small per reading (a timestamp, a sensor ID, and a numeric value), but the volume is relentless.
At 100,000 sensors reporting every 5 seconds, the ingestion rate is 20,000 readings per second. That is 1.7 billion readings per day. Each reading might be 50-100 bytes, so daily raw data volume is roughly 85-170 GB. After a year, you are looking at 30-60 TB of time-series data, and that is one factory.
Key insight: IoT data pipelines are write-dominated systems. The architecture must sustain continuous, predictable write throughput measured in tens of thousands of inserts per second, unlike web applications where reads vastly outnumber writes.
At 1 reading per 5 seconds per sensor, the formula is straightforward: QPS = sensor_count / interval_seconds. For 100K sensors at 5-second intervals, that is 100,000 / 5 = 20,000 QPS. This is well within the capability of modern time-series databases, but it requires an architecture built for sustained write throughput, not the bursty read patterns of web applications.
The IoT Data Pipeline
Data flows from physical sensors through several layers before it reaches a dashboard or triggers an alert. Each layer has a specific role.
EMQX / Mosquitto] BR --> SP[Stream Processor
Kafka / Flink] SP --> TS[(Time-Series DB
TimescaleDB / InfluxDB)] SP --> AL[Alert Engine
Threshold Rules] TS --> DASH[Dashboard
Grafana] AL --> NT[Notification
SMS / Email / PagerDuty]
Sensors are constrained devices. They have limited CPU, memory, and battery. They cannot run HTTP servers or maintain persistent TCP connections to cloud endpoints. They speak lightweight protocols.
Edge gateways aggregate data from dozens or hundreds of local sensors. A gateway might be a Raspberry Pi, a ruggedized industrial PC, or a purpose-built IoT hub. It collects readings over local protocols (Bluetooth, Zigbee, Modbus, or local MQTT) and forwards them to the cloud over MQTT or HTTPS.
The message broker (EMQX, HiveMQ, or Mosquitto) receives data from thousands of gateways. MQTT is the dominant protocol because it was designed for unreliable networks and resource-constrained devices. The broker decouples producers (gateways) from consumers (processors).
The stream processor (Kafka Streams, Apache Flink, or a simple consumer) reads from the broker and does two things: writes raw data to the time-series database and evaluates alert rules in real time.
The time-series database stores readings indexed by time and sensor ID. It supports queries like "average temperature in zone 3 over the last hour" or "maximum vibration on machine #42 this week." It handles aggressive compression because sensor data is highly regular: timestamps are evenly spaced, and values change slowly.
Protocol Comparison: MQTT vs HTTP vs CoAP
| Property | MQTT | HTTP/REST | CoAP |
|---|---|---|---|
| Transport | TCP | TCP | UDP |
| Message overhead | 2 bytes minimum header | ~200-800 bytes (headers) | 4 bytes minimum header |
| Connection model | Persistent, bidirectional | Request-response, stateless | Request-response, stateless |
| QoS levels | 0 (at most once), 1 (at least once), 2 (exactly once) | None (application layer) | Confirmable / Non-confirmable |
| Pub/Sub support | Native (topics) | No (requires SSE or WebSocket) | Observe extension |
| Power consumption | Low (persistent connection, small packets) | High (connection setup per request) | Very low (UDP, small packets) |
| Best for | Reliable telemetry over TCP networks | Low-frequency data, cloud integrations | Battery-powered devices on lossy networks |
MQTT dominates industrial IoT because of its persistent connection model (no reconnection overhead per message), its built-in QoS levels (you can choose between speed and delivery guarantees), and its minimal header overhead. HTTP works fine for devices that report once per hour, but at 5-second intervals, the per-request overhead adds up quickly.
Edge vs. Cloud Processing
Not all data needs to travel to the cloud. Edge processing handles time-sensitive decisions locally, reducing latency, bandwidth costs, and cloud compute bills.
Dedup, smoothing] EF --> EA[Edge Alerts
Critical thresholds] EA -->|ALERT: Temp > 95°C| LOCAL[Local Actuator
Shut down machine] end subgraph "Cloud" EF -->|Aggregated data
1 reading/min| BR2[Message Broker] BR2 --> SP2[Stream Processor] SP2 --> TS2[(Time-Series DB)] SP2 --> ML[ML Anomaly Detection] TS2 --> DASH2[Dashboard] end EA -->|Alert forwarded| BR2
The decision of what to process at the edge and what to send to the cloud depends on three factors.
Latency requirement. If a machine overheats, you need to shut it down in milliseconds, not wait for a round-trip to a cloud server 200ms away. Critical safety thresholds must be evaluated at the edge.
Bandwidth cost. Sending 20,000 raw readings per second to the cloud costs money (especially over cellular networks). The edge can aggregate readings (average over 1 minute instead of sending every 5-second sample), reducing bandwidth by 12x. Only anomalies and aggregates travel to the cloud.
Compute complexity. Simple threshold checks (temperature > 95C) run fine on a gateway. ML-based anomaly detection (this vibration pattern looks like bearing failure) requires more compute and typically runs in the cloud, or on beefier edge servers for critical applications.
Alert Thresholds as Balancing Loops
Alert systems in IoT follow balancing loop dynamics. When a sensor reading crosses a threshold, the system triggers a corrective action. That action brings the reading back below the threshold, which stops the alert. This is a negative feedback loop.
The challenge is tuning thresholds. Set them too low and you get alert fatigue: operators receive hundreds of alerts per shift, start ignoring them, and miss the real problems. Set them too high and you miss genuine anomalies until they become failures. The threshold acts as the goal in a balancing loop, and the gap between the current reading and the threshold determines the strength of the response.
Sophisticated systems use adaptive thresholds. Instead of a fixed number (alert if temperature > 80C), they use statistical baselines (alert if temperature is more than 3 standard deviations above the rolling 24-hour average for this specific sensor). This accounts for normal variation: a sensor in a warm zone runs hotter than one in a cool zone, and a fixed threshold would either miss anomalies in the warm zone or cry wolf in the cool zone.
OTA Firmware Updates
Deploying code to 100,000 devices in the field is not like deploying a web server update. You cannot SSH into each device. Many devices have limited storage (no room for two firmware images). Network connectivity is unreliable. A failed update can brick a device that requires a physical truck roll to replace.
Safe OTA (Over-The-Air) update strategies include: A/B partitions (device has two firmware slots, boots from the new one, rolls back to the old if health checks fail), staged rollouts (update 1% of devices, monitor for 24 hours, then 10%, then 100%), and delta updates (send only the binary diff, not the full firmware image, to reduce download size over slow networks).
Further Reading
- Time-Series Database for IoT: The Missing Piece (EMQX Blog)
- Patterns for IoT Time Series Data Ingestion with Amazon Timestream (AWS Database Blog)
- Building Industrial IoT Data Streaming Architecture with MQTT (HiveMQ Blog)
- MQTT with TimescaleDB for IoT Time-Series Data (EMQX Blog)
Assignment
You are designing a monitoring system for a factory with 100,000 temperature sensors. Each sensor sends a reading every 5 seconds.
- Calculate the ingestion QPS. How many writes per second must the time-series database sustain?
- Design the pipeline from sensor to dashboard alert. Draw the architecture showing: sensors, edge gateways, MQTT broker, stream processor, time-series database, alert engine, and dashboard.
- A sensor reading exceeds 95C. The alert must reach the operator within 500ms. Which parts of your pipeline are on the critical path? Can you meet the 500ms SLA if the data goes through the cloud, or must the alert fire at the edge?
- You want to reduce cloud bandwidth costs. Propose an edge aggregation strategy: what data do you aggregate locally, what do you send raw, and what is the resulting reduction in cloud-bound traffic?
- How do you handle a sensor that goes offline? What is the difference between "no reading" and "reading = 0"? Design the heartbeat mechanism.