An intelligent solution integrating existing Smart City systems, IoT networks and security systems of cities and municipalities — with up to 242 TOPS of AI-accelerated analytics, 462+ machine-learning models and a horizontally scalable parallel infrastructure.

Abstract

IOAS Security Cluster (IOASC) is a distributed platform for real-time analysis of imagery from municipal camera systems. It integrates the existing infrastructure of Smart City deployments, IoT sensor networks and security systems of cities and municipalities in Slovakia. Architecturally it combines a dedicated AI hardware accelerator delivering 240 TOPS, a library of more than 462 machine-learned models and a parallel/decentralised compute framework that allows a single physical unit to process up to 19 concurrent camera streams with end-to-end latency below 80 ms. This article expands on the three pillars that the original concept only touched upon: (1) the model-creation pipeline, (2) high-performance computing and (3) parallel and distributed compute strategies.

1. Cluster architecture

IOASC is an intelligent solution providing integration of existing Smart City systems, IoT networks and security systems of cities and municipalities in Slovakia. Using IOASC cluster components it is possible to implement state-of-the-art data analytics and processing methods even for generationally outdated systems, including legacy infrastructure. The components of the system can analyse — in parallel — data from camera systems with up to 242 Tera Operations Per Second (TOPS), creating a fast and intelligent solution for advanced image analysis on already installed cameras.

The system implements more than 462 models for data processing, allowing the following qualitative parameters and their interrelations to be identified in real time:

Based on the application of individual models it is possible to create any type of analytics function using machine learning, all in real time. The individual physical units are then connected into a cluster, which aggregates the resulting data and evaluates higher-order correlations:

These parameters are evaluated by the system across all connected cluster nodes; decentralisation of computational power guarantees real-time operation even with hundreds of parallel streams. Individual models are recursively tagged by the system itself and by operators (typically municipal police officers), enabling continuous classifier adaptation and a steady increase of output agreement with reality.

2. Model creation: the machine-learning pipeline

The library of 462+ models is not a static artefact — it is the product of a continuous MLOps cycle that IOAS operates for every client. The pipeline consists of seven stages, each independently containerised and horizontally scalable.

2.1 Data acquisition and curation

Input data come from three sources:

  1. RTSP/ONVIF streaming from the client’s existing camera systems — IOASC supports H.264, H.265 (HEVC) and AV1, with automatic detection and recovery from inconsistent keyframes or packet loss.
  2. Annotated corpora — for standard domains (vehicles, pedestrians, licence plates) IOAS maintains internal annotated datasets containing more than 2.4 million labelled frames (state of 2026 Q2), collected from real Slovak and Central-European municipal environments (varying lighting, weather, seasons).
  3. Operator history — municipal police forces and integrated emergency services provide feedback in the form of detection corrections, expanding the active-learning queue (section 2.4).

Before training, all data pass through a deidentification layer compatible with GDPR Art. 35 — faces of uninvolved persons are blurred, licence plates are not stored as raw characters but as one-hot embeddings for downstream tasks.

2.2 Model architectures

The IOASC library groups models into seven families by primary task:

Family Main architectures Typical performance (FPS @ INT8 on 240 TOPS NPU)
Object detection YOLOv8/v9, RT-DETR-L, EfficientDet-Lite 350 – 600 FPS at 1280×720
Multi-object tracking ByteTrack, BoT-SORT, StrongSORT 280 – 450 FPS
Automatic Number Plate Recognition (ANPR) LPRNet + CRNN, ParseQ for OCR 200 – 320 FPS
Action / behaviour recognition SlowFast R50, TimeSformer, MViTv2 60 – 110 FPS (short 32-frame clips)
Person re-identification OSNet-IBN, FastReID 700 – 900 FPS per embedding
Image enhancement (denoising) NAFNet, Restormer-S 90 – 150 FPS at 1080p
Anomaly detection PaDiM, PatchCore, AnomaLib stack 250 – 380 FPS

For each family IOAS maintains 3 – 5 versions at different accuracy ↔ latency trade-offs: *-tiny for edge nodes with limited budget, *-base for standard deployment, *-large for critical sites (central intersections, public-building entrances).

2.3 Training infrastructure

Training runs on a dedicated training cell of IOAS infrastructure with the following parameters:

Training a single production model takes between 6 hours (LPRNet fine-tune) and 5 days (RT-DETR-L on the proprietary 2.4 M-frame dataset).

2.4 Deployment optimisation (model serving)

Trained models pass through a three-stage compression before edge deployment:

  1. Structured pruning — removal of 30 – 60 % redundant weights via magnitude-based and Taylor-expansion criteria; reduces memory footprint while keeping > 99 % of the accuracy.
  2. Post-training INT8 calibration — calibration dataset of 1024 real frames, per-channel symmetric quantization; typical mAP loss < 0.8 %.
  3. Layer fusion + graph optimisation — Conv+BN+ReLU fusion, redundant reshape elimination, ONNX → TensorRT/Hailo-RT conversion.

The result is 2.2 – 4.7× faster inference at a 4× smaller memory footprint compared with the original FP32 model.

2.5 Active learning and continuous adaptation

In production, every detection with confidence below 0.75 automatically generates an active-learning candidate that is added to the annotation queue. The system operator (typically a municipal police officer) accesses an annotation console with the list of uncertain cases and labels them as true positive / false positive / false negative / re-classify. These corrections:

Drift detection runs 24/7 — it monitors confidence-score distribution, false-positive rate per category and per-camera. If F1 drops below threshold (typically 0.92 absolute or −3 σ from baseline), the system automatically escalates to the IOAS MLOps team.

2.6 Federated learning across clients

To maximise generalisation across different cities (Bratislava, Trenčín, Žilina, Košice — different architecture, cameras, traffic patterns), IOAS runs federated training rounds following FedAvg with differential privacy (DP-SGD, ε ≤ 3.0):

[Client 1: local update] ─┐
[Client 2: local update] ─┼─→  [Aggregator (IOAS HQ)]  ─→  [Global model v0.N+1]
[Client 3: local update] ─┤                                       │
                          ...                                     ▼
                                                       [Distribution back to clients]

Raw data never leave the client’s perimeter — only noised gradients and update-size metadata travel. This preserves regulatory compliance with GDPR and the Cybersecurity Act.

3. High-performance computing

3.1 Hardware acceleration (240 TOPS)

The heart of every IOASC node is an AI hardware accelerator with a declared 240 TOPS @ INT8 capability, deployable in two physical form factors:

Form Power envelope Memory Use case
In-box appliance 60 – 90 W 32 GB LPDDR5 (273 GB/s) Edge deployment next to the camera node, standalone
PCIe add-in card (HHHL or FHFL) 75 W (PCIe slot) 16 GB HBM3 (1.2 TB/s) Integration into the client’s existing server rack

The accelerator exposes a low-level C/C++ runtime API with primitives for tensor allocation, kernel launch, asynchronous memcpy and synchronization via CUDA-stream-like queues. The higher layer (ioasc-runtime) provides a zero-copy DMA link between the camera decoder (NVDEC or a dedicated hardware block) and the inference engine — the entire frame lifecycle stays in device memory until the analytical payload is emitted.

The accelerator supports:

3.2 Memory and network architecture

   ┌─────────────────────────────────────────────────────┐
   │ IOASC node                                          │
   │                                                     │
   │   [NPU 240 TOPS]  ←──→  [HBM3 1.2 TB/s]              │
   │         ↕                                           │
   │   [CPU host: 16-core ARMv9 / x86]                   │
   │         ↕                                           │
   │   [DDR5 dual-channel, 51 GB/s]                       │
   │         ↕                                           │
   │   [NVMe Gen4 SSD 4 TB]   (model cache + metadata)    │
   │         ↕                                           │
   │   [10 GbE management]   [100 GbE backbone (cluster)]  │
   └─────────────────────────────────────────────────────┘

The memory hierarchy is optimised for streaming workloads: the NPU works exclusively with tensors in HBM3 (low latency, high bandwidth), the CPU keeps metadata and coordination logic in DDR5, and NVMe serves as a local cache for model weights (cold start from NVMe takes ~ 800 ms per model, warm switch < 5 ms).

The inter-node fabric uses a 100 GbE backbone with RDMA (RoCE v2) — node-to-node tensor transfer runs over iWARP or RoCE in zero-copy mode, eliminating TCP/IP overhead during distributed inference (section 4.4).

3.3 Energy efficiency and thermal management

For 24/7 outdoor IP66-enclosed deployment IOAS optimised the thermal design:

3.4 Edge vs centralised compute — latency budget

For real-time applications the end-to-end latency budget from physical event to alarm dispatch is critical. IOASC keeps the bulk of inference at the edge:

Stage Edge deployment Centralised (cloud)
Sensor → encoder 8 – 16 ms 8 – 16 ms
Network transit < 1 ms (local network) 25 – 80 ms (WAN, over VPN)
Decode + preprocess 4 – 7 ms 4 – 7 ms
Inference (1 model) 2.5 – 6 ms (240 TOPS) 8 – 25 ms (shared GPU pool)
Postprocess + payload 1 – 2 ms 1 – 2 ms
Notification dispatch < 5 ms (local UDP) 20 – 60 ms
Σ end-to-end 20 – 38 ms 65 – 190 ms

Edge-first design delivers 3 – 5× lower latency and eliminates dependency on WAN connectivity — when the internet link is down, the cluster node continues local analysis and dispatches alerts via fall-back GSM/LTE channels.

4. Parallel computing

Parallelism in IOASC operates at five levels simultaneously, allowing the NPU to be saturated under heterogeneous workloads.

4.1 Stream-level parallelism

The base configuration of a single node serves 19 concurrent camera streams at 25 FPS, i.e. 475 frame/s in total. Streams are multiplexed via:

4.2 Pipeline parallelism

Single-frame processing is decomposed into five pipeline stages running in parallel on different frames (a classic producer-consumer chain):

   t=0 ms       t=5 ms       t=10 ms      t=15 ms      t=20 ms
   ┌──────┐
   │ FR-1 │ Decode
   └──┬───┘
      ↓     ┌──────┐
            │ FR-1 │ Preprocess     ← FR-2: Decode
            └──┬───┘
               ↓     ┌──────┐
                     │ FR-1 │ Inference  ← FR-2: Preprocess  ← FR-3: Decode
                     └──┬───┘
                        ↓     ┌──────┐
                              │ FR-1 │ Postprocess  ← FR-2: Inf  ← FR-3: Pre  ← FR-4: Dec
                              └──┬───┘
                                 ↓     ┌──────┐
                                       │ FR-1 │ Publish
                                       └──────┘

In steady state, 5 frames are simultaneously in flight at different stages. Throughput equals the slowest stage (typically inference) rather than the sum of all stages — yielding a 2.8 – 3.4× speed-up over synchronous processing.

4.3 Model and data parallelism

For large models (RT-DETR-L, MViTv2-L) that do not fit into a single HBM partition, IOASC supports:

The strategy is per-model, decided at graph compilation based on weight size, batch-size and available NPU partitions.

4.4 Inter-node sharding of the cluster

When deployed at scale — a city may have e.g. 12 IOASC nodes covering 228 streams (12 × 19) — the cluster realises functional sharding: rather than running all 462 models on every node, models are distributed:

   Node A: ANPR, vehicle classification               (models 1-128)
   Node B: pedestrian + cyclist behaviour              (models 129-238)
   Node C: incident + violence detection               (models 239-310)
   Node D: re-identification + cross-camera tracking   (models 311-396)
   Node E: anomaly + denoising postprocess             (models 397-462)

Each node publishes its analytical outputs as typed events onto a cluster event bus (NATS JetStream with persistent storage). Higher-order correlation functions (section 1) run on an aggregation node that subscribes to relevant topics and applies:

4.5 Asynchronous composition of analytical functions

Higher-order analytical functions are modelled as a DAG (directed acyclic graph) of inference operations that the cluster orchestrator schedules in parallel wherever there is no data dependency:

   Frame ──→ Detection ──┬─→ Tracking ──┬─→ ReID ────┐
                         │              │            ├─→ Multi-camera fusion
                         │              └─→ Behaviour┘
                         │
                         └─→ Classification ─→ ANPR ─→ Vehicle DB lookup

The orchestrator (an internal component built on Apache Arrow Flight + a custom DAG scheduler) starts each DAG node the moment its inputs are ready, achieving a near-optimal critical path. For a typical 19-stream workload the average NPU utilisation is 78 – 84 % (close to the theoretical maximum of batch-aware schedulers).

5. Model capacity and registry

The 462+ model library is versioned in an internal model registry with the following metadata per model:

The client deployment manifest (ioasc-deployment.yaml) declares which models are active on which node and at what priority — the runtime fetches them automatically from the model registry, applies policy validation (compliance check, signature verification) and activates them at runtime without interrupting running streams (canary rollout 5 → 50 → 100 % traffic).

6. Security architecture

The solution is fully compatible with EU legislation, the Cybersecurity Act (Slovak Act No. 69/2018 as amended by Act No. 366/2024 — NIS2 implementation) and decrees of the Slovak National Security Authority. The software system therefore includes:

The implementation also includes peer-to-peer blockchain verification of authority for individual components on the network — every node carries a cryptographic identity with a certificate in the cluster’s trust chain. Any configuration change, model deployment or new node join is recorded in a distributed ledger, fundamentally minimising the risk of information leakage, unauthorised reconfiguration or any other form of misuse.

7. Deployment scenarios

Scenario Performance Target group Reference capacity
Standalone in-box 1 node, 240 TOPS Small municipality, 1 – 19 cameras 1 intersection
PCIe add-in card 1 node, 240 TOPS City with an existing server 19 streams
Hybrid edge + central N edge + 1 aggregator City with 5 – 50 cameras 95 – 500 streams
Full cluster 5+ nodes, 1.2+ PetaOPS City with 100+ cameras 500 – 2 000 streams

The implementation is modular: a customer typically starts with a standalone or PCIe add-in (fast pilot phase, ROI < 9 months) and later expands into a full cluster without reconfiguring already deployed nodes.

8. Conclusion

IOAS Security Cluster represents the convergence of three trends: edge AI acceleration (240 TOPS @ < 90 W), vertically specialised ML models (462+ with an active retraining pipeline) and distributed/parallel computing (5 levels of parallelism, P2P coordination). The architecture is designed to integrate existing camera and IoT infrastructure without forcing hardware replacement, while remaining fully compliant with EU and Slovak legislation.

For municipalities that already own a camera system and are looking for a way to extract operational and analytical value from it, IOASC offers the path from passive recording to an active, predictive and auditable security platform.

Want to know more? Get in touch.