High system CPU usage on CC100 with PREEMPT_RT kernel – HRTIMER / NET_RX softirq storm

Hello everyone,

I’m looking for insights regarding a high system CPU usage issue on a CC100 controller.

System details

  • Device: WAGO CC100
  • Kernel:
    Linux 5.15.107-rt62-w04.02.03 #1 SMP PREEMPT_RT
    armv7l
  • Single-core CPU
  • CODESYS: disabled
  • User application: SCADA, written in Python
    • Read/communication cycle: every 10 seconds
    • No busy loops or high-frequency polling

Despite the low-frequency application logic, the system shows ~100% CPU usage, with about 70–80% attributed to kernel/system time (top).

Clarification

  • The issue persists even when the SCADA application is stopped
  • The same behavior is observed with and without Docker
  • Disabling observability / monitoring tools (metrics, exporters, etc.) does not change CPU usage

This further suggests the problem is not related to containers, monitoring, or user-space tooling.

Observations

/proc/interrupts (excerpt)

  • Very high interrupt count on:
    • arch_timer (~1.5 billion)
    • eth0 (~57 million)
    • IRQ work interrupts (IPI5)

/proc/softirqs
HRTIMER: 1216929071
TIMER: 191172500
NET_RX: 221283404

This strongly suggests a timer + network softirq storm, not user-space load.

CPU idle behavior
From repeated /proc/stat checks:

  • The CPU can enter idle
  • But the idle counter barely increases
  • CPU is constantly being woken up by timers and softirqs

Boot parameters
cat /proc/cmdline
No idle=poll, nohlt, or other parameters forcing busy idle.

Important clarification
The SCADA application:

  • Runs every 10 seconds
  • Uses blocking I/O
  • Is not CPU-intensive
  • Reducing or stopping it does not significantly change system CPU usage

This points to a kernel / driver / RT configuration issue, not an application bug.

My understanding so far
It seems this is related to:

  • PREEMPT_RT kernel behavior on a single-core system
  • Very frequent HRTIMER activity
  • High NET_RX softirq rate (even with modest network usage)

While this may be expected behavior for an RT kernel, it significantly reduces idle time and margin for real-time workloads.

  1. Is this level of HRTIMER / NET_RX activity considered normal on CC100 with PREEMPT_RT?
  2. Has anyone successfully reduced CPU usage via:
    • Network driver tuning (interrupt coalescing, NAPI behavior)?
    • Kernel configuration (HZ, timer settings)?
    • Alternative firmware / kernel variants (PREEMPT instead of RT)?
  3. Are there known recommendations from WAGO for SCADA-style workloads on CC100 where hard real-time is not required?

Any shared experience, tuning advice, or confirmation would be greatly appreciated.

Thanks in advance!

** SOLVED **

Root cause identified.

The issue was caused by the ipwatchd service interacting badly with Docker-created network bridges.

ipwatchd was repeatedly trying to monitor Docker bridge interfaces (br-xxxx), generating continuous log spam like:
“Could not retrieve IP address of the device br-xxxx”

This resulted in:

  • Constant wakeups
  • High HRTIMER and NET_RX softirq activity
  • Near 100% system CPU usage on a single-core PREEMPT_RT system

Stopping ipwatchd immediately resolves the issue:
/etc/init.d/ipwatchd stop

No network disruption was observed:

  • VPN connection remained active
  • Main Ethernet (br0 / eth0) continued working normally
  • SCADA application unaffected

Inspection of /etc/ipwatchd.conf shows that Docker bridges are automatically added to the configuration (even as passive), and the file is auto-generated and overwritten, making selective exclusion unreliable.

Hope this helps anyone facing similar high system CPU usage on CC100.

4 Likes