Skip to content

auditd: Error receiving audit netlink packet (No buffer space available) #14735

Description

@comps

Description of problem:

This is sometimes visible in syslog on (at least) RHEL-10.2 with hipaa profile remediated.

auditd: Error receiving audit netlink packet (No buffer space available)

From a deep AI-powered analysis, this appears to be something the content could mitigate (or soften) with an additional sysctl setting, namely net.core.rmem_default=8388608 (8MB or more).

Note that this is something the systemd (journald) project itself uses for its own audit-interfacing code (when it's active, ie. OS without auditd), but they use an even more extreme 128M instead of 8M.

This should be presumably set for all profiles where the content adds audit rules (or at least rules that could produce a LOT of reports).

SCAP Security Guide Version:

master @ 3bcae1b

Operating System Version:

RHEL-10.2, posibly others too

Steps to Reproduce:

  1. Run /scanning/boot-errors/hipaa (Contest) many times until you get lucky and hit a race that triggers this issue

Additional Information/Debugging Steps:

See detailed research results here

Analysis: auditd ENOBUFS and the Netlink Socket Buffer

The Error

auditd: Error receiving audit netlink packet (No buffer space available)

Emitted by lib/netlink.c:97 when recvfrom() returns ENOBUFS. This is a netlink socket buffer congestion notification, distinct from the kernel-side audit: backlog limit exceeded message. It does not necessarily indicate audit record loss.

Root Cause: Producer/Consumer Speed Mismatch

The kernel's kauditd_thread drains audit_queue in a tight batch loop (kernel/audit.c:798-833), calling netlink_unicast() for each message with no sleep between successful sends. netlink_unicast() returns as soon as the message is placed in auditd's socket receive buffer via __netlink_sendskb() -- it does not wait for auditd to call recvfrom(). Each successful send takes on the order of hundreds of nanoseconds.

auditd, by contrast, processes events synchronously in a single-threaded libev event loop: recvfrom() -> format_event() -> fprintf() to the log file -> dispatch_event() to plugins. With flush = DATA or SYNC, each fprintf() triggers a blocking write() syscall. A single write to disk can take 5-15 ms on spinning media and spike higher under journal pressure.

The default socket receive buffer (net.core.rmem_default) is 212,992 bytes (~208 KB). auditd calls no setsockopt() after socket() -- no SO_RCVBUF, no SO_RCVBUFFORCE, no NETLINK_NO_ENOBUFS (confirmed across the full git history). At ~500-byte typical messages, the buffer holds roughly 400 messages. kauditd can fill this in well under a millisecond; auditd needs tens of milliseconds to drain it if disk I/O is involved.

When the buffer is full and kauditd's 100 ms send timeout (sk_sndtimeo = HZ/10, set in audit_net_init()) expires, netlink_attachskb() calls netlink_overrun(), which sets ENOBUFS on auditd's socket and the NETLINK_S_CONGESTED flag. The congested flag rejects all subsequent sends -- even if the buffer has room -- until auditd drains the queue completely empty. This amplifies brief spikes into longer disruptions.

Messages Are Preserved, Not Lost

Before each netlink_unicast(), kauditd_send_queue() calls skb_get(skb) to take an extra reference (line 815: "grab an extra skb reference in case of error"). When the send fails and kfree_skb() drops one reference, the skb survives at refcount 1. The error hook re-queues it: main queue -> retry queue (up to 5 retries) -> hold queue. audit_log_lost() -- the only function that increments the kernel's lost counter -- is called only when these queues overflow past audit_backlog_limit. ENOBUFS on the recv side is a congestion warning, not proof of data loss. Verify with auditctl -s | grep lost and dmesg | grep 'kauditd.*overflow'.

Common Triggers (from 17 real-world reports spanning 2006-2026)

Trigger Mechanism
Log rotation rotate_logs() runs synchronously: fclose() (fsync) + rename loop + open(). 30-200 ms of blocked recv.
Disk I/O stall fprintf() with O_SYNC/O_DSYNC blocks the event loop. Even SSDs spike under journal commits.
Enriched log format + NSS log_format = ENRICHED triggers getpwuid()/getgrgid() per event. LDAP/SSSD lookups: 1-100 ms each.
SELinux AVC floods Container workloads with missing policy generate thousands of AVC denials/sec.
Slow dispatcher plugins A backed-up SIEM agent on af_unix causes the plugin queue to fill; enqueue() retries block the event loop 6 ms per event.
Boot-time backlog flush Fixed in kernel 6.7+ (022732e3d846): ACK is now sent before auditd_conn is set, preventing the backlog from flooding the socket before the registration reply is delivered. However note that RHEL-10.2 has kernel 6.12+.

journald Interaction

systemd-journald subscribes to AUDIT_NLGRP_READLOG multicast and never sends AUDIT_STATUS_PID (journald-audit.c). It cannot steal the unicast channel; the kernel returns -EEXIST on dual registration. However, journald's multicast subscription adds per-message overhead: kauditd calls skb_copy() (full deep copy, not clone) for every audit record before the unicast send, and netlink_broadcast_filtered() calls yield() when the multicast receiver's buffer exceeds 50% capacity. Notably, systemd-journald-audit.socket sets ReceiveBuffer=128M on the multicast socket -- a 600x increase over auditd's default.

A regression in systemd 258 caused journald to enable the kernel audit subsystem (AUDIT_STATUS_ENABLED) even when Audit= was explicitly empty (intended to mean "keep current state"). Fixed in v258.1 / PR #39069.

Mitigations

  1. sysctl -w net.core.rmem_default=8388608 -- increases the socket buffer from 208 KB to 8 MB
  2. flush = INCREMENTAL_ASYNC in auditd.conf -- avoids O_SYNC per-write blocking
  3. log_format = RAW -- eliminates NSS lookups during event formatting

Metadata

Metadata

Assignees

Labels

RHELRed Hat Enterprise Linux product related.RHEL10Red Hat Enterprise Linux 10 product related.productization-issueIssue found in upstream stabilization process.triaged

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions