Subpath Mounted File is Empty in Container, Bug?

Recently, I encountered a very strange phenomenon: in a newly created cluster, when doing a subPath mount, the file in the container at that subPath is empty. I checked the syntax and configuration; everything seemed correct. But I’m not hitting the known bug where a non-existent ConfigMap subPath is mounted as an empty file nor issue 54514. Moreover, if I mount the ConfigMap directly, the key’s contents show up fine in the container. It really feels like I’ve run into a bug!
1 Symptom
In the example below, I mount the ConfigMap’s config.conf
as a subPath, but inside the container the mounted file /etc/kubernetes/config.conf
is empty:
# cat test.yaml
kind: ConfigMap
apiVersion: v1
metadata:
name: test
labels:
app: test
data:
config.conf: |-
test-data
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
labels:
k8s-app: test
name: test
spec:
selector:
matchLabels:
k8s-app: test
updateStrategy:
type: RollingUpdate
template:
metadata:
labels:
k8s-app: test
spec:
containers:
- name: test
image: busybox
imagePullPolicy: IfNotPresent
command:
- tail
- -f
- /dev/null
securityContext:
privileged: true
volumeMounts:
- mountPath: /etc/kubernetes/config.conf
name: test-config
subPath: config.conf
readOnly: true
volumes:
- name: test-config
configMap:
name: test
# kubectl apply -f test.yaml
# kubectl exec -it -n kube-system test-26qvd -- sh
/ # ls -li /etc/kubernetes/
total 4
3283365 -rw-r--r-- 1 root root 0 Jun 4 13:29 config.conf
1.1 Troubleshooting Steps
Since this is a new cluster with Kubernetes version 1.33.1, I initially suspected it might be a Kubernetes bug, a kernel bug, or a containerd bug? I tried:
- Reading the subPath docs and searching Kubernetes issues—my usage was correct and no relevant issues exist (so perhaps a new, undiscovered bug?).
- Spinning up the same Kubernetes version in kind—subPath mounts worked fine there (exclude a version bug; maybe a config issue?).
- Copy kind’s kubelet configuration—problem persisted (The kernel version in kind’s cluster is differed; could it be a kernel bug?).
- Changing the kernel version—problem persisted.
- Adding debug logs around kubelet’s subPath logic (setting
-v=5
)—Found that kubelet indeed performed the bind mount, and the mount still existed whendoBindSubPath
returned. - Aligning runc versions with kind—problem persisted.
2 How It Works
When kubelet starts the container, it prepares all necessary mounts (ConfigMap, Secrets, projected volumes, /etc/resolv.conf
, /etc/hosts
, subPaths, etc.) and It then sends a CreateContainer request via CRI, passing both the host path for each subPath and the target container path. The container runtime then bind-mounts the host path into the container.
Two directories are involved:
- Kubelet’s store for Secrets/ConfigMap on the host:
{kubelet-root}/pods/{pod UID}/volumes/kubernetes.io~{configmap|secret}/{volume name}
- The per-subPath host file:
{kubelet-root}/pods/{pod UID}/volume-subpaths/{volume name}/{container name}/{index}
(index
is the order the subPath appears in the container’s volumeMounts, starting from 0)
Kubelet first saves the subPath content of the secret or configMap on the host (path includes a timestamp), then creates an empty file for the subPath, and finally bind-mounts the subPath content onto the empty file.
Bind mounting here is done in a special way: the file descriptor (fd) of the subPath content file is bind-mounted onto the empty subPath file.
If interested, you can read the Kubernetes 1.33 source code doBindSubPath.
Steps involved:
- Open the subPath file in the secret/configMap directory to get the fd.
- Bind mount this fd onto the empty subPath file.
- Remount with options.
- Close the fd.
2.1 Steps in Our Case
- Save the configMap
config.conf
to/data/kubernetes/kubelet/pods/<podUID>/volumes/kubernetes.io~configmap/test-config/..2025_06_04_13_29_02.3910074484/config.conf
- It creates an empty file in
/data/kubernetes/kubelet/pods/<podUID>/volume-subpaths/test-config/test/0
- Kubelet opens the timestamped
config.conf
, gets itsfd
- It bind-mounts that
fd
onto the empty file - It does a remount for
noatime
etc., then closes thefd
3 Case Analysis
Kubelet logs show the bind mount:
Jun 04 13:29:02 kube-master-01 kubelet[20136]: I0604 13:29:02.913922 20136 subpath_linux.go:232] bind mounting "/proc/20136/fd/19" at "/da
ta/kubernetes/kubelet/pods/98bc1caf-a4e9-437a-a9b6-85e85ede4ccc/volume-subpaths/test-config/test/0"
Jun 04 13:29:02 kube-master-01 kubelet[20136]: I0604 13:29:02.913966 20136 mount_linux.go:260] Mounting cmd (mount) with arguments (--no-c
anonicalize -o bind /proc/20136/fd/19 /data/kubernetes/kubelet/pods/98bc1caf-a4e9-437a-a9b6-85e85ede4ccc/volume-subpaths/test-config/test/0)
Jun 04 13:29:02 kube-master-01 kubelet[20136]: I0604 13:29:02.917275 20136 mount_linux.go:260] Mounting cmd (mount) with arguments (--no-c
anonicalize -o bind,remount,noatime /proc/20136/fd/19 /data/kubernetes/kubelet/pods/98bc1caf-a4e9-437a-a9b6-85e85ede4ccc/volume-subpaths/tes
t-config/test/0)
Jun 04 13:29:02 kube-master-01 kubelet[20136]: I0604 13:29:02.920550 20136 subpath_linux.go:238] Bound SubPath /data/kubernetes/kubelet/po
ds/98bc1caf-a4e9-437a-a9b6-85e85ede4ccc/volumes/kubernetes.io~configmap/test-config/..2025_06_04_13_29_02.3910074484/config.conf into /data/
kubernetes/kubelet/pods/98bc1caf-a4e9-437a-a9b6-85e85ede4ccc/volume-subpaths/test-config/test/0
Jun 04 13:29:02 kube-master-01 kubelet[20136]: I0604 13:29:02.921777 20136 subpath_linux.go:203] subpath "/data/kubernetes/kubelet/pods/98bc1caf-a4e9-437a-a9b6-85e85ede4ccc/volumes/kubernetes.io~configmap/test-config/..2025_06_04_13_29_02.3910074484/config.conf" is still mounted, mounts: [/data/kubernetes/kubelet/pods/98bc1caf-a4e9-437a-a9b6-85e85ede4ccc/volume-subpaths/test-config/test/0]
Jun 04 13:29:02 kube-master-01 kubelet[20136]: I0604 13:29:02.921855 20136 kubelet_pods.go:382] "Mount has propagation" pod="kube-system/test" containerName="test" volumeMountName="test-config" propagation="PROPAGATION_PRIVATE"
Use audit to monitor mount and umount syscalls
Here, /proc/20136/fd/19
is the fd for the file /data/kubernetes/kubelet/pods/98bc1caf-a4e9-437a-a9b6-85e85ede4ccc/volumes/kubernetes.io~configmap/test-config/..2025_06_04_13_29_02.3910074484/config.conf
, inode is 3283288
The original inode for /data/kubernetes/kubelet/pods/88a4f202-30e8-43d1-85ab-814a9369e8f9/volume-subpaths/test-config/test/0
was 3283365, after the mount it became 3283288
However, when runc mounts /data/kubernetes/kubelet/pods/88a4f202-30e8-43d1-85ab-814a9369e8f9/volume-subpaths/test-config/test/0
into the container, the inode becomes 3283365 again (the inode of the original empty file before the mount)
# Set up monitoring for mount and umount system calls
# auditctl -a always,exit -F arch=b64 -S mount,umount2 -F key=mount_operations
# Query mount records
# ausearch -k mount_operations | less
# This performs a bind mount
time->Wed Jun 4 13:00:29 2025
type=PROCTITLE msg=audit(1749042029.732:830): proctitle=6D6F756E74002D2D6E6F2D63616E6F6E6963616C697A65002D6F0062696E64002F70726F632F32303133
362F66642F3139002F646174612F6B756265726E657465732F6B7562656C65742F706F64732F38386134663230322D333065382D343364312D383561622D3831346139333639
653866392F766F6C756D652D73756270
type=PATH msg=audit(1749042029.732:830): item=1 name="/proc/20136/fd/19" inode=3283288 dev=08:11 mode=0100644 ouid=0 ogid=0 rdev=00:00 namet
ype=NORMAL cap_fp=0 cap_fi=0 cap_fe=0 cap_fver=0 cap_frootid=0
type=PATH msg=audit(1749042029.732:830): item=0 name="/data/kubernetes/kubelet/pods/88a4f202-30e8-43d1-85ab-814a9369e8f9/volume-subpaths/test-config/test/0" inode=3283365 dev=08:11 mode=0100640 ouid=0 ogid=0 rdev=00:00 nametype=NORMAL cap_fp=0 cap_fi=0 cap_fe=0 cap_fver=0 cap_frootid=0
type=CWD msg=audit(1749042029.732:830): cwd="/"
type=SYSCALL msg=audit(1749042029.732:830): arch=c000003e syscall=165 success=yes exit=0 a0=55b2728bb990 a1=55b2728bb9b0 a2=55b2728bba20 a3=1000 items=2 ppid=20136 pid=27630 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="mount" exe="/usr/bin/mount" subj=unconfined key="mount_operations"
----
# This performs a bind remount
time->Wed Jun 4 13:00:29 2025
type=PROCTITLE msg=audit(1749042029.736:831): proctitle=6D6F756E74002D2D6E6F2D63616E6F6E6963616C697A65002D6F0062696E642C72656D6F756E742C6E6F6174696D65002F70726F632F32303133362F66642F3139002F646174612F6B756265726E657465732F6B7562656C65742F706F64732F38386134663230322D333065382D343364312D383561622D3831346139333639
type=PATH msg=audit(1749042029.736:831): item=0 name="/data/kubernetes/kubelet/pods/88a4f202-30e8-43d1-85ab-814a9369e8f9/volume-subpaths/test-config/test/0" inode=3283288 dev=08:11 mode=0100644 ouid=0 ogid=0 rdev=00:00 nametype=NORMAL cap_fp=0 cap_fi=0 cap_fe=0 cap_fver=0 cap_frootid=0
type=CWD msg=audit(1749042029.736:831): cwd="/"
type=SYSCALL msg=audit(1749042029.736:831): arch=c000003e syscall=165 success=yes exit=0 a0=59c7ea4eba50 a1=59c7ea4eba70 a2=59c7ea4ebae0 a3=1420 items=1 ppid=20136 pid=27631 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="mount" exe="/usr/bin/mount" subj=unconfined key="mount_operations"
# During runc operation, inode becomes 3283365 (the inode of the original empty file before mounting)
time->Wed Jun 4 13:00:30 2025
type=PROCTITLE msg=audit(1749042030.001:873): proctitle=72756E6300696E6974
type=PATH msg=audit(1749042030.001:873): item=1 name="/data/kubernetes/kubelet/pods/88a4f202-30e8-43d1-85ab-814a9369e8f9/volume-subpaths/test-config/test/0" inode=3283365 dev=08:11 mode=0100640 ouid=0 ogid=0 rdev=00:00 nametype=NORMAL cap_fp=0 cap_fi=0 cap_fe=0 cap_fver=0 cap_frootid=0
type=PATH msg=audit(1749042030.001:873): item=0 name="/proc/thread-self/fd/8" inode=3283395 dev=00:3c mode=0100644 ouid=0 ogid=0 rdev=00:00 nametype=NORMAL cap_fp=0 cap_fi=0 cap_fe=0 cap_fver=0 cap_frootid=0
type=CWD msg=audit(1749042030.001:873): cwd="/run/containerd/io.containerd.runtime.v2.task/k8s.io/121b99326ba68e3578e29f419e3f996518996faecf7eec227ba5211cafd97493/rootfs"
type=SYSCALL msg=audit(1749042030.001:873): arch=c000003e syscall=165 success=yes exit=0 a0=c00009a310 a1=c0000c95d8 a2=c000192214 a3=5001 items=2 ppid=27639 pid=27651 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="runc:[2:INIT]" exe="/runc" subj=unconfined key="mount_operations"
The ppid
20136 is the kubelet process.
ps aux |grep 20136
root 20136 3.5 1.7 2140944 70756 ? Ssl 10:38 6:41 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/config.yaml --container-runtime-endpoint=unix:///run/containerd/containerd.sock --pod-infra-container-image=registry.k8s.io/pause:3.10 --root-dir=/data/kubernetes/kubelet --v=5
Check the file inode
# ll -ih /data/kubernetes/kubelet/pods/98bc1caf-a4e9-437a-a9b6-85e85ede4ccc/volume-subpaths/test-config/test/0
3283365 -rw-r----- 1 root root 0 Jun 4 13:29 /data/kubernetes/kubelet/pods/98bc1caf-a4e9-437a-a9b6-85e85ede4ccc/volume-subpaths/test-config/test/0
# ll -ih /data/kubernetes/kubelet/pods/98bc1caf-a4e9-437a-a9b6-85e85ede4ccc/volumes/kubernetes.io~configmap/test-config/..2025_06_04_13_29_02.3910074484/config.conf
3283288 -rw-r--r-- 1 root root 1.4K Jun 4 13:29 /data/kubernetes/kubelet/pods/98bc1caf-a4e9-437a-a9b6-85e85ede4ccc/volumes/kubernetes.io~configmap/test-config/..2025_06_04_13_29_02.3910074484/config.conf
Check the file mount
No mount was found in the system namespace, but the mount can be seen in the kubelet process.
# findmnt /data/kubernetes/kubelet/pods/98bc1caf-a4e9-437a-a9b6-85e85ede4ccc/volume-subpaths/test-config/test/0
# findmnt in the kubelet process
# findmnt -N 20136 -o SOURCE,TARGET,PROPAGATION,OPTIONS,VFS-OPTIONS /data/kubernetes/kubelet/pods/98bc1caf-a4e9-437a-a9b6-85e85ede4ccc/volume-subpaths/test-config/test/0
SOURCE TARGET PROPAGATION OPTIONS VFS-OPTIONS
/dev/sdb1[/kubernetes/kubelet/pods/98bc1caf-a4e9-437a-a9b6-85e85ede4ccc/volumes/kubernetes.io~configmap/test-config/..2025_06_04_13_29_02.3910074484/config.conf]
/data/kubernetes/kubelet/pods/98bc1caf-a4e9-437a-a9b6-85e85ede4ccc/volume-subpaths/test-config/test/0
shared,slave rw,noat rw,noatime
4 Root Cause: Mount Namespace Propagation
From the above analysis, it can be seen that the subpath is mounted in the kubelet process, but it is not visible from the shell process.
This involves the issue of namespaces propagation of mount.
The root cause is that the mount namespace is not in the same namespace, and the subpath mount point in the kubelet namespace has the propagation setting slave,shared
(it does not propagate to the parent group, while the parent group does propagate to the kubelet namespace). Therefore, containerd and shell in the system namespace cannot see that the subpath is mounted, and the subpath is an empty file when mounted inside the container.
# ll /proc/20136/ns/mnt # mount namespace of kubelet process
lrwxrwxrwx 1 root root 0 Jun 5 02:56 /proc/20136/ns/mnt -> 'mnt:[4026532279]'
# ll /proc/1/ns/mnt
lrwxrwxrwx 1 root root 0 Jun 5 02:55 /proc/1/ns/mnt -> 'mnt:[4026531841]'
# ll /proc/self/ns/mnt
lrwxrwxrwx 1 root root 0 Jun 5 02:57 /proc/self/ns/mnt -> 'mnt:[4026531841]'
#cat /proc/1/mountinfo |grep "159 "
62 25 8:17 / /data rw,noatime shared:159 - ext4 /dev/sdb1 rw
#cat /proc/20316/mountinfo |grep "159 "
867 805 8:17 / /data rw,noatime shared:492 master:159 - ext4 /dev/sdb1 rw
478 867 8:17 /kubernetes/kubelet/pods/98bc1caf-a4e9-437a-a9b6-85e85ede4ccc/volumes/kubernetes.io~configmap/test-config/..2025_06_04_13_29_02.3910074484/config.conf /data/kubernetes/kubelet/pods/98bc1caf-a4e9-437a-a9b6-85e85ede4ccc/volume-subpaths/test-config/test/0 rw,noatime shared:492 master:159 - ext4 /dev/sdb1 rw
4.1 Why Is Kubelet in a Different Namespace?
In systemd, the logs output by kubelet are sent to the journal by default, and the journal forwards them to syslog by default, causing duplicate log entries and wasting disk space. To save disk space, I defined a journal log namespace that does not forward to syslog and configured kubelet to use this log Namespace.
[Service]
LogNamespace=no-syslog-wall
Once LogNamespace
is enabled in systemd, the service runs in a new mount namespace with propagation set to slave
. This mount namespace mounts the journal-related log socket.
#cat /proc/20316/mountinfo
908 846 0:28 /systemd/journal.no-syslog-wall /run/systemd/journal ro,nosuid,nodev shared:426 master:12 - tmpfs tmpfs rw,size=801376k,nr_inodes=819200,mode=755,inode64
The systemd documentation provides an explanation:
Internally, journal namespaces are implemented through Linux mount namespacing and over-mounting the directory that contains the relevant
AF_UNIX
sockets used for logging in the unit’s mount namespace. Since mount namespaces are used this setting disconnects propagation of mounts from the unit’s processes to the host, similarly to howReadOnlyPaths=
and similar settings describe above work. Journal namespaces may hence not be used for services that need to establish mount points on the host.
https://www.freedesktop.org/software/systemd/man/255/systemd.exec.html#LogNamespace=
5 Solution
Stop using LogNamespace
. so kubelet and container runtime are in the system mount namespace, and subPath bind mounts are visible to the container runtime.
6 Summary
In systemd, container-related services involve mount-related directory operations, and these operations need to be perceived by the container runtime. The cadvisor in kubelet also depends on the mount operations in the container runtime.
Therefore, kubelet and the container runtime must be able to perceive each other’s mount operations, either by setting propagation to share (all parent mount points must also be shared) or by being in the same namespace.
Similar issues have been reported in Docker with LogNamespace
(moby#41879).
7 Reference
https://github.com/systemd/systemd/issues/16638
https://lwn.net/Articles/689856/