容器中subpath挂载文件为空文件,bug?

最近遇到一个非常奇怪的现象,在新建的集群中,进行subpath挂载,但是这个subpath在容器中的文件是空的。检查了语法配置没有问题,使用姿势没有问题,也不是configmap不存在的subpath会被挂载为空的bug,issues/54514。而且直接挂载configmap,容器里也能看configmap中的这个subpath的key的内容。感觉像是遇到了bug!
1 现象
下面这个例子中,将configmap的config.conf挂载为subpath,但是在容器中挂载的文件/etc/kubernetes/config.conf内容是空的。
# test.yaml
kind: ConfigMap
apiVersion: v1
metadata:
name: test
labels:
app: test
data:
config.conf: |-
test-data
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
labels:
k8s-app: test
name: test
spec:
selector:
matchLabels:
k8s-app: test
updateStrategy:
type: RollingUpdate
template:
metadata:
labels:
k8s-app: test
spec:
containers:
- name: test
image: busybox
imagePullPolicy: IfNotPresent
command:
- tail
- -f
- /dev/null
securityContext:
privileged: true
volumeMounts:
- mountPath: /etc/kubernetes/config.conf
name: test-config
subPath: config.conf
readOnly: true
volumes:
- name: test-config
configMap:
name: test# kubectl apply -f test.yaml
#kubectl exec -it -n kube-system test-26qvd -- sh
/ # ls -li /etc/kubernetes/
total 4
3283365 -rw-r--r-- 1 root root 0 Jun 4 13:29 config.conf1.1 排查过程
由于是新建集群,kubernetes版本是1.33.1,一度怀疑了是kubernetes bug、内核bug、containerd的bug?
尝试过下面的方案:
- 查询如何使用subpath挂载文档、查找kubernetes issue。结果–使用姿势正确、没有相关issue(难道我遇到未发现bug?)。
- 使用kind安装相同版本的kubernetes,发现subpath挂载没有问题。(排除是版本bug问题,难道是配置问题?)
- 更换kind里的kubelet配置,发现问题依旧(kind和新安装的集群内核版本不一样,难道说是内核问题?)。
- 换内核版本,发现问题依旧
- 分析kubelet的subpath相关代码,为kubelet的subpath挂载部分添加debug日志,日志级别设置为
-v=5。发现kubelet里确实执行了bind挂载,而且在doBindSubPath方法退出时候挂载还是存在的。 - 替换runc版本(跟kind里一样),但是问题依旧。
2 原理分析
在kubelet启动pod的container时候,kubelet会准备好容器所需要的挂载(比如configmap、secret、projected volume, /etc/reslov.conf、/etc/hosts 、subpath等挂载),然后通过CRI发送创建container的请求给runtime,请求中会包含subpath挂载的宿主机路径和容器中的路径,然后容器会将宿主机路径挂载到容器中。
这里涉及两个目录:
- kubelet保存secret、configmap到主机上目录,
{kubelet root}/pods/{pod uid}/volumes/kubernetes.io~{configmap or secret}/{volume name} - subpath对应单独的主机上文件,
{kubelet root}/pods/{pod uid}/volume-subpaths/{volume name}/{container name}/{index},(index是subpath在container的volumeMounts中出现序号,从0开始)
kubelet先在将secret、configmap的subpath内容保存到主机上目录(路径在..{time}/{subpath}),然后创建subpath对应的空文件,最后使用bind将subpath内容挂载到subpath对应的空文件上。
而bind挂载使用比较特殊方式:将subpath内容的fd(文件描述符)bind挂载到subpath对应的空文件上。
感兴趣的可以阅读kubernetes 1.33的源码 doBindSubPath
具体步骤:
- 打开secret、configmap目录下对应的subpath文件,得到fd
- 将这个fd作为源,subpath空文件作为目标,使用bind的方式挂载
- 然后再次执行remount
- 关闭fd
对应这个例子中:
- 保存configmap的config.conf数据到
/data/kubernetes/kubelet/pods/98bc1caf-a4e9-437a-a9b6-85e85ede4ccc/volumes/kubernetes.io~configmap/test-config/..2025_06_04_13_29_02.3910074484/config.conf - 创建空文件
/data/kubernetes/kubelet/pods/98bc1caf-a4e9-437a-a9b6-85e85ede4ccc/volume-subpaths/test-config/test/0 - 打开
/data/kubernetes/kubelet/pods/98bc1caf-a4e9-437a-a9b6-85e85ede4ccc/volumes/kubernetes.io~configmap/test-config/..2025_06_04_13_29_02.3910074484/config.conf,获得文件的fd - 将这个fd使用bind方式挂载到
/data/kubernetes/kubelet/pods/98bc1caf-a4e9-437a-a9b6-85e85ede4ccc/volume-subpaths/test-config/test/0 - 执行remount操作
- 关闭fd
2.1 案例分析
kubelet的日志看到subpath进行了bind mount
Jun 04 13:29:02 kube-master-01 kubelet[20136]: I0604 13:29:02.913922 20136 subpath_linux.go:232] bind mounting "/proc/20136/fd/19" at "/da
ta/kubernetes/kubelet/pods/98bc1caf-a4e9-437a-a9b6-85e85ede4ccc/volume-subpaths/test-config/test/0"
Jun 04 13:29:02 kube-master-01 kubelet[20136]: I0604 13:29:02.913966 20136 mount_linux.go:260] Mounting cmd (mount) with arguments (--no-c
anonicalize -o bind /proc/20136/fd/19 /data/kubernetes/kubelet/pods/98bc1caf-a4e9-437a-a9b6-85e85ede4ccc/volume-subpaths/test-config/test/0)
Jun 04 13:29:02 kube-master-01 kubelet[20136]: I0604 13:29:02.917275 20136 mount_linux.go:260] Mounting cmd (mount) with arguments (--no-c
anonicalize -o bind,remount,noatime /proc/20136/fd/19 /data/kubernetes/kubelet/pods/98bc1caf-a4e9-437a-a9b6-85e85ede4ccc/volume-subpaths/tes
t-config/test/0)
Jun 04 13:29:02 kube-master-01 kubelet[20136]: I0604 13:29:02.920550 20136 subpath_linux.go:238] Bound SubPath /data/kubernetes/kubelet/po
ds/98bc1caf-a4e9-437a-a9b6-85e85ede4ccc/volumes/kubernetes.io~configmap/test-config/..2025_06_04_13_29_02.3910074484/config.conf into /data/
kubernetes/kubelet/pods/98bc1caf-a4e9-437a-a9b6-85e85ede4ccc/volume-subpaths/test-config/test/0
Jun 04 13:29:02 kube-master-01 kubelet[20136]: I0604 13:29:02.921777 20136 subpath_linux.go:203] subpath "/data/kubernetes/kubelet/pods/98bc1caf-a4e9-437a-a9b6-85e85ede4ccc/volumes/kubernetes.io~configmap/test-config/..2025_06_04_13_29_02.3910074484/config.conf" is still mounted, mounts: [/data/kubernetes/kubelet/pods/98bc1caf-a4e9-437a-a9b6-85e85ede4ccc/volume-subpaths/test-config/test/0]
Jun 04 13:29:02 kube-master-01 kubelet[20136]: I0604 13:29:02.921855 20136 kubelet_pods.go:382] "Mount has propagation" pod="kube-system/test" containerName="test" volumeMountName="test-config" propagation="PROPAGATION_PRIVATE"使用audit监控挂载和卸载系统调用
这里的/proc/20136/fd/19为文件/data//kubernetes/kubelet/pods/98bc1caf-a4e9-437a-a9b6-85e85ede4ccc/volumes/kubernetes.io~configmap/test-config/..2025_06_04_13_29_02.3910074484/config.conf的fd,inode为3283288
原来的/data/kubernetes/kubelet/pods/88a4f202-30e8-43d1-85ab-814a9369e8f9/volume-subpaths/test-config/test/0的inode为3283365,挂载之后变成3283288
而runc操作/data/kubernetes/kubelet/pods/88a4f202-30e8-43d1-85ab-814a9369e8f9/volume-subpaths/test-config/test/0挂载到容器的时候,inode变成3283365(未挂载时候的空文件的inode)
#设置监控mount、umount系统调用
#auditctl -a always,exit -F arch=b64 -S mount,umount2 -F key=mount_operations
#查询挂载记录
# ausearch -k mount_operations | less
#这个执行bind mount
time->Wed Jun 4 13:00:29 2025
type=PROCTITLE msg=audit(1749042029.732:830): proctitle=6D6F756E74002D2D6E6F2D63616E6F6E6963616C697A65002D6F0062696E64002F70726F632F32303133
362F66642F3139002F646174612F6B756265726E657465732F6B7562656C65742F706F64732F38386134663230322D333065382D343364312D383561622D3831346139333639
653866392F766F6C756D652D73756270
type=PATH msg=audit(1749042029.732:830): item=1 name="/proc/20136/fd/19" inode=3283288 dev=08:11 mode=0100644 ouid=0 ogid=0 rdev=00:00 namet
ype=NORMAL cap_fp=0 cap_fi=0 cap_fe=0 cap_fver=0 cap_frootid=0
type=PATH msg=audit(1749042029.732:830): item=0 name="/data/kubernetes/kubelet/pods/88a4f202-30e8-43d1-85ab-814a9369e8f9/volume-subpaths/test-config/test/0" inode=3283365 dev=08:11 mode=0100640 ouid=0 ogid=0 rdev=00:00 nametype=NORMAL cap_fp=0 cap_fi=0 cap_fe=0 cap_fver=0 cap_frootid=0
type=CWD msg=audit(1749042029.732:830): cwd="/"
type=SYSCALL msg=audit(1749042029.732:830): arch=c000003e syscall=165 success=yes exit=0 a0=55b2728bb990 a1=55b2728bb9b0 a2=55b2728bba20 a3=1000 items=2 ppid=20136 pid=27630 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="mount" exe="/usr/bin/mount" subj=unconfined key="mount_operations"
----
#这个是执行bind remount
time->Wed Jun 4 13:00:29 2025
type=PROCTITLE msg=audit(1749042029.736:831): proctitle=6D6F756E74002D2D6E6F2D63616E6F6E6963616C697A65002D6F0062696E642C72656D6F756E742C6E6F6174696D65002F70726F632F32303133362F66642F3139002F646174612F6B756265726E657465732F6B7562656C65742F706F64732F38386134663230322D333065382D343364312D383561622D3831346139333639
type=PATH msg=audit(1749042029.736:831): item=0 name="/data/kubernetes/kubelet/pods/88a4f202-30e8-43d1-85ab-814a9369e8f9/volume-subpaths/test-config/test/0" inode=3283288 dev=08:11 mode=0100644 ouid=0 ogid=0 rdev=00:00 nametype=NORMAL cap_fp=0 cap_fi=0 cap_fe=0 cap_fver=0 cap_frootid=0
type=CWD msg=audit(1749042029.736:831): cwd="/"
type=SYSCALL msg=audit(1749042029.736:831): arch=c000003e syscall=165 success=yes exit=0 a0=59c7ea4eba50 a1=59c7ea4eba70 a2=59c7ea4ebae0 a3=1420 items=1 ppid=20136 pid=27631 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="mount" exe="/usr/bin/mount" subj=unconfined key="mount_operations"
#runc操作时候inode变成3283365(未挂载时候,原始空文件的inode)
time->Wed Jun 4 13:00:30 2025
type=PROCTITLE msg=audit(1749042030.001:873): proctitle=72756E6300696E6974
type=PATH msg=audit(1749042030.001:873): item=1 name="/data/kubernetes/kubelet/pods/88a4f202-30e8-43d1-85ab-814a9369e8f9/volume-subpaths/test-config/test/0" inode=3283365 dev=08:11 mode=0100640 ouid=0 ogid=0 rdev=00:00 nametype=NORMAL cap_fp=0 cap_fi=0 cap_fe=0 cap_fver=0 cap_frootid=0
type=PATH msg=audit(1749042030.001:873): item=0 name="/proc/thread-self/fd/8" inode=3283395 dev=00:3c mode=0100644 ouid=0 ogid=0 rdev=00:00 nametype=NORMAL cap_fp=0 cap_fi=0 cap_fe=0 cap_fver=0 cap_frootid=0
type=CWD msg=audit(1749042030.001:873): cwd="/run/containerd/io.containerd.runtime.v2.task/k8s.io/121b99326ba68e3578e29f419e3f996518996faecf7eec227ba5211cafd97493/rootfs"
type=SYSCALL msg=audit(1749042030.001:873): arch=c000003e syscall=165 success=yes exit=0 a0=c00009a310 a1=c0000c95d8 a2=c000192214 a3=5001 items=2 ppid=27639 pid=27651 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="runc:[2:INIT]" exe="/runc" subj=unconfined key="mount_operations"
----
time->Wed Jun 4 13:00:30 2025
type=PROCTITLE msg=audit(1749042030.001:875): proctitle=72756E6300696E6974
type=PATH msg=audit(1749042030.001:875): item=0 name="/proc/thread-self/fd/8" inode=3283365 dev=08:11 mode=0100640 ouid=0 ogid=0 rdev=00:00 nametype=NORMAL cap_fp=0 cap_fi=0 cap_fe=0 cap_fver=0 cap_frootid=0
type=CWD msg=audit(1749042030.001:875): cwd="/run/containerd/io.containerd.runtime.v2.task/k8s.io/121b99326ba68e3578e29f419e3f996518996faecf7eec227ba5211cafd97493/rootfs"
type=SYSCALL msg=audit(1749042030.001:875): arch=c000003e syscall=165 success=yes exit=0 a0=c00019224c a1=c0000c96b0 a2=c00019224d a3=44000 items=1 ppid=27639 pid=27651 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="runc:[2:INIT]" exe="/runc" subj=unconfined key="mount_operations"
----
time->Wed Jun 4 13:00:30 2025
type=PROCTITLE msg=audit(1749042030.001:877): proctitle=72756E6300696E6974
type=PATH msg=audit(1749042030.001:877): item=0 name="/proc/thread-self/fd/8" inode=3283365 dev=08:11 mode=0100640 ouid=0 ogid=0 rdev=00:00 nametype=NORMAL cap_fp=0 cap_fi=0 cap_fe=0 cap_fver=0 cap_frootid=0
type=CWD msg=audit(1749042030.001:877): cwd="/run/containerd/io.containerd.runtime.v2.task/k8s.io/121b99326ba68e3578e29f419e3f996518996faecf7eec227ba5211cafd97493/rootfs"
type=SYSCALL msg=audit(1749042030.001:877): arch=c000003e syscall=165 success=yes exit=0 a0=c00019227c a1=c0000c9788 a2=c00019227d a3=5021 items=1 ppid=27639 pid=27651 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="runc:[2:INIT]" exe="/runc" subj=unconfined key="mount_operations"
----
time->Wed Jun 4 13:29:02 2025
type=PROCTITLE msg=audit(1749043742.915:1126): proctitle=6D6F756E74002D2D6E6F2D63616E6F6E6963616C697A65002D6F0062696E64002F70726F632F3230313
3362F66642F3139002F646174612F6B756265726E657465732F6B7562656C65742F706F64732F39386263316361662D613465392D343337612D613962362D383565383565646
5346363632F766F6C756D652D73756270
type=PATH msg=audit(1749043742.915:1126): item=1 name="/proc/20136/fd/19" inode=3283288 dev=08:11 mode=0100644 ouid=0 ogid=0 rdev=00:00 name
type=NORMAL cap_fp=0 cap_fi=0 cap_fe=0 cap_fver=0 cap_frootid=0
type=PATH msg=audit(1749043742.915:1126): item=0 name="/data/kubernetes/kubelet/pods/98bc1caf-a4e9-437a-a9b6-85e85ede4ccc/volume-subpaths/test-config/test/0" inode=3283365 dev=08:11 mode=0100640 ouid=0 ogid=0 rdev=00:00 nametype=NORMAL cap_fp=0 cap_fi=0 cap_fe=0 cap_fver=0 cap_frootid=0
type=CWD msg=audit(1749043742.915:1126): cwd="/"
type=SYSCALL msg=audit(1749043742.915:1126): arch=c000003e syscall=165 success=yes exit=0 a0=5de043404990 a1=5de0434049b0 a2=5de043404a20 a3=1000 items=2 ppid=20136 pid=28700 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="mount" exe="/usr/bin/mount" subj=unconfined key="mount_operations"
time->Wed Jun 4 13:29:03 2025
type=PROCTITLE msg=audit(1749043743.038:1169): proctitle=72756E6300696E6974
type=PATH msg=audit(1749043743.038:1169): item=1 name="/data/kubernetes/kubelet/pods/98bc1caf-a4e9-437a-a9b6-85e85ede4ccc/volume-subpaths/test-config/test/0" inode=3283365 dev=08:11 mode=0100640 ouid=0 ogid=0 rdev=00:00 nametype=NORMAL cap_fp=0 cap_fi=0 cap_fe=0 cap_fver=0 cap_frootid=0
type=PATH msg=audit(1749043743.038:1169): item=0 name="/proc/thread-self/fd/8" inode=3283395 dev=00:3c mode=0100644 ouid=0 ogid=0 rdev=00:00 nametype=NORMAL cap_fp=0 cap_fi=0 cap_fe=0 cap_fver=0 cap_frootid=0
type=CWD msg=audit(1749043743.038:1169): cwd="/run/containerd/io.containerd.runtime.v2.task/k8s.io/fda9387959e560161bb71dc777fc9844ba9fece65bda3fe9ee5f97007b121bd0/rootfs"
type=SYSCALL msg=audit(1749043743.038:1169): arch=c000003e syscall=165 success=yes exit=0 a0=c0001022a0 a1=c0001373c8 a2=c0000f3ed4 a3=5001 items=2 ppid=28709 pid=28721 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="runc:[2:INIT]" exe="/runc" subj=unconfined key="mount_operations"
time->Wed Jun 4 13:29:03 2025
type=PROCTITLE msg=audit(1749043743.038:1171): proctitle=72756E6300696E6974
type=PATH msg=audit(1749043743.038:1171): item=0 name="/proc/thread-self/fd/8" inode=3283365 dev=08:11 mode=0100640 ouid=0 ogid=0 rdev=00:00 nametype=NORMAL cap_fp=0 cap_fi=0 cap_fe=0 cap_fver=0 cap_frootid=0
type=CWD msg=audit(1749043743.038:1171): cwd="/run/containerd/io.containerd.runtime.v2.task/k8s.io/fda9387959e560161bb71dc777fc9844ba9fece65bda3fe9ee5f97007b121bd0/rootfs"
type=SYSCALL msg=audit(1749043743.038:1171): arch=c000003e syscall=165 success=yes exit=0 a0=c0000f3f0c a1=c0001374a0 a2=c0000f3f0d a3=44000 items=1 ppid=28709 pid=28721 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="runc:[2:INIT]" exe="/runc" subj=unconfined key="mount_operations"
----
time->Wed Jun 4 13:29:03 2025
type=PROCTITLE msg=audit(1749043743.039:1173): proctitle=72756E6300696E6974
type=PATH msg=audit(1749043743.039:1173): item=0 name="/proc/thread-self/fd/8" inode=3283365 dev=08:11 mode=0100640 ouid=0 ogid=0 rdev=00:00 nametype=NORMAL cap_fp=0 cap_fi=0 cap_fe=0 cap_fver=0 cap_frootid=0
type=CWD msg=audit(1749043743.039:1173): cwd="/run/containerd/io.containerd.runtime.v2.task/k8s.io/fda9387959e560161bb71dc777fc9844ba9fece65bda3fe9ee5f97007b121bd0/rootfs"
type=SYSCALL msg=audit(1749043743.039:1173): arch=c000003e syscall=165 success=yes exit=0 a0=c0000f3f3c a1=c000137578 a2=c0000f3f3d a3=5021 items=1 ppid=28709 pid=28721 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="runc:[2:INIT]" exe="/runc" subj=unconfined key="mount_operations"ppid 20136是kubelet进程
ps aux |grep 20136
root 20136 3.5 1.7 2140944 70756 ? Ssl 10:38 6:41 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/config.yaml --container-runtime-endpoint=unix:///run/containerd/containerd.sock --pod-infra-container-image=registry.k8s.io/pause:3.10 --root-dir=/data/kubernetes/kubelet --v=5查看文件inode
# ll -ih /data/kubernetes/kubelet/pods/98bc1caf-a4e9-437a-a9b6-85e85ede4ccc/volume-subpaths/test-config/test/0
3283365 -rw-r----- 1 root root 0 Jun 4 13:29 /data/kubernetes/kubelet/pods/98bc1caf-a4e9-437a-a9b6-85e85ede4ccc/volume-subpaths/test-config/test/0
# ll -ih /data/kubernetes/kubelet/pods/98bc1caf-a4e9-437a-a9b6-85e85ede4ccc/volumes/kubernetes.io~configmap/test-config/..2025_06_04_13_29_02.3910074484/config.conf
3283288 -rw-r--r-- 1 root root 1.4K Jun 4 13:29 /data/kubernetes/kubelet/pods/98bc1caf-a4e9-437a-a9b6-85e85ede4ccc/volumes/kubernetes.io~configmap/test-config/..2025_06_04_13_29_02.3910074484/config.conf查看文件mount
在系统命名空间里未发现挂载,而在kubelet进程中可以看到挂载
# findmnt /data/kubernetes/kubelet/pods/98bc1caf-a4e9-437a-a9b6-85e85ede4ccc/volume-subpaths/test-config/test/0
#在kubelet进程中查找mnt
# findmnt -N 20136 -o SOURCE,TARGET,PROPAGATION,OPTIONS,VFS-OPTIONS /data/kubernetes/kubelet/pods/98bc1caf-a4e9-437a-a9b6-85e85ede4ccc/volume-subpaths/test-config/test/0
SOURCE TARGET PROPAGATION OPTIONS VFS-OPTIONS
/dev/sdb1[/kubernetes/kubelet/pods/98bc1caf-a4e9-437a-a9b6-85e85ede4ccc/volumes/kubernetes.io~configmap/test-config/..2025_06_04_13_29_02.3910074484/config.conf]
/data/kubernetes/kubelet/pods/98bc1caf-a4e9-437a-a9b6-85e85ede4ccc/volume-subpaths/test-config/test/0
shared,slave rw,noat rw,noatime3 为什么
通过上面分析可以发现,在kubelet进程中可以看到subpath是挂载的,而从shell进程里是看不到挂载的。
这里就涉及到了mount的namespaces propagation问题。
根本原因是mount namespace不在一个namespace里,且kubelet namespace里的subpath挂载点propagation是slave,shared(不会向父group进行传播,而父group会传播到kubelet命名空间)。所以containerd、shell在系统命名空间中,看不到subpath是被挂载的,subpath在容器中挂载是空文件。
# ll /proc/20136/ns/mnt #kubelet进程的mount namespace
lrwxrwxrwx 1 root root 0 Jun 5 02:56 /proc/20136/ns/mnt -> 'mnt:[4026532279]'
# ll /proc/1/ns/mnt
lrwxrwxrwx 1 root root 0 Jun 5 02:55 /proc/1/ns/mnt -> 'mnt:[4026531841]'
# ll /proc/self/ns/mnt
lrwxrwxrwx 1 root root 0 Jun 5 02:57 /proc/self/ns/mnt -> 'mnt:[4026531841]'
#cat /proc/1/mountinfo |grep "159 "
62 25 8:17 / /data rw,noatime shared:159 - ext4 /dev/sdb1 rw
#cat /proc/20316/mountinfo |grep "159 "
867 805 8:17 / /data rw,noatime shared:492 master:159 - ext4 /dev/sdb1 rw
478 867 8:17 /kubernetes/kubelet/pods/98bc1caf-a4e9-437a-a9b6-85e85ede4ccc/volumes/kubernetes.io~configmap/test-config/..2025_06_04_13_29_02.3910074484/config.conf /data/kubernetes/kubelet/pods/98bc1caf-a4e9-437a-a9b6-85e85ede4ccc/volume-subpaths/test-config/test/0 rw,noatime shared:492 master:159 - ext4 /dev/sdb1 rw3.1 为什么kubelet的mount命名空间不一样呢?
在systemd中kubelet输出的日志,默认发送到journal,而journal的默认会转发到syslog,造成重复的日志记录,浪费磁盘。为了节约磁盘,我定义了一个不转发到syslog的journal log namespace,然后配置kubelet使用这个log Namespace。
[Service]
LogNamespace=no-syslog-wall而启用systemd的LogNamespace后,会将服务运行在新的mount namespace中而且设置propagation为slave,这个mount namespace会挂载journal相关的日志socket
#cat /proc/20316/mountinfo
908 846 0:28 /systemd/journal.no-syslog-wall /run/systemd/journal ro,nosuid,nodev shared:426 master:12 - tmpfs tmpfs rw,size=801376k,nr_inodes=819200,mode=755,inode64systemd的文档中给出了解释
Internally, journal namespaces are implemented through Linux mount namespacing and over-mounting the directory that contains the relevant
AF_UNIXsockets used for logging in the unit’s mount namespace. Since mount namespaces are used this setting disconnects propagation of mounts from the unit’s processes to the host, similarly to howReadOnlyPaths=and similar settings describe above work. Journal namespaces may hence not be used for services that need to establish mount points on the host.
https://www.freedesktop.org/software/systemd/man/255/systemd.exec.html#LogNamespace=
3.2 解决方案
不使用LogNamespace,这样kubelet和containerd在系统mount命名空间里,containerd就能看到kubelet的挂载。
4 总结
在systemd中容器相关的服务,会涉及mount相关的目录操作,而这些操作需要让container runtime感知到。而kubelet中的cadvisor也会依赖container runtime中的挂载操作。
所以必须让kubelet和container runtime能互相感知挂载操作,propagation为share(所有父级挂载点必须也是share)或在同一命名空间里。
在网上也看到有人在docker中使用LogNamespace产生的类似问题,issues/41879。
5 Reference
https://github.com/systemd/systemd/issues/16638
https://lwn.net/Articles/689856/

![[译]Kubernetes CRD生成中的那些坑](/translate/kubernetes-crd-generation-pitfalls/crd-gen-pitfall.webp)