pod一直往一个节点上调度

遇到奇怪的现象:spark生成的job pod大概率都调度到同一节点,即不同的job的pod都会调度到同一节点,pod过于集中在某个节点上,出现调度不均衡现象。节点没有任何污点,节点资源空闲程度差不多。job没有任何nodeSelectornodeAffinitynodeNamePodTopologySpread

环境:kubernetes 1.23,scheduler是默认的default-scheduler且为默认配置。pod是由batch job生成。

现象:pod一直调度到10.10.33.57节点上

scheduler的日志

text

I0314 18:10:20.837459  308219 scheduler.go:621] "Successfully bound pod to node" pod="bigdata-hanoi-default/ham-35003-2023031417-1-fgd78" node="10.10.33.57" evaluatedNodes=14 feasibleNodes=8
I0314 18:10:20.868467  308219 scheduler.go:621] "Successfully bound pod to node" pod="bigdata-hanoi-default/ham-26818-2023031417-1-79l87" node="10.10.33.57" evaluatedNodes=14 feasibleNodes=8
I0314 18:10:20.907352  308219 scheduler.go:621] "Successfully bound pod to node" pod="bigdata-hanoi-default/ham-35002-2023031417-1-6nzzb" node="10.10.33.57" evaluatedNodes=14 feasibleNodes=8
I0314 18:10:21.296938  308219 scheduler.go:621] "Successfully bound pod to node" pod="bigdata-hanoi-default/ham-34857-2023031417-1-hgcnq" node="10.10.33.57" evaluatedNodes=14 feasibleNodes=8
I0314 18:10:21.645678  308219 scheduler.go:621] "Successfully bound pod to node" pod="bigdata-hanoi-default/ham-25771-2023031417-1-jsq9b" node="10.10.33.57" evaluatedNodes=14 feasibleNodes=8
I0314 18:10:22.029478  308219 scheduler.go:621] "Successfully bound pod to node" pod="bigdata-hanoi-default/ham-34856-2023031417-1-rxlld" node="10.10.33.57" evaluatedNodes=14 feasibleNodes=8
I0314 18:10:33.312340  308219 scheduler.go:621] "Successfully bound pod to node" pod="bigdata-hanoi-default/ham-33264-2023031417-1-bxv5r" node="10.10.33.57" evaluatedNodes=14 feasibleNodes=8
I0314 18:10:34.344112  308219 scheduler.go:621] "Successfully bound pod to node" pod="bigdata-hanoi-default/ham-34808-2023031418-1-s4ttx" node="10.10.33.57" evaluatedNodes=14 feasibleNodes=8
I0314 18:10:34.736190  308219 scheduler.go:621] "Successfully bound pod to node" pod="bigdata-hanoi-default/ham-28084-2023031417-1-mp4qh" node="10.10.33.57" evaluatedNodes=14 feasibleNodes=8
I0314 18:10:36.118176  308219 scheduler.go:621] "Successfully bound pod to node" pod="bigdata-hanoi-default/ham-35664-2023031417-1-kkkdn" node="10.10.33.57" evaluatedNodes=14 feasibleNodes=8
I0314 18:10:42.858574  308219 scheduler.go:621] "Successfully bound pod to node" pod="bigdata-hanoi-default/ham-38872-2023031417-1-f7w7t" node="10.10.33.57" evaluatedNodes=14 feasibleNodes=8
I0314 18:10:53.170750  308219 scheduler.go:621] "Successfully bound pod to node" pod="bigdata-hanoi-default/ham-24503-2023031417-1-xvj5z" node="10.10.33.57" evaluatedNodes=14 feasibleNodes=8
I0314 18:10:53.222601  308219 scheduler.go:621] "Successfully bound pod to node" pod="bigdata-hanoi-default/ham-26243-2023031417-1-cm2ft" node="10.10.33.57" evaluatedNodes=14 feasibleNodes=8
I0314 18:11:03.978335  308219 scheduler.go:621] "Successfully bound pod to node" pod="bigdata-hanoi-default/ham-25770-2023031417-1-g2w5w" node="10.10.33.57" evaluatedNodes=14 feasibleNodes=8
I0314 18:11:05.319211  308219 scheduler.go:621] "Successfully bound pod to node" pod="bigdata-hanoi-bi/ham-33521-2023031417-1-pkgnx" node="10.10.33.57" evaluatedNodes=14 feasibleNodes=8
I0314 18:11:30.115938  308219 scheduler.go:621] "Successfully bound pod to node" pod="bigdata-hanoi-default/ham-44574-2023031417-1-4lfj7" node="10.10.33.57" evaluatedNodes=14 feasibleNodes=8
I0314 18:11:48.207950  308219 scheduler.go:621] "Successfully bound pod to node" pod="bigdata-hanoi-default/ham-38967-2023031417-1-lr7n9" node="10.10.33.57" evaluatedNodes=14 feasibleNodes=8
I0314 18:11:58.279178  308219 scheduler.go:621] "Successfully bound pod to node" pod="bigdata-hanoi-default/ham-38820-2023031417-1-q2k2n" node="10.10.33.57" evaluatedNodes=14 feasibleNodes=8
I0314 18:12:09.313942  308219 scheduler.go:621] "Successfully bound pod to node" pod="bigdata-hanoi-default/ham-27890-2023031417-1-4f6w4" node="10.10.33.57" evaluatedNodes=14 feasibleNodes=8
I0314 18:12:24.762822  308219 scheduler.go:621] "Successfully bound pod to node" pod="bigdata-hanoi-default/ham-29749-2023031417-1-ns8gl" node="10.10.33.57" evaluatedNodes=14 feasibleNodes=8
I0314 18:12:49.411474  308219 scheduler.go:621] "Successfully bound pod to node" pod="bigdata-hanoi-default/ham-25769-2023031417-1-fzt5h" node="10.10.33.57" evaluatedNodes=14 feasibleNodes=8
I0314 18:13:13.496645  308219 scheduler.go:621] "Successfully bound pod to node" pod="bigdata-hanoi-default/ham-32066-2023031417-1-wmspl" node="10.10.33.57" evaluatedNodes=14 feasibleNodes=8
I0314 18:13:28.571884  308219 scheduler.go:621] "Successfully bound pod to node" pod="bigdata-hanoi-default/ham-25767-2023031417-1-qkwv7" node="10.10.33.57" evaluatedNodes=14 feasibleNodes=8
I0314 18:13:31.570898  308219 scheduler.go:621] "Successfully bound pod to node" pod="bigdata-hanoi-default/ham-29671-2023031417-1-w4xl7" node="10.10.33.57" evaluatedNodes=14 feasibleNodes=8
I0314 18:13:31.633205  308219 scheduler.go:621] "Successfully bound pod to node" pod="bigdata-hanoi-default/ham-31807-202303141800-2-9d7qx" node="10.10.33.57" evaluatedNodes=14 feasibleNodes=8

job资源定义

text

apiVersion: v1
items:
- apiVersion: batch/v1
  kind: Job
  metadata:
    creationTimestamp: "2023-03-14T10:53:13Z"
    generation: 1
    labels:
      hanoi2_task_id: ham-29183-202303141852-1
    name: ham-29183-202303141852-1
    namespace: bigdata-hanoi-default
    resourceVersion: "259164839"
    uid: bd6a6e70-00c3-4433-9637-fba91b579f20
  spec:
    backoffLimit: 0
    completionMode: NonIndexed
    completions: 1
    parallelism: 1
    selector:
      matchLabels:
        controller-uid: bd6a6e70-00c3-4433-9637-fba91b579f20
    suspend: false
    template:
      metadata:
        creationTimestamp: null
        labels:
          controller-uid: bd6a6e70-00c3-4433-9637-fba91b579f20
          hanoi2_task_id: ham-29183-202303141852-1
          job-name: ham-29183-202303141852-1
      spec:
        containers:
        - args:
          - 'curl http://xxx.com/script > /data/workspace/script && python2 /data/workspace/script'
          command:
          - /bin/bash
          - -c
          env:
          - name: POD_IP
            valueFrom:
              fieldRef:
                apiVersion: v1
                fieldPath: status.podIP
          - name: SPARK_LOCAL_HOSTNAME
            valueFrom:
              fieldRef:
                apiVersion: v1
                fieldPath: status.podIP
          image: hive:vtest3
          imagePullPolicy: IfNotPresent
          name: ham-29183-202303141852-1
          resources:
            limits:
              cpu: 200m
              memory: 400Mi
            requests:
              cpu: 10m
              memory: 100Mi
          terminationMessagePath: /dev/termination-log
          terminationMessagePolicy: File
          volumeMounts:
          - mountPath: /data/nfs_txyun
            name: nfs
        dnsPolicy: ClusterFirst

原理:scheduler的进行pod调度的处理流程中决定调度到那个节点的阶段是PreScore、Score、NormalizeScore,即对节点打分阶段。节点的分数最高(由多个一样高,则从里面随机选择一个节点),pod就绑定到这个节点。而默认的配置里打分阶段开启的插件为:NodeAffinityNodeResourcesFitVolumeBindingPodTopologySpreadInterPodAffinityNodeResourcesBalancedAllocationImageLocalityTaintToleration。具体可以查看官方文档Extension points scheduling-plugins

scheduler framework extensions

将scheduler的日志调到10,可以看到scheduler的过程。下面抓取一个pod的调度过程的日志进行分析

text

I0314 22:33:02.424871  278829 scheduling_queue.go:933] "About to try and schedule pod" pod="bigdata-hanoi-default/ham-38943-2023031421-1-jhmzc"
I0314 22:33:02.424880  278829 scheduler.go:443] "Attempting to schedule pod" pod="bigdata-hanoi-default/ham-38943-2023031421-1-jhmzc"

# 每个节点对应的"NodeResourcesBalancedAllocation"打分
I0314 22:33:02.425309  278829 resource_allocation.go:73] "Listing internal info for allocatable resources, requested resources and score" pod="bigdata-hanoi-default/ham-38943-2023031421-1-jhmzc" node="10.10.33.58" resourceAllocationScorer="LeastAllocated" allocatableResource=map[cpu:15600 memory:131739963392] requestedResource=map[cpu:1610 memory:3017801728] resourceScore=93
I0314 22:33:02.425332  278829 resource_allocation.go:73] "Listing internal info for allocatable resources, requested resources and score" pod="bigdata-hanoi-default/ham-38943-2023031421-1-jhmzc" node="10.10.33.58" resourceAllocationScorer="NodeResourcesBalancedAllocation" allocatableResource=map[cpu:15600 memory:131739963392] requestedResource=map[cpu:1110 memory:1969225728] resourceScore=97
I0314 22:33:02.425344  278829 resource_allocation.go:73] "Listing internal info for allocatable resources, requested resources and score" pod="bigdata-hanoi-default/ham-38943-2023031421-1-jhmzc" node="10.10.33.53" resourceAllocationScorer="LeastAllocated" allocatableResource=map[cpu:7600 memory:30323785728] requestedResource=map[cpu:2860 memory:5632950272] resourceScore=71
I0314 22:33:02.425345  278829 resource_allocation.go:73] "Listing internal info for allocatable resources, requested resources and score" pod="bigdata-hanoi-default/ham-38943-2023031421-1-jhmzc" node="10.10.33.54" resourceAllocationScorer="LeastAllocated" allocatableResource=map[cpu:7600 memory:30323785728] requestedResource=map[cpu:4815 memory:10583277568] resourceScore=50
I0314 22:33:02.425354  278829 resource_allocation.go:73] "Listing internal info for allocatable resources, requested resources and score" pod="bigdata-hanoi-default/ham-38943-2023031421-1-jhmzc" node="10.10.33.59" resourceAllocationScorer="LeastAllocated" allocatableResource=map[cpu:15600 memory:131739955200] requestedResource=map[cpu:1560 memory:2912944128] resourceScore=93
I0314 22:33:02.425354  278829 resource_allocation.go:73] "Listing internal info for allocatable resources, requested resources and score" pod="bigdata-hanoi-default/ham-38943-2023031421-1-jhmzc" node="10.10.33.55" resourceAllocationScorer="LeastAllocated" allocatableResource=map[cpu:15600 memory:131739963392] requestedResource=map[cpu:6010 memory:10527703040] resourceScore=76
I0314 22:33:02.425364  278829 resource_allocation.go:73] "Listing internal info for allocatable resources, requested resources and score" pod="bigdata-hanoi-default/ham-38943-2023031421-1-jhmzc" node="10.10.33.53" resourceAllocationScorer="NodeResourcesBalancedAllocation" allocatableResource=map[cpu:7600 memory:30323785728] requestedResource=map[cpu:1860 memory:3535798272] resourceScore=93
I0314 22:33:02.425365  278829 resource_allocation.go:73] "Listing internal info for allocatable resources, requested resources and score" pod="bigdata-hanoi-default/ham-38943-2023031421-1-jhmzc" node="10.10.33.54" resourceAllocationScorer="NodeResourcesBalancedAllocation" allocatableResource=map[cpu:7600 memory:30323785728] requestedResource=map[cpu:3815 memory:8486125568] resourceScore=88
I0314 22:33:02.425372  278829 resource_allocation.go:73] "Listing internal info for allocatable resources, requested resources and score" pod="bigdata-hanoi-default/ham-38943-2023031421-1-jhmzc" node="10.10.33.59" resourceAllocationScorer="NodeResourcesBalancedAllocation" allocatableResource=map[cpu:15600 memory:131739955200] requestedResource=map[cpu:1060 memory:1864368128] resourceScore=97
I0314 22:33:02.425381  278829 resource_allocation.go:73] "Listing internal info for allocatable resources, requested resources and score" pod="bigdata-hanoi-default/ham-38943-2023031421-1-jhmzc" node="10.10.33.56" resourceAllocationScorer="LeastAllocated" allocatableResource=map[cpu:15600 memory:131739963392] requestedResource=map[cpu:7335 memory:13727956992] resourceScore=70
I0314 22:33:02.425383  278829 resource_allocation.go:73] "Listing internal info for allocatable resources, requested resources and score" pod="bigdata-hanoi-default/ham-38943-2023031421-1-jhmzc" node="10.10.33.57" resourceAllocationScorer="LeastAllocated" allocatableResource=map[cpu:15600 memory:131739963392] requestedResource=map[cpu:1520 memory:4343201792] resourceScore=93
I0314 22:33:02.425389  278829 resource_allocation.go:73] "Listing internal info for allocatable resources, requested resources and score" pod="bigdata-hanoi-default/ham-38943-2023031421-1-jhmzc" node="10.10.33.60" resourceAllocationScorer="LeastAllocated" allocatableResource=map[cpu:15600 memory:131739963392] requestedResource=map[cpu:1620 memory:3122659328] resourceScore=93
I0314 22:33:02.425393  278829 resource_allocation.go:73] "Listing internal info for allocatable resources, requested resources and score" pod="bigdata-hanoi-default/ham-38943-2023031421-1-jhmzc" node="10.10.33.55" resourceAllocationScorer="NodeResourcesBalancedAllocation" allocatableResource=map[cpu:15600 memory:131739963392] requestedResource=map[cpu:5010 memory:8640266240] resourceScore=87
I0314 22:33:02.425400  278829 resource_allocation.go:73] "Listing internal info for allocatable resources, requested resources and score" pod="bigdata-hanoi-default/ham-38943-2023031421-1-jhmzc" node="10.10.33.56" resourceAllocationScorer="NodeResourcesBalancedAllocation" allocatableResource=map[cpu:15600 memory:131739963392] requestedResource=map[cpu:6335 memory:11630804992] resourceScore=84
I0314 22:33:02.425401  278829 resource_allocation.go:73] "Listing internal info for allocatable resources, requested resources and score" pod="bigdata-hanoi-default/ham-38943-2023031421-1-jhmzc" node="10.10.33.57" resourceAllocationScorer="NodeResourcesBalancedAllocation" allocatableResource=map[cpu:15600 memory:131739963392] requestedResource=map[cpu:1020 memory:3294625792] resourceScore=97
I0314 22:33:02.425405  278829 resource_allocation.go:73] "Listing internal info for allocatable resources, requested resources and score" pod="bigdata-hanoi-default/ham-38943-2023031421-1-jhmzc" node="10.10.33.60" resourceAllocationScorer="NodeResourcesBalancedAllocation" allocatableResource=map[cpu:15600 memory:131739963392] requestedResource=map[cpu:1120 memory:2074083328] resourceScore=97

# 每个节点对应的NodeAffinity打分,由于没有NodeAffinity所以所有的节点为0
I0314 22:33:02.425477  278829 generic_scheduler.go:434] "Plugin scored node for pod" pod="bigdata-hanoi-default/ham-38943-2023031421-1-jhmzc" plugin="NodeAffinity" node="10.10.33.58" score=0
I0314 22:33:02.425487  278829 generic_scheduler.go:434] "Plugin scored node for pod" pod="bigdata-hanoi-default/ham-38943-2023031421-1-jhmzc" plugin="NodeAffinity" node="10.10.33.53" score=0
I0314 22:33:02.425495  278829 generic_scheduler.go:434] "Plugin scored node for pod" pod="bigdata-hanoi-default/ham-38943-2023031421-1-jhmzc" plugin="NodeAffinity" node="10.10.33.54" score=0
I0314 22:33:02.425502  278829 generic_scheduler.go:434] "Plugin scored node for pod" pod="bigdata-hanoi-default/ham-38943-2023031421-1-jhmzc" plugin="NodeAffinity" node="10.10.33.55" score=0
I0314 22:33:02.425513  278829 generic_scheduler.go:434] "Plugin scored node for pod" pod="bigdata-hanoi-default/ham-38943-2023031421-1-jhmzc" plugin="NodeAffinity" node="10.10.33.59" score=0
I0314 22:33:02.425518  278829 generic_scheduler.go:434] "Plugin scored node for pod" pod="bigdata-hanoi-default/ham-38943-2023031421-1-jhmzc" plugin="NodeAffinity" node="10.10.33.56" score=0
I0314 22:33:02.425524  278829 generic_scheduler.go:434] "Plugin scored node for pod" pod="bigdata-hanoi-default/ham-38943-2023031421-1-jhmzc" plugin="NodeAffinity" node="10.10.33.57" score=0
I0314 22:33:02.425529  278829 generic_scheduler.go:434] "Plugin scored node for pod" pod="bigdata-hanoi-default/ham-38943-2023031421-1-jhmzc" plugin="NodeAffinity" node="10.10.33.60" score=0

# 每个节点对应NodeResourcesFit打分
I0314 22:33:02.425534  278829 generic_scheduler.go:434] "Plugin scored node for pod" pod="bigdata-hanoi-default/ham-38943-2023031421-1-jhmzc" plugin="NodeResourcesFit" node="10.10.33.58" score=93
I0314 22:33:02.425539  278829 generic_scheduler.go:434] "Plugin scored node for pod" pod="bigdata-hanoi-default/ham-38943-2023031421-1-jhmzc" plugin="NodeResourcesFit" node="10.10.33.53" score=71
I0314 22:33:02.425544  278829 generic_scheduler.go:434] "Plugin scored node for pod" pod="bigdata-hanoi-default/ham-38943-2023031421-1-jhmzc" plugin="NodeResourcesFit" node="10.10.33.54" score=50
I0314 22:33:02.425549  278829 generic_scheduler.go:434] "Plugin scored node for pod" pod="bigdata-hanoi-default/ham-38943-2023031421-1-jhmzc" plugin="NodeResourcesFit" node="10.10.33.55" score=76
I0314 22:33:02.425554  278829 generic_scheduler.go:434] "Plugin scored node for pod" pod="bigdata-hanoi-default/ham-38943-2023031421-1-jhmzc" plugin="NodeResourcesFit" node="10.10.33.59" score=93
I0314 22:33:02.425560  278829 generic_scheduler.go:434] "Plugin scored node for pod" pod="bigdata-hanoi-default/ham-38943-2023031421-1-jhmzc" plugin="NodeResourcesFit" node="10.10.33.56" score=70
I0314 22:33:02.425567  278829 generic_scheduler.go:434] "Plugin scored node for pod" pod="bigdata-hanoi-default/ham-38943-2023031421-1-jhmzc" plugin="NodeResourcesFit" node="10.10.33.57" score=93
I0314 22:33:02.425574  278829 generic_scheduler.go:434] "Plugin scored node for pod" pod="bigdata-hanoi-default/ham-38943-2023031421-1-jhmzc" plugin="NodeResourcesFit" node="10.10.33.60" score=93

# 每个节点对应"VolumeBinding"打分
I0314 22:33:02.425581  278829 generic_scheduler.go:434] "Plugin scored node for pod" pod="bigdata-hanoi-default/ham-38943-2023031421-1-jhmzc" plugin="VolumeBinding" node="10.10.33.58" score=0
I0314 22:33:02.425588  278829 generic_scheduler.go:434] "Plugin scored node for pod" pod="bigdata-hanoi-default/ham-38943-2023031421-1-jhmzc" plugin="VolumeBinding" node="10.10.33.53" score=0
I0314 22:33:02.425594  278829 generic_scheduler.go:434] "Plugin scored node for pod" pod="bigdata-hanoi-default/ham-38943-2023031421-1-jhmzc" plugin="VolumeBinding" node="10.10.33.54" score=0
I0314 22:33:02.425599  278829 generic_scheduler.go:434] "Plugin scored node for pod" pod="bigdata-hanoi-default/ham-38943-2023031421-1-jhmzc" plugin="VolumeBinding" node="10.10.33.55" score=0
I0314 22:33:02.425604  278829 generic_scheduler.go:434] "Plugin scored node for pod" pod="bigdata-hanoi-default/ham-38943-2023031421-1-jhmzc" plugin="VolumeBinding" node="10.10.33.59" score=0
I0314 22:33:02.425609  278829 generic_scheduler.go:434] "Plugin scored node for pod" pod="bigdata-hanoi-default/ham-38943-2023031421-1-jhmzc" plugin="VolumeBinding" node="10.10.33.56" score=0
I0314 22:33:02.425614  278829 generic_scheduler.go:434] "Plugin scored node for pod" pod="bigdata-hanoi-default/ham-38943-2023031421-1-jhmzc" plugin="VolumeBinding" node="10.10.33.57" score=0
I0314 22:33:02.425619  278829 generic_scheduler.go:434] "Plugin scored node for pod" pod="bigdata-hanoi-default/ham-38943-2023031421-1-jhmzc" plugin="VolumeBinding" node="10.10.33.60" score=0

# 每个节点对应的"PodTopologySpread"打分,由于没有PodTopologySpread,所以都是200
I0314 22:33:02.425625  278829 generic_scheduler.go:434] "Plugin scored node for pod" pod="bigdata-hanoi-default/ham-38943-2023031421-1-jhmzc" plugin="PodTopologySpread" node="10.10.33.58" score=200
I0314 22:33:02.425630  278829 generic_scheduler.go:434] "Plugin scored node for pod" pod="bigdata-hanoi-default/ham-38943-2023031421-1-jhmzc" plugin="PodTopologySpread" node="10.10.33.53" score=200
I0314 22:33:02.425635  278829 generic_scheduler.go:434] "Plugin scored node for pod" pod="bigdata-hanoi-default/ham-38943-2023031421-1-jhmzc" plugin="PodTopologySpread" node="10.10.33.54" score=200
I0314 22:33:02.425640  278829 generic_scheduler.go:434] "Plugin scored node for pod" pod="bigdata-hanoi-default/ham-38943-2023031421-1-jhmzc" plugin="PodTopologySpread" node="10.10.33.55" score=200
I0314 22:33:02.425645  278829 generic_scheduler.go:434] "Plugin scored node for pod" pod="bigdata-hanoi-default/ham-38943-2023031421-1-jhmzc" plugin="PodTopologySpread" node="10.10.33.59" score=200
I0314 22:33:02.425651  278829 generic_scheduler.go:434] "Plugin scored node for pod" pod="bigdata-hanoi-default/ham-38943-2023031421-1-jhmzc" plugin="PodTopologySpread" node="10.10.33.56" score=200
I0314 22:33:02.425658  278829 generic_scheduler.go:434] "Plugin scored node for pod" pod="bigdata-hanoi-default/ham-38943-2023031421-1-jhmzc" plugin="PodTopologySpread" node="10.10.33.57" score=200
I0314 22:33:02.425666  278829 generic_scheduler.go:434] "Plugin scored node for pod" pod="bigdata-hanoi-default/ham-38943-2023031421-1-jhmzc" plugin="PodTopologySpread" node="10.10.33.60" score=200

# 每个节点对应的InterPodAffinity打分
I0314 22:33:02.425672  278829 generic_scheduler.go:434] "Plugin scored node for pod" pod="bigdata-hanoi-default/ham-38943-2023031421-1-jhmzc" plugin="InterPodAffinity" node="10.10.33.58" score=0
I0314 22:33:02.425677  278829 generic_scheduler.go:434] "Plugin scored node for pod" pod="bigdata-hanoi-default/ham-38943-2023031421-1-jhmzc" plugin="InterPodAffinity" node="10.10.33.53" score=0
I0314 22:33:02.425682  278829 generic_scheduler.go:434] "Plugin scored node for pod" pod="bigdata-hanoi-default/ham-38943-2023031421-1-jhmzc" plugin="InterPodAffinity" node="10.10.33.54" score=0
I0314 22:33:02.425687  278829 generic_scheduler.go:434] "Plugin scored node for pod" pod="bigdata-hanoi-default/ham-38943-2023031421-1-jhmzc" plugin="InterPodAffinity" node="10.10.33.55" score=0
I0314 22:33:02.425692  278829 generic_scheduler.go:434] "Plugin scored node for pod" pod="bigdata-hanoi-default/ham-38943-2023031421-1-jhmzc" plugin="InterPodAffinity" node="10.10.33.59" score=0
I0314 22:33:02.425697  278829 generic_scheduler.go:434] "Plugin scored node for pod" pod="bigdata-hanoi-default/ham-38943-2023031421-1-jhmzc" plugin="InterPodAffinity" node="10.10.33.56" score=0
I0314 22:33:02.425703  278829 generic_scheduler.go:434] "Plugin scored node for pod" pod="bigdata-hanoi-default/ham-38943-2023031421-1-jhmzc" plugin="InterPodAffinity" node="10.10.33.57" score=0
I0314 22:33:02.425708  278829 generic_scheduler.go:434] "Plugin scored node for pod" pod="bigdata-hanoi-default/ham-38943-2023031421-1-jhmzc" plugin="InterPodAffinity" node="10.10.33.60" score=0

# 每个节点对应的NodeResourcesBalancedAllocation打分
I0314 22:33:02.425713  278829 generic_scheduler.go:434] "Plugin scored node for pod" pod="bigdata-hanoi-default/ham-38943-2023031421-1-jhmzc" plugin="NodeResourcesBalancedAllocation" node="10.10.33.58" score=97
I0314 22:33:02.425719  278829 generic_scheduler.go:434] "Plugin scored node for pod" pod="bigdata-hanoi-default/ham-38943-2023031421-1-jhmzc" plugin="NodeResourcesBalancedAllocation" node="10.10.33.53" score=93
I0314 22:33:02.425724  278829 generic_scheduler.go:434] "Plugin scored node for pod" pod="bigdata-hanoi-default/ham-38943-2023031421-1-jhmzc" plugin="NodeResourcesBalancedAllocation" node="10.10.33.54" score=88
I0314 22:33:02.425729  278829 generic_scheduler.go:434] "Plugin scored node for pod" pod="bigdata-hanoi-default/ham-38943-2023031421-1-jhmzc" plugin="NodeResourcesBalancedAllocation" node="10.10.33.55" score=87
I0314 22:33:02.425735  278829 generic_scheduler.go:434] "Plugin scored node for pod" pod="bigdata-hanoi-default/ham-38943-2023031421-1-jhmzc" plugin="NodeResourcesBalancedAllocation" node="10.10.33.59" score=97
I0314 22:33:02.425742  278829 generic_scheduler.go:434] "Plugin scored node for pod" pod="bigdata-hanoi-default/ham-38943-2023031421-1-jhmzc" plugin="NodeResourcesBalancedAllocation" node="10.10.33.56" score=84
I0314 22:33:02.425750  278829 generic_scheduler.go:434] "Plugin scored node for pod" pod="bigdata-hanoi-default/ham-38943-2023031421-1-jhmzc" plugin="NodeResourcesBalancedAllocation" node="10.10.33.57" score=97
I0314 22:33:02.425756  278829 generic_scheduler.go:434] "Plugin scored node for pod" pod="bigdata-hanoi-default/ham-38943-2023031421-1-jhmzc" plugin="NodeResourcesBalancedAllocation" node="10.10.33.60" score=97

# 每个节点对应的ImageLocality打分
I0314 22:33:02.425762  278829 generic_scheduler.go:434] "Plugin scored node for pod" pod="bigdata-hanoi-default/ham-38943-2023031421-1-jhmzc" plugin="ImageLocality" node="10.10.33.58" score=0
I0314 22:33:02.425767  278829 generic_scheduler.go:434] "Plugin scored node for pod" pod="bigdata-hanoi-default/ham-38943-2023031421-1-jhmzc" plugin="ImageLocality" node="10.10.33.53" score=26
I0314 22:33:02.425772  278829 generic_scheduler.go:434] "Plugin scored node for pod" pod="bigdata-hanoi-default/ham-38943-2023031421-1-jhmzc" plugin="ImageLocality" node="10.10.33.54" score=54
I0314 22:33:02.425777  278829 generic_scheduler.go:434] "Plugin scored node for pod" pod="bigdata-hanoi-default/ham-38943-2023031421-1-jhmzc" plugin="ImageLocality" node="10.10.33.55" score=100
I0314 22:33:02.425782  278829 generic_scheduler.go:434] "Plugin scored node for pod" pod="bigdata-hanoi-default/ham-38943-2023031421-1-jhmzc" plugin="ImageLocality" node="10.10.33.59" score=0
I0314 22:33:02.425787  278829 generic_scheduler.go:434] "Plugin scored node for pod" pod="bigdata-hanoi-default/ham-38943-2023031421-1-jhmzc" plugin="ImageLocality" node="10.10.33.56" score=83
I0314 22:33:02.425793  278829 generic_scheduler.go:434] "Plugin scored node for pod" pod="bigdata-hanoi-default/ham-38943-2023031421-1-jhmzc" plugin="ImageLocality" node="10.10.33.57" score=100
I0314 22:33:02.425799  278829 generic_scheduler.go:434] "Plugin scored node for pod" pod="bigdata-hanoi-default/ham-38943-2023031421-1-jhmzc" plugin="ImageLocality" node="10.10.33.60" score=0

# 每个节点对应的"TaintToleration"打分
I0314 22:33:02.425805  278829 generic_scheduler.go:434] "Plugin scored node for pod" pod="bigdata-hanoi-default/ham-38943-2023031421-1-jhmzc" plugin="TaintToleration" node="10.10.33.58" score=300
I0314 22:33:02.425810  278829 generic_scheduler.go:434] "Plugin scored node for pod" pod="bigdata-hanoi-default/ham-38943-2023031421-1-jhmzc" plugin="TaintToleration" node="10.10.33.53" score=300
I0314 22:33:02.425815  278829 generic_scheduler.go:434] "Plugin scored node for pod" pod="bigdata-hanoi-default/ham-38943-2023031421-1-jhmzc" plugin="TaintToleration" node="10.10.33.54" score=300
I0314 22:33:02.425821  278829 generic_scheduler.go:434] "Plugin scored node for pod" pod="bigdata-hanoi-default/ham-38943-2023031421-1-jhmzc" plugin="TaintToleration" node="10.10.33.55" score=300
I0314 22:33:02.425828  278829 generic_scheduler.go:434] "Plugin scored node for pod" pod="bigdata-hanoi-default/ham-38943-2023031421-1-jhmzc" plugin="TaintToleration" node="10.10.33.59" score=300
I0314 22:33:02.425836  278829 generic_scheduler.go:434] "Plugin scored node for pod" pod="bigdata-hanoi-default/ham-38943-2023031421-1-jhmzc" plugin="TaintToleration" node="10.10.33.56" score=300
I0314 22:33:02.425856  278829 generic_scheduler.go:434] "Plugin scored node for pod" pod="bigdata-hanoi-default/ham-38943-2023031421-1-jhmzc" plugin="TaintToleration" node="10.10.33.57" score=300
I0314 22:33:02.425861  278829 generic_scheduler.go:434] "Plugin scored node for pod" pod="bigdata-hanoi-default/ham-38943-2023031421-1-jhmzc" plugin="TaintToleration" node="10.10.33.60" score=300

# 每个节点最后总的打分
I0314 22:33:02.425870  278829 generic_scheduler.go:491] "Calculated node's final score for pod" pod="bigdata-hanoi-default/ham-38943-2023031421-1-jhmzc" node="10.10.33.58" score=690
I0314 22:33:02.425876  278829 generic_scheduler.go:491] "Calculated node's final score for pod" pod="bigdata-hanoi-default/ham-38943-2023031421-1-jhmzc" node="10.10.33.53" score=690
I0314 22:33:02.425881  278829 generic_scheduler.go:491] "Calculated node's final score for pod" pod="bigdata-hanoi-default/ham-38943-2023031421-1-jhmzc" node="10.10.33.54" score=692
I0314 22:33:02.425886  278829 generic_scheduler.go:491] "Calculated node's final score for pod" pod="bigdata-hanoi-default/ham-38943-2023031421-1-jhmzc" node="10.10.33.55" score=763
I0314 22:33:02.425890  278829 generic_scheduler.go:491] "Calculated node's final score for pod" pod="bigdata-hanoi-default/ham-38943-2023031421-1-jhmzc" node="10.10.33.59" score=690
I0314 22:33:02.425895  278829 generic_scheduler.go:491] "Calculated node's final score for pod" pod="bigdata-hanoi-default/ham-38943-2023031421-1-jhmzc" node="10.10.33.56" score=737
I0314 22:33:02.425900  278829 generic_scheduler.go:491] "Calculated node's final score for pod" pod="bigdata-hanoi-default/ham-38943-2023031421-1-jhmzc" node="10.10.33.57" score=790
I0314 22:33:02.425907  278829 generic_scheduler.go:491] "Calculated node's final score for pod" pod="bigdata-hanoi-default/ham-38943-2023031421-1-jhmzc" node="10.10.33.60" score=690
I0314 22:33:02.426010  278829 default_binder.go:52] "Attempting to bind pod to node" pod="bigdata-hanoi-default/ham-38943-2023031421-1-jhmzc" node="10.10.33.57"

对日志中打分数据进行整理

10.10.33.5310.10.33.5410.10.33.5510.10.33.5610.10.33.5710.10.33.5810.10.33.5910.10.33.60
NodeAffinity00000000
NodeResourcesFit7150767093939393
VolumeBinding00000000
PodTopologySpread200200200200200200200200
InterPodAffinity00000000
NodeResourcesBalancedAllocation9388878497979797
ImageLocality265410083100000
TaintToleration300300300300300300300300
Total Score690692763737790690690690

分析数据

  1. 打分第一(10.10.33.57)与第二的(10.10.33.55)差在节点资源相关的评分上
  2. 第二的(10.10.33.55)与第三的(10.10.33.56)差在ImageLocality评分上(这个可能是scheduler刚启动数据没有完全同步bug https://github.com/kubernetes/kubernetes/issues/116673
  3. 差距最大的打分项是ImageLocality,导致最后的10.10.33.53、10.10.33.58、10.10.33.59、10.10.33.60分数最低不可能被调度到

新发现

发现调度到同一节点的pod的镜像,发现pod镜像都是一样。且这个镜像在10.10.33.53、10.10.33.58、10.10.33.59、10.10.33.60上都没有。

  1. 不同任务的job都是使用同一个镜像,而这个镜像只有在部分节点上10.10.33.53、10.10.33.54、10.10.33.55、10.10.33.56、10.10.33.57
  2. ImageLocality的设计是越大的镜像越要考虑节点上是否有镜像,导致有镜像节点和没有镜像节点的打分差距巨大
  3. 节点10.10.33.57剩余的资源比其他节点多
  4. 由于没有镜像的节点ImageLocality打分为0,导致10.10.33.53、10.10.33.58、10.10.33.59、10.10.33.60分数最低不可能被调度到
  1. pod每个容器镜像打分

    • 如果节点存在这个镜像,分数= 镜像大小*拥有这个镜像节点数量/总的节点数量

    • 如果节点不存在这个镜像,分数为0

  2. pod所有容器镜像总分=pod所有容器镜像分数相加

  3. 对pod所有容器镜像总分进行规整,让分数在范围内[24117248, pod里的容器数量*1048576000],即[23MB, pod里的容器数量*1GB]

    • pod所有容器镜像总分小于24117248,则为24117248,即pod所有容器镜像总分=max(pod所有容器镜像总分,24117248)

    • pod所有容器镜像总分大于(pod里的容器数量*1048576000),则为(pod里的容器数量*1048576000),即pod所有容器镜像总分=min(pod所有容器镜像总分,pod里的容器数量*1048576000)

    • pod所有容器镜像总分在[24117248, pod里的容器数量*1048576000],保持不动

  4. 最后分数=100*(pod所有容器镜像总分-24117248)/(pod里的容器数量*1048576000-24117248)

总结:ImageLocality最高分数为100,如果镜像超过1G同时镜像分布比例(拥有这个镜像节点数量/总的节点数量)比较高,那么大概率分数是100。这个插件设计初衷是越大的镜像越要考虑节点上是否有镜像,降低pod的启动时间。

  1. 获取node当前的resource的allocable值和添加pod request资源后总的resource的request总量(这里的resource指的是cpu、memory等资源类型)

  2. 计算出每种资源的request占allocable的比值,比值=request[resource]/allocable[resource]

  3. 计算出所有资源的request占allocable的比值的标准差,比如只有cpu、memory资源,则计算{cpu的比值、memory的比值}标准差

4.最后分数=100*(1-标准差)

总结:各种request资源占节点可分配资源比值越接近,则这个节点分数越高。设计插件初衷,应该是让节点的各种资源的分配百分比接近,避免节点上某种资源分配完了,另外一种资源还剩很多。

这里以默认的策略"LeastAllocated"

1.获取node当前的resource的allocable值和添加pod request资源后总的resource的request总量(这里的resource指的是cpu、memory等资源类型)

2.计算出每种resource资源allocable分配出request后剩余resource资源量占allocable的比率,即剩余resource资源比率=(allocable-request)*100/allocable

3.计算每种资源的分数=剩余resource资源比率*resource的权重,默认cpu和memory的权重都是1

4.计算所有资源的分数=每种资源的分数进行相加,计算所有资源的权重=每种资源的权重进行相加

5.最后分数=所有资源的分数/所有资源的权重

总结:分配request资源之后剩余资源越多的节点分数越高,即资源越充足的节点分数越高。

它对应的pod的podAffinity、podAntiAffinity配置。算法总体思路是根据节点上的pod的podAffinity、podAntiAffinity的配置topologyKey,计算topologyKey名称、节点labels[topologyKey名称](topologyKey名称在节点label上的值)组成的二维map的分数。节点根据自身label从二维map中查找对应分数进行累加,得到节点的分数。在对节点分数进行规整化,输出最终的分数。

1.遍历的pod为待调度的pod有pod preferred affinity/pod Preferred AntiAffinity,则需要遍历节点上的所有pod,否则遍历节点上的带有affinity的pod.

2.计算待调度的pod的pod preferred affinity的相关打分–遍历的pod匹配待调度的pod的preferred affinity规则(节点上的pod的namespace匹配且labelSelector匹配),则二维map(topologyKey名称、节点labels[topologyKey名称])里的值=原来的值+weight*匹配的pod数

3.计算待调度的pod的pod preferred AntiAffinity的相关打分–遍历的pod匹配待调度的pod的pod preferred AntiAffinity规则(节点上的pod的namespace匹配且labelSelector匹配),则二维map(topologyKey名称、节点labels[topologyKey名称])里的值=原来的值-weight*匹配的pod数

4.计算遍历的pod的podAffinity的相关打分–如果待调度pod有label且scheduler配置文件里的InterPodAffinity.hardPodAffinityWeight大于0,且待调度的pod匹配遍历的pod的podAffinity,则二维map(topologyKey名称、节点labels[topologyKey名称])里的值=原来的值+InterPodAffinity.hardPodAffinityWeight

5.计算遍历的pod的pod preferred affinity的相关打分–如果待调度的pod匹配遍历的pod的pod preferred Affinity,则二维map(topologyKey名称、节点labels[topologyKey名称])里的值=原来的值+待调度pod的preferred affinity的weight

6.计算遍历的pod的pod preferred AntiAffinity的相关打分–如果待调度的pod匹配遍历的pod的pod preferred AntiAffinity,则二维map(topologyKey名称、节点labels[topologyKey名称])里的值=原来的值-待调度pod的preferred affinity的weight

7.节点分数计算:遍历node的所有label,从二维map(topologyKey名称、节点labels[topologyKey名称])里找到存在的值进行相加,这个值就是节点的分数。

8.对所有节点进行上面步骤处理,处理完后进行规整节点的分数。

  • 遍历所有节点的分数找出最大值和最小值
  • 如果节点分数都相同,则节点最终的分数都为0
  • 否则,节点最后分数=100*(节点分数-最小值)/(最大值-最小值)

总结:节点分数最大值,那么最后分数一定是100,节点分数是最小值,那么最后分数一定是0,其他节点的分数在(0,100)。

解决思路:让节点间的打分差距减少,可以关闭某个插件的打分或开启某个插件的打分、或者调整插件的权重。

1.禁用ImageLocality

在kube-scheduler的启动配置文件里配置成如下

text

apiVersion: kubescheduler.config.k8s.io/v1beta3
profiles:
  - plugins:
  schedulerName: default-scheduler
  score:
      disabled:
      - name: ImageLocality

2.设置podAntiAffinity

使用相同的镜像的pod打上相同的标签,并设置podAntiAffinity

text

      spec:
        affinity:
          podAntiAffinity:
            preferredDuringSchedulingIgnoredDuringExecution:
            - labelSelector:
                matchExpressions:
                - key: job-type
                  operator: In
                  values:
                  - spark
              topologyKey: kubernetes.io/hostname
              weight: 100

设置podAntiAffinity后,(这里节点由上线下线,节点跟上面有所不一样)经过一定时间后一个pod的调度日志

text

I0318 12:33:44.125838  241226 eventhandlers.go:118] "Add event for unscheduled pod" pod="bigdata-hanoi-default/ham-29183-202303181232-1-w8wd4"
I0318 12:33:44.125881  241226 scheduling_queue.go:933] "About to try and schedule pod" pod="bigdata-hanoi-default/ham-29183-202303181232-1-w8wd4"
I0318 12:33:44.125889  241226 scheduler.go:443] "Attempting to schedule pod" pod="bigdata-hanoi-default/ham-29183-202303181232-1-w8wd4"
I0318 12:33:44.126192  241226 resource_allocation.go:73] "Listing internal info for allocatable resources, requested resources and score" pod="bigdata-hanoi-default/ham-29183-202303181232-1-w8wd4" node="10.10.33.61" resourceAllocationScorer="LeastAllocated" allocatableResource=map[cpu:15600 memory:131739963392] requestedResource=map[cpu:6225 memory:11998855168] resourceScore=75
I0318 12:33:44.126198  241226 resource_allocation.go:73] "Listing internal info for allocatable resources, requested resources and score" pod="bigdata-hanoi-default/ham-29183-202303181232-1-w8wd4" node="10.10.33.62" resourceAllocationScorer="LeastAllocated" allocatableResource=map[cpu:15600 memory:131739963392] requestedResource=map[cpu:4950 memory:13366198272] resourceScore=78
I0318 12:33:44.126203  241226 resource_allocation.go:73] "Listing internal info for allocatable resources, requested resources and score" pod="bigdata-hanoi-default/ham-29183-202303181232-1-w8wd4" node="10.10.33.58" resourceAllocationScorer="LeastAllocated" allocatableResource=map[cpu:15600 memory:131739963392] requestedResource=map[cpu:4130 memory:11453595648] resourceScore=82
I0318 12:33:44.126210  241226 resource_allocation.go:73] "Listing internal info for allocatable resources, requested resources and score" pod="bigdata-hanoi-default/ham-29183-202303181232-1-w8wd4" node="10.10.33.59" resourceAllocationScorer="LeastAllocated" allocatableResource=map[cpu:15600 memory:131739955200] requestedResource=map[cpu:4155 memory:6479151104] resourceScore=84
I0318 12:33:44.126212  241226 resource_allocation.go:73] "Listing internal info for allocatable resources, requested resources and score" pod="bigdata-hanoi-default/ham-29183-202303181232-1-w8wd4" node="10.10.33.61" resourceAllocationScorer="NodeResourcesBalancedAllocation" allocatableResource=map[cpu:15600 memory:131739963392] requestedResource=map[cpu:5325 memory:10111418368] resourceScore=86
I0318 12:33:44.126220  241226 resource_allocation.go:73] "Listing internal info for allocatable resources, requested resources and score" pod="bigdata-hanoi-default/ham-29183-202303181232-1-w8wd4" node="10.10.33.62" resourceAllocationScorer="NodeResourcesBalancedAllocation" allocatableResource=map[cpu:15600 memory:131739963392] requestedResource=map[cpu:3850 memory:11269046272] resourceScore=91
I0318 12:33:44.126220  241226 resource_allocation.go:73] "Listing internal info for allocatable resources, requested resources and score" pod="bigdata-hanoi-default/ham-29183-202303181232-1-w8wd4" node="10.10.33.58" resourceAllocationScorer="NodeResourcesBalancedAllocation" allocatableResource=map[cpu:15600 memory:131739963392] requestedResource=map[cpu:2930 memory:9146728448] resourceScore=94
I0318 12:33:44.126227  241226 resource_allocation.go:73] "Listing internal info for allocatable resources, requested resources and score" pod="bigdata-hanoi-default/ham-29183-202303181232-1-w8wd4" node="10.10.33.59" resourceAllocationScorer="NodeResourcesBalancedAllocation" allocatableResource=map[cpu:15600 memory:131739955200] requestedResource=map[cpu:3555 memory:5011144704] resourceScore=90
I0318 12:33:44.126232  241226 resource_allocation.go:73] "Listing internal info for allocatable resources, requested resources and score" pod="bigdata-hanoi-default/ham-29183-202303181232-1-w8wd4" node="10.10.33.60" resourceAllocationScorer="LeastAllocated" allocatableResource=map[cpu:15600 memory:131739963392] requestedResource=map[cpu:4800 memory:6340739072] resourceScore=82
I0318 12:33:44.126265  241226 resource_allocation.go:73] "Listing internal info for allocatable resources, requested resources and score" pod="bigdata-hanoi-default/ham-29183-202303181232-1-w8wd4" node="10.10.33.60" resourceAllocationScorer="NodeResourcesBalancedAllocation" allocatableResource=map[cpu:15600 memory:131739963392] requestedResource=map[cpu:4100 memory:4872732672] resourceScore=88
I0318 12:33:44.126351  241226 generic_scheduler.go:434] "Plugin scored node for pod" pod="bigdata-hanoi-default/ham-29183-202303181232-1-w8wd4" plugin="VolumeBinding" node="10.10.33.61" score=0
I0318 12:33:44.126359  241226 generic_scheduler.go:434] "Plugin scored node for pod" pod="bigdata-hanoi-default/ham-29183-202303181232-1-w8wd4" plugin="VolumeBinding" node="10.10.33.62" score=0
I0318 12:33:44.126365  241226 generic_scheduler.go:434] "Plugin scored node for pod" pod="bigdata-hanoi-default/ham-29183-202303181232-1-w8wd4" plugin="VolumeBinding" node="10.10.33.58" score=0
I0318 12:33:44.126370  241226 generic_scheduler.go:434] "Plugin scored node for pod" pod="bigdata-hanoi-default/ham-29183-202303181232-1-w8wd4" plugin="VolumeBinding" node="10.10.33.59" score=0
I0318 12:33:44.126375  241226 generic_scheduler.go:434] "Plugin scored node for pod" pod="bigdata-hanoi-default/ham-29183-202303181232-1-w8wd4" plugin="VolumeBinding" node="10.10.33.60" score=0
I0318 12:33:44.126382  241226 generic_scheduler.go:434] "Plugin scored node for pod" pod="bigdata-hanoi-default/ham-29183-202303181232-1-w8wd4" plugin="PodTopologySpread" node="10.10.33.61" score=200
I0318 12:33:44.126390  241226 generic_scheduler.go:434] "Plugin scored node for pod" pod="bigdata-hanoi-default/ham-29183-202303181232-1-w8wd4" plugin="PodTopologySpread" node="10.10.33.62" score=200
I0318 12:33:44.126397  241226 generic_scheduler.go:434] "Plugin scored node for pod" pod="bigdata-hanoi-default/ham-29183-202303181232-1-w8wd4" plugin="PodTopologySpread" node="10.10.33.58" score=200
I0318 12:33:44.126405  241226 generic_scheduler.go:434] "Plugin scored node for pod" pod="bigdata-hanoi-default/ham-29183-202303181232-1-w8wd4" plugin="PodTopologySpread" node="10.10.33.59" score=200
I0318 12:33:44.126412  241226 generic_scheduler.go:434] "Plugin scored node for pod" pod="bigdata-hanoi-default/ham-29183-202303181232-1-w8wd4" plugin="PodTopologySpread" node="10.10.33.60" score=200
I0318 12:33:44.126418  241226 generic_scheduler.go:434] "Plugin scored node for pod" pod="bigdata-hanoi-default/ham-29183-202303181232-1-w8wd4" plugin="InterPodAffinity" node="10.10.33.61" score=200
I0318 12:33:44.126423  241226 generic_scheduler.go:434] "Plugin scored node for pod" pod="bigdata-hanoi-default/ham-29183-202303181232-1-w8wd4" plugin="InterPodAffinity" node="10.10.33.62" score=0
I0318 12:33:44.126428  241226 generic_scheduler.go:434] "Plugin scored node for pod" pod="bigdata-hanoi-default/ham-29183-202303181232-1-w8wd4" plugin="InterPodAffinity" node="10.10.33.58" score=0
I0318 12:33:44.126434  241226 generic_scheduler.go:434] "Plugin scored node for pod" pod="bigdata-hanoi-default/ham-29183-202303181232-1-w8wd4" plugin="InterPodAffinity" node="10.10.33.59" score=0
I0318 12:33:44.126439  241226 generic_scheduler.go:434] "Plugin scored node for pod" pod="bigdata-hanoi-default/ham-29183-202303181232-1-w8wd4" plugin="InterPodAffinity" node="10.10.33.60" score=0
I0318 12:33:44.126445  241226 generic_scheduler.go:434] "Plugin scored node for pod" pod="bigdata-hanoi-default/ham-29183-202303181232-1-w8wd4" plugin="NodeResourcesBalancedAllocation" node="10.10.33.61" score=86
I0318 12:33:44.126451  241226 generic_scheduler.go:434] "Plugin scored node for pod" pod="bigdata-hanoi-default/ham-29183-202303181232-1-w8wd4" plugin="NodeResourcesBalancedAllocation" node="10.10.33.62" score=91
I0318 12:33:44.126456  241226 generic_scheduler.go:434] "Plugin scored node for pod" pod="bigdata-hanoi-default/ham-29183-202303181232-1-w8wd4" plugin="NodeResourcesBalancedAllocation" node="10.10.33.58" score=94
I0318 12:33:44.126462  241226 generic_scheduler.go:434] "Plugin scored node for pod" pod="bigdata-hanoi-default/ham-29183-202303181232-1-w8wd4" plugin="NodeResourcesBalancedAllocation" node="10.10.33.59" score=90
I0318 12:33:44.126468  241226 generic_scheduler.go:434] "Plugin scored node for pod" pod="bigdata-hanoi-default/ham-29183-202303181232-1-w8wd4" plugin="NodeResourcesBalancedAllocation" node="10.10.33.60" score=88
I0318 12:33:44.126473  241226 generic_scheduler.go:434] "Plugin scored node for pod" pod="bigdata-hanoi-default/ham-29183-202303181232-1-w8wd4" plugin="ImageLocality" node="10.10.33.61" score=100
I0318 12:33:44.126479  241226 generic_scheduler.go:434] "Plugin scored node for pod" pod="bigdata-hanoi-default/ham-29183-202303181232-1-w8wd4" plugin="ImageLocality" node="10.10.33.62" score=100
I0318 12:33:44.126484  241226 generic_scheduler.go:434] "Plugin scored node for pod" pod="bigdata-hanoi-default/ham-29183-202303181232-1-w8wd4" plugin="ImageLocality" node="10.10.33.58" score=100
I0318 12:33:44.126491  241226 generic_scheduler.go:434] "Plugin scored node for pod" pod="bigdata-hanoi-default/ham-29183-202303181232-1-w8wd4" plugin="ImageLocality" node="10.10.33.59" score=100
I0318 12:33:44.126497  241226 generic_scheduler.go:434] "Plugin scored node for pod" pod="bigdata-hanoi-default/ham-29183-202303181232-1-w8wd4" plugin="ImageLocality" node="10.10.33.60" score=100
I0318 12:33:44.126502  241226 generic_scheduler.go:434] "Plugin scored node for pod" pod="bigdata-hanoi-default/ham-29183-202303181232-1-w8wd4" plugin="TaintToleration" node="10.10.33.61" score=300
I0318 12:33:44.126508  241226 generic_scheduler.go:434] "Plugin scored node for pod" pod="bigdata-hanoi-default/ham-29183-202303181232-1-w8wd4" plugin="TaintToleration" node="10.10.33.62" score=300
I0318 12:33:44.126513  241226 generic_scheduler.go:434] "Plugin scored node for pod" pod="bigdata-hanoi-default/ham-29183-202303181232-1-w8wd4" plugin="TaintToleration" node="10.10.33.58" score=300
I0318 12:33:44.126518  241226 generic_scheduler.go:434] "Plugin scored node for pod" pod="bigdata-hanoi-default/ham-29183-202303181232-1-w8wd4" plugin="TaintToleration" node="10.10.33.59" score=300
I0318 12:33:44.126524  241226 generic_scheduler.go:434] "Plugin scored node for pod" pod="bigdata-hanoi-default/ham-29183-202303181232-1-w8wd4" plugin="TaintToleration" node="10.10.33.60" score=300
I0318 12:33:44.126529  241226 generic_scheduler.go:434] "Plugin scored node for pod" pod="bigdata-hanoi-default/ham-29183-202303181232-1-w8wd4" plugin="NodeAffinity" node="10.10.33.61" score=0
I0318 12:33:44.126534  241226 generic_scheduler.go:434] "Plugin scored node for pod" pod="bigdata-hanoi-default/ham-29183-202303181232-1-w8wd4" plugin="NodeAffinity" node="10.10.33.62" score=0
I0318 12:33:44.126539  241226 generic_scheduler.go:434] "Plugin scored node for pod" pod="bigdata-hanoi-default/ham-29183-202303181232-1-w8wd4" plugin="NodeAffinity" node="10.10.33.58" score=0
I0318 12:33:44.126545  241226 generic_scheduler.go:434] "Plugin scored node for pod" pod="bigdata-hanoi-default/ham-29183-202303181232-1-w8wd4" plugin="NodeAffinity" node="10.10.33.59" score=0
I0318 12:33:44.126550  241226 generic_scheduler.go:434] "Plugin scored node for pod" pod="bigdata-hanoi-default/ham-29183-202303181232-1-w8wd4" plugin="NodeAffinity" node="10.10.33.60" score=0
I0318 12:33:44.126555  241226 generic_scheduler.go:434] "Plugin scored node for pod" pod="bigdata-hanoi-default/ham-29183-202303181232-1-w8wd4" plugin="NodeResourcesFit" node="10.10.33.61" score=75
I0318 12:33:44.126560  241226 generic_scheduler.go:434] "Plugin scored node for pod" pod="bigdata-hanoi-default/ham-29183-202303181232-1-w8wd4" plugin="NodeResourcesFit" node="10.10.33.62" score=78
I0318 12:33:44.126567  241226 generic_scheduler.go:434] "Plugin scored node for pod" pod="bigdata-hanoi-default/ham-29183-202303181232-1-w8wd4" plugin="NodeResourcesFit" node="10.10.33.58" score=82
I0318 12:33:44.126572  241226 generic_scheduler.go:434] "Plugin scored node for pod" pod="bigdata-hanoi-default/ham-29183-202303181232-1-w8wd4" plugin="NodeResourcesFit" node="10.10.33.59" score=84
I0318 12:33:44.126578  241226 generic_scheduler.go:434] "Plugin scored node for pod" pod="bigdata-hanoi-default/ham-29183-202303181232-1-w8wd4" plugin="NodeResourcesFit" node="10.10.33.60" score=82
I0318 12:33:44.126588  241226 generic_scheduler.go:491] "Calculated node's final score for pod" pod="bigdata-hanoi-default/ham-29183-202303181232-1-w8wd4" node="10.10.33.61" score=961
I0318 12:33:44.126594  241226 generic_scheduler.go:491] "Calculated node's final score for pod" pod="bigdata-hanoi-default/ham-29183-202303181232-1-w8wd4" node="10.10.33.62" score=769
I0318 12:33:44.126599  241226 generic_scheduler.go:491] "Calculated node's final score for pod" pod="bigdata-hanoi-default/ham-29183-202303181232-1-w8wd4" node="10.10.33.58" score=776
I0318 12:33:44.126605  241226 generic_scheduler.go:491] "Calculated node's final score for pod" pod="bigdata-hanoi-default/ham-29183-202303181232-1-w8wd4" node="10.10.33.59" score=774
I0318 12:33:44.126612  241226 generic_scheduler.go:491] "Calculated node's final score for pod" pod="bigdata-hanoi-default/ham-29183-202303181232-1-w8wd4" node="10.10.33.60" score=770
I0318 12:33:44.126698  241226 default_binder.go:52] "Attempting to bind pod to node" pod="bigdata-hanoi-default/ham-29183-202303181232-1-w8wd4" node="10.10.33.61"

数据整理

10.10.33.5810.10.33.5910.10.33.6010.10.33.6110.10.33.62
NodeAffinity00000
NodeResourcesFit8284827578
VolumeBinding00000
PodTopologySpread200200200200200
InterPodAffinity0002000
NodeResourcesBalancedAllocation9490888691
ImageLocality100100100100100
TaintToleration300300300300300
Total Score776774770961769

分析数据

  1. ImageLocality的打分都是一样的,说明每个节点上都有这个镜像
  2. 10.10.33.58、10.10.33.59、10.10.33.60、10.10.33.62上都有相同类型的pod在运行,所以InterPodAffinity打分为0,只有10.10.33.61上面没有这个类型任务的pod,所以打分为200(这里乘以了每个插件的权重,InterPodAffinity的权重默认为2)

下面日志展现是InterPodAffinity各种打分情况,说明pod基本都是在节点间均衡调度

text

I0318 12:35:54.480311  241226 generic_scheduler.go:434] "Plugin scored node for pod" pod="bigdata-hanoi-default/ham-24503-2023031811-1-hzvjd" plugin="InterPodAffinity" node="10.10.33.58" score=150
I0318 12:35:54.480316  241226 generic_scheduler.go:434] "Plugin scored node for pod" pod="bigdata-hanoi-default/ham-24503-2023031811-1-hzvjd" plugin="InterPodAffinity" node="10.10.33.61" score=100
I0318 12:35:54.480321  241226 generic_scheduler.go:434] "Plugin scored node for pod" pod="bigdata-hanoi-default/ham-24503-2023031811-1-hzvjd" plugin="InterPodAffinity" node="10.10.33.62" score=0
I0318 12:35:54.480327  241226 generic_scheduler.go:434] "Plugin scored node for pod" pod="bigdata-hanoi-default/ham-24503-2023031811-1-hzvjd" plugin="InterPodAffinity" node="10.10.33.60" score=200
I0318 12:35:54.480332  241226 generic_scheduler.go:434] "Plugin scored node for pod" pod="bigdata-hanoi-default/ham-24503-2023031811-1-hzvjd" plugin="InterPodAffinity" node="10.10.33.59" score=50

I0318 12:34:14.726775  241226 generic_scheduler.go:434] "Plugin scored node for pod" pod="bigdata-hanoi-default/ham-35680-2023031811-1-rx6sz" plugin="InterPodAffinity" node="10.10.33.62" score=0
I0318 12:34:14.726784  241226 generic_scheduler.go:434] "Plugin scored node for pod" pod="bigdata-hanoi-default/ham-35680-2023031811-1-rx6sz" plugin="InterPodAffinity" node="10.10.33.58" score=0
I0318 12:34:14.726790  241226 generic_scheduler.go:434] "Plugin scored node for pod" pod="bigdata-hanoi-default/ham-35680-2023031811-1-rx6sz" plugin="InterPodAffinity" node="10.10.33.60" score=200
I0318 12:34:14.726796  241226 generic_scheduler.go:434] "Plugin scored node for pod" pod="bigdata-hanoi-default/ham-35680-2023031811-1-rx6sz" plugin="InterPodAffinity" node="10.10.33.59" score=0
I0318 12:34:14.726802  241226 generic_scheduler.go:434] "Plugin scored node for pod" pod="bigdata-hanoi-default/ham-35680-2023031811-1-rx6sz" plugin="InterPodAffinity" node="10.10.33.61" score=0

I0318 12:36:18.573939  241226 generic_scheduler.go:434] "Plugin scored node for pod" pod="bigdata-hanoi-default/ham-44003-202303181230-1-bb9kn" plugin="InterPodAffinity" node="10.10.33.58" score=200
I0318 12:36:18.573946  241226 generic_scheduler.go:434] "Plugin scored node for pod" pod="bigdata-hanoi-default/ham-44003-202303181230-1-bb9kn" plugin="InterPodAffinity" node="10.10.33.61" score=132
I0318 12:36:18.573953  241226 generic_scheduler.go:434] "Plugin scored node for pod" pod="bigdata-hanoi-default/ham-44003-202303181230-1-bb9kn" plugin="InterPodAffinity" node="10.10.33.62" score=0
I0318 12:36:18.573959  241226 generic_scheduler.go:434] "Plugin scored node for pod" pod="bigdata-hanoi-default/ham-44003-202303181230-1-bb9kn" plugin="InterPodAffinity" node="10.10.33.59" score=132
I0318 12:36:18.573965  241226 generic_scheduler.go:434] "Plugin scored node for pod" pod="bigdata-hanoi-default/ham-44003-202303181230-1-bb9kn" plugin="InterPodAffinity" node="10.10.33.60" score=200

I0318 12:36:19.612833  241226 generic_scheduler.go:434] "Plugin scored node for pod" pod="bigdata-hanoi-default/ham-37288-202303181215-1-8rbrl" plugin="InterPodAffinity" node="10.10.33.58" score=0
I0318 12:36:19.612839  241226 generic_scheduler.go:434] "Plugin scored node for pod" pod="bigdata-hanoi-default/ham-37288-202303181215-1-8rbrl" plugin="InterPodAffinity" node="10.10.33.61" score=200
I0318 12:36:19.612844  241226 generic_scheduler.go:434] "Plugin scored node for pod" pod="bigdata-hanoi-default/ham-37288-202303181215-1-8rbrl" plugin="InterPodAffinity" node="10.10.33.62" score=0
I0318 12:36:19.612849  241226 generic_scheduler.go:434] "Plugin scored node for pod" pod="bigdata-hanoi-default/ham-37288-202303181215-1-8rbrl" plugin="InterPodAffinity" node="10.10.33.60" score=200
I0318 12:36:19.612854  241226 generic_scheduler.go:434] "Plugin scored node for pod" pod="bigdata-hanoi-default/ham-37288-202303181215-1-8rbrl" plugin="InterPodAffinity" node="10.10.33.59" score=200

I0320 12:43:32.206742  241226 generic_scheduler.go:434] "Plugin scored node for pod" pod="bigdata-hanoi-default/ham-39074-2023032011-1-fv6h6" plugin="InterPodAffinity" node="10.10.33.58" score=0
I0320 12:43:32.206747  241226 generic_scheduler.go:434] "Plugin scored node for pod" pod="bigdata-hanoi-default/ham-39074-2023032011-1-fv6h6" plugin="InterPodAffinity" node="10.10.33.62" score=0
I0320 12:43:32.206754  241226 generic_scheduler.go:434] "Plugin scored node for pod" pod="bigdata-hanoi-default/ham-39074-2023032011-1-fv6h6" plugin="InterPodAffinity" node="10.10.33.60" score=100
I0320 12:43:32.206759  241226 generic_scheduler.go:434] "Plugin scored node for pod" pod="bigdata-hanoi-default/ham-39074-2023032011-1-fv6h6" plugin="InterPodAffinity" node="10.10.33.61" score=200
I0320 12:43:32.206764  241226 generic_scheduler.go:434] "Plugin scored node for pod" pod="bigdata-hanoi-default/ham-39074-2023032011-1-fv6h6" plugin="InterPodAffinity" node="10.10.33.59" score=0

I0320 13:13:09.948985  241226 generic_scheduler.go:434] "Plugin scored node for pod" pod="bigdata-hanoi-default/ham-25767-2023032012-1-nnj7k" plugin="InterPodAffinity" node="10.10.33.61" score=200
I0320 13:13:09.948990  241226 generic_scheduler.go:434] "Plugin scored node for pod" pod="bigdata-hanoi-default/ham-25767-2023032012-1-nnj7k" plugin="InterPodAffinity" node="10.10.33.62" score=200
I0320 13:13:09.948995  241226 generic_scheduler.go:434] "Plugin scored node for pod" pod="bigdata-hanoi-default/ham-25767-2023032012-1-nnj7k" plugin="InterPodAffinity" node="10.10.33.58" score=0
I0320 13:13:09.949001  241226 generic_scheduler.go:434] "Plugin scored node for pod" pod="bigdata-hanoi-default/ham-25767-2023032012-1-nnj7k" plugin="InterPodAffinity" node="10.10.33.60" score=200
I0320 13:13:09.949005  241226 generic_scheduler.go:434] "Plugin scored node for pod" pod="bigdata-hanoi-default/ham-25767-2023032012-1-nnj7k" plugin="InterPodAffinity" node="10.10.33.59" score=0

I0320 13:12:09.243515  241226 generic_scheduler.go:434] "Plugin scored node for pod" pod="bigdata-hanoi-default/ham-43308-2023032012-1-7ncz7" plugin="InterPodAffinity" node="10.10.33.61" score=200
I0320 13:12:09.243522  241226 generic_scheduler.go:434] "Plugin scored node for pod" pod="bigdata-hanoi-default/ham-43308-2023032012-1-7ncz7" plugin="InterPodAffinity" node="10.10.33.58" score=0
I0320 13:12:09.243529  241226 generic_scheduler.go:434] "Plugin scored node for pod" pod="bigdata-hanoi-default/ham-43308-2023032012-1-7ncz7" plugin="InterPodAffinity" node="10.10.33.60" score=0
I0320 13:12:09.243535  241226 generic_scheduler.go:434] "Plugin scored node for pod" pod="bigdata-hanoi-default/ham-43308-2023032012-1-7ncz7" plugin="InterPodAffinity" node="10.10.33.62" score=0
I0320 13:12:09.243541  241226 generic_scheduler.go:434] "Plugin scored node for pod" pod="bigdata-hanoi-default/ham-43308-2023032012-1-7ncz7" plugin="InterPodAffinity" node="10.10.33.59" score=0

I0320 13:10:51.044472  241226 generic_scheduler.go:434] "Plugin scored node for pod" pod="bigdata-hanoi-default/ham-29621-2023032012-1-gm5v7" plugin="InterPodAffinity" node="10.10.33.58" score=0
I0320 13:10:51.044478  241226 generic_scheduler.go:434] "Plugin scored node for pod" pod="bigdata-hanoi-default/ham-29621-2023032012-1-gm5v7" plugin="InterPodAffinity" node="10.10.33.61" score=0
I0320 13:10:51.044484  241226 generic_scheduler.go:434] "Plugin scored node for pod" pod="bigdata-hanoi-default/ham-29621-2023032012-1-gm5v7" plugin="InterPodAffinity" node="10.10.33.62" score=0
I0320 13:10:51.044489  241226 generic_scheduler.go:434] "Plugin scored node for pod" pod="bigdata-hanoi-default/ham-29621-2023032012-1-gm5v7" plugin="InterPodAffinity" node="10.10.33.60" score=0
I0320 13:10:51.044494  241226 generic_scheduler.go:434] "Plugin scored node for pod" pod="bigdata-hanoi-default/ham-29621-2023032012-1-gm5v7" plugin="InterPodAffinity" node="10.10.33.59" score=0

出现pod都往一个节点调度,触发条件是,节点间的资源剩余量差不多,且镜像大于1G,且pod都是使用同一个镜像。而job创建的pod执行时间比较短,容易造成节点资源余量差不多。而spark任务用的是python环境镜像都是4G左右,容易让ImageLocality打分接近100。

https://github.com/kubernetes/kubernetes/issues/91204

https://github.com/kubernetes/kubernetes/issues/42281

https://stackoverflow.com/questions/73278919/why-would-the-kubernetes-scheduler-always-place-my-pod-replicas-on-the-same-node

https://github.com/kubernetes/kubernetes/issues/110149

相关内容