Kubernetes pod distribution amongst nodes

Is there any way to make kubernetes distribute pods as much as possible? I have "Requests" on all deployments and global Requests as well as HPA. all nodes are the same.

Just had a situation where my ASG scaled down a node and one service became completely unavailable as all 4 pods were on the same node that was scaled down.

I would like to maintain a situation where each deployment must spread its containers on at least 2 nodes.


Sounds like what you want is Inter-Pod Affinity and Pod Anti-affinity.

Inter-pod affinity and anti-affinity were introduced in Kubernetes 1.4. Inter-pod affinity and anti-affinity allow you to constrain which nodes your pod is eligible to schedule on based on labels on pods that are already running on the node rather than based on labels on nodes. The rules are of the form “this pod should (or, in the case of anti-affinity, should not) run in an X if that X is already running one or more pods that meet rule Y.” Y is expressed as a LabelSelector with an associated list of namespaces (or “all” namespaces); unlike nodes, because pods are namespaced (and therefore the labels on pods are implicitly namespaced), a label selector over pod labels must specify which namespaces the selector should apply to. Conceptually X is a topology domain like node, rack, cloud provider zone, cloud provider region, etc. You express it using a topologyKey which is the key for the node label that the system uses to denote such a topology domain, eg see the label keys listed above in the section “Interlude: built-in node labels.”

Anti-affinity can be used to ensure that you are spreading your pods across failure domains. You can state these rules as preferences, or as hard rules. In the latter case, if it is unable to satisfy your constraint, the pod would fail to get scheduled.


Here I leverage Anirudh's answer adding example code.

My initial kubernetes yaml looked like this:

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: say-deployment
spec:
  replicas: 6
  template:
    metadata:
      labels:
        app: say
    spec:
      containers:
      - name: say
        image: gcr.io/hazel-champion-200108/say
        ports:
        - containerPort: 8080
---
kind: Service
apiVersion: v1
metadata:
  name: say-service
spec:
  selector:
    app: say
  ports:
    - protocol: TCP
      port: 8080
  type: LoadBalancer
  externalIPs:
    - 192.168.0.112

At this point, kubernetes scheduler somehow decides that all the 6 replicas should be deployed on the same node.

Then I added requiredDuringSchedulingIgnoredDuringExecution to force the pods beeing deployed on different nodes:

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: say-deployment
spec:
  replicas: 3
  template:
    metadata:
      labels:
        app: say
    spec:
      containers:
      - name: say
        image: gcr.io/hazel-champion-200108/say
        ports:
        - containerPort: 8080
      affinity:
              podAntiAffinity:
                requiredDuringSchedulingIgnoredDuringExecution:
                  - labelSelector:
                      matchExpressions:
                        - key: "app"
                          operator: In
                          values:
                          - say
                    topologyKey: "kubernetes.io/hostname"
---
kind: Service
apiVersion: v1
metadata:
  name: say-service
spec:
  selector:
    app: say
  ports:
    - protocol: TCP
      port: 8080
  type: LoadBalancer
  externalIPs:
    - 192.168.0.112

Now all the pods are run on different nodes. And since I have 3 nodes and 6 pods, other 3 pods (6 minus 3) can't be running (pending). This is because I required it: requiredDuringSchedulingIgnoredDuringExecution .

kubectl get pods -o wide 

NAME                              READY     STATUS    RESTARTS   AGE       IP            NODE
say-deployment-8b46845d8-4zdw2   1/1       Running            0          24s       10.244.2.80   night
say-deployment-8b46845d8-699wg   0/1       Pending            0          24s       <none>        <none>
say-deployment-8b46845d8-7nvqp   1/1       Running            0          24s       10.244.1.72   gray
say-deployment-8b46845d8-bzw48   1/1       Running            0          24s       10.244.0.25   np3
say-deployment-8b46845d8-vwn8g   0/1       Pending            0          24s       <none>        <none>
say-deployment-8b46845d8-ws8lr   0/1       Pending            0          24s       <none>        <none>

Now if I loosen this requirement with preferredDuringSchedulingIgnoredDuringExecution :

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: say-deployment
spec:
  replicas: 6
  template:
    metadata:
      labels:
        app: say
    spec:
      containers:
      - name: say
        image: gcr.io/hazel-champion-200108/say
        ports:
        - containerPort: 8080
      affinity:
              podAntiAffinity:
                preferredDuringSchedulingIgnoredDuringExecution:
                  - weight: 100
                    podAffinityTerm:
                      labelSelector:
                        matchExpressions:
                          - key: "app"
                            operator: In
                            values:
                            - say
                      topologyKey: "kubernetes.io/hostname"
---
kind: Service
apiVersion: v1
metadata:
  name: say-service
spec:
  selector:
    app: say
  ports:
    - protocol: TCP
      port: 8080
  type: LoadBalancer
  externalIPs:
    - 192.168.0.112

First 3 pods are deployed on 3 different nodes just like in the previous case. And the rest 3 (6 pods minus 3 nodes) are deployed on various nodes according to kubernetes internal considerations.

NAME                              READY     STATUS    RESTARTS   AGE       IP            NODE
say-deployment-57cf5fb49b-26nvl   1/1       Running   0          59s       10.244.2.81   night
say-deployment-57cf5fb49b-2wnsc   1/1       Running   0          59s       10.244.0.27   np3
say-deployment-57cf5fb49b-6v24l   1/1       Running   0          59s       10.244.1.73   gray
say-deployment-57cf5fb49b-cxkbz   1/1       Running   0          59s       10.244.0.26   np3
say-deployment-57cf5fb49b-dxpcf   1/1       Running   0          59s       10.244.1.75   gray
say-deployment-57cf5fb49b-vv98p   1/1       Running   0          59s       10.244.1.74   gray
链接地址: http://www.djcxy.com/p/94274.html

上一篇: JavaScript的

下一篇: Kubernetes pod在节点间的分布