Kubernetes pod distribution amongst nodes
Is there any way to make kubernetes distribute pods as much as possible? I have "Requests" on all deployments and global Requests as well as HPA. all nodes are the same.
Just had a situation where my ASG scaled down a node and one service became completely unavailable as all 4 pods were on the same node that was scaled down.
I would like to maintain a situation where each deployment must spread its containers on at least 2 nodes.
Sounds like what you want is Inter-Pod Affinity and Pod Anti-affinity.
Inter-pod affinity and anti-affinity were introduced in Kubernetes 1.4. Inter-pod affinity and anti-affinity allow you to constrain which nodes your pod is eligible to schedule on based on labels on pods that are already running on the node rather than based on labels on nodes. The rules are of the form “this pod should (or, in the case of anti-affinity, should not) run in an X if that X is already running one or more pods that meet rule Y.” Y is expressed as a LabelSelector with an associated list of namespaces (or “all” namespaces); unlike nodes, because pods are namespaced (and therefore the labels on pods are implicitly namespaced), a label selector over pod labels must specify which namespaces the selector should apply to. Conceptually X is a topology domain like node, rack, cloud provider zone, cloud provider region, etc. You express it using a topologyKey which is the key for the node label that the system uses to denote such a topology domain, eg see the label keys listed above in the section “Interlude: built-in node labels.”
Anti-affinity can be used to ensure that you are spreading your pods across failure domains. You can state these rules as preferences, or as hard rules. In the latter case, if it is unable to satisfy your constraint, the pod would fail to get scheduled.
Here I leverage Anirudh's answer adding example code.
My initial kubernetes yaml looked like this:
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: say-deployment
spec:
replicas: 6
template:
metadata:
labels:
app: say
spec:
containers:
- name: say
image: gcr.io/hazel-champion-200108/say
ports:
- containerPort: 8080
---
kind: Service
apiVersion: v1
metadata:
name: say-service
spec:
selector:
app: say
ports:
- protocol: TCP
port: 8080
type: LoadBalancer
externalIPs:
- 192.168.0.112
At this point, kubernetes scheduler somehow decides that all the 6 replicas should be deployed on the same node.
Then I added requiredDuringSchedulingIgnoredDuringExecution
to force the pods beeing deployed on different nodes:
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: say-deployment
spec:
replicas: 3
template:
metadata:
labels:
app: say
spec:
containers:
- name: say
image: gcr.io/hazel-champion-200108/say
ports:
- containerPort: 8080
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: "app"
operator: In
values:
- say
topologyKey: "kubernetes.io/hostname"
---
kind: Service
apiVersion: v1
metadata:
name: say-service
spec:
selector:
app: say
ports:
- protocol: TCP
port: 8080
type: LoadBalancer
externalIPs:
- 192.168.0.112
Now all the pods are run on different nodes. And since I have 3 nodes and 6 pods, other 3 pods (6 minus 3) can't be running (pending). This is because I required it: requiredDuringSchedulingIgnoredDuringExecution
.
kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE
say-deployment-8b46845d8-4zdw2 1/1 Running 0 24s 10.244.2.80 night
say-deployment-8b46845d8-699wg 0/1 Pending 0 24s <none> <none>
say-deployment-8b46845d8-7nvqp 1/1 Running 0 24s 10.244.1.72 gray
say-deployment-8b46845d8-bzw48 1/1 Running 0 24s 10.244.0.25 np3
say-deployment-8b46845d8-vwn8g 0/1 Pending 0 24s <none> <none>
say-deployment-8b46845d8-ws8lr 0/1 Pending 0 24s <none> <none>
Now if I loosen this requirement with preferredDuringSchedulingIgnoredDuringExecution
:
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: say-deployment
spec:
replicas: 6
template:
metadata:
labels:
app: say
spec:
containers:
- name: say
image: gcr.io/hazel-champion-200108/say
ports:
- containerPort: 8080
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchExpressions:
- key: "app"
operator: In
values:
- say
topologyKey: "kubernetes.io/hostname"
---
kind: Service
apiVersion: v1
metadata:
name: say-service
spec:
selector:
app: say
ports:
- protocol: TCP
port: 8080
type: LoadBalancer
externalIPs:
- 192.168.0.112
First 3 pods are deployed on 3 different nodes just like in the previous case. And the rest 3 (6 pods minus 3 nodes) are deployed on various nodes according to kubernetes internal considerations.
NAME READY STATUS RESTARTS AGE IP NODE
say-deployment-57cf5fb49b-26nvl 1/1 Running 0 59s 10.244.2.81 night
say-deployment-57cf5fb49b-2wnsc 1/1 Running 0 59s 10.244.0.27 np3
say-deployment-57cf5fb49b-6v24l 1/1 Running 0 59s 10.244.1.73 gray
say-deployment-57cf5fb49b-cxkbz 1/1 Running 0 59s 10.244.0.26 np3
say-deployment-57cf5fb49b-dxpcf 1/1 Running 0 59s 10.244.1.75 gray
say-deployment-57cf5fb49b-vv98p 1/1 Running 0 59s 10.244.1.74 gray
链接地址: http://www.djcxy.com/p/94274.html
上一篇: JavaScript的