1. 监控架构概述 #

kubernetes监控指标大体可以分为两类：核心监控指标和自定义指标，核心监控指标是kubernetes内置稳定可靠监控指标，早期由heapster完成，现由metric-server实现；自定义指标用于实现核心指标的扩展，能够提供更丰富的指标支持，如应用状态指标，自定义指标需要通过Aggregator和k8s api集成，当前主流通过promethues实现。

监控指标用途：

kubectl top 查看node和pod的cpu+内存使用情况
kubernetes-dashbaord 控制台查看节点和pod资源监控
Horizontal Pod Autoscaler 水平横向动态扩展
Scheduler 调度器调度选择条件

2. metric-server架构和安装 #

2. 1 metric-server简介 #

Metrics Server is a cluster-wide aggregator of resource usage data. Resource metrics are used by components like kubectl top and the Horizontal Pod Autoscaler to scale workloads. To autoscale based upon a custom metric, you need to use the Prometheus Adapter Metric-server是一个集群级别的资源指标收集器，用于收集资源指标数据

提供基础资源如CPU、内存监控接口查询；
接口通过 Kubernetes aggregator注册到kube-apiserver中；
对外通过Metric API暴露给外部访问；
自定义指标使用需要借助Prometheus实现。

The Metrics API

/node 获取所有节点的指标，指标名称为NodeMetrics
/node/<node_name> 特定节点指标
/namespaces/{namespace}/pods 获取命名空间下的所有pod指标
/namespaces/{namespace}/pods/{pod} 特定pod的指标，指标名称为PodMetrics

未来将能够支持指标聚合，如max最大值，min最小值，95th峰值，以及自定义时间窗口，如1h，1d，1w等。

2.2 metric-server架构 #

监控架构分两部分内容：核心监控(图白色部分)和自定义监控（图蓝色部分）

1、核心监控实现

通过kubelet收集资源估算+使用估算
metric-server负责数据收集，不负责数据存储
metric-server对外暴露Metric API接口
核心监控指标客用户HPA，kubectl top，scheduler和dashboard

2、自定义监控实现

自定义监控指标包括监控指标和服务指标
需要在每个node上部署一个agent上报至集群监控agent，如prometheus
集群监控agent收集数据后需要将监控指标+服务指标通过API adaptor转换为apiserver能够处理的接口
HPA通过自定义指标实现更丰富的弹性扩展能力，需要通过HPA adaptor API做次转换。

2.3 metric-server部署 #

1、获取metric-server安装文件，当前具有两个版本：1.7和1.8+，kubernetes1.7版本安装1.7的metric-server版本，kubernetes 1.8后版本安装metric server 1.8+版本

1[root@node-1 ~]# git clone https://github.com/kubernetes-sigs/metrics-server.git

2、部署metric-server，部署1.8+版本

 1[root@node-1 metrics-server]# kubectl apply -f deploy/1.8+/
 2clusterrole.rbac.authorization.k8s.io/system:aggregated-metrics-reader created
 3clusterrolebinding.rbac.authorization.k8s.io/metrics-server:system:auth-delegator created
 4rolebinding.rbac.authorization.k8s.io/metrics-server-auth-reader created
 5apiservice.apiregistration.k8s.io/v1beta1.metrics.k8s.io created
 6serviceaccount/metrics-server created
 7deployment.apps/metrics-server created
 8service/metrics-server created
 9clusterrole.rbac.authorization.k8s.io/system:metrics-server created
10clusterrolebinding.rbac.authorization.k8s.io/system:metrics-server created

核心的配置文件是metrics-server-deployment.yaml，metric-server以Deployment的方式部署在集群中，j镜像k8s.gcr.io/metrics-server-amd64:v0.3.6需要提前下载好，其对应的安装文件内容如下：

 1---
 2apiVersion: v1
 3kind: ServiceAccount
 4metadata:
 5  name: metrics-server
 6  namespace: kube-system
 7---
 8apiVersion: apps/v1
 9kind: Deployment
10metadata:
11  name: metrics-server
12  namespace: kube-system
13  labels:
14    k8s-app: metrics-server
15spec:
16  selector:
17    matchLabels:
18      k8s-app: metrics-server
19  template:
20    metadata:
21      name: metrics-server
22      labels:
23        k8s-app: metrics-server
24    spec:
25      serviceAccountName: metrics-server
26      volumes:
27      # mount in tmp so we can safely use from-scratch images and/or read-only containers
28      - name: tmp-dir
29        emptyDir: {}
30      containers:
31      - name: metrics-server
32        image: k8s.gcr.io/metrics-server-amd64:v0.3.6
33        args:
34          - --cert-dir=/tmp
35          - --secure-port=4443
36ExternalIP
37        ports:
38        - name: main-port
39          containerPort: 4443
40          protocol: TCP
41        securityContext:
42          readOnlyRootFilesystem: true
43          runAsNonRoot: true
44          runAsUser: 1000
45        imagePullPolicy: Always
46        volumeMounts:
47        - name: tmp-dir
48          mountPath: /tmp
49      nodeSelector:
50        beta.kubernetes.io/os: linux

3、检查metric-server部署的情况,查看metric-server的Pod已部署成功

1[root@node-1 1.8+]# kubectl get deployments metrics-server -n kube-system 
2NAME             READY   UP-TO-DATE   AVAILABLE   AGE
3metrics-server   1/1     1            1           2m49s
4[root@node-1 1.8+]# kubectl get pods -n kube-system metrics-server-67db467b7b-5xf8x 
5NAME                              READY   STATUS    RESTARTS   AGE
6metrics-server-67db467b7b-5xf8x   1/1     Running   0          3m

实际此时metric-server并不能使用，使用kubectl top node 查看会提示Error from server (NotFound): nodemetrics.metrics.k8s.io "node-1" not found类似的报错，查看metric-server的pod的日志信息，显示如下：

1[root@node-1 1.8+]# kubectl logs metrics-server-67db467b7b-5xf8x  -n kube-system -f
2I1230 11:34:10.905500       1 serving.go:312] Generated self-signed cert (/tmp/apiserver.crt, /tmp/apiserver.key)
3I1230 11:34:11.527346       1 secure_serving.go:116] Serving securely on [::]:4443
4E1230 11:35:11.552067       1 manager.go:111] unable to fully collect metrics: [unable to fully scrape metrics from source kubelet_summary:node-1: unable to fetch metrics from Kubelet node-1 (node-1): Get https://node-1:10250/stats/summary?only_cpu_and_memory=true: dial tcp: lookup node-1 on 10.96.0.10:53: no such host, unable to fully scrape metrics from source kubelet_summary:node-3: unable to fetch metrics from Kubelet node-3 (node-3): Get https://node-3:10250/stats/summary?only_cpu_and_memory=true: dial tcp: lookup node-3 on 10.96.0.10:53: no such host, unable to fully scrape metrics from source kubelet_summary:node-2: unable to fetch metrics from Kubelet node-2 (node-2): Get https://node-2:10250/stats/summary?only_cpu_and_memory=true: dial tcp: lookup node-2 on 10.96.0.10:53: no such host]

4、上述的报错信息提示pod中通过DNS无法解析主机名，可以通过在pod中定义hosts文件或告知metric-server优先使用IP的方式通讯，修改metric-server的deployment配置文件，修改如下并重新应用配置

修改metric-server部署配置文件

5、应用metric-server部署文件后重新生成一个pod，日志中再次查看提示另外一个报错信息

1[root@node-1 1.8+]# kubectl logs metrics-server-f54f5d6bf-s42rc   -n kube-system -f
2I1230 11:45:26.615547       1 serving.go:312] Generated self-signed cert (/tmp/apiserver.crt, /tmp/apiserver.key)
3I1230 11:45:27.043723       1 secure_serving.go:116] Serving securely on [::]:4443
4
5E1230 11:46:27.065274       1 manager.go:111] unable to fully collect metrics: [unable to fully scrape metrics from source kubelet_summary:node-2: unable to fetch metrics from Kubelet node-2 (10.254.100.102): Get https://10.254.100.102:10250/stats/summary?only_cpu_and_memory=true: x509: cannot validate certificate for 10.254.100.102 because it doesn't contain any IP SANs, unable to fully scrape metrics from source kubelet_summary:node-1: unable to fetch metrics from Kubelet node-1 (10.254.100.101): Get https://10.254.100.101:10250/stats/summary?only_cpu_and_memory=true: x509: cannot validate certificate for 10.254.100.101 because it doesn't contain any IP SANs, unable to fully scrape metrics from source kubelet_summary:node-3: unable to fetch metrics from Kubelet node-3 (10.254.100.103): Get https://10.254.100.103:10250/stats/summary?only_cpu_and_memory=true: x509: cannot validate certificate for 10.254.100.103 because it doesn't contain any IP SANs]

6、修改metric-server的deployments配置文件，添加--kubelet-insecure-tls参数设置

metric-server调整部署参数

再次重新部署后无报错，等待几分钟后就有数据上报告metric-server中了，可以通过kubectl top进行验证测试。

2.4 metric-server api测试 #

1、安装完metric-server后会增加一个metrics.k8s.io/v1beta1的API组，该API组通过Aggregator接入apiserver中

metric-server api接口

2、使用命令行查看kubectl top node的监控信息,可以看到CPU和内存的利用率

1[root@node-1 1.8+]# kubectl top node 
2NAME     CPU(cores)   CPU%   MEMORY(bytes)   MEMORY%   
3node-1   110m         5%     4127Mi          53%       
4node-2   53m          5%     1066Mi          61%       
5node-3   34m          3%     1002Mi          57%

3、查看pod监控信息,可以看到pod中CPU和内存的使用情况

 1[root@node-1 1.8+]# kubectl top pods
 2NAME                                   CPU(cores)   MEMORY(bytes)   
 3haproxy-1-686c67b997-kw8pp             0m           1Mi             
 4haproxy-2-689b4f897-7cwmf              0m           1Mi             
 5haproxy-ingress-demo-5d487d4fc-5pgjt   0m           1Mi             
 6haproxy-ingress-demo-5d487d4fc-pst2q   0m           1Mi             
 7haproxy-ingress-demo-5d487d4fc-sr8tm   0m           1Mi             
 8ingress-demo-d77bdf4df-7kwbj           0m           1Mi             
 9ingress-demo-d77bdf4df-7x6jn           0m           1Mi             
10ingress-demo-d77bdf4df-hr88b           0m           1Mi             
11ingress-demo-d77bdf4df-wc22k           0m           1Mi             
12service-1-7b66bf758f-xj9jh             0m           2Mi             
13service-2-7c7444684d-w9cv9             1m           3Mi

4、除了用命令行连接metricc-server获取监控资源，还可以通过API方式链接方式获取，可用API有

http://127.0.0.1:8001/apis/metrics.k8s.io/v1beta1/nodes
http://127.0.0.1:8001/apis/metrics.k8s.io/v1beta1/nodes/
http://127.0.0.1:8001/apis/metrics.k8s.io/v1beta1/pods
http://127.0.0.1:8001/apis/metrics.k8s.io/v1beta1/namespace//pods/<pod-name

如下测试API接口的使用：

  1a、创建一个kube proxy代理，用于链接apiserver，默认将监听在127的8001端口
  2[root@node-1 ~]# kubectl proxy 
  3Starting to serve on 127.0.0.1:8001
  4
  5b、查看node列表的监控数据，可以获取到所有node的资源监控数据，usage中包含cpu和memory
  6[root@node-1 ~]# curl http://127.0.0.1:8001/apis/metrics.k8s.io/v1beta1/nodes 
  7  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
  8                                 Dload  Upload   Total   Spent    Left  Speed
  9100  1167  100  1167    0     0   393k      0 --:--:-- --:--:-- --:--:--  569k
 10{
 11  "kind": "NodeMetricsList",
 12  "apiVersion": "metrics.k8s.io/v1beta1",
 13  "metadata": {
 14    "selfLink": "/apis/metrics.k8s.io/v1beta1/nodes"
 15  },
 16  "items": [
 17    {
 18      "metadata": {
 19        "name": "node-3",
 20        "selfLink": "/apis/metrics.k8s.io/v1beta1/nodes/node-3",
 21        "creationTimestamp": "2019-12-30T14:23:00Z"
 22      },
 23      "timestamp": "2019-12-30T14:22:07Z",
 24      "window": "30s",
 25      "usage": {
 26        "cpu": "32868032n",
 27        "memory": "1027108Ki"
 28      }
 29    },
 30    {
 31      "metadata": {
 32        "name": "node-1",
 33        "selfLink": "/apis/metrics.k8s.io/v1beta1/nodes/node-1",
 34        "creationTimestamp": "2019-12-30T14:23:00Z"
 35      },
 36      "timestamp": "2019-12-30T14:22:07Z",
 37      "window": "30s",
 38      "usage": {
 39        "cpu": "108639556n",
 40        "memory": "4305356Ki"
 41      }
 42    },
 43    {
 44      "metadata": {
 45        "name": "node-2",
 46        "selfLink": "/apis/metrics.k8s.io/v1beta1/nodes/node-2",
 47        "creationTimestamp": "2019-12-30T14:23:00Z"
 48      },
 49      "timestamp": "2019-12-30T14:22:12Z",
 50      "window": "30s",
 51      "usage": {
 52        "cpu": "47607386n",
 53        "memory": "1119960Ki"
 54      }
 55    }
 56  ]
 57}
 58
 59c、指定某个具体的node访问到具体node的资源监控数据
 60[root@node-1 ~]# curl http://127.0.0.1:8001/apis/metrics.k8s.io/v1beta1/nodes/node-2
 61{
 62  "kind": "NodeMetrics",
 63  "apiVersion": "metrics.k8s.io/v1beta1",
 64  "metadata": {
 65    "name": "node-2",
 66    "selfLink": "/apis/metrics.k8s.io/v1beta1/nodes/node-2",
 67    "creationTimestamp": "2019-12-30T14:24:39Z"
 68  },
 69  "timestamp": "2019-12-30T14:24:12Z",
 70  "window": "30s",
 71  "usage": {
 72    "cpu": "43027609n",
 73    "memory": "1120168Ki"
 74  }
 75}
 76
 77d、查看所有pod的列表信息
 78curl http://127.0.0.1:8001/apis/metrics.k8s.io/v1beta1/pods
 79
 80e、查看某个具体pod的监控数据
 81[root@node-1 ~]# curl http://127.0.0.1:8001/apis/metrics.k8s.io/v1beta1/namespaces/default/pods/haproxy-ingress-demo-5d487d4fc-sr8tm
 82{
 83  "kind": "PodMetrics",
 84  "apiVersion": "metrics.k8s.io/v1beta1",
 85  "metadata": {
 86    "name": "haproxy-ingress-demo-5d487d4fc-sr8tm",
 87    "namespace": "default",
 88    "selfLink": "/apis/metrics.k8s.io/v1beta1/namespaces/default/pods/haproxy-ingress-demo-5d487d4fc-sr8tm",
 89    "creationTimestamp": "2019-12-30T14:36:30Z"
 90  },
 91  "timestamp": "2019-12-30T14:36:13Z",
 92  "window": "30s",
 93  "containers": [
 94    {
 95      "name": "haproxy-ingress-demo",
 96      "usage": {
 97        "cpu": "0",
 98        "memory": "1428Ki"
 99      }
100    }
101  ]
102}

5、当然也可以通过kubectl -raw的方式访问接口,如调用node-3的数据

 1[root@node-1 ~]# kubectl get --raw /apis/metrics.k8s.io/v1beta1/nodes/node-3 | jq .
 2{
 3  "kind": "NodeMetrics",
 4  "apiVersion": "metrics.k8s.io/v1beta1",
 5  "metadata": {
 6    "name": "node-3",
 7    "selfLink": "/apis/metrics.k8s.io/v1beta1/nodes/node-3",
 8    "creationTimestamp": "2019-12-30T14:44:46Z"
 9  },
10  "timestamp": "2019-12-30T14:44:09Z",
11  "window": "30s",
12  "usage": {
13    "cpu": "35650151n",
14    "memory": "1026820Ki"
15  }
16}

其他近似的接口有：

kubectl get --raw /apis/metrics.k8s.io/v1beta1/nodes 获取所有node的数据

kubectl get --raw /apis/metrics.k8s.io/v1beta1/nodes/<node_name> 获取特定node数据

kubectl get --raw /apis/metrics.k8s.io/v1beta1/pods 获取所有pod的数据

kubectl get --raw /apis/metrics.k8s.io/v1beta1/namespaces/default/pods/haproxy-ingress-demo-5d487d4fc-sr8tm 获取某个特定pod的数据

3. HPA水平横向动态扩展 #

3.1 HPA概述 #

The Horizontal Pod Autoscaler automatically scales the number of pods in a replication controller, deployment, replica set or stateful set based on observed CPU utilization (or, with custom metrics support, on some other application-provided metrics). Note that Horizontal Pod Autoscaling does not apply to objects that can’t be scaled, for example, DaemonSets.

水平横向扩展

HPA即Horizontal Pod Autoscaler,Pod水平横向动态扩展，即根据应用分配资源使用情况，动态增加或者减少Pod副本数量，以实现集群资源的扩容，其实现机制为：

HPA需要依赖于监控组件，调用监控数据实现动态伸缩，如调用Metrics API接口
HPA是二级的副本控制器，建立在Deployments，ReplicaSet，StatefulSets等副本控制器基础之上
HPA根据获取资源指标不同支持两个版本：v1和v2alpha1
HPA V1获取核心资源指标，如CPU和内存利用率，通过调用Metric-server API接口实现
HPA V2获取自定义监控指标，通过Prometheus获取监控数据实现
HPA根据资源API周期性调整副本数，检测周期horizontal-pod-autoscaler-sync-period定义的值，默认15s

3.2 HPA实现 #

如下开始延时HPA功能的实现，先创建一个Deployment副本控制器，然后再通过HPA定义资源度量策略，当CPU利用率超过requests分配的80%时即扩容。

1、创建Deployment副本控制器

 1[root@node-1 ~]# kubectl run hpa-demo --image=nginx:1.7.9 --port=80 --replicas=1 --expose=true --requests="'cpu=200m,memory=64Mi"
 2
 3[root@node-1 ~]# kubectl get deployments hpa-demo -o yaml
 4apiVersion: extensions/v1beta1
 5kind: Deployment
 6metadata:
 7  annotations:
 8    deployment.kubernetes.io/revision: "1"
 9  creationTimestamp: "2019-12-31T01:43:24Z"
10  generation: 1
11  labels:
12    run: hpa-demo
13  name: hpa-demo
14  namespace: default
15  resourceVersion: "14451208"
16  selfLink: /apis/extensions/v1beta1/namespaces/default/deployments/hpa-demo
17  uid: 3b0f29e8-8606-4e52-8f5b-6c960d396136
18spec:
19  progressDeadlineSeconds: 600
20  replicas: 1
21  revisionHistoryLimit: 10
22  selector:
23    matchLabels:
24      run: hpa-demo
25  strategy:
26    rollingUpdate:
27      maxSurge: 25%
28      maxUnavailable: 25%
29    type: RollingUpdate
30  template:
31    metadata:
32      creationTimestamp: null
33      labels:
34        run: hpa-demo
35    spec:
36      containers:
37      - image: nginx:1.7.9
38        imagePullPolicy: IfNotPresent
39        name: hpa-demo
40        ports:
41        - containerPort: 80
42          protocol: TCP
43        resources: 
44          requests:
45            cpu: 200m
46            memory: 64Mi
47        terminationMessagePath: /dev/termination-log
48        terminationMessagePolicy: File
49      dnsPolicy: ClusterFirst
50      restartPolicy: Always
51      schedulerName: default-scheduler
52      securityContext: {}
53      terminationGracePeriodSeconds: 30
54status:
55  availableReplicas: 1
56  conditions:
57  - lastTransitionTime: "2019-12-31T01:43:25Z"
58    lastUpdateTime: "2019-12-31T01:43:25Z"
59    message: Deployment has minimum availability.
60    reason: MinimumReplicasAvailable
61    status: "True"
62    type: Available
63  - lastTransitionTime: "2019-12-31T01:43:24Z"
64    lastUpdateTime: "2019-12-31T01:43:25Z"
65    message: ReplicaSet "hpa-demo-755bdd875c" has successfully progressed.
66    reason: NewReplicaSetAvailable
67    status: "True"
68    type: Progressing
69  observedGeneration: 1
70  readyReplicas: 1
71  replicas: 1
72  updatedReplicas: 1

2、创建HPA控制器，基于CPU实现横向扩展,策略为至少2个Pod，最大5个，targetCPUUtilizationPercentage表示CPU实际使用率占requests百分比

 1apiVersion: autoscaling/v1
 2kind: HorizontalPodAutoscaler
 3metadata:
 4  name: hpa-demo
 5spec:
 6  maxReplicas: 5
 7  minReplicas: 2
 8  scaleTargetRef:
 9    apiVersion: apps/v1
10    kind: Deployment
11    name: hpa-demo
12  targetCPUUtilizationPercentage: 80

3、应用HPA规则并查看详情，由于策略需确保最小2个副本，Deployment默认不是2个副本，因此需要扩容，在详情日志中看到副本扩展至2个

 1[root@node-1 ~]# kubectl apply -f hpa-demo.yaml 
 2horizontalpodautoscaler.autoscaling/hpa-demo created
 3
 4#查看HPA列表
 5[root@node-1 ~]# kubectl get horizontalpodautoscalers.autoscaling 
 6NAME       REFERENCE             TARGETS         MINPODS   MAXPODS   REPLICAS   AGE
 7hpa-demo   Deployment/hpa-demo   <unknown>/80%   2         5         0          7s
 8
 9#查看HPA详情
10[root@node-1 ~]# kubectl describe horizontalpodautoscalers.autoscaling hpa-demo 
11Name:                                                  hpa-demo
12Namespace:                                             default
13Labels:                                                <none>
14Annotations:                                           kubectl.kubernetes.io/last-applied-configuration:
15                                                         {"apiVersion":"autoscaling/v1","kind":"HorizontalPodAutoscaler","metadata":{"annotations":{},"name":"hpa-demo","namespace":"default"},"spe...
16CreationTimestamp:                                     Tue, 31 Dec 2019 09:52:51 +0800
17Reference:                                             Deployment/hpa-demo
18Metrics:                                               ( current / target )
19  resource cpu on pods  (as a percentage of request):  <unknown> / 80%
20Min replicas:                                          2
21Max replicas:                                          5
22Deployment pods:                                       1 current / 2 desired
23Conditions:
24  Type         Status  Reason            Message
25  ----         ------  ------            -------
26  AbleToScale  True    SucceededRescale  the HPA controller was able to update the target scale to 2
27Events:
28  Type    Reason             Age   From                       Message
29  ----    ------             ----  ----                       -------
30  Normal  SuccessfulRescale  1s    horizontal-pod-autoscaler  New size: 2; reason: Current number of replicas below Spec.MinReplicas #副本扩容至2个，根据MinReplica的策略

4、查看Deployment列表校验确认扩容情况，已达到HPA基础最小化策略

1[root@node-1 ~]# kubectl get deployments hpa-demo  --show-labels 
2NAME       READY   UP-TO-DATE   AVAILABLE   AGE   LABELS
3hpa-demo   2/2     2            2           94m   run=hpa-demo
4
5[root@node-1 ~]# kubectl get pods -l run=hpa-demo
6NAME                        READY   STATUS    RESTARTS   AGE
7hpa-demo-5fcd9c757d-7q4td   1/1     Running   0          5m10s
8hpa-demo-5fcd9c757d-cq6k6   1/1     Running   0          10m

5、假如业务增长期间，CPU利用率增高，会自动横向增加Pod来实现，下面开始通过CPU压测来演示Deployment的扩展

 1[root@node-1 ~]# kubectl exec -it hpa-demo-5fcd9c757d-cq6k6  /bin/bash
 2root@hpa-demo-5fcd9c757d-cq6k6:/#  dd if=/dev/zero of=/dev/null 
 3
 4再次查看HPA的日志，提示已扩容，原因是cpu resource utilization (percentage of request) above target，即CPU资源利用率超过requests设置的百分比
 5[root@node-1 ~]# kubectl describe horizontalpodautoscalers.autoscaling hpa-demo 
 6Name:                                                  hpa-demo
 7Namespace:                                             default
 8Labels:                                                <none>
 9Annotations:                                           kubectl.kubernetes.io/last-applied-configuration:
10                                                         {"apiVersion":"autoscaling/v1","kind":"HorizontalPodAutoscaler","metadata":{"annotations":{},"name":"hpa-demo","namespace":"default"},"spe...
11CreationTimestamp:                                     Tue, 31 Dec 2019 09:52:51 +0800
12Reference:                                             Deployment/hpa-demo
13Metrics:                                               ( current / target )
14  resource cpu on pods  (as a percentage of request):  99% (199m) / 80%
15Min replicas:                                          2
16Max replicas:                                          5
17Deployment pods:                                       5 current / 5 desired
18Conditions:
19  Type            Status  Reason            Message
20  ----            ------  ------            -------
21  AbleToScale     True    ReadyForNewScale  recommended size matches current size
22  ScalingActive   True    ValidMetricFound  the HPA was able to successfully calculate a replica count from cpu resource utilization (percentage of request)
23  ScalingLimited  True    TooManyReplicas   the desired replica count is more than the maximum replica count
24Events:
25  Type     Reason                   Age                   From                       Message
26  ----     ------                   ----                  ----                       -------
27  Normal   SuccessfulRescale        8m2s                  horizontal-pod-autoscaler  New size: 4; reason: cpu resource utilization (percentage of request) above target
28
29查看副本的个数，确认扩容情况，已成功扩容至5个
30[root@node-1 ~]# kubectl get pods 
31NAME                                   READY   STATUS    RESTARTS   AGE
32hpa-demo-5fcd9c757d-7q4td              1/1     Running   0          16m
33hpa-demo-5fcd9c757d-cq6k6              1/1     Running   0          21m
34hpa-demo-5fcd9c757d-jmb6w              1/1     Running   0          16m
35hpa-demo-5fcd9c757d-lpxk8              1/1     Running   0          16m
36hpa-demo-5fcd9c757d-zs6cg              1/1     Running   0          21m

6、停止CPU压测业务，HPA会自定缩减Pod的副本个数，直至满足条件

 1[root@node-1 ~]# kubectl describe horizontalpodautoscalers.autoscaling hpa-demo
 2Name:                                                  hpa-demo
 3Namespace:                                             default
 4Labels:                                                <none>
 5Annotations:                                           kubectl.kubernetes.io/last-applied-configuration:
 6                                                         {"apiVersion":"autoscaling/v1","kind":"HorizontalPodAutoscaler","metadata":{"annotations":{},"name":"hpa-demo","namespace":"default"},"spe...
 7CreationTimestamp:                                     Tue, 31 Dec 2019 09:52:51 +0800
 8Reference:                                             Deployment/hpa-demo
 9Metrics:                                               ( current / target )
10  resource cpu on pods  (as a percentage of request):  0% (0) / 80%
11Min replicas:                                          2
12Max replicas:                                          5
13Deployment pods:                                       2 current / 2 desired
14Conditions:
15  Type            Status  Reason            Message
16  ----            ------  ------            -------
17  AbleToScale     True    ReadyForNewScale  recommended size matches current size
18  ScalingActive   True    ValidMetricFound  the HPA was able to successfully calculate a replica count from cpu resource utilization (percentage of request)
19  ScalingLimited  True    TooFewReplicas    the desired replica count is increasing faster than the maximum scale rate
20Events:
21  Type     Reason                   Age                   From                       Message
22  ----     ------                   ----                  ----                       -------
23  Normal   SuccessfulRescale        18m                   horizontal-pod-autoscaler  New size: 4; reason: cpu resource utilization (percentage of request) above target
24  Normal   SuccessfulRescale        113s                  horizontal-pod-autoscaler  New size: 2; reason: All metrics below target   #缩减至2个pod副本
25
26确认副本的个数，已缩减至最小数量2个
27[root@node-1 ~]# kubectl get pods -l run=hpa-demo
28NAME                        READY   STATUS    RESTARTS   AGE
29hpa-demo-5fcd9c757d-cq6k6   1/1     Running   0          24m
30hpa-demo-5fcd9c757d-zs6cg   1/1     Running   0          24m

通过上面的例子可以知道，HPA可以基于metric-server提供的API监控数据实现水平动态弹性扩展的需求，从而可以根据业务CPU使用情况，动态水平横向扩展，保障业务的可用性。当前HPA V1扩展使用指标只能基于CPU分配使用率进行扩展，功能相对有限，更丰富的功能需要由HPA V2版来实现，其由不同的API来实现：

metrics.k8s.io 资源指标API，通过metric-server提供，提供node和pod的cpu，内存资源查询；
custom.metrics.k8s.io 自定义指标，通过adapter和kube-apiserver集成，如promethues；
external.metrics.k8s.io 外部指标，和自定义指标类似，需要通过adapter和k8s集成。

参考文献 #

资源指标说明：https://kubernetes.io/docs/tasks/debug-application-cluster/resource-metrics-pipeline/

部署官方说明：https://github.com/kubernetes-sigs/metrics-server

(https://github.com/kubernetes-sigs/metrics-server)

『转载』该文章来源于网络，侵删。

#kubernetes

last updated: 2023-06-15

17 使用metric Server让HPA弹性伸缩愉快运行

Table of Contents