Operator是由CoreOS公司开发的,用来扩展 Kubernetes API,特定的应用程序控制器,它用来创建、配置和管理复杂的有状态应用,如数据库、缓存和监控系统。
简单来说Operator可以大大简化有状态服务部署。

1. 克隆文件

# 创建文件夹
mkdir -p /kube/operator

# 克隆文件
git clone https://github.com/coreos/kube-prometheus.git

# 进入文件夹
cd /kube/operator/kube-prometheus/manifests

2. 初始化

# 全部启动
kubectl apply -f .

# 查看CRD,coreos的均为新增的
kubectl get crd
NAME                                             CREATED AT
alertmanagers.monitoring.coreos.com              2019-08-21T06:17:09Z
podmonitors.monitoring.coreos.com                2019-08-21T06:17:09Z
prometheuses.monitoring.coreos.com               2019-08-21T06:17:09Z
prometheusrules.monitoring.coreos.com            2019-08-21T06:17:09Z
servicemonitors.monitoring.coreos.com            2019-08-21T06:17:09Z
volumesnapshotclasses.snapshot.storage.k8s.io    2019-08-12T01:52:51Z
volumesnapshotcontents.snapshot.storage.k8s.io   2019-08-12T01:52:51Z
volumesnapshots.snapshot.storage.k8s.io          2019-08-12T01:52:51Z


# 查看状态
kubectl get all -n monitoring
NAME                                       READY   STATUS              RESTARTS   AGE
pod/alertmanager-main-0                    0/2     ContainerCreating   0          4m17s
pod/grafana-7dc5f8f9f6-qqs4q               0/1     ContainerCreating   0          9m37s
pod/kube-state-metrics-77467ddf9b-w45h5    0/4     ContainerCreating   0          9m38s
pod/node-exporter-2q7wn                    0/2     ContainerCreating   0          9m37s
pod/node-exporter-88v4z                    0/2     ContainerCreating   0          9m37s
pod/node-exporter-jds2c                    0/2     ContainerCreating   0          9m37s
pod/node-exporter-sbpf9                    0/2     ContainerCreating   0          9m37s
pod/node-exporter-v222n                    0/2     ContainerCreating   0          9m37s
pod/node-exporter-z8svt                    2/2     Running             0          9m37s
pod/node-exporter-zzjmj                    2/2     Running             0          9m37s
pod/prometheus-adapter-668748ddbd-tr7w4    1/1     Running             0          9m38s
pod/prometheus-k8s-0                       0/3     ContainerCreating   0          4m7s
pod/prometheus-k8s-1                       0/3     ContainerCreating   0          4m7s
pod/prometheus-operator-7447bf4dcb-62t4m   1/1     Running             0          9m39s


NAME                            TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)             AGE
service/alertmanager-main       ClusterIP   10.97.133.11     <none>        9093/TCP            9m39s
service/alertmanager-operated   ClusterIP   None             <none>        9093/TCP,6783/TCP   4m17s
service/grafana                 ClusterIP   10.111.136.200   <none>        3000/TCP            9m38s
service/kube-state-metrics      ClusterIP   None             <none>        8443/TCP,9443/TCP   9m38s
service/node-exporter           ClusterIP   None             <none>        9100/TCP            9m38s
service/prometheus-adapter      ClusterIP   10.100.178.209   <none>        443/TCP             9m38s
service/prometheus-k8s          ClusterIP   10.110.75.229    <none>        9090/TCP            9m37s
service/prometheus-operated     ClusterIP   None             <none>        9090/TCP            4m7s
service/prometheus-operator     ClusterIP   None             <none>        8080/TCP            9m40s

NAME                           DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR            AGE
daemonset.apps/node-exporter   7         7         2       7            2           kubernetes.io/os=linux   9m38s

NAME                                  READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/grafana               0/1     1            0           9m38s
deployment.apps/kube-state-metrics    0/1     1            0           9m38s
deployment.apps/prometheus-adapter    1/1     1            1           9m38s
deployment.apps/prometheus-operator   1/1     1            1           9m40s

NAME                                             DESIRED   CURRENT   READY   AGE
replicaset.apps/grafana-7dc5f8f9f6               1         1         0       9m38s
replicaset.apps/kube-state-metrics-77467ddf9b    1         1         0       9m38s
replicaset.apps/prometheus-adapter-668748ddbd    1         1         1       9m38s
replicaset.apps/prometheus-operator-7447bf4dcb   1         1         1       9m40s

NAME                                 READY   AGE
statefulset.apps/alertmanager-main   0/3     4m17s
statefulset.apps/prometheus-k8s      0/2     4m7s

根据网络的快慢,最终部署完毕是这样的:

NAME                                       READY   STATUS    RESTARTS   AGE
pod/alertmanager-main-0                    2/2     Running   0          25m
pod/alertmanager-main-1                    2/2     Running   0          9m4s
pod/alertmanager-main-2                    2/2     Running   0          6m10s
pod/grafana-7dc5f8f9f6-qqs4q               1/1     Running   0          30m
pod/kube-state-metrics-7c9f45f76b-6n44g    4/4     Running   0          6m36s
pod/node-exporter-2q7wn                    2/2     Running   0          30m
pod/node-exporter-88v4z                    2/2     Running   0          30m
pod/node-exporter-jds2c                    2/2     Running   0          30m
pod/node-exporter-sbpf9                    2/2     Running   0          30m
pod/node-exporter-v222n                    2/2     Running   0          30m
pod/node-exporter-z8svt                    2/2     Running   0          30m
pod/node-exporter-zzjmj                    2/2     Running   0          30m
pod/prometheus-adapter-668748ddbd-tr7w4    1/1     Running   0          30m
pod/prometheus-k8s-0                       3/3     Running   1          25m
pod/prometheus-k8s-1                       3/3     Running   1          25m
pod/prometheus-operator-7447bf4dcb-62t4m   1/1     Running   0          30m


NAME                            TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)             AGE
service/alertmanager-main       ClusterIP   10.97.133.11     <none>        9093/TCP            30m
service/alertmanager-operated   ClusterIP   None             <none>        9093/TCP,6783/TCP   25m
service/grafana                 ClusterIP   10.111.136.200   <none>        3000/TCP            30m
service/kube-state-metrics      ClusterIP   None             <none>        8443/TCP,9443/TCP   30m
service/node-exporter           ClusterIP   None             <none>        9100/TCP            30m
service/prometheus-adapter      ClusterIP   10.100.178.209   <none>        443/TCP             30m
service/prometheus-k8s          ClusterIP   10.110.75.229    <none>        9090/TCP            30m
service/prometheus-operated     ClusterIP   None             <none>        9090/TCP            25m
service/prometheus-operator     ClusterIP   None             <none>        8080/TCP            30m

NAME                           DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR            AGE
daemonset.apps/node-exporter   7         7         7       7            7           kubernetes.io/os=linux   30m

NAME                                  READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/grafana               1/1     1            1           30m
deployment.apps/kube-state-metrics    1/1     1            1           30m
deployment.apps/prometheus-adapter    1/1     1            1           30m
deployment.apps/prometheus-operator   1/1     1            1           30m

NAME                                             DESIRED   CURRENT   READY   AGE
replicaset.apps/grafana-7dc5f8f9f6               1         1         1       30m
replicaset.apps/kube-state-metrics-77467ddf9b    0         0         0       30m
replicaset.apps/kube-state-metrics-7c9f45f76b    1         1         1       6m36s
replicaset.apps/prometheus-adapter-668748ddbd    1         1         1       30m
replicaset.apps/prometheus-operator-7447bf4dcb   1         1         1       30m

NAME                                 READY   AGE
statefulset.apps/alertmanager-main   3/3     25m
statefulset.apps/prometheus-k8s      2/2     25m

3. 外部访问

对prometheus-k8s与grafana启动ingress访问。

3.1 配置ingress

# prometheus ingress
vi prometheus-ingress.yaml 
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: prometheus-ingress
  namespace: monitoring
spec:
  rules:
  - host: prometheus.s.com
    http:
      paths:
      - backend:
          serviceName: prometheus-k8s
          servicePort: 9090

# grafana ingress
vi grafana-ingress.yaml 
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: grafana-ingress
  namespace: monitoring
spec:
  rules:
  - host: grafana.s.com
    http:
      paths:
      - backend:
          serviceName: grafana
          servicePort: 3000
          
# alertmanager ingress
vi alertmanager-ingress.yaml
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: alertmanager-ingress
  namespace: monitoring
spec:
  rules:
  - host: alertmanager.s.com
    http:
      paths:
      - backend:
          serviceName: alertmanager-main
          servicePort: 9093
          
# deploy
kubectl apply -f prometheus-ingress.yaml grafana-ingress.yaml alertmanager-ingress.yaml

prometheus与alertmanager无需认证,grafana默认账号密码为admin/admin

3.2 配置DNS

prometheus.s.comalertmanager.s.comgrafana.s.com在DNS配置中指向Ingress VIP地址。
参考:CentOS搭建DNS服务器


4. 修复未生效监控

进入prometheus.s.com的targets后发现有2项监控项没有数据。
prometheus-monitoring.png

  • monitoring/kube-controller-manager/0 (0/0 up)
  • monitoring/kube-scheduler/0 (0/0 up)

4.1 kube-scheduler

查看kube-scheduler组件对应的ServiceMonitor资源定义。

cat /kube/operator/kube-prometheus/manifests/prometheus-serviceMonitorKubeScheduler.yaml

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  labels:
    k8s-app: kube-scheduler
  name: kube-scheduler
  namespace: monitoring
spec:
  endpoints:
  - interval: 30s
    port: http-metrics
  jobLabel: k8s-app
  namespaceSelector:
    matchNames:
    - kube-system # 匹配kube-system namespace下的service
  selector:
    matchLabels:
      k8s-app: kube-scheduler # 匹配有此label的service

没有满足的servce,自然监控不到,那么新建一个就可以了。
Service需要匹配pod的label,查看pod:

kubectl describe pod kube-scheduler-kube-master -n kube-system

Name:                 kube-scheduler-kube-master
Namespace:            kube-system
Priority:             2000000000
Priority Class Name:  system-cluster-critical
Node:                 kube-master/10.0.1.50
Start Time:           Sun, 18 Aug 2019 21:43:46 -0400
Labels:               component=kube-scheduler
                      tier=control-plane

可以看到label中component=kube-scheduler,那么可以根据此建立service。

vi kube-scheduler-svc.yaml
apiVersion: v1
kind: Service
metadata:
  namespace: kube-system
  name: kube-scheduler
  labels:
    k8s-app: kube-scheduler
spec:
  selector:
    component: kube-scheduler
  ports:
  - name: http-metrics
    port: 10251
    targetPort: 10251
    protocol: TCP
    
# apply
kubectl apply -f kube-scheduler-svc.yaml

10251是kube-scheduler组件 metrics 数据所在的端口,10252是kube-controller-manager组件的监控数据所在端口。
创建成功后稍顷查看prometheus已经可以看到kube-scheduler已经可以查看到了。

4.2 kube-controller-manager

与上一个相同

cat /kube/operator/kube-prometheus/manifests/prometheus-serviceMonitorKubeControllerManager.yaml

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  labels:
    k8s-app: kube-controller-manager
  name: kube-controller-manager
  namespace: monitoring
spec:
  endpoints:
  - interval: 30s
    metricRelabelings:
    - action: drop
      regex: etcd_(debugging|disk|request|server).*
      sourceLabels:
      - __name__
    port: http-metrics
  jobLabel: k8s-app
  namespaceSelector:
    matchNames:
    - kube-system # 匹配kube-system namespace下的service
  selector:
    matchLabels:
      k8s-app: kube-controller-manager # 匹配有此label的service

查看pod label:

kubectl describe pod kube-controller-manager-kube-master -n kube-system
Name:                 kube-controller-manager-kube-master
Namespace:            kube-system
Priority:             2000000000
Priority Class Name:  system-cluster-critical
Node:                 kube-master/10.0.1.50
Start Time:           Sun, 18 Aug 2019 21:43:46 -0400
Labels:               component=kube-controller-manager
                      tier=control-plane

新建:

vi kube-controller-manager-svc.yaml

apiVersion: v1
kind: Service
metadata:
  namespace: kube-system
  name: kube-controller-manager
  labels:
    k8s-app: kube-controller-manager
spec:
  selector:
    component: kube-controller-manager
  ports:
  - name: http-metrics
    port: 10252
    targetPort: 10252
    protocol: TCP
    
# apply
kubectl apply -f kube-controller-manager-svc.yaml

4.3 还可能来自kubeadm的错误

如果刷新发现监控状态错误,还可能是因为采用kubeadm部署,默认绑定地址为127.0.0.1,所以抓取访问会被拒绝,修改为0.0.0.0即可。

# 进入目录
cd /etc/kubernetes/manifests/
# vi kube-scheduler.yaml

spec:
  containers:
  - command:
    - kube-scheduler
    - --bind-address=127.0.0.1 # 更改为0.0.0.0
    - --kubeconfig=/etc/kubernetes/scheduler.conf
    - --leader-elect=true

# vi kube-controller-manager.yaml
spec:
  containers:
  - command:
    - kube-controller-manager
    - --allocate-node-cidrs=true
    - --authentication-kubeconfig=/etc/kubernetes/controller-manager.conf
    - --authorization-kubeconfig=/etc/kubernetes/controller-manager.conf
    - --bind-address=127.0.0.1 # 更改为0.0.0.0

更改完毕后将文件移出文件夹,稍倾再移入即可自动更新。

5. Grafana查看

访问grafana.s.com即可查看监控,默认已经将数据源与dashboard导入了,直接即可查看。
grafana.png


6. 持久化存储

这一步可以在初始化之前做,也可以在部署成功之后做。
建立StorageClass,在此之前需要配置好持久化存储,我们已经有了NFS,参考:Kubernetes使用NFS做持久化存储 - SPEX
实际在生产环节中,最好使用SSD集群,而不是NFS。
然后建立StorageClass:

# 修改
vi prometheus-data-sc.yaml 
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: prometheus-data-db
provisioner: fuseim.pri/ifs

# 生效
kubectl apply -f prometheus-data-sc.yaml

然后给Prometheus CRD添加PVC,添加完成后,Prometheus Operator会自动给所有的Statusfulset资源使用PVC。

# 新建
vi /kube/operator/kube-prometheus/manifests/prometheus-prometheus.yaml 

···
spec:
  storage:
    volumeClaimTemplate:
      spec:
        storageClassName: prometheus-data-db
        resources:
          requests:
            storage: 10Gi
            
# 生效
kubectl apply -f prometheus-prometheus.yaml

感谢

  1. Storage
  2. Prometheus Operator 高级配置-www.qikqiak.com|阳明的博客|Kubernetes|Docker|istio|Python|Golang|Cloud Native
Last modification:August 23rd, 2019 at 10:27 am