日志系统有很多的选择,ES系与Grafana系。

对于ES系有几下集中组合:

  • Filebeat → ES → Kibana
  • Filebeat → Kafka → Logstash → ES → Kibana
  • Fluented → Kafka → Logstash → ES → Kibana

对于Grafana系来说,还处于比较初级的阶段,主要是:

  • Loki → Grafana

对于ES系来说,增加一个Kafka,可以解决异步问题,防止ES出现故障导致日志上传失败,还可以用Logstash重新上报不同的Index,利于日志收集、展示与检索,这里我们选择使用Fluented → Kafka → Logstash → ES → Kibana来作为我们的方案。

0.名词

  • ECK

    • Elastic Cloud on Kubernetes
    • 最新版本:1.2
    • 用途:基于Kubernetes CRD的Elastic系Operator
    • 可部署系统

      • ES
      • Kibana
      • APM
      • Enterprise Search
      • Beats
  • Strimzi

    • 基于Kubernetes CRD的Apache Kafka cluster部署工具Operator
    • CNCF Member Project
  • Fluentd

    • 轻量级日志上报工具
    • CNCF Graduated Project

1.Apply ECK CRD

目前最新版本为1.2,相比较上个版本,增加了Beat等。
执行后会自动创建elastis-system命名空间,并启动elastic-operator服务。

# apply crd
% kubectl apply -f https://download.elastic.co/downloads/eck/1.2.0/all-in-one.yaml
customresourcedefinition.apiextensions.k8s.io/apmservers.apm.k8s.elastic.co created
customresourcedefinition.apiextensions.k8s.io/beats.beat.k8s.elastic.co created
customresourcedefinition.apiextensions.k8s.io/elasticsearches.elasticsearch.k8s.elastic.co created
customresourcedefinition.apiextensions.k8s.io/enterprisesearches.enterprisesearch.k8s.elastic.co created
customresourcedefinition.apiextensions.k8s.io/kibanas.kibana.k8s.elastic.co created
namespace/elastic-system created
serviceaccount/elastic-operator created
secret/elastic-webhook-server-cert created
clusterrole.rbac.authorization.k8s.io/elastic-operator created
clusterrole.rbac.authorization.k8s.io/elastic-operator-view created
clusterrole.rbac.authorization.k8s.io/elastic-operator-edit created
clusterrolebinding.rbac.authorization.k8s.io/elastic-operator created
rolebinding.rbac.authorization.k8s.io/elastic-operator created
service/elastic-webhook-server created
statefulset.apps/elastic-operator created
validatingwebhookconfiguration.admissionregistration.k8s.io/elastic-webhook.k8s.elastic.co created

2.Deploy ES

建立命名空间,所有的服务都将部署于此。

# create namespace
% kubectl create ns eck

准备部署文件:log-es-no-auth.yaml

  • 3 master 50G
  • 3 data 500G
  • storageClassName: gp2 是AWS EKS默认的EBS存储,如果你不是EKS,请修改为你集群对应的StorageClass名称
  • 自动生成secret: elastic-es-elastic-user,这里我们并没有自定义用户名密码,es会自动生成用户名密码保存至此secret
  • 自动生成secret: elastic-es-http-certs-internal,我们如果不使用eck创建资源的话,就需要使用证书连接es,这里es自动创建了http连接证书,接下来我们会使用logstash消费kafka并推送至es就需要用到
apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
  name: elastic
spec:
  version: 7.8.0
  http:
    service:
      spec:
        type: ClusterIP
  nodeSets:
  - name: masters
    count: 3
    config:
      node.master: true
      node.data: false
      node.ingest: false
      node.ml: false
      xpack.ml.enabled: true
      cluster.remote.connect: false
    podTemplate:
      spec:
        initContainers:
          - name: set-max-map-count
            command:
              - sh
              - -c
              - sysctl -w vm.max_map_count=262144
            securityContext:
              privileged: true
    volumeClaimTemplates:
      - metadata:
          name: elasticsearch-master-data
        spec:
          accessModes:
            - ReadWriteOnce
          resources:
            requests:
              storage: 50Gi
          storageClassName: gp2
  - name: data
    count: 3
    config:
      node.master: false
      node.data: true
      node.ingest: true
      node.ml: true
      cluster.remote.connect: false
    podTemplate:
      spec:
        initContainers:
          - name: set-max-map-count
            command:
              - sh
              - -c
              - sysctl -w vm.max_map_count=262144
            securityContext:
              privileged: true
    volumeClaimTemplates:
      - metadata:
          name: elasticsearch-data
        spec:
          accessModes:
            - ReadWriteOnce
          resources:
            requests:
              storage: 500Gi
          storageClassName: gp2

部署至集群:

% kubectl apply -f log-es-no-auth.yaml -n eck

确认创建的资源:

% kubectl get secret -n eck
NAME                                TYPE                                  DATA   AGE
default-token-bvwfj                 kubernetes.io/service-account-token   3      18d
elastic-es-data-es-config           Opaque                                1      6m20s
elastic-es-elastic-user             Opaque                                1      6m21s
elastic-es-http-ca-internal         Opaque                                2      6m21s
elastic-es-http-certs-internal      Opaque                                3      6m21s
elastic-es-http-certs-public        Opaque                                2      6m21s
elastic-es-internal-users           Opaque                                2      6m21s
elastic-es-masters-es-config        Opaque                                1      6m20s
elastic-es-remote-ca                Opaque                                1      6m22s
elastic-es-transport-ca-internal    Opaque                                2      6m21s
elastic-es-transport-certificates   Opaque                                13     6m21s
elastic-es-transport-certs-public   Opaque                                1      6m21s
elastic-es-xpack-file-realm         Opaque                                3      6m20s
% kubectl get pvc -n eck
NAME                                             STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
elasticsearch-data-elastic-es-data-0             Bound    pvc-7eb061e8-55fb-44c1-ab02-5f9cdc4bae94   500Gi      RWO            gp2            6m43s
elasticsearch-data-elastic-es-data-1             Bound    pvc-0e08dd11-428f-447c-988d-feada4c849ed   500Gi      RWO            gp2            6m43s
elasticsearch-data-elastic-es-data-2             Bound    pvc-7424dd15-817c-4f73-83d3-1447c09c828d   500Gi      RWO            gp2            6m43s
elasticsearch-data-elastic-es-masters-0          Bound    pvc-502dcafe-1903-44eb-9935-50e9a527bcc9   1Gi        RWO            gp2            6m43s
elasticsearch-data-elastic-es-masters-1          Bound    pvc-49439211-a1f8-474e-b7f8-07f61b8e1a7d   1Gi        RWO            gp2            6m43s
elasticsearch-data-elastic-es-masters-2          Bound    pvc-b01cc6ea-c599-4baa-91f6-e48a753a7c2a   1Gi        RWO            gp2            6m43s
elasticsearch-master-data-elastic-es-masters-0   Bound    pvc-4f2bb0bd-ba64-44b6-b4aa-3c37fd9b7f74   50Gi       RWO            gp2            6m43s
elasticsearch-master-data-elastic-es-masters-1   Bound    pvc-78fe0fa0-0f63-423b-88f6-cd8e4760c76c   50Gi       RWO            gp2            6m43s
elasticsearch-master-data-elastic-es-masters-2   Bound    pvc-d1eac150-d0ba-44c0-af20-696eb68d4499   50Gi       RWO            gp2            6m43s
% kubectl get cm -n eck
NAME                       DATA   AGE
elastic-es-scripts         3      7m7s
elastic-es-unicast-hosts   1      7m5s
istio-ca-root-cert         1      18d
kubectl get po -n eck
NAME                   READY   STATUS    RESTARTS   AGE
elastic-es-data-0      1/1     Running   0          7m25s
elastic-es-data-1      1/1     Running   0          7m25s
elastic-es-data-2      1/1     Running   0          7m25s
elastic-es-masters-0   1/1     Running   0          7m25s
elastic-es-masters-1   1/1     Running   0          7m25s
elastic-es-masters-2   1/1     Running   0          7m25s

2.Deploy Kibana

接下来我们创建kibana。

准备部署文件: kibana.yaml

apiVersion: kibana.k8s.elastic.co/v1
kind: Kibana
metadata:
  name: kibana
spec:
  version: 7.8.0
  count: 1
  elasticsearchRef:
    name: elastic
  http:
    tls:
      selfSignedCertificate:
        disabled: true

部署至集群:

% kubectl apply -f kibana.yaml -n eck

接下来创建访问ingress,如果你使用nodeport方式也是可以的。

文件:es-kibana.yaml

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: es-kibana
  namespace: eck
spec:
  rules:
  - host: log-dev.spex.top
    http:
      paths:
      - backend:
          serviceName: kibana-kb-http
          servicePort: 5601

部署至集群,这样我们就可以访问kibana了,账号密码查看自动生成的secret: elastic-es-elastic-user

% kubectl apply -f es-kibana.yaml

3.Deploy APM

APM部署方式与Kibana类似

文件:apm.yaml

apiVersion: apm.k8s.elastic.co/v1
kind: ApmServer
metadata:
  name: apm-server
spec:
  version: 7.8.0
  count: 1
  elasticsearchRef:
    name: elastic
  http:
    tls:
      selfSignedCertificate:
        disabled: true

部署:

% kubectl apply -f apm.yaml -n eck

APM部署后自动生成secret: apm-server-apm-token,这里包含连接APM需求的token,我们如果需要将数据上报APM,需要给服务提供此信息。

APM部署完毕后,会自动生成以下es index

APM Index

4.Deploy Kafka

Kakfa我们采用Strimzi部署,参考:https://spex.top/archives/strimzi-deploy-kafka-cluster-in-kubernetes.html

5.Deploy Fluentd

Fluentd我们采用Rancher已经集成的版本,只需要设置好Kafka地址与Topic即可,参考:https://spex.top/archives/strimzi-deploy-kafka-cluster-in-kubernetes.html

这里我们默认Step4-5已经完成,Kafka Topic已经有我们的日志数据。

6.Deploy Logstash

Logstash并未集成在ECK中,我们需要采用Helm方式部署。

Helm Chat地址:https://github.com/elastic/helm-charts/tree/master/logstash

这里我们需要自定义配置,自定义value.yaml为log-logstash-value.yaml,这里内容比较多,我们分块来看:

  • logstashConfig.logstash.yml 配置文件

    • https://elastic-es-http:9200 这里是我们刚刚的es的svc地址
    • /usr/share/logstash/config/certs/tls.crt 这里是我们挂载证书至Pod内的地址,目录地址在下面的secretMounts中定义。
  • logstashPipeline.logstash.conf 主配置文件

    • log-kafka-kafka-bootstrap.kafka 这里是我们刚刚部署的kafka地址,svc.namespace
    • remove_field 这里将不需要的tag移除
    • outout.hosts 这里与刚刚的相同
    • cacert 这里与刚刚的相同
    • user&password 这里读取环境变量,环境变量在下面的extraEnvs设置
    • index 这里我们将读取kubernetes pod_name自动根据此生成不同的index
  • extraEnvs

    • elastic-es-elastic-user 来自于es自动创建的secret,内容为key:value,key即为user elastic,所以我们直接指定user为elastic,密码则取elastic的值。
  • secretMounts

    • elastic-es-http-certs-internal 来自于es自动创建的secret,里面包含3张证书,这里偷懒将所有证书都挂载至Pod内,3张证书分别为:

      • ca.crt
      • tls.crt
      • tls.key

      这里我们只需要tls.crt

  • 其他的均为value.yaml的默认配置。
---
replicas: 1

# Allows you to add any config files in /usr/share/logstash/config/
# such as logstash.yml and log4j2.properties
#
# Note that when overriding logstash.yml, `http.host: 0.0.0.0` should always be included
# to make default probes work.
logstashConfig:
#  logstash.yml: |
#    key:
#      nestedkey: value
#  log4j2.properties: |
#    key = value
  logstash.yml: |
    http.host: 0.0.0.0
    xpack.monitoring.enabled: true
    xpack.monitoring.elasticsearch.username: '${ELASTICSEARCH_USERNAME}'
    xpack.monitoring.elasticsearch.password: '${ELASTICSEARCH_PASSWORD}'
    xpack.monitoring.elasticsearch.hosts: ["https://elastic-es-http:9200"]
    xpack.monitoring.elasticsearch.ssl.certificate_authority: /usr/share/logstash/config/certs/tls.crt

# Allows you to add any pipeline files in /usr/share/logstash/pipeline/
### ***warn*** there is a hardcoded logstash.conf in the image, override it first
logstashPipeline:
#  logstash.conf: |
#    input {
#      exec {
#        command => "uptime"
#        interval => 30
#      }
#    }
#    output { stdout { } }
  logstash.conf: |
    input {
      kafka {
        bootstrap_servers => "log-kafka-kafka-bootstrap.kafka:9092"
        client_id => "logstash"
        topics => ["eks"]
        group_id => "logstash"
        decorate_events => true
        codec => "json"
      }
    }

    filter {
      mutate {
        remove_field => ["docker", "log_type", "tag", "time", "[kubernetes][namespace_id]","[kubernetes][pod_id]","[kubernetes][master_url]", "[kubernetes][labels]","[kubernetes][container_image]","[kubernetes][host]","[kubernetes][container_image_id]","[kubernetes][namespace_labels]"]
      }
    }
    output {
      elasticsearch {
        hosts => ["https://elastic-es-http:9200"]
        cacert => "/usr/share/logstash/config/certs/tls.crt"
        user => '${ELASTICSEARCH_USERNAME}'
        password => '${ELASTICSEARCH_PASSWORD}'
        index => "%{[kubernetes][pod_name]}-%{+YYYY.MM.dd}"
      }
    }
# Extra environment variables to append to this nodeGroup
# This will be appended to the current 'env:' key. You can use any of the kubernetes env
# syntax here
extraEnvs:
#  - name: MY_ENVIRONMENT_VAR
#    value: the_value_goes_here
  - name: 'ELASTICSEARCH_USERNAME'
    value: elastic
  - name: 'ELASTICSEARCH_PASSWORD'
    valueFrom:
      secretKeyRef:
        name: elastic-es-elastic-user
        key: elastic

# Allows you to load environment variables from kubernetes secret or config map
envFrom: []
# - secretRef:
#     name: env-secret
# - configMapRef:
#     name: config-map

# Add sensitive data to k8s secrets
secrets: []
#  - name: "env"
#    value:
#      ELASTICSEARCH_PASSWORD: "LS1CRUdJTiBgUFJJVkFURSB"
#      api_key: ui2CsdUadTiBasRJRkl9tvNnw
#  - name: "tls"
#    value:
#      ca.crt: |
#        LS0tLS1CRUdJT0K
#        LS0tLS1CRUdJT0K
#        LS0tLS1CRUdJT0K
#        LS0tLS1CRUdJT0K
#      cert.crt: "LS0tLS1CRUdJTiBlRJRklDQVRFLS0tLS0K"
#      cert.key.filepath: "secrets.crt" # The path to file should be relative to the `values.yaml` file.

# A list of secrets and their paths to mount inside the pod
secretMounts:
  - name: elastic-certificate-crt
    secretName: elastic-es-http-certs-internal
    path: /usr/share/logstash/config/certs

image: "docker.elastic.co/logstash/logstash"
imageTag: "8.0.0-SNAPSHOT"
imagePullPolicy: "IfNotPresent"
imagePullSecrets: []

podAnnotations: {}

# additionals labels
labels: {}

logstashJavaOpts: "-Xmx1g -Xms1g"

resources:
  requests:
    cpu: "100m"
    memory: "1536Mi"
  limits:
    cpu: "1000m"
    memory: "1536Mi"

volumeClaimTemplate:
  accessModes: [ "ReadWriteOnce" ]
  resources:
    requests:
      storage: 1Gi

rbac:
  create: false
  serviceAccountAnnotations: {}
  serviceAccountName: ""

podSecurityPolicy:
  create: false
  name: ""
  spec:
    privileged: true
    fsGroup:
      rule: RunAsAny
    runAsUser:
      rule: RunAsAny
    seLinux:
      rule: RunAsAny
    supplementalGroups:
      rule: RunAsAny
    volumes:
      - secret
      - configMap
      - persistentVolumeClaim

persistence:
  enabled: false
  annotations: {}

extraVolumes: ""
  # - name: extras
  #   emptyDir: {}

extraVolumeMounts: ""
  # - name: extras
  #   mountPath: /usr/share/extras
  #   readOnly: true

extraContainers: ""
  # - name: do-something
  #   image: busybox
  #   command: ['do', 'something']

extraInitContainers: ""
  # - name: do-something
  #   image: busybox
  #   command: ['do', 'something']

# This is the PriorityClass settings as defined in
# https://kubernetes.io/docs/concepts/configuration/pod-priority-preemption/#priorityclass
priorityClassName: ""

# By default this will make sure two pods don't end up on the same node
# Changing this to a region would allow you to spread pods across regions
antiAffinityTopologyKey: "kubernetes.io/hostname"

# Hard means that by default pods will only be scheduled if there are enough nodes for them
# and that they will never end up on the same node. Setting this to soft will do this "best effort"
antiAffinity: "hard"

# This is the node affinity settings as defined in
# https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#node-affinity-beta-feature
nodeAffinity: {}

# The default is to deploy all pods serially. By setting this to parallel all pods are started at
# the same time when bootstrapping the cluster
podManagementPolicy: "Parallel"

httpPort: 9600

# Custom ports to add to logstash
extraPorts: []
  # - name: beats
  #   containerPort: 5001

updateStrategy: RollingUpdate

# This is the max unavailable setting for the pod disruption budget
# The default value of 1 will make sure that kubernetes won't allow more than 1
# of your pods to be unavailable during maintenance
maxUnavailable: 1

podSecurityContext:
  fsGroup: 1000
  runAsUser: 1000

securityContext:
  capabilities:
    drop:
    - ALL
  # readOnlyRootFilesystem: true
  runAsNonRoot: true
  runAsUser: 1000

# How long to wait for logstash to stop gracefully
terminationGracePeriod: 120

# Probes
# Default probes are using `httpGet` which requires that `http.host: 0.0.0.0` is part of
# `logstash.yml`. If needed probes can be disabled or overrided using the following syntaxes:
#
# disable livenessProbe
# livenessProbe: null
#
# replace httpGet default readinessProbe by some exec probe
# readinessProbe:
#   httpGet: null
#   exec:
#     command:
#       - curl
#      - localhost:9600

livenessProbe:
  httpGet:
    path: /
    port: http
  initialDelaySeconds: 300
  periodSeconds: 10
  timeoutSeconds: 5
  failureThreshold: 3
  successThreshold: 1

readinessProbe:
  httpGet:
    path: /
    port: http
  initialDelaySeconds: 60
  periodSeconds: 10
  timeoutSeconds: 5
  failureThreshold: 3
  successThreshold: 3

## Use an alternate scheduler.
## ref: https://kubernetes.io/docs/tasks/administer-cluster/configure-multiple-schedulers/
##
schedulerName: ""

nodeSelector: {}
tolerations: []

nameOverride: ""
fullnameOverride: ""

lifecycle: {}
  # preStop:
  #   exec:
  #     command: ["/bin/sh", "-c", "echo Hello from the postStart handler > /usr/share/message"]
  # postStart:
  #   exec:
  #     command: ["/bin/sh", "-c", "echo Hello from the postStart handler > /usr/share/message"]

service: {}
#  annotations: {}
#  type: ClusterIP
#  ports:
#    - name: beats
#      port: 5044
#      protocol: TCP
#      targetPort: 5044
#    - name: http
#      port: 8080
#      protocol: TCP
#      targetPort: 8080

部署至集群:

# 使用在线repo,也可以git clone下本地,都可以
% helm repo add elastic https://helm.elastic.co

# 部署至eck
% helm install log-logstash elastic/logstash -f log-logstash-value.yaml -n eck
NAME: log-logstash
LAST DEPLOYED: Wed Aug  5 12:12:13 2020
NAMESPACE: eck
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
1. Watch all cluster members come up.
  $ kubectl get pods --namespace=eck -l app=log-logstash-logstash -w

部署完毕后,会生成一堆index,这是我们刚刚定义的。

Logstash Index

接下来就是整体ELK的使用,部署的问题已基本完毕。

Last modification:August 6th, 2020 at 02:02 pm