精华内容
下载资源
问答
  • Prometheus监控k8s

    千次阅读 2020-05-13 16:33:54
    k8s监控方案 cadvisor+heapster+influxdb+grafana ...prometheus使用cadvisor采集容器监控指标,cadvisor集成在k8s的kubelet中-通过prometheus进程存储-使用grafana进行展现 node的监控-通过node_p

    k8s监控方案

    cadvisor+heapster+influxdb+grafana

    缺点:只能支持监控容器资源,无法支持业务监控,扩展性较差

    cadvisor/exporter+prometheus+grafana

    总体流程: 数据采集–>汇总–>处理–>存储–>展示

    • 容器的监控
      • prometheus使用cadvisor采集容器监控指标,cadvisor集成在k8s的kubelet中-通过prometheus进程存储-使用grafana进行展现
      • node的监控-通过node_pxporter采集当前主机的资源-通过prometheus进程存储-使用grafana进行展现
      • master的监控-通过kube-state-metrics插件从k8s中获取到apiserver的相关数据-通过prometheus进程存储-使用grafana进行展现

    kubernetes监控指标

    kubernetes自身的监控

    1. node的资源利用率-node节点上的cpu、内存、硬盘、链接
    2. node的数量-node数量与资源利用率、业务负载的比例情况、成本、资源扩展的评估
    3. pod的数量-当负载到一定程度时,node与pod的数量,评估负载到哪个阶段,大约需要多少服务器,每个pod的资源占用率如何,进行整体评估
    4. 资源对象状态-k8s在运行过程中,会创建很多pod,控制器,任务,这些内容都是由k8s中的资源对象进行维护,需要进行对资源对象的监控,获取资源对象的状态

    pod监控

    1. 每个项目中pod的数量-正常的pod数量,有问题的pod数量
    2. 容器资源利用率-统计当前pod的资源利用率,统计pod中的容器资源利用率,cpu、网络、内存评估
    3. 应用程序-项目中的程序的自身情况,如并发,请求响应,项目用户数量,订单数等

    实现思路

    监控指标        具体实现            举例
    pod性能           cadvisor            容器的cpu、内存利用率
    node性能      node-exporter       node节点的cpu、内存利用率
    k8s资源对象     kube-state-metrics  pod/deployment/service
    
    服务发现
    从kubernetes的api中去发现抓取的目标,并始终与kubernetes集群状态保持一致,
    动态的获取被抓取的目标,实时的从api中获取当前状态是否存在,
    
    官方文档
    https://prometheus.io/docs/prometheus/latest/configuration/configuration/#kubernetes_sd_config
    

    自动发现支持的组件:

    • node-自动发现集群中的node节点
    • pod-自动发现运行的容器和端口
    • service-自动发现创建的serviceIP、端口
    • endpoints-自动发现pod中的容器
    • ingress-自动发现创建的访问入口和规则

    使用prometheus监控k8s

    在k8s中部署prometheus

    官方部署文档: https://github.com/kubernetes/kubernetes/tree/master/cluster/addons/prometheus

    制作prometheus PV/PVC

    #安装依赖包
    yum -y install nfs-utils rpcbind
    
    #开机启动,
    systemctl enable rpcbind.service 
    systemctl enable nfs-server.service
    systemctl start rpcbind.service #端口是111
    systemctl start nfs-server.service # 端口是 2049 
    
    # 创建一个/data/pvdata的共享目录
    # mkdir /data/pvdata
    # chown nfsnobody:nfsnobody /data/pvdata
    # cat /etc/exports
    /data/pvdata 172.22.22.0/24(rw,async,all_squash)
    # exportfs -rv
    exporting 172.22.22.0/24:/data/pvdata
    

    下载prometheus yaml部署文件

    mkdir /data/k8s/yaml/kube-system/prometheus
    cd /data/k8s/yaml/kube-system/prometheus/
    
    # 从github官网下载yaml部署文件
    curl -O https://raw.githubusercontent.com/kubernetes/kubernetes/master/cluster/addons/prometheus/prometheus-rbac.yaml
    curl -O https://raw.githubusercontent.com/kubernetes/kubernetes/master/cluster/addons/prometheus/prometheus-configmap.yaml
    curl -O https://raw.githubusercontent.com/kubernetes/kubernetes/master/cluster/addons/prometheus/prometheus-service.yaml
    curl -O https://raw.githubusercontent.com/kubernetes/kubernetes/master/cluster/addons/prometheus/prometheus-statefulset.yaml
    

    修改statefulset.yaml

    # 删掉最下面的10行
      volumeClaimTemplates:
      - metadata:
          name: prometheus-data
        spec:
          storageClassName: standard
          accessModes:
            - ReadWriteOnce
          resources:
            requests:
              storage: "16Gi"
    
    # 新增下面3- name: prometheus-data
              persistentVolumeClaim:  
                claimName: prometheus-data
    

    新增pv/pvc yaml文件

    mkdir /data/pvdata/prometheus
    chown nfsnobody. /data/pvdata/prometheus
    
    cat > prometheus-pvc-data.yaml << EFO
    apiVersion: v1
    kind: PersistentVolume
    metadata:
      name: prometheus-data
    spec:
      storageClassName: prometheus-data
      capacity: 
        storage: 10Gi  
      accessModes: 
        - ReadWriteOnce  
      persistentVolumeReclaimPolicy: Recycle 
      nfs:
        path: /data/pvdata/prometheus
        server: 192.168.1.155
    
    ---
    apiVersion: v1
    kind: PersistentVolumeClaim
    metadata:
      name: prometheus-data
      namespace: kube-system  
    spec:
      accessModes:
        - ReadWriteOnce 
      resources:
        requests:
          storage: 10Gi 
      storageClassName: prometheus-data
    EFO
    

    新增Prometheus-ingress.yaml文件

    主要是方便外部grafana使用

    cat > prometheus-ingress.yaml << EFO
    apiVersion: extensions/v1beta1
    kind: Ingress
    metadata:
      name: prometheus-ingress
      namespace: kube-system
    spec:
      rules:
      - host: prometheus.baiyongjie.com
        http:
          paths:
          - backend:
              serviceName: prometheus
              servicePort: 9090
    EFO
    

    应用yaml文件

    # 部署顺序
    1. prometheus-rbac.yaml-对prometheus访问kube-apiserver进行授权
    2. prometheus-configmap.yaml-管理prometheus主配置文件
    3. prometheus-service.yaml-将prometheus暴露出去,可以访问
    4  prometheus-ingress.yaml-对外提供服务
    4. prometheus-pvc-data.yaml-为pod提供数据存储
    5. prometheus-statefulset.yaml-通过有状态的形式,将prometheus去部署
    6. prometheus-ingress.yaml-对外提供服务
    
    # 应用yaml文件
    kubectl apply -f prometheus-rbac.yaml 
    kubectl apply -f prometheus-configmap.yaml 
    kubectl apply  -f prometheus-ingress.yaml
    kubectl apply -f prometheus-pvc-data.yaml
    kubectl apply -f prometheus-service.yaml
    kubectl apply -f prometheus-statefulset.yaml
    
    # 查看部署情况
    [root@master prometheus]# kubectl get pv
    NAME              CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                         STORAGECLASS      REASON   AGE
    prometheus-data   10Gi       RWO            Recycle          Bound    kube-system/prometheus-data   prometheus-data            32m
    
    [root@master prometheus]# kubectl get pvc -n kube-system 
    NAME              STATUS   VOLUME            CAPACITY   ACCESS MODES   STORAGECLASS      AGE
    prometheus-data   Bound    prometheus-data   10Gi       RWO            prometheus-data   33m
    
    [root@master prometheus]# kubectl get service -n kube-system 
    NAME         TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)          AGE
    kube-dns     ClusterIP   10.96.0.10      <none>        53/UDP,53/TCP    12d
    prometheus   NodePort    10.107.69.131   <none>        9090/TCP   57m
    
    [root@master prometheus]# kubectl get statefulsets.apps  -n kube-system 
    NAME         READY   AGE
    prometheus   1/1     15m
    
    [root@master prometheus]# kubectl get ingresses.extensions -n kube-system 
    NAME                 HOSTS                       ADDRESS   PORTS   AGE
    prometheus-ingress   prometheus.baiyongjie.com             80      7m3s
    
    [root@master prometheus]# kubectl get pods -n kube-system  -o wide |grep prometheus
    NAME                             READY   STATUS    RESTARTS   AGE   IP              NODE     NOMINATED NODE   READINESS GATES
    prometheus-0                     2/2     Running   0          42s   10.244.1.6      node01   <none>           <none>
    

    访问ingress

    # 修改hosts文件,添加ingress域名解析
    192.168.1.156 prometheus.baiyongjie.com
    
    然后访问 http://prometheus.baiyongjie.com/graph
    

    img

    image.png

    部署node-exporter

    下载yaml文件

    curl -O https://raw.githubusercontent.com/kubernetes/kubernetes/master/cluster/addons/prometheus/node-exporter-ds.yml
    curl -O https://raw.githubusercontent.com/kubernetes/kubernetes/master/cluster/addons/prometheus/node-exporter-service.yaml
    

    由于我们要获取到的数据是主机的监控指标数据,而我们的 node-exporter 是运行在容器中的,所以我们在 Pod 中需要配置一些 Pod 的安全策略,这里我们就添加了hostPID: true、hostIPC: true、hostNetwork: true3个策略,用来使用主机的 PID namespace、IPC namespace 以及主机网络,这些 namespace 就是用于容器隔离的关键技术,要注意这里的 namespace 和集群中的 namespace 是两个完全不相同的概念。

    另外我们还将主机的/dev、/proc、/sys这些目录挂载到容器中,这些因为我们采集的很多节点数据都是通过这些文件夹下面的文件来获取到的,比如我们在使用top命令可以查看当前cpu使用情况,数据就来源于文件/proc/stat,使用free命令可以查看当前内存使用情况,其数据来源是来自/proc/meminfo文件。

    另外由于我们集群使用的是 kubeadm 搭建的,所以如果希望 master 节点也一起被监控,则需要添加响应的容忍。

    // 修改node-exporter-ds.yml文件
    添加
        spec:
          hostPID: true 
          hostIPC: true
          hostNetwork: true
          
          tolerations:
          - key: "node-role.kubernetes.io/master"
            operator: "Exists"
            effect: "NoSchedule"
            
          volumes:
            - name: proc
              hostPath:
                path: /proc
            - name: dev
              hostPath:
                path: /dev
            - name: sys
              hostPath:
                path: /sys
            - name: rootfs
              hostPath:
                path: /
    

    应用yaml文件

    kubectl apply -f node-exporter-service.yaml
    kubectl apply -f node-exporter-ds.yml 
    
    # 查看部署情况
    [root@master prometheus]# kubectl get pods -n kube-system |grep node-export 
    node-exporter-lb7gb              1/1     Running   0          4m59s
    node-exporter-q22zn              1/1     Running   0          4m59s
    
    [root@master prometheus]# kubectl get service -n kube-system |grep node-export      
    node-exporter   ClusterIP   None            <none>        9100/TCP        5m49s
    

    查看Prometheus是否获取到数据

    img

    image.png

    部署kube-state-metrics

    下载yaml文件

    curl -O https://raw.githubusercontent.com/kubernetes/kubernetes/master/cluster/addons/prometheus/kube-state-metrics-service.yaml
    curl -O https://raw.githubusercontent.com/kubernetes/kubernetes/master/cluster/addons/prometheus/kube-state-metrics-rbac.yaml
    curl -O https://raw.githubusercontent.com/kubernetes/kubernetes/master/cluster/addons/prometheus/kube-state-metrics-deployment.yaml
    

    应用yaml文件

    kubectl apply -f kube-state-metrics-service.yaml
    kubectl apply -f kube-state-metrics-rbac.yaml
    kubectl apply -f kube-state-metrics-deployment.yaml
    

    部署grafana

    生成yaml文件

    grafana-pvc.yaml

    mkdir /data/pvdata/prometheus-grafana
    chown nfsnobody. /data/pvdata/prometheus-grafana
    
    cat > grafana-pvc.yaml << EFO
    apiVersion: v1
    kind: PersistentVolume
    metadata:
      name: prometheus-grafana
    spec:
      storageClassName: prometheus-grafana
      capacity: 
        storage: 2Gi  
      accessModes: 
        - ReadWriteOnce  
      persistentVolumeReclaimPolicy: Recycle 
      nfs:
        path: /data/pvdata/prometheus-grafana
        server: 192.168.1.155
    
    ---
    apiVersion: v1
    kind: PersistentVolumeClaim
    metadata:
      name: prometheus-grafana
      namespace: kube-system  
    spec:
      accessModes:
        - ReadWriteOnce 
      resources:
        requests:
          storage: 2Gi 
      storageClassName: prometheus-grafana
    EFO
    

    grafana-ingress.yaml

    cat > grafana-ingress.yaml << EFO
    apiVersion: extensions/v1beta1
    kind: Ingress
    metadata:
       name: grafana
       namespace: kube-system
    spec:
       rules:
       - host: grafana.baiyongjie.com
         http:
           paths:
           - path: /
             backend:
              serviceName: grafana
              servicePort: 3000
    EFO          
    

    grafana-deployment.yaml

    # cat > grafana-deployment.yaml << EFO
    apiVersion: extensions/v1beta1
    kind: Deployment
    metadata:
      name: grafana
      namespace: kube-system
      labels:
        app: grafana
    spec:
      revisionHistoryLimit: 10
      template:
        metadata:
          labels:
            app: grafana
            component: prometheus
        spec:
          containers:
          - name: grafana
            env:
            - name: GF_SECURITY_ADMIN_USER
              value: admin
            - name: GF_SECURITY_ADMIN_PASSWORD
              value: admin
            image: grafana/grafana:5.3.0
            imagePullPolicy: IfNotPresent
            ports:
            - containerPort: 3000
              name: grafana
            readinessProbe:
              failureThreshold: 10
              httpGet:
                path: /api/health
                port: 3000
                scheme: HTTP
              initialDelaySeconds: 30
              periodSeconds: 10
              successThreshold: 1
              timeoutSeconds: 30
            livenessProbe:
              failureThreshold: 3
              httpGet:
                path: /api/health
                port: 3000
                scheme: HTTP
              periodSeconds: 10
              successThreshold: 1
              timeoutSeconds: 1
            resources:
              limits:
                cpu: 100m
                memory: 256Mi
              requests:
                cpu: 100m
                memory: 256Mi
            volumeMounts:
            - mountPath: /var/lib/grafana
              subPath: grafana
              name: grafana-volumes
          volumes:
          - name: grafana-volumes
            persistentVolumeClaim:
              claimName: prometheus-grafana
    EFO          
    

    部署yaml文件

    kubectl apply -f grafana-pvc.yaml
    kubectl apply -f grafana-ingress.yaml
    kubectl apply -f grafana-deployment.yaml
    
    # 查看部署情况
    [root@master prometheus]# kubectl get service -n kube-system |grep grafana
    grafana              ClusterIP   10.105.159.132   <none>        3000/TCP            150m
    
    [root@master prometheus]# kubectl get ingresses.extensions -n kube-system |grep grafana       
    grafana              grafana.baiyongjie.com                80      150m
    
    [root@master prometheus]# kubectl get pods -n kube-system |grep grafana                     
    grafana-6f6d77d98d-wwmbd              1/1     Running   0          53m
    

    配置grafana

    修改本地hosts文件添加ingress域名解析,然后访问 http://grafana.baiyongjie.com

    img

    image.png

    • 导入dashboard,推荐
      • 3131 Kubernetes All Nodes
      • 3146 Kubernetes Pods
      • 8685 K8s Cluster Summary
      • 10000 Cluster Monitoring for Kubernetes

    img

    image.png

    img

    image.png

    img

    image.png

    部署alertmanager

    下载yaml文件

    curl -O https://raw.githubusercontent.com/kubernetes/kubernetes/master/cluster/addons/prometheus/alertmanager-pvc.yaml
    curl -O https://raw.githubusercontent.com/kubernetes/kubernetes/master/cluster/addons/prometheus/alertmanager-service.yaml
    curl -O https://raw.githubusercontent.com/kubernetes/kubernetes/master/cluster/addons/prometheus/alertmanager-deployment.yaml
    curl -O https://raw.githubusercontent.com/kubernetes/kubernetes/master/cluster/addons/prometheus/alertmanager-configmap.yaml
    

    修改yaml文件

    alertmanager-pvc.yaml

    mkdir /data/pvdata/prometheus-alertmanager
    chown nfsnobody. /data/pvdata/prometheus-alertmanager
    
    cat > alertmanager-pvc.yaml  << EFO
    apiVersion: v1
    kind: PersistentVolume
    metadata:
      name: prometheus-alertmanager
    spec:
      storageClassName: prometheus-alertmanager
      capacity: 
        storage: 2Gi  
      accessModes: 
        - ReadWriteOnce  
      persistentVolumeReclaimPolicy: Recycle 
      nfs:
        path: /data/pvdata/prometheus-alertmanager
        server: 192.168.1.155
    
    ---
    apiVersion: v1
    kind: PersistentVolumeClaim
    metadata:
      name: prometheus-alertmanager
      namespace: kube-system  
    spec:
      accessModes:
        - ReadWriteOnce 
      resources:
        requests:
          storage: 2Gi 
      storageClassName: prometheus-alertmanager
    EFO
    

    alertmanager-deployment.yaml

    # 修改最后一行的claimName
            - name: storage-volume
              persistentVolumeClaim:
                claimName: prometheus-alertmanager
    

    应用yaml文件

    kubectl apply -f alertmanager-pvc.yaml
    kubectl apply -f alertmanager-configmap.yaml
    kubectl apply -f alertmanager-service.yaml
    kubectl apply -f alertmanager-deployment.yaml
    
    # 查看部署情况
    [root@master prometheus-ink8s]# kubectl get all -n kube-system  |grep alertmanager
    pod/alertmanager-c564cb9fc-bfrvb          2/2     Running   0          71s
    service/alertmanager         ClusterIP   10.102.208.66   <none>        80/TCP              5m44s
    deployment.apps/alertmanager         1/1     1            1           71s
    replicaset.apps/alertmanager-c564cb9fc          1         1         1       71s
    

    创建告警规则

    // 修改prometheus-configmap.yaml文件
    kubectl edit configmaps prometheus-config -n kube-system 
    
    // 在prometheus.yml: |下面添加
        alerting:
          alertmanagers:
          - static_configs:
            - targets:
              - alertmanager:80
        rule_files:
        - "/etc/config/rules.yml"
    
    
    // 创建告警规则, 在最下面添加
      rules.yml: |
        groups:
        - name: example
          rules:
          - alert: InstanceDown
            expr: up == 0
            for: 1m
            labels:
              severity: page
            annotations:
              summary: "Instance {{ $labels.instance }} down"
              description: "{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 1 minutes."
          - alert: NodeMemoryUsage
            expr: (sum(node_memory_MemTotal) - sum(node_memory_MemFree+node_memory_Buffers+node_memory_Cached) ) / sum(node_memory_MemTotal) * 100 > 20
            for: 2m
            labels:
              team: node
            annotations:
              summary: "{{$labels.instance}}: High Memory usage detected"
              description: "{{$labels.instance}}: Memory usage is above 20% (current value is: {{ $value }}"
       
    //  重载配置文件
    
    
    # kubectl apply -f prometheus-configmap.yaml
    # kubectl get service -n kube-system |grep prometheus
    prometheus           ClusterIP   10.111.97.89    <none>        9090/TCP            4h42m
    # curl -X POST http://10.111.97.89:9090/-/reload
    

    创建邮件告警

    # 修改alertmanager-configmap.yaml文件
    cat > alertmanager-configmap.yaml  << EFO
    apiVersion: v1
    kind: ConfigMap
    metadata:
      name: alertmanager-config
      namespace: kube-system
      labels:
        kubernetes.io/cluster-service: "true"
        addonmanager.kubernetes.io/mode: EnsureExists
    data:
      alertmanager.yml: |
        global:
          resolve_timeout: 3m #解析的超时时间
          smtp_smarthost: 'smtp.163.com:25'
          smtp_from: 'USERNAMR@163.com'
          smtp_auth_username: 'USERNAMR@163.com'
          smtp_auth_password: 'PASSWORD'
          smtp_require_tls: false
        
        route:
          group_by: ['example']
          group_wait: 60s
          group_interval: 60s
          repeat_interval: 12h
          receiver: 'mail'
        
        receivers:
        - name: 'mail'
          email_configs:
          - to: 'misterbyj@163.com'
            send_resolved: true
    EFO
    
    kubectl delete configmaps -n kube-system alertmanager-config 
    kubectl apply  -f alertmanager-configmap.yaml 
    

    查看告警

    ** 访问Prometheus, 查看是否有alerts告警规则 **

    img

    image.png

    img

    image.png

    img

     smtp_auth_password: 'PASSWORD'
      smtp_require_tls: false
    
    route:
      group_by: ['example']
      group_wait: 60s
      group_interval: 60s
      repeat_interval: 12h
      receiver: 'mail'
    
    receivers:
    - name: 'mail'
      email_configs:
      - to: 'misterbyj@163.com'
        send_resolved: true
    

    EFO

    kubectl delete configmaps -n kube-system alertmanager-config
    kubectl apply -f alertmanager-configmap.yaml

    
    ### 查看告警
    
    ** 访问Prometheus, 查看是否有alerts告警规则 **
    
    
    
    [外链图片转存中...(img-wAxD7TV1-1589358814308)]
    
    image.png
    
    
    
    [外链图片转存中...(img-lMRhpybN-1589358814310)]
    
    image.png
    
    
    
    [外链图片转存中...(img-b93FozyU-1589358814312)]
    
    image.png
    
    展开全文
  • 关于使用prometheus监控k8s集群|一、环境准备1、环境信息2、硬件环境信息二、云原生k8s集群安装三、部署Prometheus1、准备glusterfs共享存储1)创建存储卷目录2)创建存储卷3)启动存储卷4)查看存储卷状态2、创建安装...

    一、环境准备

    1、环境信息

    节点名称IP地址
    k8s-master1192.168.227.131
    k8s-node1192.168.227.132
    k8s-node2192.168.227.133

    2、硬件环境信息

    名称描述
    办公电脑winxp10
    虚拟机VMware® Workstation 15 Pro 15.5.1 build-15018445
    操作系统CentOS Linux 7 (Core)
    linux内核CentOS Linux (5.4.123-1.el7.elrepo.x86_64) 7 (Core)
    CPU至少2核(此版本的k8s要求至少2核,否则kubeadm init会报错)
    内存2G及其以上

    二、云原生k8s集群安装

    三、部署Prometheus

    1、准备glusterfs共享存储

    说明:prometheus安装的时候,需要使用到共享存储,存储采集的监控数据。
    我这里使用的glusterfs。
    glusterfs在centos7上的安装及其使用方法参见此链接:https://www.cnblogs.com/lingfenglian/p/11731849.html
    

    1)创建存储卷目录

    • 分别在glusterfs的三个节点上,创建存储卷的文件目录
    mkdir -p /data/k8s/volprome01
    

    2)创建存储卷

    gluster volume create volprome01 replica 3 k8s-master1:/data/k8s/volprome01 k8s-node1:/data/k8s/volprome01 k8s-node2:/data/k8s/volprome01 force
    

    3)启动存储卷

    gluster volume start volprome01
    

    4)查看存储卷状态

    gluster volume status
    

    在这里插入图片描述

    2、创建安装prometheus的命名空间(namespace)

    kubectl create ns prome-system
    说明:prometheus的相关资源将被安装在prome-system命名空间下面。
    

    3、安装node-exporter

    采用DaemonSet的方式的在k8s集群的每台服务器上部署node-exporter,node-exporter提供节点服务器的监控指标服务。
    

    部署node-exporter的yaml文件内容如下,根据实际情况修改里面的namespace,
    文件名:node-exporter-deploy.yaml

    apiVersion: apps/v1
    kind: DaemonSet
    metadata:
      name: node-exporter
      namespace: prome-system
      labels:
        name: node-exporter
    spec:
      selector:
        matchLabels:
          name: node-exporter
      template:
        metadata:
          labels:
            name: node-exporter
        spec:
          hostPID: true
          hostIPC: true
          hostNetwork: true
          containers:
          - name: node-exporter
            image: prom/node-exporter:v0.16.0
            ports:
            - containerPort: 9100
            resources:
              requests:
                cpu: 0.15
            securityContext:
              privileged: true
            args:
            - --path.procfs
            - /host/proc
            - --path.sysfs
            - /host/sys
            - --collector.filesystem.ignored-mount-points
            - '"^/(sys|proc|dev|host|etc)($|/)"'
            volumeMounts:
            - name: dev
              mountPath: /host/dev
            - name: proc
              mountPath: /host/proc
            - name: sys
              mountPath: /host/sys
            - name: rootfs
              mountPath: /rootfs
          tolerations:
          - key: "node-role.kubernetes.io/master"
            operator: "Exists"
            effect: "NoSchedule"
          volumes:
            - name: proc
              hostPath:
                path: /proc
            - name: dev
              hostPath:
                path: /dev
            - name: sys
              hostPath:
                path: /sys
            - name: rootfs
              hostPath:
                path: /
    
    

    4、安装kube-state-metrics

    kube-state-metrics是为prometheus采集k8s资源数据的exporter,kube-state-metrics能够采集绝大多数k8s内置资源的相关数据,例如pod、deploy、service等等。同时它也提供自己的数据,主要是资源采集个数和采集发生的异常次数统计。
    

    kube-state-metrics参照官方的文档进行安装即可,
    链接: Kubernetes Deployment.

    [root@k8s-master1 kube-state-metricsinstall]# git clone https://github.com/kubernetes/kube-state-metrics.git
    [root@k8s-master1 kube-state-metricsinstall]# cd kube-state-metrics/examples/standard/
    [root@k8s-master1 standard]# kubectl create -f .
    clusterrolebinding.rbac.authorization.k8s.io/kube-state-metrics created
    clusterrole.rbac.authorization.k8s.io/kube-state-metrics created
    deployment.apps/kube-state-metrics created
    serviceaccount/kube-state-metrics created
    service/kube-state-metrics created
    

    5、安装prometheus

    1)创建glusterfs-endpoints

    文件名:glusterfs-endpoints-prome.yaml

    备注:请根据实际情况修改大小,我设置的Prometheus保存时间比较小,所以这里设置的存储空间也比较小。

    apiVersion: v1
    kind: Endpoints
    metadata:
      name: glusterfs-cluster
      namespace: prome-system
    subsets:
    - addresses:
      - ip: 192.168.227.131
      - ip: 192.168.227.132
      - ip: 192.168.227.133
      ports:
      - port: 49153
        protocol: TCP
    
    

    2)创建PV/PVC

    文件名:prometheus-volume.yaml

    apiVersion: v1
    kind: PersistentVolume
    metadata:
      name: prometheus
    spec:
      capacity:
        storage: 2Gi
      accessModes:
      - ReadWriteMany
      glusterfs:
        endpoints: glusterfs-cluster
        path: volprome01
        readOnly: false
    ---
    apiVersion: v1
    kind: PersistentVolumeClaim
    metadata:
      name: prometheus
      namespace: prome-system
    spec:
      accessModes:
      - ReadWriteMany
      resources:
        requests:
          storage: 2Gi
    

    3)创建RBAC相关访问权限

    文件名:prometheus-rbac.yaml
    备注:prometheus采集k8s相关资源指标的时候,需要获取相关RBAC权限才可以访问。

    apiVersion: v1
    kind: ServiceAccount
    metadata:
      name: prometheus
      namespace: prome-system
    ---
    apiVersion: rbac.authorization.k8s.io/v1
    kind: ClusterRole
    metadata:
      name: prometheus
    rules:
    - apiGroups:
      - ""
      resources:
      - nodes
      - services
      - endpoints
      - pods
      - nodes/proxy
      verbs:
      - get
      - list
      - watch
    - apiGroups:
      - ""
      resources:
      - configmaps
      - nodes/metrics
      verbs:
      - get
    - nonResourceURLs:
      - /metrics
      verbs:
      - get
    ---
    apiVersion: rbac.authorization.k8s.io/v1beta1
    kind: ClusterRoleBinding
    metadata:
      name: prometheus
    roleRef:
      apiGroup: rbac.authorization.k8s.io
      kind: ClusterRole
      name: prometheus
    subjects:
    - kind: ServiceAccount
      name: prometheus
      namespace: prome-system
    

    4)创建prometheus的configmap对象

    文件名:prometheus-configmap.yaml
    备注:将prometheus.yml配置文件,挂载到k8s的configmap对象中。

    apiVersion: v1
    kind: ConfigMap
    metadata:
      name: prometheus-config
      namespace: prome-system
    data:
      prometheus.yml: | 
        global:
          scrape_interval: 15s
          scrape_timeout: 15s
        scrape_configs:
        - job_name: 'prometheus'
          static_configs:
          - targets: ['localhost:9090']
        - job_name: 'kubernetes-nodes'
          kubernetes_sd_configs:
          - role: node
          relabel_configs:
          - source_labels: [__address__]
            regex: '(.*):10250'
            replacement: '${1}:9100'
            target_label: __address__
            action: replace
          - action: labelmap
            regex: __meta_kubernetes_node_label_(.+)
        - job_name: 'kubernetes-kubelet'
          kubernetes_sd_configs:
          - role: node
          scheme: https
          tls_config:
            ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
            insecure_skip_verify: true
          bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
          relabel_configs:
          - action: labelmap
            regex: __meta_kubernetes_node_label_(.+)
        - job_name: 'container-pod-cadvisor'
          kubernetes_sd_configs:
          - role: node
          scheme: https
          tls_config:
            ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
          bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
          relabel_configs:
          - action: labelmap
            regex: __meta_kubernetes_node_label_(.+)
          - target_label: __address__
            replacement: kubernetes.default.svc:443
          - source_labels: [__meta_kubernetes_node_name]
            regex: (.+)
            target_label: __metrics_path__
            replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor
        - job_name: 'kubernetes-apiservers'
          kubernetes_sd_configs:
          - role: endpoints
          scheme: https
          tls_config:
            ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
          bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
          relabel_configs:
          - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
            action: keep
            regex: default;kubernetes;https
        - job_name: 'kubernetes-service-endpoints'
          kubernetes_sd_configs:
          - role: endpoints
          relabel_configs:
          - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
            action: keep
            regex: true
          - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
            action: replace
            target_label: __scheme__
            regex: (https?)
          - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
            action: replace
            target_label: __metrics_path__
            regex: (.+)
          - source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
            action: replace
            target_label: __address__
            regex: ([^:]+)(?::\d+)?;(\d+)
            replacement: $1:$2
          - action: labelmap
            regex: __meta_kubernetes_service_label_(.+)
          - source_labels: [__meta_kubernetes_namespace]
            action: replace
            target_label: kubernetes_namespace
          - source_labels: [__meta_kubernetes_service_name]
            action: replace
            target_label: kubernetes_name
    

    5)创建部署prometheus的deployment对象

    文件名:prometheus-deploy.yaml

    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: prometheus
      namespace: prome-system
      labels:
        app: prometheus
    spec:
      replicas: 1
      selector:
        matchLabels:
          app: prometheus
      template:
        metadata:
          labels:
            app: prometheus
        spec:
          serviceAccountName: prometheus
          containers:
          - image: prom/prometheus:v2.4.3
            imagePullPolicy: IfNotPresent
            name: prometheus
            command:
            - "/bin/prometheus"
            args:
            - "--config.file=/etc/prometheus/prometheus.yml"
            - "--storage.tsdb.path=/prometheus"
            - "--storage.tsdb.retention=2h"
            - "--web.enable-admin-api"  # 控制对admin HTTP API的访问,其中包括删除时间序列等功能
            - "--web.enable-lifecycle"  # 支持热更新,直接执行localhost:9090/-/reload立即生效
            ports:
            - containerPort: 9090
              protocol: TCP
              name: http
            volumeMounts:
            - mountPath: "/prometheus"
              subPath: prometheus
              name: data
            - mountPath: "/etc/prometheus"
              name: config-volume
            resources:
              requests:
                cpu: 100m
                memory: 512Mi
              limits:
                cpu: 100m
                memory: 512Mi
          securityContext:
            runAsUser: 0
          volumes:
          - name: data
            persistentVolumeClaim:
              claimName: prometheus
          - configMap:
              name: prometheus-config
            name: config-volume
    

    6)创建访问prometheus的service对象

    文件名:prometheus-svc.yaml

    apiVersion: v1
    kind: Service
    metadata:
      name: prometheus
      namespace: prome-system
      labels:
        app: prometheus
    spec:
      selector:
        app: prometheus
      type: NodePort
      ports:
        - nodePort: 30001
          name: web
          port: 9090
          targetPort: http
    

    7)开始部署prometheus

    • 执行如下命令,开始部署prometheus
      备注:将上面6个文件,放到同一个文件夹下面,我这里是放在prometheus-install文件夹下面的。
    cd /root/prometheus-install
    kubectl create -f .
    

    8)查看prometheus部署结果

    • 查看prometheus和node-exporter运行状态
    [root@k8s-master1 prometheus-install]# kubectl get pods -n prome-system
    NAME                          READY   STATUS    RESTARTS   AGE
    node-exporter-765w8           1/1     Running   0          127m
    node-exporter-kq9k6           1/1     Running   0          127m
    node-exporter-tpg4t           1/1     Running   0          127m
    prometheus-8446c7bbc7-qtbv2   1/1     Running   0          122m
    [root@k8s-master1 prometheus-install]#
    
    • 访问下prometheus ui页面
    http://192.168.227.131:30001
    备注:IP是k8s任一节点的服务器IP,端口是在prometheus-svc.yaml中设置的nodePort的端口
    

    能否访问到如下页面表示,部署prometheus成功
    在这里插入图片描述

    四、prometheus采集任务job与k8s集群资源关系

    在Kubernetes中,Promethues 通过与 Kubernetes API 集成,目前主要支持5中服务发现模式,分别是:Node、Service、Pod、Endpoints、Ingress。

    1)监控k8s集群节点

    Prometheus通过node-exporter来采集节点的监控指标数据。顾名思义,node_exporter 就是抓取用于采集服务器节点的各种运行指标,目前node_exporter支持几乎所有常见的监控点,比如 conntrack,cpu,diskstats,filesystem,loadavg,meminfo,netstat等,使用的kubernetes_sd_configs的node的服务发现。

        - job_name: 'kubernetes-nodes'
          kubernetes_sd_configs:
          - role: node
          relabel_configs:
          - source_labels: [__address__]
            regex: '(.*):10250'
            replacement: '${1}:9100'
            target_label: __address__
            action: replace
          - action: labelmap
            regex: __meta_kubernetes_node_label_(.+)
    

    2)容器监控

    Prometheus通过cAdvisor来采集容器的监控指标数据。目前,cAdvisor已经内置在了 kubelet 组件之中,所以我们不需要单独去安装,cAdvisor的数据路径为/api/v1/nodes//proxy/metrics,同样我们这里使用 node 的服务发现模式,因为每一个节点下面都有 kubelet,自然都有cAdvisor采集到的数据指标。使用的kubernetes_sd_configs的node的服务发现。

        - job_name: 'kubernetes-cadvisor'
          kubernetes_sd_configs:
          - role: node
          scheme: https
          tls_config:
            ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
          bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
          relabel_configs:
          - action: labelmap
            regex: __meta_kubernetes_node_label_(.+)
          - target_label: __address__
            replacement: kubernetes.default.svc:443
          - source_labels: [__meta_kubernetes_node_name]
            regex: (.+)
            target_label: __metrics_path__
            replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor
    

    3)基于service的endpoints的服务发现采集pod的指标数据

    使用的kubernetes_sd_configs的endpoints的服务发现
    kube-state-metrics是为prometheus采集k8s资源数据的exporter
    备注:用于监控k8s资源对象的kube-state-metrics服务,默认会创建名字为kube-state-metrics的service,而且设置了prometheus.io/scrape: “true”,故,下面的job_name: 'kubernetes-service-endpoints’会自动发现kube-state-metrics的service,从而采集到kube-state-metrics提供的监控指标数据。

        - job_name: 'kubernetes-service-endpoints'
          kubernetes_sd_configs:
          - role: endpoints
          relabel_configs:
          - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
            action: keep
            regex: true
          - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
            action: replace
            target_label: __scheme__
            regex: (https?)
          - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
            action: replace
            target_label: __metrics_path__
            regex: (.+)
          - source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
            action: replace
            target_label: __address__
            regex: ([^:]+)(?::\d+)?;(\d+)
            replacement: $1:$2
          - action: labelmap
            regex: __meta_kubernetes_service_label_(.+)
          - source_labels: [__meta_kubernetes_namespace]
            action: replace
            target_label: kubernetes_namespace
          - source_labels: [__meta_kubernetes_service_name]
            action: replace
            target_label: kubernetes_name
    

    4)k8s技术组件监控

    使用的kubernetes_sd_configs的endpoints的服务发现

    • kube-apiserver监控
        - job_name: kubernetes-apiservers
          scrape_interval: 30s
          scrape_timeout: 20s
          metrics_path: /metrics
          scheme: https
          kubernetes_sd_configs:
          - role: endpoints
          bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
          tls_config:
            ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
            insecure_skip_verify: false
          relabel_configs:
          - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
            separator: ;
            regex: default;kubernetes;https
            replacement: $1
            action: keep
    
    • etcd监控
      使用的kubernetes_sd_configs的endpoints的服务发现。
      etcd开启了https证书访问,这里需要把etcd证书,创建为secret对象并挂载到prometheus的pod里面,这样在访问etcd采集指标数据的时候,会获取到其证书。
      采用kubeadm安装的k8s集群,etcd证书的默认路径是/etc/kubernetes/pki/etcd
    kubectl -n prome-system create secret generic etcd-certs --from-file=/etc/kubernetes/pki/etcd/healthcheck-client.crt --from-file=/etc/kubernetes/pki/etcd/healthcheck-client.key --from-file=/etc/kubernetes/pki/etcd/ca.crt
    

    挂载到prometheus的pod里面

    kubectl edit deploy -n prome-system prometheus
            volumeMounts:
            - mountPath: "/prometheus"
              subPath: prometheus
              name: data
            - mountPath: "/etc/prometheus"
              name: config-volume
            - mountPath: /etc/kubernetes/pki/etcd
              name: etcd-certs
              readOnly: true
    
          volumes:
          - name: data
            persistentVolumeClaim:
              claimName: prometheus
          - configMap:
              name: prometheus-config
            name: config-volume
          - name: etcd-certs
            secret:
              defaultMode: 420
              secretName: etcd-certs
    

    再者,需要新增一个etcd的svc,etcd-svc.yaml

    apiVersion: v1
    kind: Service
    metadata:
      name: etcd
      namespace: kube-system
    spec:
      ports:
      - name: https
        port: 2379
        protocol: TCP
        targetPort: 2379
      type: ClusterIP
      selector:
        component: etcd
    

    prometheus的采集任务job内容如下,

        - job_name: 'etcd'
          kubernetes_sd_configs:
          - role: endpoints
          scheme: https
          tls_config:
            cert_file: /etc/kubernetes/pki/etcd/healthcheck-client.crt
            key_file: /etc/kubernetes/pki/etcd/healthcheck-client.key
            insecure_skip_verify: true
          relabel_configs:
          - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
            action: keep
            regex: kube-system;etcd;https
    
    • kube-proxy监控
      因为,k8s集群的每个节点都有kube-proxy服务,这里使用的kubernetes_sd_configs的node的服务发现。
      k8s安装后,kube-proxy的默认metrics指标地址为:
      metricsBindAddress: 127.0.0.1:10249
      需要修改为0.0.0.0:10249,
    kubectl edit cm -n kube-system kube-proxy
    

    prometheus的采集任务job内容如下,

        - job_name: 'kube-proxy'
          kubernetes_sd_configs:
          - role: node
          relabel_configs:
          - source_labels: [__address__]
            regex: '(.*):10250'
            replacement: '${1}:10249'
            target_label: __address__
            action: replace
          - action: labelmap
            regex: __meta_kubernetes_node_label_(.+)
    
    展开全文
  • prometheus 最好搭建在k8s集群里面(也就是docker里) prometheus 一些配置文件可以再github上找到 https://github.com/coreos/kube-prometheus 部署/root/kube-prometheus/manifests 目录下所有文件 kubectl apply ...

    此文章使用k8s集群搭建了prometheus监控,并且监控了k8s node节点 等一些细节部分。
    本文格式类似于随笔格式,没有详细描述。
    基础篇文章可以查看我以前的博客
    博客地址:https://blog.csdn.net/zeorg/article/details/112075071
    prometheus 最好搭建在k8s集群里面(也就是docker里)。

    prometheus 一些配置文件可以再github上找到。

    https://github.com/coreos/kube-prometheus

    部署/root/kube-prometheus/manifests 目录下所有文件
    kubectl apply -f /root/kube-prometheus/manifests/
    部署/root/kube-prometheus/manifests/setup  目录下所有文件
    kubectl apply -f /root/kube-prometheus/manifests/setup/
    要注意的是自己要创建一个工作空间
    如果报错执行下面语句
    kubectl apply -f https://raw.githubusercontent.com/prometheus-operator/prometheus-operator/release-0.43/example/prometheus-operator-crd/monitoring.coreos.com_alertmanagerconfigs.yaml
    kubectl apply -f https://raw.githubusercontent.com/prometheus-operator/prometheus-operator/release-0.43/example/prometheus-operator-crd/monitoring.coreos.com_alertmanagers.yaml
    kubectl apply -f https://raw.githubusercontent.com/prometheus-operator/prometheus-operator/release-0.43/example/prometheus-operator-crd/monitoring.coreos.com_podmonitors.yaml
    kubectl apply -f https://raw.githubusercontent.com/prometheus-operator/prometheus-operator/release-0.43/example/prometheus-operator-crd/monitoring.coreos.com_probes.yaml
    kubectl apply -f https://raw.githubusercontent.com/prometheus-operator/prometheus-operator/release-0.43/example/prometheus-operator-crd/monitoring.coreos.com_prometheuses.yaml
    kubectl apply -f https://raw.githubusercontent.com/prometheus-operator/prometheus-operator/release-0.43/example/prometheus-operator-crd/monitoring.coreos.com_prometheusrules.yaml
    kubectl apply -f https://raw.githubusercontent.com/prometheus-operator/prometheus-operator/release-0.43/example/prometheus-operator-crd/monitoring.coreos.com_servicemonitors.yaml
    kubectl apply -f https://raw.githubusercontent.com/prometheus-operator/prometheus-operator/release-0.43/example/prometheus-operator-crd/monitoring.coreos.com_thanosrulers.yaml
    
    
    部署完之后可以执行下面命令查看状态
    kubectl get pod -n monitoring 
    
    kubectl get svc -n monitoring 
    
    kubectl top node  (如果不部署setup下的环境这里会出错)
    

    环境部署成功了
    grafana 默认账号密码都是admin

    在这里插入图片描述

    展开全文
  • 理论 提示:在这里部署的prometheus,是使用的coreos提供的prometheus项目 ...Prometheus node-exporter:收集k8s集群资源的数据,指定告警规则。 Prometheus:收集apiserver,scheduler,controller-ma

    理论

    提示:在这里部署的prometheus,是使用的coreos提供的prometheus项目

    MetricsServer:是k8s集群资源使用情况的聚合器,收集数据给k8s集群内使用,如kubectl,hpa,scheduler等。

    Prometheus Operator: 是一个系统检测和警报工具箱,用来存储监控数据。

    Prometheus node-exporter:收集k8s集群资源的数据,指定告警规则。

    Prometheus:收集apiserver,scheduler,controller-manager,kubelet组件的数据,通过http协议传输。

    Grafana:可视化数据统计和监控平台。


    示例

    1、在git克隆prometheus的项目地址到本地。但是,项目已被删除
    git clone https://github.com/imirsh/kube-prometheus.git
    但是,网络原因有些时候下载不下来,最好提前下好

    2、修改grafana-service.yaml文件,更改为nodePort的暴露方式,暴露端口为31001或者不暴露,用命令查看随机端口
    下面文件除*外添加镜像下载策略imagePullPolicy: IfNotPresent;有的去掉注释即可

    [root@master manifests]# pwd
    /root/prometheus/kube-prometheus/manifests
    sed -i 's/quay.io/quay.mirrors.ustc.edu.cn/g' prometheus-prometheus.yaml 
    sed -i 's/quay.io/quay.mirrors.ustc.edu.cn/g' alertmanager-alertmanager.yaml  
       sed -i 's/quay.io/quay.mirrors.ustc.edu.cn/g' prometheus-adapter-deployment.yaml  
    sed -i 's/quay.io/quay.mirrors.ustc.edu.cn/g' node-exporter-daemonset.yaml 
       sed -i 's/quay.io/quay.mirrors.ustc.edu.cn/g' kube-state-metrics-deployment.yaml 
    * sed -i 's/quay.io/quay.mirrors.ustc.edu.cn/g' prometheus-operator-serviceMonitor.yaml 
    

    注意:这个路径特殊
    请添加图片描述
    演示一个:所有的都必须修改,否则会下载错误
    在这里插入图片描述

    3、 所有节点必须提前上传所需镜像,并导入,由于设置镜像策略,本地有,不从网上下载,所以启动部署很快!
    进入镜像目录,使用shell命令自动批量载入
    for i in ./*.tar ; do docker load -i $i ; done
    在这里插入图片描述

    4、 将这两个目录中的yaml文件,全部运行。有可能因为目录内yaml文件过多,一次不能全部运行,所以运行的时候,多运行两遍
    在这里插入图片描述
    运行另一个目录
    在这里插入图片描述
    5、验证:全部是running状态即可!!!
    在这里插入图片描述

    查看随机映射端口
    在这里插入图片描述
    6、浏览器输入:master的IP加随机端口,用户名: admin;密码: admin
    在这里插入图片描述
    7、下载模板导入即可!
    官方模板插件:
    https://grafana.com/grafana/dashboards/8588
    请添加图片描述


    展开全文
  • Prometheus监控k8s v1.22

    2021-11-01 19:34:35
    一、k8s部署Prometheus 获取最新更新以及文章用到的软件包,请移步点击:查看更新 1、组件说明 1)MetricServer:是kubernetes集群资源使用情况的聚合器,收集数据给kubernetes集群内使用,如kubectl,hpa,scheduler...
  • 使用Prometheus监控k8s node节点,借助node-exporter实现监控node节点状态。 1.先确定node-exporter状态正常:(我的有两个worker节点。所以有两个node-exporter的pod) kubectl get pods -n kube-system -o wide | ...
  • K8s部署prometheus监控k8s

    千次阅读 2021-01-01 22:43:39
    1、prometheus 作为监控k8s的最佳选择在这里做一个在k8s部署prometheus文档供大家参考。 #私自转载请联系博主否则必定追究版权 下方有微信 系统环境: IP 节点名称(不是主机名) 192.168.182.150 k8s-...
  • state-metrics部署 CSDN:K8S资源对象监控kube-state-metrics-2.0.0镜像及资源清单文件 Github:K8S资源对象监控kube-state-metrics-2.0.0资源清单文件 注意:根据k8s集群版本选择对应的kube-state-metrics版本,...
  • 本文介绍Prometheus 监控及在k8s集群中使用node-exporter、prometheus、grafana对集群进行监控。实现原理类似ELK、EFK组合。node-exporter组件负责收集节点上的metrics监控数据,并将数据推送给prometheus, ...
  • 1、部署prometheus、grafana 监控主机上部署prometheus、grafana docker run -d -p 9090:9090 -v /data/prometheus/prometheus.yml:/etc/prometheus/prometheus.yml prom/prometheus
  • K8S集群部署kube-Prometheus监控k8s集群内pod容器应用的jvm 接上一篇:K8S集群部署kube-Prometheus监控tomcat jvm 今天来实施k8s集群部署Kube-Prometheus 监控K8S集群内pod容器应用的jvm 一、环境描述 有一个jar包...
  • apiserver 作为 Kubernetes 最核心的组件,当然他的监控也是非常有必要的,对于 ...[root@k8s-master ~]# kubectl get svc NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE kubernetes ClusterIP 10.1.0.1 <none>
  • 一、k8s部署Prometheus 获取最新更新以及文章用到的软件包,请移步点击:查看更新 1、组件说明 1)MetricServer:是kubernetes集群资源使用情况的聚合器,收集数据给kubernetes集群内使用,如kubectl,hpa,scheduler...
  • apiserver 实际上是一种特殊的 Service,现在我们同样来配置一个任务用来专门发现普通类型的 Service: ... - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape] acti
  • prometheus监控k8s集群

    2020-11-04 14:29:21
    prometheus监控k8s集群简介环境介绍安装部署部署metrics部署prometheus部署alertmanager部署node-exporter部署blackbox-exporter查看所有pod状态查看界面及日志,确认运行状态 简介 在日常的运维工作中,为了减少...
  • 基于prometheus监控k8s集群

    万次阅读 2016-11-10 18:56:23
    本文建立在你已经会安装prometheus服务的基础之上,如果你还不会安装,请参考:prometheus多维度监控容器如果你还没有安装库k8s集群,情参考: 从零开始搭建基于calico的kubenetes前言kubernetes显然已成为各大公司...
  • prometheus是一名google的前员工写的,也是go语言写的,K8S是第一个托管的项目,prometheus是第二个 这里就有一些很繁杂的表达式 普罗米修斯源码托管地址 2.4 Prometheus监控特点 监控特点: 最重要的是支持...
  • 一般,我们从网上看到的...都是用prometheus监控k8s的各项资源, 如api server, namespace, pod, node等。 那如果是自己的业务pod上的自定义metrics呢? 比如,一个业务pod开放了/xxx/metrics, 那么,如果用pro...
  • 使用prometheus监控k8s中运行的微服务的JVM状态,我们需要用到actutor,micrometer包
  • 总目录索引:Helm 从入门到放弃系列 1、特征 普罗米修斯的主要特点是: 一个多维数据模型,其中包含通过度量标准名称和键/值对标识的时间序列数据 ...Prometheus生态系统包含多个组件,其中许多是...
  • 25.prometheus监控k8s集群 一、node-exporter node_exporter抓取用于采集服务器节点的各种运行指标,比如 conntrack,cpu,diskstats,filesystem,loadavg,meminfo,netstat等 更多查看:...
  • kind: ServiceAccount name: prometheus-k8s namespace: monitoring 使用以下命令确认RBAC是否创建成功: $ kubectl get sa prometheus-k8s -n monitoring NAME SECRETS AGE prometheus-k8s 1 1d $ kubectl get ...
  • Prometheus 监控K8S集群中Pod 目前cAdvisor集成到了kubelet组件内,可以在kubernetes集群中每个启动了kubelet的节点使用cAdvisor提供的metrics接口获取该节点所有容器相关的性能指标数据。cAdvisor对外提供服务的...
  • 对于使用prometheus系统进行监控的,主要围绕以下几个点进行: 系统层面监控 系统监控:CPU,Load,Memory,Swap,Disk IO,Processes,Kernel等等 网络监控:网络设备,工作负载,网络延迟,丢包率等 中间件及...
  • 前面我们已经介绍了通过cadvisor和node-exporter来监控k8s集群容器和主机资源,今天向大家介绍一下kube-state-metrics对k8s集群的监控,那它主要是监控哪些内容的呢?我们先看一下官方的介绍 kube-state-metrics is ...
  • 使用prometheus监控k8s的cAdvisor和kubelet值 参考:http://www.sohu.com/a/312646169_618296 https://www.cnblogs.com/aguncn/p/9929684.html 从Kubernetes版本1.10起,cAdvisor的UI已经差不多被弃用了,...

空空如也

空空如也

1 2 3 4 5 ... 20
收藏数 8,095
精华内容 3,238
关键字:

prometheus监控k8s

友情链接: zxbglmb.rar