精华内容
下载资源
问答
  • 测试环境 由于一开始资源有限使用虚拟机机部署节点都是单节点,随着使用频繁业务量增加从新采购新的服务器把以前的master及etcd节点迁移到新采购服务器上面同时增加节点至3节点提供高可用环境 环境: etcd 旧 ...

    业务场景:

    测试环境 由于一开始资源有限使用虚拟机机部署节点都是单节点,随着使用频繁业务量增加从新采购新的服务器把以前的master及etcd 单节点迁移到新采购服务器上面同时增加节点至3节点提供高可用环境

    环境:

    etcd 旧 节点IP: 192.168.30.31
    etcd 新节点IP:192.168.30.17,192.168.30.18,192.168.30.19
    kube-apiserver  旧节点IP:192.168.30.32
    kube-apiserver 新节点IP:192.168.30.17,192.168.30.18,192.168.30.19
    kube-apiserver  vipIP: 192.168.30.254
    kube-apiserver 启动应用  kube-apiserver  kube-controller-manager  kube-scheduler
    节点hostname node03 node4 node5

    etcd 节点添加

    # 操作节点:192.168.30.31
    # 配置etcd 操作环境 API V3版本操作
    修改 /etc/profile 添加
    export ETCDCTL_API=3
    export ENDPOINTS=https://192.168.30.31:2379
    source /etc/profile
    修改 ~/.bashrc 添加
    alias etcdctl='/apps/etcd/bin/etcdctl --endpoints=${ENDPOINTS} --cacert=/apps/etcd/ssl/etcd-ca.pem --cert=/apps/etcd/ssl/etcd_client.pem   --key=/apps/etcd/ssl/etcd_client-key.pem'
    source  ~/.bashrc
    测试配置是否正确
    etcdctl endpoint health
    [root@etcd ~]# etcdctl endpoint health
    https://192.168.30.31:2379 is healthy: successfully committed proposal: took = 20.258113ms
    输出正常证明配置正确
    # 备份etcd 数据 一定要进行备份如果不备份出错只能重新部署了
    etcdctl  snapshot save snapshot.db
    # 出现问题还原数据
    etcdctl snapshot restore ./snapshot.db   --name=etcd \
    --initial-advertise-peer-urls=https://192.168.30.31:2380  \
    --initial-cluster-token=etcd-cluster-0 \
    --initial-cluster=etcd=https://192.168.30.31:2380  \
    --data-dir=/apps/etcd/data/default.etcd
    # 生成新的数组证书
    ## 创建 ETCD Server 配置文件
    export ETCD_SERVER_IPS=" \
        \"192.168.30.31\", \
        \"192.168.30.17\", \
        \"192.168.30.18\", \
        \"192.168.30.19\" \
    " && \
    export ETCD_SERVER_HOSTNAMES=" \
        \"etcd \", \
        \"etcd03 \", \
            \"etcd4 \", \
        \"etcd5\" \
    " && \
    cat << EOF | tee /opt/k8s/cfssl/etcd/etcd_server.json
    {
      "CN": "etcd",
      "hosts": [
        "127.0.0.1",
        ${ETCD_SERVER_IPS},
        ${ETCD_SERVER_HOSTNAMES}
      ],
      "key": {
        "algo": "rsa",
        "size": 2048
      },
      "names": [
        {
          "C": "CN",
          "ST": "GuangDong",
          "L": "GuangZhou",
          "O": "niuke",
          "OU": "niuke"
        }
      ]
    }
    EOF
    ## 生成 ETCD Server 证书和私钥
    cfssl gencert \
        -ca=/opt/k8s/cfssl/pki/etcd/etcd-ca.pem \
        -ca-key=/opt/k8s/cfssl/pki/etcd/etcd-ca-key.pem \
        -config=/opt/k8s/cfssl/ca-config.json \
        -profile=kubernetes \
        /opt/k8s/cfssl/etcd/etcd_server.json | \
        cfssljson -bare /opt/k8s/cfssl/pki/etcd/etcd_server
    
    ## 创建 ETCD Member 2 配置文件
    export ETCD_MEMBER_2_IP=" \
        \"192.168.30.17\" \
    " && \
    export ETCD_MEMBER_2_HOSTNAMES="etcd03\
    " && \
    cat << EOF | tee /opt/k8s/cfssl/etcd/${ETCD_MEMBER_2_HOSTNAMES}.json
    {
      "CN": "etcd",
      "hosts": [
        "127.0.0.1",
        ${ETCD_MEMBER_2_IP},
        "${ETCD_MEMBER_2_HOSTNAMES}"
      ],
      "key": {
        "algo": "rsa",
        "size": 2048
      },
      "names": [
        {
          "C": "CN",
          "ST": "GuangDong",
          "L": "GuangZhou",
          "O": "niuke",
          "OU": "niuke"
        }
      ]
    }
    EOF
    ## 生成 ETCD Member 2 证书和私钥
    cfssl gencert \
        -ca=/opt/k8s/cfssl/pki/etcd/etcd-ca.pem \
        -ca-key=/opt/k8s/cfssl/pki/etcd/etcd-ca-key.pem \
        -config=/opt/k8s/cfssl/ca-config.json \
        -profile=kubernetes \
        /opt/k8s/cfssl/etcd/${ETCD_MEMBER_2_HOSTNAMES}.json | \
        cfssljson -bare /opt/k8s/cfssl/pki/etcd/etcd_member_${ETCD_MEMBER_2_HOSTNAMES}
    
            ## 创建 ETCD Member 3 配置文件
    export ETCD_MEMBER_3_IP=" \
        \"192.168.30.18\" \
    " && \
    export ETCD_MEMBER_3_HOSTNAMES="etcd4\
    " && \
    cat << EOF | tee /opt/k8s/cfssl/etcd/${ETCD_MEMBER_3_HOSTNAMES}.json
    {
      "CN": "etcd",
      "hosts": [
        "127.0.0.1",
        ${ETCD_MEMBER_3_IP},
        "${ETCD_MEMBER_3_HOSTNAMES}"
      ],
      "key": {
        "algo": "rsa",
        "size": 2048
      },
      "names": [
        {
          "C": "CN",
          "ST": "GuangDong",
          "L": "GuangZhou",
          "O": "niuke",
          "OU": "niuke"
        }
      ]
    }
    EOF
    ## 生成 ETCD Member 3 证书和私钥
    cfssl gencert \
        -ca=/opt/k8s/cfssl/pki/etcd/etcd-ca.pem \
        -ca-key=/opt/k8s/cfssl/pki/etcd/etcd-ca-key.pem \
        -config=/opt/k8s/cfssl/ca-config.json \
        -profile=kubernetes \
        /opt/k8s/cfssl/etcd/${ETCD_MEMBER_3_HOSTNAMES}.json | \
        cfssljson -bare /opt/k8s/cfssl/pki/etcd/etcd_member_${ETCD_MEMBER_3_HOSTNAMES}
    
            ## 创建 ETCD Member 4 配置文件
    export ETCD_MEMBER_4_IP=" \
        \"192.168.30.19\" \
    " && \
    export ETCD_MEMBER_4_HOSTNAMES="etcd5\
    " && \
    cat << EOF | tee /opt/k8s/cfssl/etcd/${ETCD_MEMBER_4_HOSTNAMES}.json
    {
      "CN": "etcd",
      "hosts": [
        "127.0.0.1",
        ${ETCD_MEMBER_4_IP},
        "${ETCD_MEMBER_4_HOSTNAMES}"
      ],
      "key": {
        "algo": "rsa",
        "size": 2048
      },
      "names": [
        {
          "C": "CN",
          "ST": "GuangDong",
          "L": "GuangZhou",
          "O": "niuke",
          "OU": "niuke"
        }
      ]
    }
    EOF
    ## 生成 ETCD Member 4证书和私钥
    cfssl gencert \
        -ca=/opt/k8s/cfssl/pki/etcd/etcd-ca.pem \
        -ca-key=/opt/k8s/cfssl/pki/etcd/etcd-ca-key.pem \
        -config=/opt/k8s/cfssl/ca-config.json \
        -profile=kubernetes \
        /opt/k8s/cfssl/etcd/${ETCD_MEMBER_4_HOSTNAMES}.json | \
        cfssljson -bare /opt/k8s/cfssl/pki/etcd/etcd_member_${ETCD_MEMBER_4_HOSTNAMES}
    
    #分发证书到每个节点
    scp -r  /opt/k8s/cfssl/pki/etcd/etcd* root@192.168.30.17: /apps/etcd/ssl/
    scp -r  /opt/k8s/cfssl/pki/etcd/etcd* root@192.168.30.18: /apps/etcd/ssl/
    scp -r  /opt/k8s/cfssl/pki/etcd/etcd* root@192.168.30.19: /apps/etcd/ssl/
    # 数据备份完成 添加节点
     etcdctl member add node03  --peer-urls=https://192.168.30.17:2380
    ##########################################################
    etcdctl member add etcd03 https://192.168.30.17:2380
    Added member named etcd03 with ID 92bf7d7f20e298fc to cluster
    ETCD_NAME="etcd03"
    ETCD_INITIAL_CLUSTER="etcd03=https://192.168.30.17:2380,etcd=https://192.168.30.31:2380"
    ETCD_INITIAL_CLUSTER_STATE="existing"
    #################################################################################
    # 修改启动配文件
    ETCD_OPTS="--name=node03 \
               --data-dir=/apps/etcd/data/default.etcd \
               --listen-peer-urls=https://192.168.30.17:2380 \
               --listen-client-urls=https://192.168.30.17:2379,https://127.0.0.1:2379 \
               --advertise-client-urls=https://192.168.30.17:2379 \
               --initial-advertise-peer-urls=https://192.168.30.17:2380 \
               --initial-cluster=etcd=https://192.168.30.31:2380,node03=https://192.168.30.17:2380 \
               --initial-cluster-token=etcd=https://192.168.30.31:2380,node03=https://192.168.30.17:2380 \
               --initial-cluster-state=existing \
               --heartbeat-interval=6000 \
               --election-timeout=30000 \
               --snapshot-count=5000 \
               --auto-compaction-retention=1 \
               --max-request-bytes=33554432 \
               --quota-backend-bytes=17179869184 \
               --trusted-ca-file=/apps/etcd/ssl/etcd-ca.pem \
               --cert-file=/apps/etcd/ssl/etcd_server.pem \
               --key-file=/apps/etcd/ssl/etcd_server-key.pem \
               --peer-cert-file=/apps/etcd/ssl/etcd_member_node03.pem \
               --peer-key-file=/apps/etcd/ssl/etcd_member_node03-key.pem \
               --peer-client-cert-auth \
               --peer-trusted-ca-file=/apps/etcd/ssl/etcd-ca.pem"
    
    # 启动 node03 节点 etcd
    service etcd start
    修改 /etc/profile 添加新节点
    export ENDPOINTS=https://192.168.30.17:2379,https://192.168.30.31:2379
     source /etc/profile
     etcdctl endpoint status
    # 查看数据存储大小是否一致如果一致添加新的节点
     etcdctl member add node4  --peer-urls=https://192.168.30.18:2380
     ETCD_OPTS="--name=node4 \
               --data-dir=/apps/etcd/data/default.etcd \
               --listen-peer-urls=https://192.168.30.18:2380 \
               --listen-client-urls=https://192.168.30.18:2379,https://127.0.0.1:2379 \
               --advertise-client-urls=https://192.168.30.18:2379 \
               --initial-advertise-peer-urls=https://192.168.30.18:2380 \
               --initial-cluster=node4=https://192.168.30.18:2380,etcd=https://192.168.30.31:2380,node03=https://192.168.30.17:2380 \
               --initial-cluster-token=node4=https://192.168.30.18:2380,etcd=https://192.168.30.31:2380,node03=https://192.168.30.17:2380 \
               --initial-cluster-state=existing \
               --heartbeat-interval=6000 \
               --election-timeout=30000 \
               --snapshot-count=5000 \
               --auto-compaction-retention=1 \
               --max-request-bytes=33554432 \
               --quota-backend-bytes=17179869184 \
               --trusted-ca-file=/apps/etcd/ssl/etcd-ca.pem \
               --cert-file=/apps/etcd/ssl/etcd_server.pem \
               --key-file=/apps/etcd/ssl/etcd_server-key.pem \
               --peer-cert-file=/apps/etcd/ssl/etcd_member_node4.pem \
               --peer-key-file=/apps/etcd/ssl/etcd_member_node4-key.pem \
               --peer-client-cert-auth \
               --peer-trusted-ca-file=/apps/etcd/ssl/etcd-ca.pem"
    
     etcdctl member add node5  --peer-urls=https://192.168.30.19:2380
     ETCD_OPTS="--name=node5 \
               --data-dir=/apps/etcd/data/default.etcd \
               --listen-peer-urls=https://192.168.30.19:2380 \
               --listen-client-urls=https://192.168.30.19:2379,https://127.0.0.1:2379 \
               --advertise-client-urls=https://192.168.30.19:2379 \
               --initial-advertise-peer-urls=https://192.168.30.19:2380 \
               --initial-cluster=node4=https://192.168.30.18:2380,etcd=https://192.168.30.31:2380,node5=https://192.168.30.19:2380,node03=https://192.168.30.17:2380 \
               --initial-cluster-token=node4=https://192.168.30.18:2380,etcd=https://192.168.30.31:2380,node5=https://192.168.30.19:2380,node03=https://192.168.30.17:2380 \
               --initial-cluster-state=existing \
               --heartbeat-interval=6000 \
               --election-timeout=30000 \
               --snapshot-count=5000 \
               --auto-compaction-retention=1 \
               --max-request-bytes=33554432 \
               --quota-backend-bytes=17179869184 \
               --trusted-ca-file=/apps/etcd/ssl/etcd-ca.pem \
               --cert-file=/apps/etcd/ssl/etcd_server.pem \
               --key-file=/apps/etcd/ssl/etcd_server-key.pem \
               --peer-cert-file=/apps/etcd/ssl/etcd_member_node5.pem \
               --peer-key-file=/apps/etcd/ssl/etcd_member_node5-key.pem \
               --peer-client-cert-auth \
               --peer-trusted-ca-file=/apps/etcd/ssl/etcd-ca.pem"
    ####
    修改  /etc/profile
    export ENDPOINTS=https://192.168.30.17:2379,https://192.168.30.18:2379,https://192.168.30.19:2379
    # 验证etcd 集群是否正常
    [root@node03 ~]# etcdctl endpoint status
    https://192.168.30.17:2379, 92bf7d7f20e298fc, 3.3.13, 30 MB, false, 16, 3963193
    https://192.168.30.18:2379, 127f6360c5080113, 3.3.13, 30 MB, true, 16, 3963193
    https://192.168.30.19:2379, 5a0a05654c847f54, 3.3.13, 30 MB, false, 16, 3963193
    节点正常
    #然后替换所有新节点
    --initial-cluster=node4=https://192.168.30.18:2380,node5=https://192.168.30.19:2380,node03=https://192.168.30.17:2380 \
    --initial-cluster-token=node4=https://192.168.30.18:2380,node5=https://192.168.30.19:2380,node03=https://192.168.30.17:2380 \
    #这两个配置

    kube-apiserver 节点添加

    # 创建 新节点证书
    ## 创建 Kubernetes API Server 配置文件
    export K8S_APISERVER_VIP=" \
        \"192.168.30.32\", \
        \"192.168.30.17\", \
        \"192.168.30.18\", \
        \"192.168.30.19\", \
        \"192.168.30.254\", \
    " && \
    export K8S_APISERVER_SERVICE_CLUSTER_IP="10.64.0.1" && \
    export K8S_APISERVER_HOSTNAME="api.k8s.niuke.local" && \
    export K8S_CLUSTER_DOMAIN_SHORTNAME="niuke" && \
    export K8S_CLUSTER_DOMAIN_FULLNAME="niuke.local" && \
    cat << EOF | tee /opt/k8s/cfssl/k8s/k8s_apiserver.json
    {
      "CN": "kubernetes",
      "hosts": [
        "127.0.0.1",
        ${K8S_APISERVER_VIP}
        "${K8S_APISERVER_SERVICE_CLUSTER_IP}", 
        "${K8S_APISERVER_HOSTNAME}",
        "kubernetes",
        "kubernetes.default",
        "kubernetes.default.svc",
        "kubernetes.default.svc.${K8S_CLUSTER_DOMAIN_SHORTNAME}",
        "kubernetes.default.svc.${K8S_CLUSTER_DOMAIN_FULLNAME}"    
      ],
      "key": {
        "algo": "rsa",
        "size": 2048
      },
      "names": [
        {
          "C": "CN",
          "ST": "GuangDong",
          "L": "GuangZhou",
          "O": "niuke",
          "OU": "niuke"
        }
      ]
    }
    EOF
    
    ## 生成 Kubernetes API Server 证书和私钥
    cfssl gencert \
        -ca=/opt/k8s/cfssl/pki/k8s/k8s-ca.pem \
        -ca-key=/opt/k8s/cfssl/pki/k8s/k8s-ca-key.pem \
        -config=/opt/k8s/cfssl/ca-config.json \
        -profile=kubernetes \
        /opt/k8s/cfssl/k8s/k8s_apiserver.json | \
        cfssljson -bare /opt/k8s/cfssl/pki/k8s/k8s_server
    
    # 分发ssl 证书到节点
    scp  -r /opt/k8s/cfssl/pki/k8s/ root@192.168.30.17:/apps/kubernetes/ssl/k8s
    scp  -r /opt/k8s/cfssl/pki/k8s/ root@192.168.30.18:/apps/kubernetes/ssl/k8s
    scp  -r /opt/k8s/cfssl/pki/k8s/ root@192.168.30.19:/apps/kubernetes/ssl/k8s
    # 修改配置文件
    ###  kube-apiserver
    KUBE_APISERVER_OPTS="--logtostderr=false \
            --bind-address=192.168.30.17 \
            --advertise-address=192.168.30.17 \
            --secure-port=5443 \
            --insecure-port=0 \
            --service-cluster-ip-range=10.64.0.0/16 \
            --service-node-port-range=30000-65000 \
            --etcd-cafile=/apps/kubernetes/ssl/etcd/etcd-ca.pem \
            --etcd-certfile=/apps/kubernetes/ssl/etcd/etcd_client.pem \
            --etcd-keyfile=/apps/kubernetes/ssl/etcd/etcd_client-key.pem \
            --etcd-prefix=/registry \
            --etcd-servers=https://192.168.30.17:2379,https://192.168.30.18:2379,https://192.168.30.19:2379 \
            --client-ca-file=/apps/kubernetes/ssl/k8s/k8s-ca.pem \
            --tls-cert-file=/apps/kubernetes/ssl/k8s/k8s_server.pem \
            --tls-private-key-file=/apps/kubernetes/ssl/k8s/k8s_server-key.pem \
            --kubelet-client-certificate=/apps/kubernetes/ssl/k8s/k8s_server.pem \
            --kubelet-client-key=/apps/kubernetes/ssl/k8s/k8s_server-key.pem \
            --service-account-key-file=/apps/kubernetes/ssl/k8s/k8s-ca.pem \
            --requestheader-client-ca-file=/apps/kubernetes/ssl/k8s/k8s-ca.pem \
            --proxy-client-cert-file=/apps/kubernetes/ssl/k8s/aggregator.pem \
            --proxy-client-key-file=/apps/kubernetes/ssl/k8s/aggregator-key.pem \
            --requestheader-allowed-names=aggregator \
            --requestheader-group-headers=X-Remote-Group \
            --requestheader-extra-headers-prefix=X-Remote-Extra- \
            --requestheader-username-headers=X-Remote-User \
            --enable-aggregator-routing=true \
            --anonymous-auth=false \
            --experimental-encryption-provider-config=/apps/kubernetes/config/encryption-config.yaml \
            --enable-admission-plugins=AlwaysPullImages,DefaultStorageClass,DefaultTolerationSeconds,LimitRanger,NamespaceExists,NamespaceLifecycle,NodeRestriction,OwnerReferencesPermissionEnforcement,PodNodeSelector,PersistentVolumeClaimResize,PodPreset,PodTolerationRestriction,ResourceQuota,ServiceAccount,StorageObjectInUseProtection MutatingAdmissionWebhook ValidatingAdmissionWebhook \
            --disable-admission-plugins=DenyEscalatingExec,ExtendedResourceToleration,ImagePolicyWebhook,LimitPodHardAntiAffinityTopology,NamespaceAutoProvision,Priority,EventRateLimit,PodSecurityPolicy \
            --cors-allowed-origins=.* \
            --enable-swagger-ui \
            --runtime-config=api/all=true \
            --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname \
            --authorization-mode=Node,RBAC \
            --allow-privileged=true \
            --apiserver-count=1 \
            --audit-log-maxage=30 \
            --audit-log-maxbackup=3 \
            --audit-log-maxsize=100 \
            --kubelet-https \
            --event-ttl=1h \
            --feature-gates=RotateKubeletServerCertificate=true,RotateKubeletClientCertificate=true \
            --enable-bootstrap-token-auth=true \
            --audit-log-path=/apps/kubernetes/log/api-server-audit.log \
            --alsologtostderr=true \
            --log-dir=/apps/kubernetes/log \
            --v=2 \
            --endpoint-reconciler-type=lease \
            --max-mutating-requests-inflight=100 \
            --max-requests-inflight=500 \
            --target-ram-mb=6000"
    
    ###  kube-controller-manager                
    
    KUBE_CONTROLLER_MANAGER_OPTS="--logtostderr=false \
    --leader-elect=true \
    --address=0.0.0.0 \
    --service-cluster-ip-range=10.64.0.0/16 \
    --cluster-cidr=10.65.0.0/16 \
    --node-cidr-mask-size=24 \
    --cluster-name=kubernetes \
    --allocate-node-cidrs=true \
    --kubeconfig=/apps/kubernetes/config/kube_controller_manager.kubeconfig \
    --authentication-kubeconfig=/apps/kubernetes/config/kube_controller_manager.kubeconfig \
    --authorization-kubeconfig=/apps/kubernetes/config/kube_controller_manager.kubeconfig \
    --use-service-account-credentials=true \
    --client-ca-file=/apps/kubernetes/ssl/k8s/k8s-ca.pem \
    --requestheader-client-ca-file=/apps/kubernetes/ssl/k8s/k8s-ca.pem \
    --node-monitor-grace-period=40s \
    --node-monitor-period=5s \
    --pod-eviction-timeout=5m0s \
    --terminated-pod-gc-threshold=50 \
    --alsologtostderr=true \
    --cluster-signing-cert-file=/apps/kubernetes/ssl/k8s/k8s-ca.pem \
    --cluster-signing-key-file=/apps/kubernetes/ssl/k8s/k8s-ca-key.pem  \
    --deployment-controller-sync-period=10s \
    --experimental-cluster-signing-duration=86700h0m0s \
    --enable-garbage-collector=true \
    --root-ca-file=/apps/kubernetes/ssl/k8s/k8s-ca.pem \
    --service-account-private-key-file=/apps/kubernetes/ssl/k8s/k8s-ca-key.pem \
    --feature-gates=RotateKubeletServerCertificate=true,RotateKubeletClientCertificate=true \
    --controllers=*,bootstrapsigner,tokencleaner \
    --horizontal-pod-autoscaler-use-rest-clients=true \
    --horizontal-pod-autoscaler-sync-period=10s \
    --flex-volume-plugin-dir=/apps/kubernetes/kubelet-plugins/volume \
    --tls-cert-file=/apps/kubernetes/ssl/k8s/k8s_controller_manager.pem \
    --tls-private-key-file=/apps/kubernetes/ssl/k8s/k8s_controller_manager-key.pem \
    --kube-api-qps=100 \
    --kube-api-burst=100 \
    --log-dir=/apps/kubernetes/log \
    --v=2"
    
    ### kube-scheduler
    
    KUBE_SCHEDULER_OPTS=" \
                       --logtostderr=false \
                       --address=0.0.0.0 \
                       --leader-elect=true \
                       --kubeconfig=/apps/kubernetes/config/kube_scheduler.kubeconfig \
                       --authentication-kubeconfig=/apps/kubernetes/config/kube_scheduler.kubeconfig \
                       --authorization-kubeconfig=/apps/kubernetes/config/kube_scheduler.kubeconfig \
                       --alsologtostderr=true \
                       --kube-api-qps=100 \
                       --kube-api-burst=100 \
                       --log-dir=/apps/kubernetes/log \
                       --v=2"
    # 其它两个节点参考17节点
    service kube-apiserver start
    service kube-controller-manager  start
    service kube-scheduler start

    验证新增节点是否正常

    https://192.168.30.17:5443/apis
    https://192.168.30.18:5443/apis
    https://192.168.30.18:5443/apis

    kubernetes 添加删除master 节点及etcd节点

    签名证书

    kubernetes 添加删除master 节点及etcd节点

    安装haproxy 及keepalived

    yum install -y haproxy keepalived
    
    修改 haproxy 配置
     /etc/haproxy/haproxy.cfg
     frontend kube-apiserver-https
      mode tcp
      bind :6443
      default_backend kube-apiserver-backend
    backend kube-apiserver-backend
      mode tcp
      server 192.168.30.17-api 192.168.30.17:5443 check
      server 192.168.30.18-api 192.168.30.18:5443 check
      server 192.168.30.19-api 192.168.30.19:5443 check
    # 启动haproxy 
    service haproxy  start 
    三台配置一样
    # 修改keepalived 配置
    192.168.30.19配置
    cat /etc/keepalived/keepalived.conf
    ! Configuration File for keepalived
    
    global_defs {
     router_id LVS_DEVEL
    }
    
    vrrp_script check_haproxy {
      script "killall -0 haproxy"
      interval 3
      weight -2
      fall 10
      rise 2
    }
    
    vrrp_instance VI_1 {
      state MASTER
      interface br0
      virtual_router_id 51
      priority 250
      advert_int 2
      authentication {
        auth_type PASS
        auth_pass  99ce6e3381dc326633737ddaf5d904d2
      }
      virtual_ipaddress {
        192.168.30.254/24
      }
      track_script {
        check_haproxy
      }
    }
    ### 192.168.30.18 配置
    cat /etc/keepalived/keepalived.conf
     ! Configuration File for keepalived
    
    global_defs {
     router_id LVS_DEVEL
    }
    
    vrrp_script check_haproxy {
      script "killall -0 haproxy"
      interval 3
      weight -2
      fall 10
      rise 2
    }
    
    vrrp_instance VI_1 {
      state BACKUP
      interface br0
      virtual_router_id 51
      priority 249
      advert_int 2
      authentication {
        auth_type PASS
        auth_pass 99ce6e3381dc326633737ddaf5d904d2
      }
      virtual_ipaddress {
        192.168.30.254/24
      }
      track_script {
        check_haproxy
      }
    }
    ## 192.168.30.17 配置
    cat /etc/keepalived/keepalived.conf
     ! Configuration File for keepalived
    
    global_defs {
     router_id LVS_DEVEL
    }
    
    vrrp_script check_haproxy {
      script "killall -0 haproxy"
      interval 3
      weight -2
      fall 10
      rise 2
    }
    
    vrrp_instance VI_1 {
      state BACKUP
      interface br0
      virtual_router_id 51
      priority 248
      advert_int 2
      authentication {
        auth_type PASS
        auth_pass 99ce6e3381dc326633737ddaf5d904d2
      }
      virtual_ipaddress {
        192.168.30.254/24
      }
      track_script {
        check_haproxy
      }
    }
    
    ### 启动三台 keepalived 
    service keepalived start
    192.168.30.19 配置为master
    
    [root@node5 ~]# ip a | grep br0
    2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master br0 state UP group default qlen 1000
    6: br0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
        inet 192.168.30.19/24 brd 192.168.30.255 scope global br0
        inet 192.168.30.254/24 scope global secondary br0
    # 测试192.168.30.254 是否能正常访问
    https://192.168.30.254:6443

    kubernetes 添加删除master 节点及etcd节点
    能正常打开

    修改node 节点 
    bootstrap.kubeconfig
    kubelet.kubeconfig 两个文件连接地址
    本地~/.kube/config 文件连接地址
    可以使用vim 修改
     server: https://192.168.30.254:6443
    
     修改完成重启node 节点
     service kubelet restart
     验证node 节点是否正
     kubectl get node
     [root@]~]#kubectl get node
    NAME         STATUS   ROLES         AGE   VERSION
    ingress      Ready    k8s-ingress   60d   v1.14.6
    ingress-01   Ready    k8s-ingress   29d   v1.14.6
    node01       Ready    k8s-node      60d   v1.14.6
    node02       Ready    k8s-node      60d   v1.14.6
    node03       Ready    k8s-node      12d   v1.14.6
    node4        Ready    k8s-node      12d   v1.14.6
    node5        Ready    k8s-node      12d   v1.14.6
    所有节点正常

    删除etcd 旧节点

    service etcd stop
    
    etcdctl member list
    查找 member key
    etcdctl endpoint status
    验证k8s 集群是否正常
    ### 删除旧节点
    etcdctl member remove 7994ca589d94dceb
    再次验证集群
    [root@node03 ~]# etcdctl member list
    127f6360c5080113, started, node4, https://192.168.30.18:2380, https://192.168.30.18:2379
    5a0a05654c847f54, started, node5, https://192.168.30.19:2380, https://192.168.30.19:2379
    92bf7d7f20e298fc, started, node03, https://192.168.30.17:2380, https://192.168.30.17:2379
    [root@node03 ~]# etcdctl endpoint status
    https://192.168.30.17:2379, 92bf7d7f20e298fc, 3.3.13, 30 MB, false, 16, 3976114
    https://192.168.30.18:2379, 127f6360c5080113, 3.3.13, 30 MB, true, 16, 3976114
    https://192.168.30.19:2379, 5a0a05654c847f54, 3.3.13, 30 MB, false, 16, 3976114
    [root@node03 ~]# etcdctl endpoint hashkv
    https://192.168.30.17:2379, 189505982
    https://192.168.30.18:2379, 189505982
    https://192.168.30.19:2379, 189505982
    [root@node03 ~]# etcdctl endpoint health
    https://192.168.30.17:2379 is healthy: successfully committed proposal: took = 2.671314ms
    https://192.168.30.18:2379 is healthy: successfully committed proposal: took = 2.2904ms
    https://192.168.30.19:2379 is healthy: successfully committed proposal: took = 3.555137ms
    [root@]~]#kubectl get node
    NAME         STATUS   ROLES         AGE   VERSION
    ingress      Ready    k8s-ingress   60d   v1.14.6
    ingress-01   Ready    k8s-ingress   29d   v1.14.6
    node01       Ready    k8s-node      60d   v1.14.6
    node02       Ready    k8s-node      60d   v1.14.6
    node03       Ready    k8s-node      12d   v1.14.6
    node4        Ready    k8s-node      12d   v1.14.6
    node5        Ready    k8s-node      12d   v1.14.6
    一切正常
    删除etcd 开机启动
    chkconfig etcd off

    删除 kube-apiserver 旧节点

    service kube-controller-manager stop
    service kube-scheduler stop
    service kube-apiserver   stop 
    删除开机启动 
    chkconfig  kube-controller-manager off
    chkconfig  kube-scheduler   off
    chkconfig   kube-apiserver  off
    再次验证
    kubectl get node
    [root@]~]#kubectl get node
    NAME         STATUS   ROLES         AGE   VERSION
    ingress      Ready    k8s-ingress   60d   v1.14.6
    ingress-01   Ready    k8s-ingress   29d   v1.14.6
    node01       Ready    k8s-node      60d   v1.14.6
    node02       Ready    k8s-node      60d   v1.14.6
    node03       Ready    k8s-node      12d   v1.14.6
    node4        Ready    k8s-node      12d   v1.14.6
    node5        Ready    k8s-node      12d   v1.14.6
    [root@]~]#kubectl get cs
    NAME                 STATUS    MESSAGE             ERROR
    scheduler            Healthy   ok
    controller-manager   Healthy   ok
    etcd-0               Healthy   {"health":"true"}
    etcd-1               Healthy   {"health":"true"}
    etcd-2               Healthy   {"health":"true"}
    访问k8s 集群里面的业务如果都正常证明增加删除节点操作正确
    展开全文
  • etcd节点故障恢复

    2020-11-03 22:36:45
    电脑断电,导致etcd集群有一个节点启动失败,比对了数据目录,应该是数据不一致导致的。 故障现象: etcd服务启动失败,日志中有如下报错: recovering backend from snapshot error: database snapshot file path ...
    • 故障原因:
      电脑断电,导致etcd集群有一个节点启动失败,比对了数据目录,应该是数据不一致导致的。
    • 故障现象:
      etcd服务启动失败,日志中有如下报错:
      recovering backend from snapshot error: database snapshot file path error

    故障处理过程

    • 集群中删除故障节点
    etcdctl --endpoints="https://192.168.171.200:2379,https://192.168.171.201:2379,https://192.168.171.202:2379" member list   //获取member ID 
    36b6dcf065a1b19f: name=etcd03 peerURLs=https://192.168.171.202:2380 clientURLs=https://192.168.171.202:2379 isLeader=false
    7a3d1a92c3588a59: name=etcd02 peerURLs=https://192.168.171.201:2380 clientURLs=https://192.168.171.201:2379 isLeader=false
    a0ac85faa030bb7e: name=etcd01 peerURLs=https://192.168.171.200:2380 clientURLs=https://192.168.171.200:2379 isLeader=true
    etcdctl --endpoints="https://192.168.171.200:2379,https://192.168.171.201:2379,https://192.168.171.202:2379" member remove 36b6dcf065a1b19f  //删除故障节点
    
    • 删除故障节点数据目录 确保member数据被清除
    cd /var/lib/etcd   //默认目录
    rm -rf *
    
    • 将故障节点重新加入集群
    etcdctl --endpoints="https://192.168.171.200:2379,https://192.168.171.201:2379,https://192.168.171.202:2379" member add etcd03 https://192.168.171.202:2380
    命令会输出节点加入集群需要的启动参数,具体如下:
    ETCD_NAME="etcd03"
    ETCD_INITIAL_CLUSTER="etcd03=https://192.168.171.202:2380,etcd02=https://192.168.171.201:2380,etcd01=https://192.168.171.200:2380"
    ETCD_INITIAL_CLUSTER_STATE="existing"
    
    • 启动故障节点etcd服务
    由于是旧节点重新加入集群,只需要修改ETCD_INITIAL_CLUSTER_STATE参数为existing
    sed -i 's/new/existing/g' /opt/etcd/cfg/etcd
    systemctl restart etcd  //启动服务
    systemctl status etcd		//检查服务状态
    
    • 检查etcd集群状态
    etcdctl --endpoints="https://192.168.171.200:2379,https://192.168.171.201:2379,https://192.168.171.202:2379" cluster-health
    

    小结:

    该适用于集群中某个节点故障,例如数据丢失,服务无法启动。

    展开全文
  • 这里主要实验etcd节点和实例的扩容。 一、etcd扩容,主要思路 etcd是一个独立的服务,在kubernetes中使用时将配置参数和数据目录分别映射到了宿主机目录,而且使用hostnetwork网络(本主机网络)。其中, /...

    Kubernetes使用kubeadm安装默认只有一个etcd实例,存在单点故障的风险。提升Kubernetes集群可用性的方法包括:1、备份(Kubernetes探秘—etcd状态数据及其备份 );2、etcd节点和实例扩容;3、apiserver的多节点服务和负载均衡。这里主要实验etcd节点和实例的扩容。

    一、etcd扩容,主要思路

    etcd是一个独立的服务,在kubernetes中使用时将配置参数和数据目录分别映射到了宿主机目录,而且使用hostnetwork网络(本主机网络)。其中,/etc/kubernetes/manifest/etcd.yaml 为启动参数文件,/etc/kubernetes/pki/etcd 为 https使用的证书,/var/lib/etcd 为该节点的etcd数据文件。

    对于已用kubeadm安装的单Master节点Kubernetes集群,其etcd运行实例只有一个。我们希望将其etcd实例扩展到多个,以降低单点失效风险。Kubernetes中etcd的扩容的思路如下:

    • 所有节点安装kubeadm/kubectl/kubelet,按照独立master节点安装。
    • 创建etcd集群的证书,并复制到各个节点。
    • 在各节点修改etcd启动配置文件,启动etcd实例。有多种方式(运行结果一样、管理方式不同):
      • 通过 kubectl 部署,让kubernetes控制启动。通过nodeSelector指定运行的节点。
      • 通过 kubelet 服务来启动,操作系统通过systemd启动kubelet服务。这是k8s的标准过程。
      • 通过docker的--restart参数让容器自行启动,由容器服务来进行管理。
      • 把etcd作为宿主机服务来直接启动,不使用Docker或者k8s管理。
    • 将所有节点kube-apiserver.yaml的etcd服务指向本地的etcd服务实例。
      • etcd是分布式的存储,所有节点的数据将会自动同步,从任何节点访问都是一样的。

    二、etcd扩容,实验步骤

    第一步:安装多个节点

    准备好安装etcd的节点。我使用ubuntu 18.04LTS,然后安装Docker CE 18.06和kubernetes 1.12.3。

    我这里的三个节点分别为:

    • podc01, 10.1.1.201
    • podc02, 10.1.1.202
    • podc03, 10.1.1.203

    需要提前把k8s用到的容器镜像拉取下来到每一个节点。参考:

    第二步:创建etcd证书

    本想尝试复制主节点的/etc/kubernetes/kpi和/etc/kubernetes/manifest目录到所有副(mate)节点,启动后出现各种问题无法正常访问,提示是ca证书问题。最后,准备从头开始创建自己的证书和部署yaml文件。

    创建证书使用cfssl来创建,需要下载模版文件和修改定义文件,包括ca机构、ca-config配置、ca-key私钥、csr请求、server/peer/client等证书的配置模版文件等。需要将里面的信息按照自己的环境进行修改。

    • 最后生成cert-file证书文件、key-file公钥文件和trusted-ca-file证书机构文件(因为我们这里用的是自签名,所以创建自己的证书机构文件)。
    • 这三个文件在etcd实例启动时配置进去(注意:API2和API3的参数名称有些不同),需要放到每一个节点的相应目录,并映射到etcd容器卷中。
    • 使用etcdctl作为服务客户端访问时也需要指定相应的参数,其它对端(Peer)etcd实例也需要使用这些参数来相互访问、组成集群、同步数据。

    下面说明具体过程(更多信息参考 https://segmentfault.com/a/1190000016010980)。

    1、准备cfssl证书工具

    mkdir ~/cfssl && cd ~/cfssl
    mkdir bin && cd bin
    
    wget https://pkg.cfssl.org/R1.2/cfssl_linux-amd64 -O cfssl
    wget https://pkg.cfssl.org/R1.2/cfssljson_linux-amd64 -O cfssljson 
    
    chmod +x {cfssl,cfssljson}
    export PATH=$PATH:~/cfssl/bin
    
    • 可选:为了方便,可以将path添加到~/.profile文件中,或者复制到/usr/local/bin目录。

    2、创建证书配置文件

    创建证书配置文件目录:

    mkdir -p ~/cfssl/etcd-certs && cd ~/cfssl/etcd-certs

    生成证书配置文件放到~/cfssl/etcd-certs目录中,文件模版如下:

    # ==============================================
    # ca-config.json
    {
        "signing": {
            "default": {
                "expiry": "43800h"
            },
            "profiles": {
                "server": {
                    "expiry": "43800h",
                    "usages": [
                        "signing",
                        "key encipherment",
                        "server auth"
                    ]
                },
                "client": {
                    "expiry": "43800h",
                    "usages": [
                        "signing",
                        "key encipherment",
                        "client auth"
                    ]
                },
                "peer": {
                    "expiry": "43800h",
                    "usages": [
                        "signing",
                        "key encipherment",
                        "server auth",
                        "client auth"
                    ]
                }
            }
        }
    }
    
    # ==============================================
    # ca-csr.json
    {
        "CN": "My own CA",
        "key": {
            "algo": "rsa",
            "size": 2048
        },
        "names": [
            {
                "C": "US",
                "L": "CA",
                "O": "My Company Name",
                "ST": "San Francisco",
                "OU": "Org Unit 1",
                "OU": "Org Unit 2"
            }
        ]
    }
    
    # ==============================================
    # server.json
    {
        "CN": "etcd0",
        "hosts": [
            "127.0.0.1",
            "0.0.0.0",
            "10.1.1.201",
            "10.1.1.202",
            "10.1.1.203"
        ],
        "key": {
            "algo": "ecdsa",
            "size": 256
        },
        "names": [
            {
                "C": "US",
                "L": "CA",
                "ST": "San Francisco"
            }
        ]
    }
    
    # ==============================================
    # peer1.json  # 填本机IP
    {
        "CN": "etcd0",
        "hosts": [
            "10.1.1.201"
        ],
        "key": {
            "algo": "ecdsa",
            "size": 256
        },
        "names": [
            {
                "C": "US",
                "L": "CA",
                "ST": "San Francisco"
            }
        ]
    }
    
    # ==============================================
    # client.json
    {
        "CN": "client",
        "hosts": [
           ""
        ],
        "key": {
            "algo": "ecdsa",
            "size": 256
        },
        "names": [
            {
                "C": "US",
                "L": "CA",
                "ST": "San Francisco"
            }
        ]
    }

    3、创建etcd集群的证书

    操作如下:

    cd ~/cfssl/etcd-certs
     
    cfssl gencert -initca ca-csr.json | cfssljson -bare ca -
    cfssl gencert -ca=ca.pem -ca-key=ca-key.pem -config=ca-config.json -profile=server server.json | cfssljson -bare server
    cfssl gencert -ca=ca.pem -ca-key=ca-key.pem -config=ca-config.json -profile=peer peer1.json | cfssljson -bare peer1
    cfssl gencert -ca=ca.pem -ca-key=ca-key.pem -config=ca-config.json -profile=client client.json | cfssljson -bare client

    查看所产生的证书文件:

    ls -l ~/cfssl/etcd-certs

    文件包括:

    ...
    

    第三步:启动etcd多实例

    • 注意:
      • 因为扩容过程中,需要将原来的etcd库删除,会导致kubernetes集群的master节点信息丢失,因此在扩容之前,建议使用etcdctl snapshot命令进行备份。或者,另建etcd节点,将原来的数据传送过去。

    启动etcd实例之前,务必将/var/lib/etcd目录清空,否则一些设置的参数将不会起作用,仍然保留原来的状态。

    注意,etcd的下面几个参数只在第一次启动(初始化)时起作用,包括:

    •    - --initial-advertise-peer-urls=http://10.1.1.202:2380
    •    - --initial-cluster=podc02=http://10.1.1.202:2380,podc03=http://10.1.1.203:2380
    •    - --initial-cluster-token=etcd-cluster
    •    - --initial-cluster-state=new
      • 如果是添加新节点,先在原来的节点运行member add xxx。然后- --initial-cluster-state=existing,再启动服务。

    1、上传证书文件

    将cfssl/etcd-certs目录拷贝到/etc/kubernetes/pki/etcd-certs 目录,可以使用scp或sftp上传。

    2、编辑启动文件

    编辑/etc/kubernetes/manifests/etcd.yaml文件,这是kubelet启动etcd实例的配置文件。

    # /etc/kubernetes/manifests/etcd.yaml
    apiVersion: v1
    kind: Pod
    metadata:
      annotations:
        scheduler.alpha.kubernetes.io/critical-pod: ""
      creationTimestamp: null
      labels:
        component: etcd
        tier: control-plane
      name: etcd
      namespace: kube-system
    spec:
      containers:
      - command:
        - etcd
        - --advertise-client-urls=https://10.1.1.201:2379
        - --cert-file=/etc/kubernetes/pki/etcd-certs/server.pem
        - --client-cert-auth=true
        - --data-dir=/var/lib/etcd
        - --initial-advertise-peer-urls=https://10.1.1.201:2380
        - --initial-cluster=etcd0=https://10.1.1.201:2380
        - --key-file=/etc/kubernetes/pki/etcd-certs/server-key.pem
        - --listen-client-urls=https://10.1.1.201:2379
        - --listen-peer-urls=https://10.1.1.201:2380
        - --name=etcd1
        - --peer-cert-file=/etc/kubernetes/pki/etcd-certs/peer1.pem
        - --peer-client-cert-auth=true
        - --peer-key-file=/etc/kubernetes/pki/etcd-certs/peer1-key.pem
        - --peer-trusted-ca-file=/etc/kubernetes/pki/etcd-certs/ca.pem
        - --snapshot-count=10000
        - --trusted-ca-file=/etc/kubernetes/pki/etcd-certs/ca.pem
        image: k8s.gcr.io/etcd-amd64:3.2.18
        imagePullPolicy: IfNotPresent
       #livenessProbe:
       #  exec:
       #    command:
       #    - /bin/sh
       #    - -ec
       #    - ETCDCTL_API=3 etcdctl --endpoints=https://[10.1.1.201]:2379 --cacert=/etc/kubernetes/pki/etcd-certs/ca.pem
       #      --cert=/etc/kubernetes/pki/etcd-certs/client.pem --key=/etc/kubernetes/pki/etcd-certs/client-key.pem
       #      get foo
       #  failureThreshold: 8
       #  initialDelaySeconds: 15
       #  timeoutSeconds: 15
        name: etcd
        resources: {}
        volumeMounts:
        - mountPath: /var/lib/etcd
          name: etcd-data
        - mountPath: /etc/kubernetes/pki/etcd
          name: etcd-certs
      hostNetwork: true
      priorityClassName: system-cluster-critical
      volumes:
      - hostPath:
          path: /var/lib/etcd
          type: DirectoryOrCreate
        name: etcd-data
      - hostPath:
          path: /etc/kubernetes/pki/etcd-certs
          type: DirectoryOrCreate
        name: etcd-certs
    status: {}

    参照上面的模式,在各个副节点修改etcd启动参数/etc/kubernetes/manifest/etcd.yaml文件内容。

    • 注意:IP地址需要修改多个地方,不要遗漏、错误
    • 重启kubelet服务。
      • sudo systemctl restart kubelet。
    • 检查etcd服务。
      • ectdctl 连接到实例,etcdctl member list。
      • 最终,多节点的etcd实例链接为一个集群。

    3、验证运行状态

    进入etcd容器执行:

    alias etcdv3="ETCDCTL_API=3 etcdctl --endpoints=https://[10.1.1.201]:2379 --cacert=/etc/kubernetes/pki/etcd/ca.pem --cert=/etc/kubernetes/pki/etcd/client.pem --key=/etc/kubernetes/pki/etcd/client-key.pem"
    etcdv3 member add etcd1 --peer-urls="https://10.1.1.202:2380"

    4、增加etcd节点

    拷贝etcd1(10.1.1.201)节点上的证书到etcd1(10.1.1.202)节点上,复制peer1.json到etcd2的peer2.json,修改peer2.json。

    # peer2.json
    {
        "CN": "etcd1",
        "hosts": [
            "10.1.86.202"
        ],
        "key": {
            "algo": "ecdsa",
            "size": 256
        },
        "names": [
            {
                "C": "US",
                "L": "CA",
                "ST": "San Francisco"
            }
        ]
    }

    重新生成在etcd1上生成peer1证书:

    cfssl gencert -ca=ca.pem -ca-key=ca-key.pem -config=ca-config.json -profile=peer peer1.json | cfssljson -bare peer1

    启动etcd1,配置文件如下:

    # etcd02 etcd.yaml
    apiVersion: v1
    kind: Pod
    metadata:
      annotations:
        scheduler.alpha.kubernetes.io/critical-pod: ""
      creationTimestamp: null
      labels:
        component: etcd
        tier: control-plane
      name: etcd
      namespace: kube-system
    spec:
      containers:
      - command:
        - etcd
        - --advertise-client-urls=https://10.1.1.202:2379
        - --cert-file=/etc/kubernetes/pki/etcd-certs/server.pem
        - --data-dir=/var/lib/etcd
        - --initial-advertise-peer-urls=https://10.1.1.202:2380
        - --initial-cluster=etcd01=https://10.1.1.201:2380,etcd02=https://10.1.1.202:2380
        - --key-file=/etc/kubernetes/pki/etcd-certs/server-key.pem
        - --listen-client-urls=https://10.1.1.202:2379
        - --listen-peer-urls=https://10.1.1.202:2380
        - --name=etcd02
        - --peer-cert-file=/etc/kubernetes/pki/etcd-certs/peer2.pem
        - --peer-client-cert-auth=true
        - --peer-key-file=/etc/kubernetes/pki/etcd-certs/peer2-key.pem
        - --peer-trusted-ca-file=/etc/kubernetes/pki/etcd-certs/ca.pem
        - --snapshot-count=10000
        - --trusted-ca-file=/etc/kubernetes/pki/etcd-certs/ca.pem
        - --initial-cluster-state=existing  # 千万别加双引号,被坑死
        image: k8s.gcr.io/etcd-amd64:3.2.18
        imagePullPolicy: IfNotPresent
      # livenessProbe:
      #   exec:
      #     command:
      #     - /bin/sh
      #     - -ec
      #     - ETCDCTL_API=3 etcdctl --endpoints=https://[10.1.1.202]:2379 --cacert=/etc/kubernetes/pki/etcd-certs/ca.crt
      #       --cert=/etc/kubernetes/pki/etcd/healthcheck-client.crt --key=/etc/kubernetes/pki/etcd-certs/healthcheck-client.key
      #       get foo
      #   failureThreshold: 8
      #   initialDelaySeconds: 15
      #   timeoutSeconds: 15
        name: etcd
        resources: {}
        volumeMounts:
        - mountPath: /var/lib/etcd
          name: etcd-data
        - mountPath: /etc/kubernetes/pki/etcd
          name: etcd-certs
      hostNetwork: true
      priorityClassName: system-cluster-critical
      volumes:
      - hostPath:
          path: /var/lib/etcd
          type: DirectoryOrCreate
        name: etcd-data
      - hostPath:
          path: /etc/kubernetes/pki/etcd-certs
          type: DirectoryOrCreate
        name: etcd-certs
    status: {}

    进入etcd容器执行:

    alias etcdv3="ETCDCTL_API=3 etcdctl --endpoints=https://[10.1.86.201]:2379 --cacert=/etc/kubernetes/pki/etcd/ca.pem --cert=/etc/kubernetes/pki/etcd/client.pem --key=/etc/kubernetes/pki/etcd/client-key.pem"
    etcdv3 member add etcd1 --peer-urls="https://10.1.1.203:2380"

    按照以上步骤,增加etcd03。

    5、etcd集群健康检查

    # etcdctl --endpoints=https://[10.1.1.201]:2379 --ca-file=/etc/kubernetes/pki/etcd-certs/ca.pem --cert-file=/etc/kubernetes/pki/etcd-certs/client.pem --key-file=/etc/kubernetes/pki/etcd-certs/client-key.pem cluster-health
    
    member 5856099674401300 is healthy: got healthy result from https://10.1.86.201:2379
    member df99f445ac908d15 is healthy: got healthy result from https://10.1.86.202:2379
    cluster is healthy

    第四步:修改apiserver服务指向

    - --etcd-cafile=/etc/kubernetes/pki/etcd-certs/ca.pem
    - --etcd-certfile=/etc/kubernetes/pki/etcd-certs/client.pem
    - --etcd-keyfile=/etc/kubernetes/pki/etcd-certs/client-key.pem

    至此,etcd已经扩展成多节点的分布式集群,而且各个节点的kubernetes都是可以访问的。

    注意:

    • 上面的流程适合刚创建的k8s集群。
    • 如果已经有kubeadm的多节点集群,可以先创建node2/node3的etcd集群,然后将node1的数据同步过来,再添加node1集群,就能保留原来的数据。

    上面所部署的工作节点还只能连接到一个apiserver,其它副节点的apiserver虽然可用但是无法被工作节点连接到。

    下一步需要实现多master节点的容错,遇主节点故障时可以转移访问其它的副节点。

    更多参考

    转载于:https://my.oschina.net/u/2306127/blog/2980950

    展开全文
  • Kubernetes中etcd节点挂了,修复后重启失败 [root@node2 ~]# systemctl start etcd Job for etcd.service failed because the control process exited with error code. See "systemctl status etcd.service" and ...

    现象:
    Kubernetes中etcd节点挂了,修复后重启失败

    [root@node2 ~]# systemctl start etcd
    Job for etcd.service failed because the control process exited with error code. See "systemctl status etcd.service" and "journalctl -xe" for details.
    
    [root@node2 ~]#systemctl status etcd
    ● etcd.service - Etcd Server
       Loaded: loaded (/usr/lib/systemd/system/etcd.service; disabled; vendor preset: disabled)
       Active: failed (Result: start-limit) since 四 2021-04-15 11:09:53 CST; 8s ago
      Process: 74317 ExecStart=/opt/etcd/bin/etcd --name=${ETCD_NAME} --data-dir=${ETCD_DATA_DIR} --listen-peer-urls=${ETCD_LISTEN_PEER_URLS} --listen-client-urls=${ETCD_LISTEN_CLIENT_URLS},http://127.0.0.1:2379 --advertise-client-urls=${ETCD_ADVERTISE_CLIENT_URLS} --initial-advertise-peer-urls=${ETCD_INITIAL_ADVERTISE_PEER_URLS} --initial-cluster=${ETCD_INITIAL_CLUSTER} --initial-cluster-token=${ETCD_INITIAL_CLUSTER_TOKEN} --initial-cluster-state=new --cert-file=/opt/etcd/ssl/server.pem --key-file=/opt/etcd/ssl/server-key.pem --peer-cert-file=/opt/etcd/ssl/server.pem --peer-key-file=/opt/etcd/ssl/server-key.pem --trusted-ca-file=/opt/etcd/ssl/ca.pem --peer-trusted-ca-file=/opt/etcd/ssl/ca.pem (code=exited, status=1/FAILURE)
     Main PID: 74317 (code=exited, status=1/FAILURE)
    
    4月 15 11:09:53 node2 systemd[1]: Failed to start Etcd Server.
    4月 15 11:09:53 node2 systemd[1]: Unit etcd.service entered failed state.
    4月 15 11:09:53 node2 systemd[1]: etcd.service failed.
    4月 15 11:09:53 node2 systemd[1]: etcd.service holdoff time over, scheduling restart.
    4月 15 11:09:53 node2 systemd[1]: Stopped Etcd Server.
    4月 15 11:09:53 node2 systemd[1]: start request repeated too quickly for etcd.service
    4月 15 11:09:53 node2 systemd[1]: Failed to start Etcd Server.
    4月 15 11:09:53 node2 systemd[1]: Unit etcd.service entered failed state.
    4月 15 11:09:53 node2 systemd[1]: etcd.service failed.
    

    报错提示是请求过快
    在这里插入图片描述
    思路:
    1、防火墙问题
    2、检查/usr/lib/systemd/system/etcd.service服务启动
    3、/opt/etcd/cfg/etcd配置文件
    解决:
    etcd节点挂了,但是之前是可以用的,所以直接排除防火墙问题和/opt/etcd/cfg/etcd配置文件里的问题。但是还是顺带检查一下。
    1、防火墙
    在这里插入图片描述
    2、检查/usr/lib/systemd/system/etcd.service服务启动
    我想会不会是启动脚本有问题

    vim /usr/lib/systemd/system/etcd.service
    
    [Unit]
    Description=Etcd Server
    After=network.target
    After=network-online.target
    Wants=network-online.target
    
    [Service]
    Type=notify
    EnvironmentFile=/opt/etcd/cfg/etcd
    ExecStart=/opt/etcd/bin/etcd --name=${ETCD_NAME} \
    --data-dir=${ETCD_DATA_DIR} \
    --listen-peer-urls=${ETCD_LISTEN_PEER_URLS} \
    --listen-client-urls=${ETCD_LISTEN_CLIENT_URLS},http://127.0.0.1:2379 \
    --advertise-client-urls=${ETCD_ADVERTISE_CLIENT_URLS} \
    --initial-advertise-peer-urls=${ETCD_INITIAL_ADVERTISE_PEER_URLS} \
    --initial-cluster=${ETCD_INITIAL_CLUSTER} \
    --initial-cluster-token=${ETCD_INITIAL_CLUSTER_TOKEN} \
    --initial-cluster-state=existing \
    --cert-file=/opt/etcd/ssl/server.pem \
    --key-file=/opt/etcd/ssl/server-key.pem \
    --peer-cert-file=/opt/etcd/ssl/server.pem \
    --peer-key-file=/opt/etcd/ssl/server-key.pem \
    --trusted-ca-file=/opt/etcd/ssl/ca.pem \
    --peer-trusted-ca-file=/opt/etcd/ssl/ca.pem
    Restart=on-failure
    LimitNOFILE=65536
    
    [Install]
    WantedBy=multi-user.target
    
    

    在这里插入图片描述
    修改成--initial-cluster-state=existing后开启服务

    3、重启服务
    将所有部署etcd的节点都重启etcd服务

    systemctl restart etcd.service
    

    然后在启动坏掉节点上的etcd服务

    systemctl start etcd.service
    

    在这里插入图片描述

    展开全文
  • k8s 删除master和etcd节点并重新加入

    千次阅读 2021-03-22 17:00:33
    目前master集群有3个master节点,其中一个要更换。 删除主节点 在master1上执行如下命令 # kubectl drain master2 --delete-local-data --force --ignore-daemonsets node/master2 already cordoned WARNING: ...
  • title: k8s之etcd节点竞选 date: 2018-12-03 16:15:00 tags: [‘k8s’,‘etcd’] category: etcd article: k8s之etcd二 竞选流程 etcd内部采用raft协议来实现,所以在etcd里面,节点有3个状态,一开始都是 follower...
  • ETCD节点故障恢复

    2018-01-15 00:01:00
    在我们的内网环境中搭建了三个节点ETCD,不过这三个节点ETCD都搭建在同一台机器上。后来机器资源不够了系统直接kill了ETCD,导致内网的ETCD三个节点全部挂掉了。刚开始想逐个启动就完事了,但是按照之前的data-...
  • 将有问题的etcd节点重新加入集群

    千次阅读 2017-11-15 14:53:28
    etcd集群中某个节点坏掉,或不小心清空了某个节点的数据,可以按照如下步骤重新将此节点加入集群1.从集群中删除坏掉的节点1.查看集群健康状态,找到坏掉的节点ID:etcdctl --endpoints=...
  • 在v1.17,Kubernetes支持最多5000个节点的集群。更具体地说,我们支持满足以下所有条件的配置: 不超过5000个节点 吊舱总数不超过150000 总集装箱不超过300000 每个节点不超过100个Pod ...
  • 本篇已经安装了单个etcd,然后进行扩容etcd节点至2个,安装单节点请参照:https://www.cnblogs.com/effortsing/p/10295261.html 实验架构 test1: 192.168.0.91 etcd test2: 192.168.0.92 etcd test3: 192...
  • cmpe273-Assignment3 2 个实例访问 localhost:4001 上的 etcd 节点
  • etcd增加节点与删除节点

    千次阅读 2020-07-08 15:59:59
    etcd增加节点节点部署etcd(此处省略) 在已有节点添加新节点 etcdctl member add etcd-server-7-11 https://10.4.7.11:2380 Added member named etcd-server-7-11 with ID 4e59b8e9d0462cf3 to cluster ETCD_...
  • 文章目录查看当前集群状态删除受损etcd节点的数据数据受损节点重新加入集群修改etcd启动参数,重启etcd 由于自己的误操作,将A节点的etcd备份数据复制到B节点的etcd备份节点目录下,还原etcd快照后,导致etcd数据...
  • etcd集群增加节点

    2021-12-30 10:27:34
    前言:发现etcd集群同步消息延迟有点高,所以计划把etcd集群迁移到有ssd盘的节点上 1.迁移前的准备 查看etcd版本 [root@test ~]# etcdctl --version etcdctl version: 3.3.11 API version: 2 因为etcd使用的是...
  • ETCD增加节点与删除节点(衔接上篇文章做了安装认证) 查看现在状态 查看文件 [root@uat-master02 ssl]# pwd /data/etcd/ssl [root@uat-master02 ssl]# ls ca-config.json ca-csr.json ca.pem client.json client....
  • 本文主要讲讲单节点安装和基本使用。 etcd目前默认使用2379端口提供HTTP API服务,2380端口和peer通信(这两个端口已经被IANA官方预留给etcd); 因为etcd是go语言编写的,安装只需要下载对应的二进制文件,并放到合适...
  • etcd集群中的某个节点,重新安装系统的...尝试下面的操作:在正常的etcd节点上执行 etcdctl member remove 将新的etcd节点加入到原来的集群: etcdctl member add在重装的节点上添加配置文件:ETCD_NAME="0c-c4-7a-8...
  • k8s集群更换etcd节点

    2021-03-07 20:13:32
    一、使用etcd集群外的节点生成etcd节点需要的证书 新节点IP为100.7.36.60,主机名node601、etcd数据备份ETCDCTL_API=3 etcdctl --endpoints=...
  • etcd添加新节点

    千次阅读 2019-04-17 20:33:20
    第一步:etcd客户端执行 [root@k8s-master-1 ~]# etcdctl member add node180 https://109.105.1.180:2380 Added member named node180 with ID ead16f36d2aa4f03 to cluster ETCD_NAME=“node180” ETCD_INITIAL_...
  • ETCD-节点挂掉会怎样?

    千次阅读 2020-01-15 00:16:07
    0.环境是,三台独立的ubuntu机器,搭建的一个三节点etcd集群。 1.先看下正常状态下的一组操作。 2.直接干掉一个节点,但是不是主节点。 此时在活着的两台机器上执行 etcdctl member list 都是这个结果 ...
  • etcd节点配置

    千次阅读 2017-11-12 22:40:42
    # [member] ...#ETCD_DATA_DIR="/var/lib/etcd/default.etcd" ETCD_DATA_DIR="/work/etcd" #ETCD_WAL_DIR="" #ETCD_SNAPSHOT_COUNT="10000" #ETCD_HEARTBEAT_INTERVAL="100" #ETCD_ELECTION_TIMEOUT=
  • etcd集群节点挂掉后恢复步骤

    千次阅读 2020-04-13 17:18:14
    下面是我恢复这个etcd挂掉节点步骤 重新部署etcd步骤就不说了 恢复集群 # 删除集群里面挂掉节点信息 export ETCDCTL_API=3 # 10.0.249.162:2379是正常一个node节点 etcdctl --endpoints=10.0.249.1...
  • 有一个etcd节点因为磁盘问题当掉 在node1节点查看健康状态 [root@node01 ~]# /k8s/etcd/bin/etcdctl --ca-file=/k8s/etcd/ssl/ca.pem --cert-file=/k8s/etcd/ssl/server.pem --key-file=/k8s/etcd/ssl/server-key....
  • 最近K8s集群有一台...etcd集群更新节点处置方法: 查看集群成员 [root]# etcdctl --endpoints=http://10.1.1.10:6666 member list 12f6b86f1ef61557: name=c-etcd0 peerURLs=http://10.1.1.10:6667 clientURLs=ht
  • 2019独角兽企业重金招聘Python工程师标准>>> ...使用kubeadm join加入新的节点(将会创建kubelet基础服务,而且etcd节点和kubernetes节点同时可用)。在主节点获取添加命令,如下: #在主节点上执行 ...

空空如也

空空如也

1 2 3 4 5 ... 20
收藏数 30,494
精华内容 12,197
关键字:

etcd节点