hadoop 订阅
Hadoop是一个由Apache基金会所开发的分布式系统基础架构。用户可以在不了解分布式底层细节的情况下,开发分布式程序。充分利用集群的威力进行高速运算和存储。Hadoop实现了一个分布式文件系统(Hadoop Distributed File System),简称HDFS。HDFS有高容错性的特点,并且设计用来部署在低廉的(low-cost)硬件上;而且它提供高吞吐量(high throughput)来访问应用程序的数据,适合那些有着超大数据集(large data set)的应用程序。HDFS放宽了(relax)POSIX的要求,可以以流的形式访问(streaming access)文件系统中的数据。Hadoop的框架最核心的设计就是:HDFS和MapReduce。HDFS为海量的数据提供了存储,而MapReduce则为海量的数据提供了计算 [1]  。 展开全文
Hadoop是一个由Apache基金会所开发的分布式系统基础架构。用户可以在不了解分布式底层细节的情况下,开发分布式程序。充分利用集群的威力进行高速运算和存储。Hadoop实现了一个分布式文件系统(Hadoop Distributed File System),简称HDFS。HDFS有高容错性的特点,并且设计用来部署在低廉的(low-cost)硬件上;而且它提供高吞吐量(high throughput)来访问应用程序的数据,适合那些有着超大数据集(large data set)的应用程序。HDFS放宽了(relax)POSIX的要求,可以以流的形式访问(streaming access)文件系统中的数据。Hadoop的框架最核心的设计就是:HDFS和MapReduce。HDFS为海量的数据提供了存储,而MapReduce则为海量的数据提供了计算 [1]  。
信息
核心设计
HDFS和MapReduce
外文名
Hadoop
类    别
电脑程序
中文名
海杜普
学    科
信息科学
全    称
Hadoop Distributed File System
Hadoop起源
Hadoop起源于Apache Nutch项目,始于2002年,是Apache Lucene的子项目之一 [2]  。2004年,Google在“操作系统设计与实现”(Operating System Design and Implementation,OSDI)会议上公开发表了题为MapReduce:Simplified Data Processing on Large Clusters(Mapreduce:简化大规模集群上的数据处理)的论文之后,受到启发的Doug Cutting等人开始尝试实现MapReduce计算框架,并将它与NDFS(Nutch Distributed File System)结合,用以支持Nutch引擎的主要算法 [2]  。由于NDFS和MapReduce在Nutch引擎中有着良好的应用,所以它们于2006年2月被分离出来,成为一套完整而独立的软件,并被命名为Hadoop。到了2008年年初,hadoop已成为Apache的顶级项目,包含众多子项目,被应用到包括Yahoo在内的很多互联网公司 [2]  。
收起全文
精华内容
下载资源
问答
  • 2022-03-18 15:20:54

    安装环境

    虚拟软件:VMware® Workstation 16 Pro

    虚拟机操作系统:CentOS 7.9-Minimal

    虚拟机 IP:192.168.153.11192.168.153.12192.168.153.13

    前期规划

    Hadoop 集群包含两个集群:HDFS 集群、YARN 集群,两个集群在逻辑上分离,但通常会共用主机。

    两个集群都是标准的主从架构集群。

    HDFS 集群包含的角色(守护进程):

    • 主角色:NameNode
    • 从角色:DataNode
    • 主角色辅助角色:SecondaryNameNode

    YARN 集群包含的角色(守护进程):

    • 主角色:ResourceManager
    • 从角色:NodeManager

    集群规划

    服务器IP 地址运行角色(守护进程)
    node1.hadoop.com192.168.153.11NameNode DataNode ResourceManager NodeManager
    node2.hadoop.com192.168.153.12SecondaryNameNode DataNode NodeManager
    node3.hadoop.com192.168.153.13DataNode NodeManager

    环境配置

    每台虚拟机都要配置,使用 root 用户。

    1、关闭防火墙

    systemctl stop firewalld
    systemctl disable firewalld
    

    2、同步时间

    yum -y install ntpdate
    ntpdate ntp5.aliyun.com
    

    3、配置主机名

    vi /etc/hostname
    

    按照规划,将三台虚拟机的主机名分别设置为:node1.hadoop.comnode2.hadoop.comnode3.hadoop.com

    4、配置 hosts 文件

    vi /etc/hosts
    

    添加下面的内容:

    192.168.153.11 node1 node1.hadoop.com
    192.168.153.12 node2 node1.hadoop.com
    192.168.153.13 node3 node1.hadoop.com
    

    5、安装 JDK

    yum -y install java-1.8.0-openjdk java-1.8.0-openjdk-devel
    

    配置 JAVA_HOME

    cat <<EOF | tee /etc/profile.d/hadoop_java.sh
    export JAVA_HOME=\$(dirname \$(dirname \$(readlink \$(readlink \$(which javac)))))
    export PATH=\$PATH:\$JAVA_HOME/bin
    EOF
    source /etc/profile.d/hadoop_java.sh
    

    确认:

    echo $JAVA_HOME
    

    6、创建 hadoop 用户,并设置密码

    adduser hadoop
    usermod -aG wheel hadoop
    passwd hadoop
    

    创建 HDFS 本地存放数据的目录:

    mkdir /home/hadoop/data
    chown hadoop: /home/hadoop/data
    

    7、配置环境变量

    echo 'export HADOOP_HOME=/home/hadoop/hadoop-3.3.2' >> /etc/profile
    echo 'export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin' >> /etc/profile
    source /etc/profile
    

    8、配置 SSH

    yum install openssh
    

    切换到 hadoop 用户,执行下面的命令。

    ssh-keygen
    ssh-copy-id node1
    ssh-copy-id node2
    ssh-copy-id node3
    

    每台虚拟机都要执行,执行过程如下:

    [hadoop@node1 ~]$ ssh-keygen
    Generating public/private rsa key pair.
    Enter file in which to save the key (/home/hadoop/.ssh/id_rsa):
    Created directory '/home/hadoop/.ssh'.
    Enter passphrase (empty for no passphrase):
    Enter same passphrase again:
    Your identification has been saved in /home/hadoop/.ssh/id_rsa.
    Your public key has been saved in /home/hadoop/.ssh/id_rsa.pub.
    The key fingerprint is:
    SHA256:gFs4NEpc6MIVv7/r5f2rUFdOi7ht11GceM3fd/Uq/nU hadoop@node1.hadoop.com
    The key's randomart image is:
    +---[RSA 2048]----+
    | ..+=            |
    | .o+.+        .oo|
    |..o +.o      . =*|
    |...  +..    . * B|
    | .  ..  S  o o +*|
    |      .   . +  .=|
    |       . o ..o..E|
    |        + o......|
    |      .+.. o++o  |
    +----[SHA256]-----+
    [hadoop@node1 ~]$ ssh-copy-id node1
    /bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/home/hadoop/.ssh/id_rsa.pub"
    The authenticity of host 'node1 (192.168.153.11)' can't be established.
    ECDSA key fingerprint is SHA256:BxdxJ5ONWI6xkPrFWxy9MIFs/B3IpEgjhFxiwI6KOLU.
    ECDSA key fingerprint is MD5:78:ea:2d:36:7e:eb:83:47:8f:61:c6:70:b6:0f:20:d6.
    Are you sure you want to continue connecting (yes/no)? yes
    /bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
    /bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
    hadoop@node1's password:
    
    Number of key(s) added: 1
    
    Now try logging into the machine, with:   "ssh 'node1'"
    and check to make sure that only the key(s) you wanted were added.
    
    [hadoop@node1 ~]$ ssh-copy-id node2
    /bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/home/hadoop/.ssh/id_rsa.pub"
    The authenticity of host 'node2 (192.168.153.12)' can't be established.
    ECDSA key fingerprint is SHA256:BxdxJ5ONWI6xkPrFWxy9MIFs/B3IpEgjhFxiwI6KOLU.
    ECDSA key fingerprint is MD5:78:ea:2d:36:7e:eb:83:47:8f:61:c6:70:b6:0f:20:d6.
    Are you sure you want to continue connecting (yes/no)? yes
    /bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
    /bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
    hadoop@node2's password:
    
    Number of key(s) added: 1
    
    Now try logging into the machine, with:   "ssh 'node2'"
    and check to make sure that only the key(s) you wanted were added.
    
    [hadoop@node1 ~]$ ssh-copy-id node3
    /bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/home/hadoop/.ssh/id_rsa.pub"
    The authenticity of host 'node3 (192.168.153.13)' can't be established.
    ECDSA key fingerprint is SHA256:BxdxJ5ONWI6xkPrFWxy9MIFs/B3IpEgjhFxiwI6KOLU.
    ECDSA key fingerprint is MD5:78:ea:2d:36:7e:eb:83:47:8f:61:c6:70:b6:0f:20:d6.
    Are you sure you want to continue connecting (yes/no)? yes
    /bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
    /bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
    hadoop@node3's password:
    
    Number of key(s) added: 1
    
    Now try logging into the machine, with:   "ssh 'node3'"
    and check to make sure that only the key(s) you wanted were added.
    
    [hadoop@node1 ~]$
    

    下载安装

    先在 node1 虚拟机进行安装配置,然后把安装好的目录复制到另外两台虚拟机。(使用 hadoop 用户)

    1、下载并解压

    使用 hadoop 用户连接 node1 虚拟机,用下面的命令下载安装包到 /home/hadoop 目录。

    cd /home/hadoop
    curl -Ok https://dlcdn.apache.org/hadoop/common/hadoop-3.3.2/hadoop-3.3.2.tar.gz
    

    解压:

    tar zxf hadoop-3.3.2.tar.gz
    

    接下来通过配置文件对 Hadoop 进行配置。

    Hadoop 的配置文件分为三类:

    • 默认配置文件 – 包括 core-default.xmlhdfs-default.xmlyarn-default.xmlmapred-default.xml,这些文件是只读的,存放的是参数的默认值。
    • 自定义配置文件 – 包括 etc/hadoop/core-site.xmletc/hadoop/hdfs-site.xmletc/hadoop/yarn-site.xmletc/hadoop/mapred-site.xml,用来存放自定义配置信息,将会覆盖默认配置。
    • 环境配置文件 – 包括 etc/hadoop/hadoop-env.shetc/hadoop/mapred-env.shetc/hadoop/yarn-env.sh,这些文件用来配置各守护进程的 Java 运行环境。

    2、配置 hadoop-env.sh 文件

    cd hadoop-3.3.2
    vi etc/hadoop/hadoop-env.sh
    

    添加下面这些内容:

    export JAVA_HOME=$JAVA_HOME
    export HDFS_NAMENODE_USER=hadoop
    export HDFS_DATANODE_USER=hadoop
    export HDFS_SECONDARYNAMENODE_USER=hadoop
    export YARN_RESOURCEMANAGER_USER=hadoop
    export YARN_NODEMANAGER_USER=hadoop
    

    至少要配置 JAVA_HOME 环境变量,另外可以通过下面这些变量,为不同的守护进程单独进行配置:

    守护进程环境变量
    NameNodeHDFS_NAMENODE_OPTS
    DataNodeHDFS_DATANODE_OPTS
    Secondary NameNodeHDFS_SECONDARYNAMENODE_OPTS
    ResourceManagerYARN_RESOURCEMANAGER_OPTS
    NodeManagerYARN_NODEMANAGER_OPTS
    WebAppProxyYARN_PROXYSERVER_OPTS
    Map Reduce Job History ServerMAPRED_HISTORYSERVER_OPTS

    例如,给 Namenode 配置使用 parallelGC 和 4GB 堆内存:

    export HDFS_NAMENODE_OPTS="-XX:+UseParallelGC -Xmx4g"
    

    3、配置 core-site.xml 文件

    该文件将会覆盖 core-default.xml 中的配置。

    vi etc/hadoop/core-site.xml
    

    添加下面的内容:

    <!-- 设置默认使用的文件系统 Hadoop 支持 file、HDFS、GFS、Ali Cloud、Amazon Cloud 等文件系统 -->
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://node1:8020</value>
    </property>
    
    <!-- 设置 Hadoop 本地保存数据的路径 -->
    <property>
        <name>hadoop.tmp.dir</name>
        <value>/home/hadoop/data</value>
    </property>
    
    
    <!-- 设置 Hadoop web UI 用户身份 -->
    <property>
        <name>hadoop.http.staticuser.user</name>
        <value>hadoop</value>
    </property>
    
    <!-- 整合 Hive 用户代理设置 -->
    <property>
        <name>hadoop.proxyuser.root.hosts</name>
        <value>*</value>
    </property>
    
    <!-- 文件垃圾桶保存时间 -->
    <property>
        <name>fs.trash.interval</name>
        <value>1440</value>
    </property>
    

    4、配置 hdfs-site.xml 文件

    该文件将会覆盖 hdfs-default.xml 中的配置。

    vi etc/hadoop/hdfs-site.xml
    

    添加下面的内容:

    <!-- 设置 SNN 进程运行机器位置信息 -->
    <property>
        <name>dfs.namenode.secondary.http-address</name>
        <value>node2:9868</value>
    </property>
    

    5、配置 mapred-site.xml 文件

    该文件将会覆盖 mapred-default.xml 中的配置。

    vi etc/hadoop/mapred-site.xml
    

    添加下面的内容:

    <!-- 设置 MR 程序默认运行模式:yarn 集群模式,local 本地模式-->
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
    
    <!-- MR 程序历史服务地址 -->
    <property>
        <name>mapreduce.jobhistory.address</name>
        <value>node1:10020</value>
    </property>
    
    <!-- MR 程序历史服务器 web 端地址 -->
    <property>
        <name>mapreduce.jobhistory.webapp.address</name>
        <value>node1:19888</value>
    </property>
    
    <property>
        <name>yarn.app.mapreduce.am.env</name>
        <value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
    </property>
    
    <property>
        <name>mapreduce.map.env</name>
        <value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
    </property>
    
    <property>
        <name>mapreduce.reduce.env</name>
        <value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
    </property>
    

    6、配置 yarn-site.xml 文件

    该文件将会覆盖 yarn-default.xml 中的配置。

    vi etc/hadoop/yarn-site.xml
    

    添加下面的内容:

    <!-- 设置 YARN 集群主角色运行机器位置 -->
    <property>
        <name>yarn.resourcemanager.hostname</name>
        <value>node1</value>
    </property>
    
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
    
    <!-- 是否对容器实施物理内存限制 -->
    <property>
        <name>yarn.nodemanager.pmem-check-enabled</name>
        <value>false</value>
    </property>
    
    <!-- 是否对容器实施虚拟内存限制 -->
    <property>
        <name>yarn.nodemanager.vmem-check-enabled</name>
        <value>false</value>
    </property>
    
    <!-- 开启日志聚集-->
    <property>
        <name>yarn.log-aggregation-enable</name>
        <value>true</value>
    </property>
    
    <!-- 设置 yarn 历史服务器地址 -->
    <property>
        <name>yarn.log.server.url</name>
        <value>http://node1:19888/jobhistory/logs</value>
    </property>
    

    7、配置 workers 文件

    vi etc/hadoop/workers
    

    删除原来内容,并添加下面的内容:

    node1.hadoop.com
    node2.hadoop.com
    node3.hadoop.com
    

    8、将配置好的安装包复制到 node2 和 node3 机器。

    scp -r /home/hadoop/hadoop-3.3.2 hadoop@node2:/home/hadoop/
    scp -r /home/hadoop/hadoop-3.3.2 hadoop@node3:/home/hadoop/
    

    启动集群

    Hadoop 提供了两种启动方式:

    • 使用命令逐个启动进程 – 每台机器都要手动执行命令,可精准控制每个进程的启动。
    • 使用脚本一键启动 – 前提是要配置好机器之间的 SSH 免密登录和 etc/hadoop/workers 文件。

    逐个启动进程的命令

    # HDFS 集群
    $HADOOP_HOME/bin/hdfs --daemon start namenode | datanode | secondarynamenode
    
    # YARN 集群
    $HADOOP_HOME/bin/yarn --daemon start resourcemanager | nodemanager | proxyserver
    

    启动集群的脚本

    • HDFS 集群 – $HADOOP_HOME/sbin/start-dfs.sh,一键启动 HDFS 集群的所有进程。
    • YARN 集群 – $HADOOP_HOME/sbin/start-yarn.sh,一键启动 YARN 集群的所有进程
    • Hadoop 集群 – $HADOOP_HOME/sbin/start-all.sh,一键启动 HDFS 集群和 YARN 集群的所有进程。

    1、格式化文件系统

    启动集群之前,需要对 HDFS 进行格式化(仅在 node1 机器执行)。

    [hadoop@node1 ~]$ hdfs namenode -format
    WARNING: /home/hadoop/hadoop-3.3.2/logs does not exist. Creating.
    2022-03-17 23:22:55,296 INFO namenode.NameNode: STARTUP_MSG:
    /************************************************************
    STARTUP_MSG: Starting NameNode
    STARTUP_MSG:   host = node1/192.168.153.11
    STARTUP_MSG:   args = [-format]
    STARTUP_MSG:   version = 3.3.2
    STARTUP_MSG:   classpath = /home/hadoop/hadoop-3.3.2/etc/hadoop:/home/hadoop/hadoop-3.3.2/share/hadoop/common/lib/accessors-smart-2.4.7.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/common/lib/animal-sniffer-annotations-1.17.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/common/lib/asm-5.0.4.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/common/lib/audience-annotations-0.5.0.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/common/lib/avro-1.7.7.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/common/lib/checker-qual-2.5.2.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/common/lib/commons-beanutils-1.9.4.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/common/lib/commons-cli-1.2.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/common/lib/commons-codec-1.11.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/common/lib/commons-collections-3.2.2.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/common/lib/commons-compress-1.21.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/common/lib/commons-configuration2-2.1.1.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/common/lib/commons-daemon-1.0.13.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/common/lib/commons-io-2.8.0.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/common/lib/commons-lang3-3.12.0.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/common/lib/commons-logging-1.1.3.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/common/lib/commons-math3-3.1.1.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/common/lib/commons-net-3.6.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/common/lib/commons-text-1.4.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/common/lib/curator-client-4.2.0.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/common/lib/curator-framework-4.2.0.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/common/lib/curator-recipes-4.2.0.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/common/lib/dnsjava-2.1.7.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/common/lib/failureaccess-1.0.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/common/lib/gson-2.8.9.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/common/lib/guava-27.0-jre.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/common/lib/hadoop-annotations-3.3.2.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/common/lib/hadoop-auth-3.3.2.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/common/lib/hadoop-shaded-guava-1.1.1.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/common/lib/hadoop-shaded-protobuf_3_7-1.1.1.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/common/lib/httpclient-4.5.13.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/common/lib/httpcore-4.4.13.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/common/lib/j2objc-annotations-1.1.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/common/lib/jackson-annotations-2.13.0.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/common/lib/jackson-core-2.13.0.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/common/lib/jackson-core-asl-1.9.13.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/common/lib/jackson-databind-2.13.0.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/common/lib/jackson-jaxrs-1.9.13.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/common/lib/jackson-mapper-asl-1.9.13.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/common/lib/jackson-xc-1.9.13.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/common/lib/jakarta.activation-api-1.2.1.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/common/lib/javax.servlet-api-3.1.0.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/common/lib/jaxb-api-2.2.11.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/common/lib/jaxb-impl-2.2.3-1.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/common/lib/jcip-annotations-1.0-1.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/common/lib/jersey-core-1.19.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/common/lib/jersey-json-1.19.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/common/lib/jersey-server-1.19.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/common/lib/jersey-servlet-1.19.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/common/lib/jettison-1.1.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/common/lib/jetty-http-9.4.43.v20210629.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/common/lib/jetty-io-9.4.43.v20210629.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/common/lib/jetty-security-9.4.43.v20210629.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/common/lib/jetty-server-9.4.43.v20210629.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/common/lib/jetty-servlet-9.4.43.v20210629.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/common/lib/jetty-util-9.4.43.v20210629.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/common/lib/jetty-util-ajax-9.4.43.v20210629.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/common/lib/jetty-webapp-9.4.43.v20210629.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/common/lib/jetty-xml-9.4.43.v20210629.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/common/lib/jsch-0.1.55.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/common/lib/json-smart-2.4.7.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/common/lib/jsp-api-2.1.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/common/lib/jsr305-3.0.2.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/common/lib/jsr311-api-1.1.1.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/common/lib/jul-to-slf4j-1.7.30.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/common/lib/kerb-admin-1.0.1.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/common/lib/kerb-client-1.0.1.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/common/lib/kerb-common-1.0.1.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/common/lib/kerb-core-1.0.1.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/common/lib/kerb-crypto-1.0.1.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/common/lib/kerb-identity-1.0.1.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/common/lib/kerb-server-1.0.1.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/common/lib/kerb-simplekdc-1.0.1.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/common/lib/kerb-util-1.0.1.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/common/lib/kerby-asn1-1.0.1.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/common/lib/kerby-config-1.0.1.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/common/lib/kerby-pkix-1.0.1.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/common/lib/kerby-util-1.0.1.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/common/lib/kerby-xdr-1.0.1.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/common/lib/listenablefuture-9999.0-empty-to-avoid-conflict-with-guava.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/common/lib/log4j-1.2.17.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/common/lib/metrics-core-3.2.4.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/common/lib/netty-3.10.6.Final.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/common/lib/nimbus-jose-jwt-9.8.1.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/common/lib/paranamer-2.3.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/common/lib/protobuf-java-2.5.0.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/common/lib/re2j-1.1.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/common/lib/slf4j-api-1.7.30.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/common/lib/slf4j-log4j12-1.7.30.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/common/lib/snappy-java-1.1.8.2.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/common/lib/stax2-api-4.2.1.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/common/lib/token-provider-1.0.1.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/common/lib/woodstox-core-5.3.0.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/common/lib/zookeeper-3.5.6.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/common/lib/zookeeper-jute-3.5.6.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/common/hadoop-common-3.3.2-tests.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/common/hadoop-common-3.3.2.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/common/hadoop-kms-3.3.2.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/common/hadoop-nfs-3.3.2.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/common/hadoop-registry-3.3.2.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/hdfs:/home/hadoop/hadoop-3.3.2/share/hadoop/hdfs/lib/accessors-smart-2.4.7.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/hdfs/lib/animal-sniffer-annotations-1.17.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/hdfs/lib/asm-5.0.4.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/hdfs/lib/audience-annotations-0.5.0.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/hdfs/lib/avro-1.7.7.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/hdfs/lib/checker-qual-2.5.2.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/hdfs/lib/commons-beanutils-1.9.4.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/hdfs/lib/commons-cli-1.2.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/hdfs/lib/commons-codec-1.11.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/hdfs/lib/commons-collections-3.2.2.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/hdfs/lib/commons-compress-1.21.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/hdfs/lib/commons-configuration2-2.1.1.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/hdfs/lib/commons-daemon-1.0.13.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/hdfs/lib/commons-io-2.8.0.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/hdfs/lib/commons-lang3-3.12.0.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/hdfs/lib/commons-logging-1.1.3.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/hdfs/lib/commons-math3-3.1.1.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/hdfs/lib/commons-net-3.6.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/hdfs/lib/commons-text-1.4.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/hdfs/lib/curator-client-4.2.0.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/hdfs/lib/curator-framework-4.2.0.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/hdfs/lib/curator-recipes-4.2.0.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/hdfs/lib/dnsjava-2.1.7.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/hdfs/lib/failureaccess-1.0.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/hdfs/lib/gson-2.8.9.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/hdfs/lib/guava-27.0-jre.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/hdfs/lib/hadoop-annotations-3.3.2.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/hdfs/lib/hadoop-auth-3.3.2.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/hdfs/lib/hadoop-shaded-guava-1.1.1.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/hdfs/lib/hadoop-shaded-protobuf_3_7-1.1.1.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/hdfs/lib/httpclient-4.5.13.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/hdfs/lib/httpcore-4.4.13.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/hdfs/lib/j2objc-annotations-1.1.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/hdfs/lib/jackson-annotations-2.13.0.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/hdfs/lib/jackson-core-2.13.0.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/hdfs/lib/jackson-core-asl-1.9.13.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/hdfs/lib/jackson-databind-2.13.0.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/hdfs/lib/jackson-jaxrs-1.9.13.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/hdfs/lib/jackson-mapper-asl-1.9.13.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/hdfs/lib/jackson-xc-1.9.13.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/hdfs/lib/jakarta.activation-api-1.2.1.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/hdfs/lib/javax.servlet-api-3.1.0.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/hdfs/lib/jaxb-api-2.2.11.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/hdfs/lib/jaxb-impl-2.2.3-1.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/hdfs/lib/jcip-annotations-1.0-1.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/hdfs/lib/jersey-core-1.19.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/hdfs/lib/jersey-json-1.19.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/hdfs/lib/jersey-server-1.19.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/hdfs/lib/jersey-servlet-1.19.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/hdfs/lib/jettison-1.1.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/hdfs/lib/jetty-http-9.4.43.v20210629.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/hdfs/lib/jetty-io-9.4.43.v20210629.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/hdfs/lib/jetty-security-9.4.43.v20210629.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/hdfs/lib/jetty-server-9.4.43.v20210629.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/hdfs/lib/jetty-servlet-9.4.43.v20210629.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/hdfs/lib/jetty-util-9.4.43.v20210629.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/hdfs/lib/jetty-util-ajax-9.4.43.v20210629.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/hdfs/lib/jetty-webapp-9.4.43.v20210629.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/hdfs/lib/jetty-xml-9.4.43.v20210629.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/hdfs/lib/jsch-0.1.55.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/hdfs/lib/json-simple-1.1.1.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/hdfs/lib/json-smart-2.4.7.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/hdfs/lib/jsr305-3.0.2.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/hdfs/lib/jsr311-api-1.1.1.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/hdfs/lib/kerb-admin-1.0.1.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/hdfs/lib/kerb-client-1.0.1.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/hdfs/lib/kerb-common-1.0.1.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/hdfs/lib/kerb-core-1.0.1.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/hdfs/lib/kerb-crypto-1.0.1.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/hdfs/lib/kerb-identity-1.0.1.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/hdfs/lib/kerb-server-1.0.1.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/hdfs/lib/kerb-simplekdc-1.0.1.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/hdfs/lib/kerb-util-1.0.1.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/hdfs/lib/kerby-asn1-1.0.1.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/hdfs/lib/kerby-config-1.0.1.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/hdfs/lib/kerby-pkix-1.0.1.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/hdfs/lib/kerby-util-1.0.1.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/hdfs/lib/kerby-xdr-1.0.1.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/hdfs/lib/leveldbjni-all-1.8.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/hdfs/lib/listenablefuture-9999.0-empty-to-avoid-conflict-with-guava.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/hdfs/lib/log4j-1.2.17.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/hdfs/lib/netty-3.10.6.Final.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/hdfs/lib/netty-all-4.1.68.Final.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/hdfs/lib/nimbus-jose-jwt-9.8.1.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/hdfs/lib/okhttp-2.7.5.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/hdfs/lib/okio-1.6.0.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/hdfs/lib/paranamer-2.3.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/hdfs/lib/protobuf-java-2.5.0.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/hdfs/lib/re2j-1.1.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/hdfs/lib/snappy-java-1.1.8.2.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/hdfs/lib/stax2-api-4.2.1.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/hdfs/lib/token-provider-1.0.1.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/hdfs/lib/woodstox-core-5.3.0.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/hdfs/lib/zookeeper-3.5.6.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/hdfs/lib/zookeeper-jute-3.5.6.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/hdfs/hadoop-hdfs-3.3.2-tests.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/hdfs/hadoop-hdfs-3.3.2.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/hdfs/hadoop-hdfs-client-3.3.2-tests.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/hdfs/hadoop-hdfs-client-3.3.2.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/hdfs/hadoop-hdfs-httpfs-3.3.2.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/hdfs/hadoop-hdfs-native-client-3.3.2-tests.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/hdfs/hadoop-hdfs-native-client-3.3.2.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/hdfs/hadoop-hdfs-nfs-3.3.2.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/hdfs/hadoop-hdfs-rbf-3.3.2-tests.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/hdfs/hadoop-hdfs-rbf-3.3.2.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/mapreduce/hadoop-mapreduce-client-app-3.3.2.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/mapreduce/hadoop-mapreduce-client-common-3.3.2.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/mapreduce/hadoop-mapreduce-client-core-3.3.2.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/mapreduce/hadoop-mapreduce-client-hs-3.3.2.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/mapreduce/hadoop-mapreduce-client-hs-plugins-3.3.2.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-3.3.2-tests.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-3.3.2.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/mapreduce/hadoop-mapreduce-client-nativetask-3.3.2.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/mapreduce/hadoop-mapreduce-client-shuffle-3.3.2.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/mapreduce/hadoop-mapreduce-client-uploader-3.3.2.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.3.2.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/yarn:/home/hadoop/hadoop-3.3.2/share/hadoop/yarn/lib/HikariCP-java7-2.4.12.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/yarn/lib/aopalliance-1.0.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/yarn/lib/asm-analysis-9.1.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/yarn/lib/asm-commons-9.1.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/yarn/lib/asm-tree-9.1.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/yarn/lib/bcpkix-jdk15on-1.60.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/yarn/lib/bcprov-jdk15on-1.60.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/yarn/lib/ehcache-3.3.1.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/yarn/lib/fst-2.50.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/yarn/lib/geronimo-jcache_1.0_spec-1.0-alpha-1.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/yarn/lib/guice-4.0.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/yarn/lib/guice-servlet-4.0.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/yarn/lib/jackson-jaxrs-base-2.13.0.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/yarn/lib/jackson-jaxrs-json-provider-2.13.0.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/yarn/lib/jackson-module-jaxb-annotations-2.13.0.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/yarn/lib/jakarta.xml.bind-api-2.3.3.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/yarn/lib/java-util-1.9.0.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/yarn/lib/javax-websocket-client-impl-9.4.43.v20210629.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/yarn/lib/javax-websocket-server-impl-9.4.43.v20210629.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/yarn/lib/javax.inject-1.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/yarn/lib/javax.websocket-api-1.0.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/yarn/lib/javax.websocket-client-api-1.0.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/yarn/lib/javax.ws.rs-api-2.1.1.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/yarn/lib/jersey-client-1.19.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/yarn/lib/jersey-guice-1.19.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/yarn/lib/jetty-annotations-9.4.43.v20210629.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/yarn/lib/jetty-client-9.4.43.v20210629.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/yarn/lib/jetty-jndi-9.4.43.v20210629.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/yarn/lib/jetty-plus-9.4.43.v20210629.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/yarn/lib/jline-3.9.0.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/yarn/lib/jna-5.2.0.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/yarn/lib/json-io-2.5.1.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/yarn/lib/metrics-core-3.2.4.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/yarn/lib/mssql-jdbc-6.2.1.jre7.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/yarn/lib/objenesis-2.6.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/yarn/lib/snakeyaml-1.26.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/yarn/lib/swagger-annotations-1.5.4.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/yarn/lib/websocket-api-9.4.43.v20210629.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/yarn/lib/websocket-client-9.4.43.v20210629.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/yarn/lib/websocket-common-9.4.43.v20210629.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/yarn/lib/websocket-server-9.4.43.v20210629.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/yarn/lib/websocket-servlet-9.4.43.v20210629.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/yarn/hadoop-yarn-api-3.3.2.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/yarn/hadoop-yarn-applications-distributedshell-3.3.2.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/yarn/hadoop-yarn-applications-mawo-core-3.3.2.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/yarn/hadoop-yarn-applications-unmanaged-am-launcher-3.3.2.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/yarn/hadoop-yarn-client-3.3.2.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/yarn/hadoop-yarn-common-3.3.2.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/yarn/hadoop-yarn-registry-3.3.2.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/yarn/hadoop-yarn-server-applicationhistoryservice-3.3.2.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/yarn/hadoop-yarn-server-common-3.3.2.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/yarn/hadoop-yarn-server-nodemanager-3.3.2.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/yarn/hadoop-yarn-server-resourcemanager-3.3.2.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/yarn/hadoop-yarn-server-router-3.3.2.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/yarn/hadoop-yarn-server-sharedcachemanager-3.3.2.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/yarn/hadoop-yarn-server-tests-3.3.2.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/yarn/hadoop-yarn-server-timeline-pluginstorage-3.3.2.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/yarn/hadoop-yarn-server-web-proxy-3.3.2.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/yarn/hadoop-yarn-services-api-3.3.2.jar:/home/hadoop/hadoop-3.3.2/share/hadoop/yarn/hadoop-yarn-services-core-3.3.2.jar
    STARTUP_MSG:   build = git@github.com:apache/hadoop.git -r 0bcb014209e219273cb6fd4152df7df713cbac61; compiled by 'chao' on 2022-02-21T18:39Z
    STARTUP_MSG:   java = 1.8.0_322
    ************************************************************/
    2022-03-17 23:22:55,312 INFO namenode.NameNode: registered UNIX signal handlers for [TERM, HUP, INT]
    2022-03-17 23:22:55,408 INFO namenode.NameNode: createNameNode [-format]
    2022-03-17 23:22:55,800 INFO namenode.NameNode: Formatting using clusterid: CID-4271710c-605c-44fe-be87-6cbbcbb60338
    2022-03-17 23:22:55,834 INFO namenode.FSEditLog: Edit logging is async:true
    2022-03-17 23:22:55,870 INFO namenode.FSNamesystem: KeyProvider: null
    2022-03-17 23:22:55,872 INFO namenode.FSNamesystem: fsLock is fair: true
    2022-03-17 23:22:55,873 INFO namenode.FSNamesystem: Detailed lock hold time metrics enabled: false
    2022-03-17 23:22:55,886 INFO namenode.FSNamesystem: fsOwner                = hadoop (auth:SIMPLE)
    2022-03-17 23:22:55,886 INFO namenode.FSNamesystem: supergroup             = supergroup
    2022-03-17 23:22:55,886 INFO namenode.FSNamesystem: isPermissionEnabled    = true
    2022-03-17 23:22:55,886 INFO namenode.FSNamesystem: isStoragePolicyEnabled = true
    2022-03-17 23:22:55,886 INFO namenode.FSNamesystem: HA Enabled: false
    2022-03-17 23:22:55,930 INFO common.Util: dfs.datanode.fileio.profiling.sampling.percentage set to 0. Disabling file IO profiling
    2022-03-17 23:22:55,940 INFO blockmanagement.DatanodeManager: dfs.block.invalidate.limit: configured=1000, counted=60, effected=1000
    2022-03-17 23:22:55,941 INFO blockmanagement.DatanodeManager: dfs.namenode.datanode.registration.ip-hostname-check=true
    2022-03-17 23:22:55,944 INFO blockmanagement.BlockManager: dfs.namenode.startup.delay.block.deletion.sec is set to 000:00:00:00.000
    2022-03-17 23:22:55,944 INFO blockmanagement.BlockManager: The block deletion will start around 2022 Mar 17 23:22:55
    2022-03-17 23:22:55,947 INFO util.GSet: Computing capacity for map BlocksMap
    2022-03-17 23:22:55,947 INFO util.GSet: VM type       = 64-bit
    2022-03-17 23:22:55,950 INFO util.GSet: 2.0% max memory 839.5 MB = 16.8 MB
    2022-03-17 23:22:55,950 INFO util.GSet: capacity      = 2^21 = 2097152 entries
    2022-03-17 23:22:55,959 INFO blockmanagement.BlockManager: Storage policy satisfier is disabled
    2022-03-17 23:22:55,959 INFO blockmanagement.BlockManager: dfs.block.access.token.enable = false
    2022-03-17 23:22:55,968 INFO blockmanagement.BlockManagerSafeMode: dfs.namenode.safemode.threshold-pct = 0.999
    2022-03-17 23:22:55,968 INFO blockmanagement.BlockManagerSafeMode: dfs.namenode.safemode.min.datanodes = 0
    2022-03-17 23:22:55,968 INFO blockmanagement.BlockManagerSafeMode: dfs.namenode.safemode.extension = 30000
    2022-03-17 23:22:55,969 INFO blockmanagement.BlockManager: defaultReplication         = 3
    2022-03-17 23:22:55,969 INFO blockmanagement.BlockManager: maxReplication             = 512
    2022-03-17 23:22:55,969 INFO blockmanagement.BlockManager: minReplication             = 1
    2022-03-17 23:22:55,969 INFO blockmanagement.BlockManager: maxReplicationStreams      = 2
    2022-03-17 23:22:55,969 INFO blockmanagement.BlockManager: redundancyRecheckInterval  = 3000ms
    2022-03-17 23:22:55,969 INFO blockmanagement.BlockManager: encryptDataTransfer        = false
    2022-03-17 23:22:55,969 INFO blockmanagement.BlockManager: maxNumBlocksToLog          = 1000
    2022-03-17 23:22:55,996 INFO namenode.FSDirectory: GLOBAL serial map: bits=29 maxEntries=536870911
    2022-03-17 23:22:55,996 INFO namenode.FSDirectory: USER serial map: bits=24 maxEntries=16777215
    2022-03-17 23:22:55,996 INFO namenode.FSDirectory: GROUP serial map: bits=24 maxEntries=16777215
    2022-03-17 23:22:55,996 INFO namenode.FSDirectory: XATTR serial map: bits=24 maxEntries=16777215
    2022-03-17 23:22:56,023 INFO util.GSet: Computing capacity for map INodeMap
    2022-03-17 23:22:56,023 INFO util.GSet: VM type       = 64-bit
    2022-03-17 23:22:56,023 INFO util.GSet: 1.0% max memory 839.5 MB = 8.4 MB
    2022-03-17 23:22:56,023 INFO util.GSet: capacity      = 2^20 = 1048576 entries
    2022-03-17 23:22:56,024 INFO namenode.FSDirectory: ACLs enabled? true
    2022-03-17 23:22:56,024 INFO namenode.FSDirectory: POSIX ACL inheritance enabled? true
    2022-03-17 23:22:56,024 INFO namenode.FSDirectory: XAttrs enabled? true
    2022-03-17 23:22:56,025 INFO namenode.NameNode: Caching file names occurring more than 10 times
    2022-03-17 23:22:56,030 INFO snapshot.SnapshotManager: Loaded config captureOpenFiles: false, skipCaptureAccessTimeOnlyChange: false, snapshotDiffAllowSnapRootDescendant: true, maxSnapshotLimit: 65536
    2022-03-17 23:22:56,033 INFO snapshot.SnapshotManager: SkipList is disabled
    2022-03-17 23:22:56,037 INFO util.GSet: Computing capacity for map cachedBlocks
    2022-03-17 23:22:56,037 INFO util.GSet: VM type       = 64-bit
    2022-03-17 23:22:56,037 INFO util.GSet: 0.25% max memory 839.5 MB = 2.1 MB
    2022-03-17 23:22:56,037 INFO util.GSet: capacity      = 2^18 = 262144 entries
    2022-03-17 23:22:56,047 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.window.num.buckets = 10
    2022-03-17 23:22:56,047 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.num.users = 10
    2022-03-17 23:22:56,047 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.windows.minutes = 1,5,25
    2022-03-17 23:22:56,051 INFO namenode.FSNamesystem: Retry cache on namenode is enabled
    2022-03-17 23:22:56,051 INFO namenode.FSNamesystem: Retry cache will use 0.03 of total heap and retry cache entry expiry time is 600000 millis
    2022-03-17 23:22:56,053 INFO util.GSet: Computing capacity for map NameNodeRetryCache
    2022-03-17 23:22:56,053 INFO util.GSet: VM type       = 64-bit
    2022-03-17 23:22:56,053 INFO util.GSet: 0.029999999329447746% max memory 839.5 MB = 257.9 KB
    2022-03-17 23:22:56,053 INFO util.GSet: capacity      = 2^15 = 32768 entries
    2022-03-17 23:22:56,080 INFO namenode.FSImage: Allocated new BlockPoolId: BP-571583129-192.168.153.11-1647530576071
    2022-03-17 23:22:56,101 INFO common.Storage: Storage directory /home/hadoop/data/dfs/name has been successfully formatted.
    2022-03-17 23:22:56,128 INFO namenode.FSImageFormatProtobuf: Saving image file /home/hadoop/data/dfs/name/current/fsimage.ckpt_0000000000000000000 using no compression
    2022-03-17 23:22:56,226 INFO namenode.FSImageFormatProtobuf: Image file /home/hadoop/data/dfs/name/current/fsimage.ckpt_0000000000000000000 of size 401 bytes saved in 0 seconds .
    2022-03-17 23:22:56,241 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
    2022-03-17 23:22:56,259 INFO namenode.FSNamesystem: Stopping services started for active state
    2022-03-17 23:22:56,260 INFO namenode.FSNamesystem: Stopping services started for standby state
    2022-03-17 23:22:56,264 INFO namenode.FSImage: FSImageSaver clean checkpoint: txid=0 when meet shutdown.
    2022-03-17 23:22:56,264 INFO namenode.NameNode: SHUTDOWN_MSG:
    /************************************************************
    SHUTDOWN_MSG: Shutting down NameNode at node1/192.168.153.11
    ************************************************************/
    [hadoop@node1 ~]$
    

    2、启动 HDFS 集群

    start-dfs.sh
    

    该脚本将会启动 NameNode 守护进程和 DataNode 守护进程:

    [hadoop@node1 hadoop-3.3.2]$ start-dfs.sh
    Starting namenodes on [node1]
    Starting datanodes
    node1.hadoop.com: Warning: Permanently added 'node1.hadoop.com' (ECDSA) to the list of known hosts.
    node3.hadoop.com: ssh: Could not resolve hostname node3.hadoop.com: Name or service not known
    node2.hadoop.com: ssh: Could not resolve hostname node2.hadoop.com: Name or service not known
    Starting secondary namenodes [node2]
    node2: WARNING: /home/hadoop/hadoop-3.3.2/logs does not exist. Creating.
    [hadoop@node1 hadoop-3.3.2]$
    [hadoop@node1 hadoop-3.3.2]$ jps
    5001 DataNode
    5274 Jps
    4863 NameNode
    [hadoop@node1 hadoop-3.3.2]$
    

    启动成功后,可以在浏览器访问 NameNode 的 Web 界面(默认端口:9870):

    image20220317174601o4mnfh7.png

    3、启动 YARN 集群

    start-yarn.sh
    

    该脚本将会启动 ResourceManager 守护进程和 NodeManager 守护进程:

    [hadoop@node1 hadoop-3.3.2]$ start-yarn.sh
    Starting resourcemanager
    Starting nodemanagers
    node3.hadoop.com: ssh: Could not resolve hostname node3.hadoop.com: Name or service not known
    node2.hadoop.com: ssh: Could not resolve hostname node2.hadoop.com: Name or service not known
    [hadoop@node1 hadoop-3.3.2]$
    [hadoop@node1 hadoop-3.3.2]$ jps
    5536 NodeManager
    5395 ResourceManager
    5001 DataNode
    5867 Jps
    4863 NameNode
    [hadoop@node1 hadoop-3.3.2]$
    

    启动成功后,可以在浏览器访问 ResourceManager 的 Web 界面(默认端口:8088):

    [外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-7ei9XEpg-1647588030424)(https://b3logfile.com/file/2022/03/image-20220317174702-hyary7k-fa3546d4.png)]

    除了 start-dfs.shstart-yarn.sh 脚本,也可使用 start-all.sh 脚本,一次性启动 Hadoop 的所有进程。

    停止集群

    和启动集群一样,Hadoop 提供了两种方式停止集群。

    逐个终止进程的命令

    # HDFS 集群
    $HADOOP_HOME/bin/hdfs --daemon stop namenode | datanode | secondarynamenode
    
    # YARN 集群
    $HADOOP_HOME/bin/yarn --daemon stop resourcemanager | nodemanager | proxyserver
    

    停止集群的脚本

    • HDFS 集群 – $HADOOP_HOME/sbin/stop-dfs.sh,一键终止 HDFS 集群的所有进程。
    • YARN 集群 – $HADOOP_HOME/sbin/stop-yarn.sh,一键终止 YARN 集群的所有进程
    • Hadoop 集群 – $HADOOP_HOME/sbin/stop-all.sh,一键终止 HDFS 集群和 YARN 集群的所有进程。

    使用 stop-all.sh 脚本,一次性停止 Hadoop 的所有进程。

    [hadoop@node1 hadoop-3.3.2]$ stop-all.sh
    WARNING: Stopping all Apache Hadoop daemons as hadoop in 10 seconds.
    WARNING: Use CTRL-C to abort.
    Stopping namenodes on [node1]
    Stopping datanodes
    node2.hadoop.com: ssh: Could not resolve hostname node2.hadoop.com: Name or service not known
    node3.hadoop.com: ssh: Could not resolve hostname node3.hadoop.com: Name or service not known
    Stopping secondary namenodes [node2]
    Stopping nodemanagers
    node3.hadoop.com: ssh: Could not resolve hostname node3.hadoop.com: Name or service not known
    node2.hadoop.com: ssh: Could not resolve hostname node2.hadoop.com: Name or service not known
    Stopping resourcemanager
    [hadoop@node1 hadoop-3.3.2]$
    

    相关资料

    Hadoop: Setting up a Single Node Cluster

    Hadoop Cluster Setup

    How To Install Apache Hadoop / HBase on CentOS 7

    2022最新黑马程序员大数据Hadoop入门视频教程_哔哩哔哩_bilibili

    更多相关内容
  • hadoop-2.7.3.tar.gz 下载 hadoop tar 包下载

    热门讨论 2018-05-17 15:42:49
    hadoop-2.7.3.tar.gz 下载 目前相对比较稳定的版本 hadoop tar 包下载hadoop-2.7.3.tar.gz 下载 hadoop tar 包下载
  • hadoop-2.6.0编译好的64bit的native库

    千次下载 热门讨论 2015-03-23 10:45:10
    在64位系统上运行Hadoop 2.6.0会出现以下提示: 用缺省的32位native库(/opt/hadoop-2.6.0/lib/native)会出现下面的错误: 14/01/27 10:52:34 WARN util.NativeCodeLoader: Unable to load native-hadoop library for ...
  • hadoop介绍

    万次阅读 2022-04-08 14:20:50
    大数据与Hadoop Hadoop模块: Hadoop Common: Hadoop 分布式文件系统 (HDFS): Hadoop YARN: Hadoop MapReduce: Hadoop生态圈组件: Spark(分布式计算框架) Flink(分布式计算框架) Zookeeper...

    目录

    大数据与Hadoop

    Hadoop模块:

    Hadoop Common:

    Hadoop 分布式文件系统 (HDFS):

    Hadoop YARN:

    Hadoop MapReduce:

    Hadoop生态圈组件:

    Spark(分布式计算框架)

    Flink(分布式计算框架)

    Zookeeper(分布式协作服务)

    Sqoop(数据同步工具)

    Hive/Impala(基于Hadoop的数据仓库)

    HBase(分布式列存储数据库)

    Kafka(分布式消息队列)

    Tez(数据流编程框架)

    Hadoop的特点


    大数据与Hadoop

            提起Hadoop,大家想到的一定是大数据,现在的Hadoop和大数据已经密不可分了,那么究竟什么是大数据呢?大数据一般来说可以用五个字总结:大 多 值 快 信,因为他们的英文单词首字母都是V 因此也叫5V

    一、Volume:大 数据的采集,计算,存储量都非常的庞大。

    二、Variety:多 种类和来源多样化。种类有:结构化、半结构化和非结构化数据等,常见的来源有:网络日志、音频、视频、图片等等。

    三、Value:值 数据价值密度相对较低,犹如浪里淘金,百炼成钢般才能获取到大量信息中的部分有价值的信息

    四、Velocity:快 数据增长速度快,处理速度也快,获取数据的速度也要快。

    五、Veracity:信 数据的准确性和可信赖度,即数据的质量。

            Hadoop是一个框架,它允许使用简单的编程模型跨计算机集群分布式处理大型数据集。它可以从单个服务器扩展到数千台机器,每台机器都提供本地计算和存储。Hadoop本身不是依靠硬件来提供高可用性,而是设计用于检测和处理应用层的故障,因此在计算机集群之上提供高可用性服务,但实际每台计算机都可能容易出现故障。

    官网地址:Apache Hadoop 

    Hadoop模块:

    Hadoop Common

            支持其他 Hadoop 模块的通用实用程序。

    Hadoop 分布式文件系统 (HDFS)

            一种分布式文件系统,可提供对应用程序数据的高吞吐量访问。源自于Google的GFS论文,发表于2003年10月,HDFS是GFS克隆版。HDFS是Hadoop体系中数据存储管理的基础。它是一个高度容错的系统,能检测和应对硬件故障,用于在低成本的通用硬件上运行。HDFS简化了文件的一致性模型,通过流式数据访问,提供高吞吐量应用程序数据访问功能,适合带有大型数据集的应用程序。它提供了一次写入多次读取的机制,数据以块的形式,同时分布在集群不同物理机器上。

    Hadoop YARN

            作业调度和集群资源管理的框架。YARN是下一代MapReduce,即MRv2,是在第一代MapReduce基础上演变而来的,主要是为了解决原始Hadoop扩展性较差,不支持多计算框架而提出的。Yarn是下一代 Hadoop 计算平台,Yarn是一个通用的运行时框架,用户可以编写自己的计算框架,在该运行环境中运行。用于自己编写的框架作为客户端的一个lib,在运用提交作业时打包即可。

    Hadoop MapReduce

            基于 YARN 的系统,用于并行处理大型数据集。源自于google的MapReduce论文,发表于2004年12月,Hadoop MapReduce是google MapReduce 克隆版。MapReduce是一种分布式计算模型,用以进行大数据量的计算。它屏蔽了分布式计算框架细节,将计算抽象成map和reduce两部分,其中Map对数据集上的独立元素进行指定的操作,生成键-值对形式中间结果。Reduce则对中间结果中相同“键”的所有“值”进行规约,以得到最终结果。MapReduce非常适合在大量计算机组成的分布式并行环境里进行数据处理。

    Hadoop生态圈组件:

            因为Hadoop体系也是一个计算框架,在这个框架下,可以使用一种简单的编程模式,通过多台计算机构成的集群,分布式处理大数据集。Hadoop是可扩展的,它可以方便地从单一服务器扩展到数千台服务器,每台服务器进行本地计算和存储。除了依赖于硬件交付的高可用性,软件库本身也提供数据保护,并可以在应用层做失败处理,从而在计算机集群的顶层提供高可用服务。Hadoop核心生态圈组件如下图所示:

     说句题外话:是的,很像是一个动物园,让程序员来起名字就是这个样子,所以果然还是术业有专攻,起名字这种事情,以后还是交给专业的人来吧,这些名字不代表他们的功能!

    看一张架构图,这就是实际企业级大数据应用的架构:

    Spark(分布式计算框架)

    Spark是一种基于内存的分布式并行计算框架,不同于MapReduce的是——Job中间输出结果可以保存在内存中,从而不再需要读写HDFS,因此Spark能更好地适用于数据挖掘与机器学习等需要迭代的MapReduce的算法。

    Cluster Manager:在standalone模式中即为Master主节点,控制整个集群,监控worker。在YARN模式中为资源管理器。

    Spark将数据抽象为RDD(弹性分布式数据集),内部提供了大量的库,包括Spark Core、Spark SQL、Spark Streaming、MLlib、GraphX。 开发者可以在同一个应用程序中无缝组合使用这些库。

    Spark Core:包含Spark的基本功能;尤其是定义RDD的API、操作以及这两者上的动作。其他Spark的库都是构建在RDD和Spark Core之上的

    Spark SQL:提供通过Apache Hive的SQL变体Hive查询语言(HiveQL)与Spark进行交互的API。每个数据库表被当做一个RDD,Spark SQL查询被转换为Spark操作。

    Spark Streaming:对实时数据流进行处理和控制。Spark Streaming允许程序能够像普通RDD一样处理实时数据,通过短时批处理实现的伪流处理。

    GraphX:控制图、并行图操作和计算的一组算法和工具的集合。GraphX扩展了RDD API,包含控制图、创建子图、访问路径上所有顶点的操作

    Flink(分布式计算框架)

    Flink是一个基于内存的分布式并行处理框架,类似于Spark,但在部分设计思想有较大出入。对 Flink 而言,其所要处理的主要场景就是流数据,批数据只是流数据的一个极限特例而已。

    Flink VS Spark

    Spark中,RDD在运行时是表现为Java Object,而Flink主要表现为logical plan。所以在Flink中使用的类Dataframe api是被作为第一优先级来优化的。但是相对来说在spark RDD中就没有了这块的优化了。

    Spark中,对于批处理有RDD,对于流式有DStream,不过内部实际还是RDD抽象;在Flink中,对于批处理有DataSet,对于流式我们有DataStreams,但是是同一个公用的引擎之上两个独立的抽象,并且Spark是伪流处理,而Flink是真流处理。

    Zookeeper(分布式协作服务)

    解决分布式环境下的数据管理问题:统一命名,状态同步,集群管理,配置同步等。

    Hadoop的许多组件依赖于Zookeeper,它运行在计算机集群上面,用于管理Hadoop操作。

    Sqoop(数据同步工具)

    Sqoop是SQL-to-Hadoop的缩写,主要用于传统数据库和Hadoop之前传输数据。数据的导入和导出本质上是Mapreduce程序,充分利用了MR的并行化和容错性。

    Sqoop利用数据库技术描述数据架构,用于在关系数据库、数据仓库和Hadoop之间转移数据。

    Hive/Impala(基于Hadoop的数据仓库)

    Hive定义了一种类似SQL的查询语言(HQL),将SQL转化为MapReduce任务在Hadoop上执行。通常用于离线分析。

    HQL用于运行存储在Hadoop上的查询语句,Hive让不熟悉MapReduce开发人员也能编写数据查询语句,然后这些语句被翻译为Hadoop上面的MapReduce任务。

    Impala是用于处理存储在Hadoop集群中的大量数据的MPP(大规模并行处理)SQL查询引擎。 它是一个用C ++和Java编写的开源软件。 与Apache Hive不同,Impala不基于MapReduce算法。 它实现了一个基于守护进程的分布式架构,它负责在同一台机器上运行的查询执行的所有方面。因此执行效率高于Apache Hive。

    HBase(分布式列存储数据库)

    HBase是一个建立在HDFS之上,面向列的针对结构化数据的可伸缩、高可靠、高性能、分布式和面向列的动态模式数据库。

    HBase采用了BigTable的数据模型:增强的稀疏排序映射表(Key/Value),其中,键由行关键字、列关键字和时间戳构成。

    HBase提供了对大规模数据的随机、实时读写访问,同时,HBase中保存的数据可以使用MapReduce来处理,它将数据存储和并行计算完美地结合在一起。

    Kafka(分布式消息队列)

    Kafka是一种高吞吐量的分布式发布订阅消息系统,它可以处理消费者规模的网站中的所有动作流数据。实现了主题、分区及其队列模式以及生产者、消费者架构模式。

    生产者组件和消费者组件均可以连接到KafKa集群,而KafKa被认为是组件通信之间所使用的一种消息中间件。KafKa内部氛围很多Topic(一种高度抽象的数据结构),每个Topic又被分为很多分区(partition),每个分区中的数据按队列模式进行编号存储。被编号的日志数据称为此日志数据块在队列中的偏移量(offest),偏移量越大的数据块越新,即越靠近当前时间。生产环境中的最佳实践架构是Flume+KafKa+Spark Streaming。

    Tez(数据流编程框架)

    基于YARN建立,提供强大而灵活的引擎,可执行任意有向无环图(DAG)数据处理任务,既支持批处理又支持交互式的用户场景。Tez已经被Hive、Pig等Hadoop生态圈的组件所采用,用来替代 MapReduce作为底层执行引擎。

    Hadoop的特点

    总的来说,基于Hadoop开发出来的大数据平台,通常具有以下特点。

    1)扩容能力:能够可靠地存储和处理PB级的数据。Hadoop生态基本采用HDFS作为存储组件,吞吐量高、稳定可靠。

    2)成本低:可以利用廉价、通用的机器组成的服务器群分发、处理数据。这些服务器群总计可达数千个节点。

    3)高效率:通过分发数据,Hadoop可以在数据所在节点上并行处理,处理速度非常快。

    4)可靠性:Hadoop能自动维护数据的多份备份,并且在任务失败后能自动重新部署计算任务。

    Hadoop生态同时也存在不少缺点。

    1)因为Hadoop采用文件存储系统,所以读写时效性较差,至今没有一款既支持快速更新又支持高效查询的组件。

    2)Hadoop生态系统日趋复杂,组件之间的兼容性差,安装和维护比较困难。

    3)Hadoop各个组件功能相对单一,优点很明显,缺点也很明显。

    4)云生态对Hadoop的冲击十分明显,云厂商定制化组件导致版本分歧进一步扩大,无法形成合力。

    5)整体生态基于Java开发,容错性较差,可用性不高,组件容易挂掉。

    展开全文
  • 大数据Hadoop入门

    千次阅读 2022-03-09 11:39:50
    目录 一、Hadoop运行环境搭建 1.1 IP和主机名配置 1.2 centos最小化安装需要的配置: 1.3 修改主机名和hosts文件 1.4 Hadoop102安装JDK 1.5 Hadoop102安装Hadoophadoop102安装Hadoop 2.5 Hadoop目录结构 3.1 本地...

    目录

    一、Hadoop运行环境搭建

    1.1 IP和主机名配置

    1.2 centos最小化安装需要的配置:

    1.3 修改主机名和hosts文件

    1.4 Hadoop102安装JDK

    1.5 Hadoop102安装Hadoop

    在hadoop102安装Hadoop

    2.5 Hadoop目录结构

    3.1 本地运行模式(官方WordCount)

    3.2 完全分布式运行模式(开发重点)

    3.2.1 虚拟机准备

    3.2.2 编写集群分发脚本xsync

    3.2.3 SSH无密登录配置

    3.2.4 集群配置

    3.2.5 群起集群

    配置历史服务器

    3.2.7 配置日志的聚集

    3.2.8 集群启动/停止方式总结

    3.2.9 编写Hadoop集群常用脚本

    一、Hadoop运行环境搭建

    先安装好VMware15并且创建一台虚拟机,这里使用的是CentOS-7.5-x86-1804

    1.1 IP和主机名配置

    1)点击vmware的“编辑” =>虚拟网络编辑器(N)...

    2) 点击“VMnet8"后点击”更改设置“

    3) 而后再次点击VMnet8,修改 子网IP地址为:192.168.10.0(IP可以任意取值只要不为192.168.1.0即可)

    4) 修改完成后,点击NAT设置,将网关的地址修改与子网IP在同一网段。这里网关IP设置为192.168.10.2 ;随后点击确定=>确定

     5)  配置主机的IP,网络设置内找到“VMnet8",=>点击属性=>双击Internet协议版本4(TCP/IPv4)对默认网关及DNS服务器进行修改。


    1.2 centos最小化安装需要的配置:

    1)hadoop100虚拟机配置要求如下(本文Linux系统全部以CentOS-7.5-x86-1804为例)

    (1)使用yum安装需要虚拟机可以正常上网,yum安装前可以先测试下虚拟机联网情况
    [root@hadoop100 ~]# ping www.baidu.com
    PING www.baidu.com (14.215.177.39) 56(84) bytes of data.
    64 bytes from 14.215.177.39 (14.215.177.39): icmp_seq=1 ttl=128 time=8.60 ms
    64 bytes from 14.215.177.39 (14.215.177.39): icmp_seq=2 ttl=128 time=7.72 ms
    (2)安装epel-release
    注:Extra Packages for Enterprise Linux是为“红帽系”的操作系统提供额外的软件包,适用于RHEL、CentOS和Scientific Linux。相当于是一个软件仓库,大多数rpm包在官方 repository 中是找不到的)
    [root@hadoop100 ~]# yum install -y epel-release

    (1)使用yum安装需要虚拟机可以正常上网,yum安装前可以先测试下虚拟机联网情况

    [root@hadoop100 ~]# ping www.baidu.com

    PING www.baidu.com (14.215.177.39) 56(84) bytes of data.

    64 bytes from 14.215.177.39 (14.215.177.39): icmp_seq=1 ttl=128 time=8.60 ms

    64 bytes from 14.215.177.39 (14.215.177.39): icmp_seq=2 ttl=128 time=7.72 ms

    (2)安装epel-release

    注:Extra Packages for Enterprise Linux是为“红帽系”的操作系统提供额外的软件包,适用于RHEL、CentOS和Scientific Linux。相当于是一个软件仓库,大多数rpm包在官方 repository 中是找不到的)

    [root@hadoop100 ~]# yum install -y epel-release

                    安装工具包
                            net-tool:工具包集合,包含ifconfig等命令
                            [root@hadoop100 ~]# yum install -y net-tools 
                    vim:编辑器
                            [root@hadoop100 ~]# yum install -y vim

    2)关闭防火墙,关闭防火墙开机自启
    [root@hadoop100 ~]# systemctl stop firewalld
    [root@hadoop100 ~]# systemctl disable firewalld.service
    3)创建atguigu用户,并修改jason用户的密码
    [root@hadoop100 ~]# useradd jason
    [root@hadoop100 ~]# passwd jason
    4)配置atguigu用户具有root权限,方便后期加sudo执行root权限的命令
    [root@hadoop100 ~]# vim /etc/sudoers
    修改/etc/sudoers文件,在%wheel这行下面添加一行,如下所示:
    ## Allow root to run any commands anywhere
    root    ALL=(ALL)     ALL

    ## Allows people in group wheel to run all commands
    %wheel  ALL=(ALL)       ALL
    atguigu   ALL=(ALL)     NOPASSWD:ALL
    注意:atguigu这一行不要直接放到root行下面,因为所有用户都属于wheel组,你先配置了atguigu具有免密功能,但是程序执行到%wheel行时,该功能又被覆盖回需要密码。所以atguigu要放到%wheel这行下面。
    5)在/opt目录下创建文件夹,并修改所属主和所属组
    (1)在/opt目录下创建module、software文件夹
    [root@hadoop100 ~]# mkdir /opt/module
    [root@hadoop100 ~]# mkdir /opt/software
    (2)修改module、software文件夹的所有者和所属组均为jason用户 
    [root@hadoop100 ~]# chown jason:jason /opt/module 
    [root@hadoop100 ~]# chown jason:jason /opt/software
    (3)查看module、software文件夹的所有者和所属组
    [root@hadoop100 ~]# cd /opt/
    [root@hadoop100 opt]# ll
    总用量 12
    drwxr-xr-x. 2 jason jason 4096 5月  28 17:18 module
    drwxr-xr-x. 2 root    root    4096 9月   7 2017 rh
    drwxr-xr-x. 2 jason jason 4096 5月  28 17:18 software
    6)卸载虚拟机自带的JDK
        注意:如果你的虚拟机是最小化安装不需要执行这一步。
    [root@hadoop100 ~]# rpm -qa | grep -i java | xargs -n1 rpm -e --nodeps 
    rpm -qa:查询所安装的所有rpm软件包
    grep -i:忽略大小写
    xargs -n1:表示每次只传递一个参数
    rpm -e –nodeps:强制卸载软件
    7)重启虚拟机
    [root@hadoop100 

     vim /etc/sysconfig/network-scripts/ifcfg-ens33 

    注意黑体字部分: 

    1.3 修改主机名和hosts文件

    1)修改主机名称
    [ root@hadoop100 ~] # vim /etc/hostname
    hadoop100
    2)配置Linux克隆机主机名称映射hosts文件,打开/etc/hosts
    | [root@hadoop100 ~]# vim /etc/hosts
    添加如下内容:
    192.168.10.100 hadoop100
    192.168.10.101 hadoop101
    192.168.10.102 hadoop102
    192.168.10.103 hadoop103
    192.168.10.104 hadoop104
    192.168.10.105 hadoop105
    192.168.10.106 hadoop106
    192.168.10.107 hadoop107
    192.168.10.108 hadoop108
    3)重启克隆机hadoop100 
    [ root@hadoop100 ~] # reboot

    4)修改windows的主机映射文件(hosts 文件) 

    (1)如果操作系统是window7, 可以直接修改
    (a)进入C:\Windows\System32)drivers\etc路径e
    (b)打开hosts文件并添加如下内容,然后保存e
    192.168.10.100 hadoop100
    192.168.10.101 hadoop101
    192.168.10.102 hadoop102
    192.168.10.103 hadoop103
    192.168.10.104 hadoop104
    192.168.10.105 hadoop105
    192.168.10.106 hadoop106
    192.168.10.107 hadoop107
    192.168.10.108 hadoop108
    (2)如果操作系统是window10,先拷贝出来,修改保存以后,再覆盖即可
            (a)进入C:\Windows\System32\drivers\etc路径
            (b)拷贝hosts文件到桌面
            (c)打开桌面hosts文件并添加如下内容
    192.168.10.100 hadoop100
    192.168.10.101 hadoop101
    192.168.10.102 hadoop102
    192.168.10.103 hadoop103
    192.168.10.104 hadoop104
    192.168.10.105 hadoop105
    192.168.10.106 hadoop106
    192.168.10.107 hadoop107
    192.168.10.108 hadoop108

            (d)将桌面hosts文件覆盖C:\Windows\System32\drivers\etc 路径hosts文件

    这样便可以使用主机名称代替IP 地址

     选择“创建完整克隆”

     多克隆三台虚拟机

    克隆完毕后使用以下命令将hostname修改为对应的主机名 ; IPADDR修改为对应网址 

    [ root@hadoop100 ~]# vim /etc/sysconfig/network- scripts/ifcfg- ens33
    [ root@hadoop100 ~]# vim /etc/hostname

    三台虚拟机都修改完毕后,我们可以查看ifconfig,已经网络是连同ping

    1.4 Hadoop102安装JDK

    将jdk安装包上传到/opt/software

    然后对jdk进行解压到/opt/module文件夹下:tar -zxvf jdk-8u212-linux-x64.tar.gz -C /opt/module/

    配置jdk环境变量,传统是将路径配置放在/etc/profile下面,在这里我们采用自己创建配置文件的方式:

    [jason@hadoop102 jdk1.8.0_212]$ cd /etc/profile.d

    [jason@hadoop102 profile.d]$ sudo vim my_env.sh        添加下面内容:

       #JAVA_HOME
    export JAVA_HOME=/opt/module/jdk1.8.0_212
    export PATH=$PATH:$JAVA_HOME/bin

    重新加载:[jason@hadoop102 profile.d]$ source /etc/profile

    验证是否成功:[jason@hadoop102 profile.d]$ java

    1.5 Hadoop102安装Hadoop

    在hadoop102安装Hadoop

    Hadoop下载地址:https://archive.apache.org/dist/hadoop/common/hadoop-3.1.3/

    1)XShell文件传输工具将hadoop-3.1.3.tar.gz导入到opt目录下面的software文件夹下面

    2)进入到Hadoop安装包路径下

    [jason@hadoop102 ~]$ cd /opt/software/

    3)解压安装文件到/opt/module下面

    [jason@hadoop102 software]$ tar -zxvf hadoop-3.1.3.tar.gz -C /opt/module/

    4)查看是否解压成功

    [jason@hadoop102 software]$ ls /opt/module/

    hadoop-3.1.3

    5)将Hadoop添加到环境变量

    (1)获取Hadoop安装路径

    [jason@hadoop102 hadoop-3.1.3]$ pwd

    /opt/module/hadoop-3.1.3

    (2)打开/etc/profile.d/my_env.sh文件

    [jason@hadoop102 hadoop-3.1.3]$ sudo vim /etc/profile.d/my_env.sh

    • 在my_env.sh文件末尾添加如下内容:(shift+g)

    #HADOOP_HOME

    export HADOOP_HOME=/opt/module/hadoop-3.1.3

    export PATH=$PATH:$HADOOP_HOME/bin

    export PATH=$PATH:$HADOOP_HOME/sbin

    • 保存并退出: :wq

    (3)让修改后的文件生效

    [jason@hadoop102 hadoop-3.1.3]$ source /etc/profile

    6)测试是否安装成功

    [jason@hadoop102 hadoop-3.1.3]$ hadoop version

    Hadoop 3.1.3

    7)重启如果Hadoop命令不能用再重启虚拟机)

    [jason@hadoop102 hadoop-3.1.3]$ sudo reboot

    2.5 Hadoop目录结构

    1)查看Hadoop目录结构

    2)重要目录

    (1)bin目录:存放对Hadoop相关服务(hdfs,yarn,mapred)进行操作的脚本

    (2)etc目录:Hadoop的配置文件目录,存放Hadoop的配置文件

    (3)lib目录:存放Hadoop的本地库(对数据进行压缩解压缩功能)

    (4)sbin目录:存放启动或停止Hadoop相关服务的脚本

    (5)share目录:存放Hadoop的依赖jar包、文档、和官方案例

    三、 Hadoop运行模式

    1)Hadoop官方网站:Apache Hadoop

    2)Hadoop运行模式包括:本地模式伪分布式模式以及完全分布式模式

    • 本地模式:单机运行,只是用来演示一下官方案例。生产环境不用。
    • 伪分布式模式:也是单机运行,但是具备Hadoop集群的所有功能,一台服务器模拟一个分布式的环境。个别缺钱的公司用来测试,生产环境不用。

    完全分布式模式:多台服务器组成分布式环境。生产环境使用。

    3.1 本地运行模式(官方WordCount)

    1创建在hadoop-3.1.3文件下面创建一个wcinput文件夹

    [jason@hadoop102 hadoop-3.1.3]$ mkdir wcinput

    2在wcinput文件下创建一个word.txt文件

    [jason@hadoop102 hadoop-3.1.3]$ cd wcinput

    3编辑word.txt文件

    [jason@hadoop102 wcinput]$ vim word.txt

    • 在文件中输入如下内容

    hadoop yarn
    hadoop mapreduce
    jason
    jason

    • 保存退出::wq

    4回到Hadoop目录/opt/module/hadoop-3.1.3

    5执行程序

    [jason@hadoop102 hadoop-3.1.3]$ hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.3.jar wordcount wcinput wcoutput

    6查看结果

    [jason@hadoop102 hadoop-3.1.3]$ cat wcoutput/part-r-00000

    看到如下结果:

    jason 2
    hadoop  2
    mapreduce       1
    yarn    1

    3.2 完全分布式运行模式(开发重点

    分析:

    1)准备3台客户机(关闭防火墙、静态IP、主机名称

    2)安装JDK

    3)配置环境变量

    4)安装Hadoop

    5)配置环境变量

    6)配置集群

    7)单点启动

    8配置ssh

    9群起并测试集群

    3.2.1 虚拟机准备

    详见2.1、2.2两节

    3.2.2 编写集群分发脚本xsync

    1)scp(secure copy安全拷贝

    (1)scp定义

    scp可以实现服务器与服务器之间的数据拷贝。(from server1 to server2)

    (2)基本语法

    scp    -r        $pdir/$fname             $user@$host:$pdir/$fname

    命令   递归     要拷贝的文件路径/名称   目的地用户@主机:目的地路径/名称

    (3)案例实操

    • 前提:在hadoop102、hadoop103、hadoop104都已经创建好的/opt/module、           /opt/software两个目录,并且已经把这两个目录修改为jason:jason

    [jason@hadoop102 ~]$ sudo chown jason:jason -R /opt/module

    (a)在hadoop102上,将hadoop102中/opt/module/jdk1.8.0_212目录拷贝到hadoop103上。

    [jason@hadoop102 ~]$ scp -r /opt/module/jdk1.8.0_212  jason@hadoop103:/opt/module

    (b)在hadoop103上,将hadoop102中/opt/module/hadoop-3.1.3目录拷贝到hadoop103上。

    [jason@hadoop103 ~]$ scp -r jason@hadoop102:/opt/module/hadoop-3.1.3 /opt/module/

    (c)在hadoop103上操作,将hadoop102中/opt/module目录下所有目录拷贝到hadoop104上。

    [jason@hadoop103 opt]$ scp -r jason@hadoop102:/opt/module/* jason@hadoop104:/opt/module

    2)rsync远程同步工具

    rsync主要用于备份和镜像。具有速度快、避免复制相同内容和支持符号链接的优点。

    rsync和scp区别:rsync做文件的复制要比scp的速度快,rsync只对差异文件做更新。scp是把所有文件都复制过去。

    (1)基本语法

    rsync    -av       $pdir/$fname             $user@$host:$pdir/$fname

    命令   选项参数   要拷贝的文件路径/名称   目的地用户@主机:目的地路径/名称

      选项参数说明

    选项

    功能

    -a

    归档拷贝

    -v

    显示复制过程

    3.2.3 SSH无密登录配置

    1配置ssh

    (1)基本语法

    ssh另一台电脑的IP地址

    (2)ssh连接时出现Host key verification failed的解决方法

    [jason@hadoop102 ~]$ ssh hadoop103

    • 如果出现如下内容

    Are you sure you want to continue connecting (yes/no)?

    • 输入yes,并回车

    (3)退回到hadoop102

    [jason@hadoop103 ~]$ exit

    2无密钥配置

    (1)免密登录原理 

    (2)生成公钥和私钥

    [jason@hadoop102 .ssh]$ pwd

    /home/jason/.ssh

    [jason@hadoop102 .ssh]$ ssh-keygen -t rsa

    然后敲(三个回车),就会生成两个文件id_rsa(私钥)、id_rsa.pub(公钥)

    (3)将公钥拷贝到要免密登录的目标机器上

    [jason@hadoop102 .ssh]$ ssh-copy-id hadoop102

    [jason@hadoop102 .ssh]$ ssh-copy-id hadoop103

    [jason@hadoop102 .ssh]$ ssh-copy-id hadoop104

    注意

    还需要在hadoop103采用jason账号配置一下无密登录到hadoop102hadoop103、hadoop104服务器上。

    还需要在hadoop104采用jason账号配置一下无密登录到hadoop102hadoop103、hadoop104服务器上。

    还需要在hadoop102采用root账号,配置一下无密登录到hadoop102hadoop103hadoop104

    3.ssh文件夹下(~/.ssh)的文件功能解释

    known_hosts

    记录ssh访问过计算机的公钥(public key)

    id_rsa

    生成的私钥

    id_rsa.pub

    生成的公钥

    authorized_keys

    存放授权过的无密登录服务器公钥

    资源下载:core-site.xml、hdfs-site.xml、yarn-site.xml、mapred-site.xml-Hadoop文档类资源-CSDN下载

    3.2.4 集群配置

    1集群部署规划

    注意:

    • NameNode和SecondaryNameNode不要安装在同一台服务器
    • ResourceManager也很消耗内存,不要和NameNode、SecondaryNameNode配置在同一台机器上。

    hadoop102

    hadoop103

    hadoop104

    HDFS

    NameNode

    DataNode

    DataNode

    SecondaryNameNode

    DataNode

    YARN

    NodeManager

    ResourceManager

    NodeManager

    NodeManager

    2)配置文件说明

    Hadoop配置文件分两类:默认配置文件和自定义配置文件,只有用户想修改某一默认配置值时,才需要修改自定义配置文件,更改相应属性值。

    (1)默认配置文件:

    要获取的默认文件

    文件存放在Hadoop的jar包中的位置

    [core-default.xml]

    hadoop-common-3.1.3.jar/core-default.xml

    [hdfs-default.xml]

    hadoop-hdfs-3.1.3.jar/hdfs-default.xml

    [yarn-default.xml]

    hadoop-yarn-common-3.1.3.jar/yarn-default.xml

    [mapred-default.xml]

    hadoop-mapreduce-client-core-3.1.3.jar/mapred-default.xml

    (2)自定义配置文件:

    core-site.xmlhdfs-site.xmlyarn-site.xmlmapred-site.xml四个配置文件存放在$HADOOP_HOME/etc/hadoop这个路径上,用户可以根据项目需求重新进行修改配置。

    3配置集群

    (1)核心配置文件

    配置core-site.xml

    [jason@hadoop102 ~]$ cd $HADOOP_HOME/etc/hadoop

    [jason@hadoop102 hadoop]$ vim core-site.xml

    文件内容如下:

    <?xml version="1.0" encoding="UTF-8"?>
    
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
    
    
    <configuration>
    
        <!-- 指定NameNode的地址 -->
    
        <property>
    
            <name>fs.defaultFS</name>
    
            <value>hdfs://hadoop102:8020</value>
    
        </property>
    
        <!-- 指定hadoop数据的存储目录 -->
    
        <property>
    
            <name>hadoop.tmp.dir</name>
    
            <value>/opt/module/hadoop-3.1.3/data</value>
    
        </property>
    
    
        <!-- 配置HDFS网页登录使用的静态用户为jason -->
    
        <property>
    
            <name>hadoop.http.staticuser.user</name>
    
            <value>jason</value>
    
        </property>
    
    </configuration>

    (2)HDFS配置文件

    配置hdfs-site.xml

    [jason@hadoop102 hadoop]$ vim hdfs-site.xml

    文件内容如下:

    <?xml version="1.0" encoding="UTF-8"?>
    
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
    
    
    
    <configuration>
    
    <!-- nn web端访问地址-->
    
    <property>
    
            <name>dfs.namenode.http-address</name>
    
            <value>hadoop102:9870</value>
    
        </property>
    
    <!-- 2nn web端访问地址-->
    
        <property>
    
            <name>dfs.namenode.secondary.http-address</name>
    
            <value>hadoop104:9868</value>
    
        </property>
    
    </configuration>

    (3)YARN配置文件

    配置yarn-site.xml

    [jason@hadoop102 hadoop]$ vim yarn-site.xml

    文件内容如下:

    <?xml version="1.0" encoding="UTF-8"?>
    
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
    
    
    
    <configuration>
    
        <!-- 指定MR走shuffle -->
    
        <property>
    
            <name>yarn.nodemanager.aux-services</name>
    
            <value>mapreduce_shuffle</value>
    
        </property>
    
    
    
        <!-- 指定ResourceManager的地址-->
    
        <property>
    
            <name>yarn.resourcemanager.hostname</name>
    
            <value>hadoop103</value>
    
        </property>
    
    
    
        <!-- 环境变量的继承 -->
    
        <property>
    
            <name>yarn.nodemanager.env-whitelist</name>
    
            <value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value>
    
        </property>
    
    </configuration>

    (4)MapReduce配置文件

    配置mapred-site.xml

    [jason@hadoop102 hadoop]$ vim mapred-site.xml

    文件内容如下:

    <?xml version="1.0" encoding="UTF-8"?>
    
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
    
    
    
    <configuration>
    
    <!-- 指定MapReduce程序运行在Yarn上 -->
    
        <property>
    
            <name>mapreduce.framework.name</name>
    
            <value>yarn</value>
    
        </property>
    
    </configuration>

    4)在集群上分发配置好的Hadoop配置文件

    [jason@hadoop102 hadoop]$ xsync /opt/module/hadoop-3.1.3/etc/hadoop/

    5)去103和104上查看文件分发情况

    [jason@hadoop103 ~]$ cat /opt/module/hadoop-3.1.3/etc/hadoop/core-site.xml

    [jason@hadoop104 ~]$ cat /opt/module/hadoop-3.1.3/etc/hadoop/core-site.xml

    3.2.5 群起集群

    1)配置workers

    [jason@hadoop102 hadoop]$ vim /opt/module/hadoop-3.1.3/etc/hadoop/workers

    在该文件中增加如下内容:

    hadoop102

    hadoop103

    hadoop104

    注意:该文件中添加的内容结尾不允许有空格,文件中不允许有空行。

    同步所有节点配置文件

    [jason@hadoop102 hadoop]$ xsync /opt/module/hadoop-3.1.3/etc

    2)启动集群

    (1)如果集群是第一次启动,需要在hadoop102节点格式化NameNode(注意:格式化NameNode会产生新的集群id导致NameNode和DataNode的集群id不一致,集群不到已数据。如果集群在运行过程中报错,需要重新格式化NameNode的话,一定要停止namenode和datanode进程,并且要删除所有机器的datalogs目录,然后再进行格式化。

    [jason@hadoop102 hadoop-3.1.3]$ hdfs namenode -format

    2)启动HDFS

    [jason@hadoop102 hadoop-3.1.3]$ sbin/start-dfs.sh

    可以使用jps查看hdfs是否配置正确:

    (3)在配置了ResourceManager的节点(hadoop103启动YARN

    [jason@hadoop103 hadoop-3.1.3]$ sbin/start-yarn.sh

    (4)Web端查看HDFS的NameNode

    (a)浏览器中输入:http://hadoop102:9870

    (b)查看HDFS上存储的数据信息

    (5)Web端查看YARN的ResourceManager

    (a)浏览器中输入:http://hadoop103:8088

    (b)查看YARN上运行的Job信息

    3)集群基本测试

    (1)上传文件到集群

    • 上传小文件

    [jason@hadoop102 ~]$ hadoop fs -mkdir /input

    [jason@hadoop102 ~]$ hadoop fs -put $HADOOP_HOME/wcinput/word.txt /input

    • 上传大文件

    [jason@hadoop102 ~]$ hadoop fs -put  /opt/software/jdk-8u212-linux-x64.tar.gz  /

    (2)上传文件后查看文件存放在什么位置

    • 查看HDFS文件存储路径

    [jason@hadoop102 subdir0]$ pwd

    /opt/module/hadoop-3.1.3/data/dfs/data/current/BP-1436128598-192.168.10.102-1610603650062/current/finalized/subdir0/subdir0

    • 查看HDFS在磁盘存储文件内容

    [jason@hadoop102 subdir0]$ cat blk_1073741825

    hadoop yarn

    hadoop mapreduce

    jason

    jason

    (3)拼接

    -rw-rw-r--. 1 jason jason 134217728 5月  23 16:01 blk_1073741836

    -rw-rw-r--. 1 jason jason   1048583 5月  23 16:01 blk_1073741836_1012.meta

    -rw-rw-r--. 1 jason jason  63439959 5月  23 16:01 blk_1073741837

    -rw-rw-r--. 1 jason jason    495635 5月  23 16:01 blk_1073741837_1013.meta

    [jason@hadoop102 subdir0]$ cat blk_1073741836>>tmp.tar.gz

    [jason@hadoop102 subdir0]$ cat blk_1073741837>>tmp.tar.gz

    [jason@hadoop102 subdir0]$ tar -zxvf tmp.tar.gz

    (4)下载

    [jason@hadoop104 software]$ hadoop fs -get /jdk-8u212-linux-x64.tar.gz ./

    (5)执行wordcount程序

    [jason@hadoop102 hadoop-3.1.3]$ hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.3.jar wordcount /input /output

    新手指南:集群崩溃处理方式,请记住,1杀进程,2删除所有data与log,3重启,4格式化

    1.停掉所有进程

    [jason@hadoop102 hadoop-3.1.3]$ sbin/stop-dfs.sh

    2.删除集群多有data与logs文件夹

    [jason@hadoop102 hadoop-3.1.3]$ rm -rf dada/ logs/

    [jason@hadoop103 hadoop-3.1.3]$ rm -rf dada/ logs/

    [jason@hadoop104 hadoop-3.1.3]$ rm -rf dada/ logs/

    3.格式化

    [jason@hadoop102 hadoop-3.1.3]$ hdfs namenode -format

    4.启动集群

    [jason@hadoop102 hadoop-3.1.3]$ sbin/start-dfs.sh

    配置历史服务器

    为了查看程序的历史运行情况,需要配置一下历史服务器。具体配置步骤如下:

    1配置mapred-site.xml

    [jason@hadoop102 hadoop]$ vim mapred-site.xml

    在该文件里面增加如下配置。

    <!-- 历史服务器端地址 -->
    <property>
    
        <name>mapreduce.jobhistory.address</name>
    
        <value>hadoop102:10020</value>
    
    </property>
    
    <!-- 历史服务器web端地址 -->
    <property>
    
        <name>mapreduce.jobhistory.webapp.address</name>
    
        <value>hadoop102:19888</value>
    
    </property>

    2)分发配置

    [jason@hadoop102 hadoop]$ xsync $HADOOP_HOME/etc/hadoop/mapred-site.xml

    3)在hadoop102启动历史服务器

    [jason@hadoop102 hadoop]$ mapred --daemon start historyserver

    4查看历史服务器是否启动

    [jason@hadoop102 hadoop]$ jps

    5查看JobHistory

    http://hadoop102:19888/jobhistory

    3.2.7 配置日志的聚集

    日志聚集概念:应用运行完成以后,将程序运行日志信息上传到HDFS系统上。

    日志聚集功能好处:可以方便的查看到程序运行详情,方便开发调试。

    注意:开启日志聚集功能,需要重新启动NodeManager 、ResourceManager和HistoryServer

    开启日志聚集功能具体步骤如下:

    1)配置yarn-site.xml

    [jason@hadoop102 hadoop]$ vim yarn-site.xml

    在该文件里面增加如下配置。

    <!-- 开启日志聚集功能 -->

    <property>

        <name>yarn.log-aggregation-enable</name>

        <value>true</value>

    </property>

    <!-- 设置日志聚集服务器地址 -->

    <property>  

        <name>yarn.log.server.url</name>  

        <value>http://hadoop102:19888/jobhistory/logs</value>

    </property>

    <!-- 设置日志保留时间为7天 -->

    <property>

        <name>yarn.log-aggregation.retain-seconds</name>

        <value>604800</value>

    </property>

    2)分发配置

    [jason@hadoop102 hadoop]$ xsync $HADOOP_HOME/etc/hadoop/yarn-site.xml

    3)关闭NodeManager 、ResourceManager和HistoryServer

    [jason@hadoop103 hadoop-3.1.3]$ sbin/stop-yarn.sh

    [jason@hadoop103 hadoop-3.1.3]$ mapred --daemon stop historyserver

    4启动NodeManager 、ResourceManageHistoryServer

    [jason@hadoop103 ~]$ start-yarn.sh

    [jason@hadoop102 ~]$ mapred --daemon start historyserver

    5)删除HDFS上已经存在的输出文件

    [jason@hadoop102 ~]$ hadoop fs -rm -r /output

    6)执行WordCount程序

    [jason@hadoop102 hadoop-3.1.3]$ hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.3.jar wordcount /input /output

    7)查看日志

    (1)历史服务器地址

    http://hadoop102:19888/jobhistory

    (2)历史任务列表

    (3)查看任务运行日志

    (4)运行日志详情

     

    3.2.8 集群启动/停止方式总结

    1各个模块分开启动/停止(配置ssh是前提)常用

    (1)整体启动/停止HDFS

    start-dfs.sh/stop-dfs.sh

    (2)整体启动/停止YARN

    start-yarn.sh/stop-yarn.sh

    2各个服务组件逐一启动/停止

    (1)分别启动/停止HDFS组件

    hdfs --daemon start/stop namenode/datanode/secondarynamenode

    (2)启动/停止YARN

    yarn --daemon start/stop  resourcemanager/nodemanager

    3.2.9 编写Hadoop集群常用脚本

    1)Hadoop集群启停脚本(包含HDFS,Yarn,Historyserver):myhadoop.sh

    [jason@hadoop102 ~]$ cd /home/jason/bin

    [jason@hadoop102 bin]$ vim myhadoop.sh

    • 输入如下内容
    #!/bin/bash
    
    
    
    if [ $# -lt 1 ]
    
    then
    
        echo "No Args Input..."
    
        exit ;
    
    fi
    
    
    
    case $1 in
    
    "start")
    
            echo " =================== 启动 hadoop集群 ==================="
    
    
    
            echo " --------------- 启动 hdfs ---------------"
    
            ssh hadoop102 "/opt/module/hadoop-3.1.3/sbin/start-dfs.sh"
    
            echo " --------------- 启动 yarn ---------------"
    
            ssh hadoop103 "/opt/module/hadoop-3.1.3/sbin/start-yarn.sh"
    
            echo " --------------- 启动 historyserver ---------------"
    
            ssh hadoop102 "/opt/module/hadoop-3.1.3/bin/mapred --daemon start historyserver"
    
    ;;
    
    "stop")
    
            echo " =================== 关闭 hadoop集群 ==================="
    
    
    
            echo " --------------- 关闭 historyserver ---------------"
    
            ssh hadoop102 "/opt/module/hadoop-3.1.3/bin/mapred --daemon stop historyserver"
    
            echo " --------------- 关闭 yarn ---------------"
    
            ssh hadoop103 "/opt/module/hadoop-3.1.3/sbin/stop-yarn.sh"
    
            echo " --------------- 关闭 hdfs ---------------"
    
            ssh hadoop102 "/opt/module/hadoop-3.1.3/sbin/stop-dfs.sh"
    
    ;;
    
    *)
    
        echo "Input Args Error..."
    
    ;;
    
    esac
    • 保存后退出,然后赋予脚本执行权限

    [jason@hadoop102 bin]$ chmod +x myhadoop.sh

    2)查看三台服务器Java进程脚本:jpsall

    [jason@hadoop102 ~]$ cd /home/jason/bin

    [jason@hadoop102 bin]$ vim jpsall

    • 输入如下内容
    #!/bin/bash
    
    
    
    for host in hadoop102 hadoop103 hadoop104
    
    do
    
            echo =============== $host ===============
    
            ssh $host jps
    
    done
    • 保存后退出,然后赋予脚本执行权限

    [jason@hadoop102 bin]$ chmod +x jpsall

    3)分发/home/jason/bin目录,保证自定义脚本在三台机器上都可以使用

    [jason@hadoop102 ~]$ xsync /home/jason/bin/

    展开全文
  • hadoop-eclipse-plugin-2.6.0.jar程序文件

    千次下载 热门讨论 2014-12-20 17:39:40
    It takes me two days to find the root cause for the exceptions and finally generate this jar file for hadoop 2.6.0. 1.use hadoop 2.6.0. 2.change the hadoop_home and path to point to hadoop 2.6.0 3....
  • hadoop_docker上传至linux服务器,此处用的centos7,脚本里一些安装命令是cetos环境的。部署时最好使用同版本hadoop-3.1.3,因为有个配置与该版本以后的版本不同,防止后续使用出错。 3.2 hadoop_docker目录下的...

    [注意]

    • 此环境只能用于学习,用的是弱口令:000000;而且开放很多端口不安全。
    • 搭建环境建议使用最低12g内存机器;测试时使用的是 4核cpu,4g内存环境集群启动不起来,解决办法是创建8g虚拟内存,创建虚拟内存链接:创建虚拟内存

    1 集群部署结构

    在这里插入图片描述

    2 环境

    • 提前安装好docker/docker-compose。
    • 安装好nginx,主要是为了通过本地浏览器访问网页,因为有集群有3个节点,需要做反向代理,配置文件nginx.conf在下文给出。

    3 在宿主机上防火墙开启以下端口

    映射端口hadoop102hadoop103hadoop104
    -200223002240022
    8042280423804248042
    8088280883808848088
    9864298643986449864
    9868298683986849868
    9870298703987049870
    1988819888--

    宿主机需要一共打开23个端口;还有一个80端口,用于测试nginx是否启动成功。

    4 部署

    4.1 部署脚本目录

    在这里插入图片描述

    将hadoop_docker上传至linux服务器,此处用的centos7,脚本里一些安装命令是cetos环境的。部署时最好使用同版本hadoop-3.1.3,因为有个配置与该版本以后的版本不同,防止后续使用出错。

    4.2 hadoop_docker目录下的文件

    [1]hadoop_docker/config-default
    hadoop_docker/config-default目录下的内容是hadoop-3.1.3.tar.gz解压缩后hadoop-3.1.3/etc/hadoop目录下的内容,完全一样。
    [2] hadoop_docker/config-site
    在这里插入图片描述
    core-site.xml

    <?xml version="1.0"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
    <configuration>
    	<!-- 指定 Namenode 的地址 -->
    	<property>
    	  <name>fs.defaultFS</name>
    	  <value>hdfs://hadoop102:8020</value>
    	  <description>The name of the default file system.  A URI whose
    	  scheme and authority determine the FileSystem implementation.  The
    	  uri's scheme determines the config property (fs.SCHEME.impl) naming
    	  the FileSystem implementation class.  The uri's authority is used to
    	  determine the host, port, etc. for a filesystem.</description>
    	</property>
    	<!-- 指定 hadoop 数据的存储目录 -->
    	<property>
    	  <name>hadoop.tmp.dir</name>
    	  <value>/opt/hadoop-3.1.3/data</value>
    	  <description>A base for other temporary directories.</description>
    	</property>
    	<!-- 配置 HDFS 网页登录使用的静态用户为 root -->
    	<property>
    		<name>hadoop.http.staticuser.user</name>
    		<value>root</value>
    	</property>
    </configuration>
    

    hadoop-env.sh修改如下几个环境变量

    export HDFS_NAMENODE_USER=root
    export HDFS_DATANODE_USER=root
    export HDFS_SECONDARYNAMENODE_USER=root
    export YARN_RESOURCEMANAGER_USER=root
    export YARN_NODEMANAGER_USER=root
    

    hdfs-site.xml

    <?xml version="1.0"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
    <configuration>
    	<property>
    	  <name>dfs.namenode.http-address</name>
    	  <value>hadoop102:9870</value>
    	  <description>
    		The address and the base port where the dfs namenode web ui will listen on.
    	  </description>
    	</property>
    	<property>
    	  <name>dfs.namenode.secondary.http-address</name>
    	  <value>hadoop104:9868</value>
    	  <description>
    		The secondary namenode http server address and port.
    	  </description>
    	</property>
    </configuration>
    

    mapred-site.xml

    <?xml version="1.0"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
    <configuration>
    <property>
      <name>mapreduce.framework.name</name>
      <value>yarn</value>
      <description>The runtime framework for executing MapReduce jobs.
      Can be one of local, classic or yarn.
      </description>
    </property>
    <!-- 历史服务器端地址 -->
    <property>
      <name>mapreduce.jobhistory.address</name>
      <value>hadoop102:10020</value>
    </property>
    <!-- 历史服务器web端地址 -->
    <property>
      <name>mapreduce.jobhistory.webapp.address</name>
      <value>hadoop102:19888</value>
    </property>
    </configuration>
    

    works

    hadoop102
    hadoop103
    hadoop104
    

    yarn-site.xml

    <?xml version="1.0"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
    <configuration>
      <property>
        <description>A comma separated list of services where service name should only
          contain a-zA-Z0-9_ and can not start with numbers</description>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
      </property>
      <property>
        <description>The hostname of the RM.</description>
        <name>yarn.resourcemanager.hostname</name>
        <value>hadoop103</value>
      </property> 
      <property>
        <description>Environment variables that containers may override rather than use NodeManager's default.</description>
        <name>yarn.nodemanager.env-whitelist</name>
        <value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value>
      </property>
      <!-- 开启日志聚集功能 -->
      <property>
        <name>yarn.log-aggregation-enable</name>
        <value>true</value>
      </property>
      <!-- 设置日志聚集服务器地址 -->
      <property>
        <name>yarn.log.server.url</name>
        <value>http://hadoop102:19888/jobhistory/logs</value>
      </property>
      <!-- 设置日志保留时间为7天 -->
      <name>yarn.log-aggregation.retain-seconds</name>
      <value>604800</value>
    </configuration>
    

    [3] docker-compose.yaml

    version: '3'
    services:
      hadoop102:
        image: hadoop:v1
        ports:
          - "19888:19888"
          - "20022:22"
          - "28042:8042"
          - "28088:8088"
          - "29864:9864"
          - "29868:9868"
          - "29870:9870"
        expose:
          - 8020
          - 10020
        privileged: true
        volumes:
          - /opt/hadoop/hadoop102/data:/opt/hadoop-3.1.3/data:rw
          - /opt/hadoop/hadoop102/etc/hadoop:/opt/hadoop-3.1.3/etc/hadoop:rw
        container_name: hadoop102
        hostname: hadoop102
        networks:
          mynet:
            ipv4_address: 172.16.21.102
        command: /usr/sbin/init
        restart: always
    
      hadoop103:
        image: hadoop:v1
        ports:
          - "30022:22"
          - "38042:8042"
          - "38088:8088"
          - "39864:9864"
          - "39868:9868"
          - "39870:9870"
        expose:
          - 8020
          - 10020
        privileged: true
        volumes:
          - /opt/hadoop/hadoop103/data:/opt/hadoop-3.1.3/data:rw
          - /opt/hadoop/hadoop103/etc/hadoop:/opt/hadoop-3.1.3/etc/hadoop:rw
        container_name: hadoop103
        hostname: hadoop103
        networks:
          mynet:
            ipv4_address: 172.16.21.103
        command: /usr/sbin/init
        restart: always
    
      hadoop104:
        image: hadoop:v1
        ports:
          - "40022:22"
          - "48042:8042"
          - "48088:8088"
          - "49864:9864"
          - "49868:9868"
          - "49870:9870"
        expose:
          - 8020
          - 10020
        privileged: true
        volumes:
          - /opt/hadoop/hadoop104/data:/opt/hadoop-3.1.3/data:rw
          - /opt/hadoop/hadoop104/etc/hadoop:/opt/hadoop-3.1.3/etc/hadoop:rw
        container_name: hadoop104
        hostname: hadoop104
        networks:
          mynet:
            ipv4_address: 172.16.21.104
        command: /usr/sbin/init
        restart: always
    
    networks:
      mynet:
        driver: bridge
        ipam:
          driver: default
          config:
            -
              subnet: 172.16.21.0/24
              gateway: 172.16.21.1
    

    [4] Dockerfile

    FROM centos:7
    MAINTAINER ChasingDreams
    WORKDIR /opt
    USER root
    COPY xsync myhadoop jpsall ./
    COPY jdk-8u212-linux-x64.tar.gz hadoop-3.1.3.tar.gz hadoop_image.sh ./
    RUN chmod +x hadoop_image.sh && ./hadoop_image.sh && rm hadoop_image.sh -rf
    CMD /bin/bash
    

    [5] hadoop-image.sh

    #! /bin/bash
    
    # 1 解压jdk
    tar -zxf jdk-8u212-linux-x64.tar.gz
    
    # 2 解压hadoop
    tar -zxf hadoop-3.1.3.tar.gz
    
    # 3 配置jdk|hadoop环境变量
    cat >> /etc/profile.d/my_env.sh << EOF
    # JAVA_HOME
    export JAVA_HOME=/opt/jdk1.8.0_212
    export PATH=\$PATH:\$JAVA_HOME/bin
    
    # HADOOP_HOME
    export HADOOP_HOME=/opt/hadoop-3.1.3
    export PATH=\$PATH:\$HADOOP_HOME/bin
    export PATH=\$PATH:\$HADOOP_HOME/sbin
    
    EOF
    
    # 4 删除jdk|hadoop压缩包
    rm jdk-8u212-linux-x64.tar.gz -rf
    rm hadoop-3.1.3.tar.gz -rf
    rm config-site.tar.gz -rf
    rm config-site -rf
    
    # 5 安装rsync
    yum -y install rsync
    systemctl enable rsyncd.service
    
    # 6 修改 xsync|myhadoop|jpsall 权限并放到bin目录下
    chmod +x /opt/xsync /opt/myhadoop /opt/jpsall
    mkdir /root/bin
    mv /opt/xsync /root/bin/
    mv /opt/myhadoop /root/bin/
    mv /opt/jpsall /root/bin/
    
    # 7 安装openssh-server
    yum install -y openssl openssh-server openssh-clients
    systemctl enable sshd.service
    sed -i '/^#PermitRootLogin yes$/cPermitRootLogin yes' /etc/ssh/sshd_config
    sed -i '/^UsePAM yes$/cUsePAM no' /etc/ssh/sshd_config
    sed -i '/^#PubkeyAuthentication yes$/cPubkeyAuthentication yes' /etc/ssh/sshd_config
    
    # 8 修改密码
    echo 000000 | passwd --stdin root
    
    # 9 修改系统语言环境
    echo "export LC_ALL=en_US.utf8" >> /etc/profile
    echo "export LANG=en_US.utf8" >> /etc/profile
    
    # 10 修改系统时区
    ln -sf /usr/share/zoneinfo/Asia/Shanghai /etc/localtime
    

    [6] image-container.sh

    #! /bin/bash
    # 0 删除上次启动的集群相关的数据及配置
    if [ -d /opt/hadoop ]; then
    	rm /opt/hadoop/* -rf
    fi
    
    # 1 build hadoop image
    docker build -t hadoop:v1 .
    echo "========= Building image successfully!!! ========="
    
    # 2 宿主机上的映射目录,配置文件目录
    sed -i 's/\r$//g' config-site/workers  # 修改为unix格式文件,避免是dos格式文件,导致DataNode启动不起来
    hadoop_cluster_dir=/opt/hadoop
    for hadoop in hadoop102 hadoop103 hadoop104
    do
    	  mkdir -p ${hadoop_cluster_dir}/${hadoop}/etc/hadoop
    	  \cp -rf config-default/* ${hadoop_cluster_dir}/${hadoop}/etc/hadoop
    	  \cp -f config-site/* ${hadoop_cluster_dir}/${hadoop}/etc/hadoop
    done
    echo "========= Configuration copy complete!!! ========="
    
    # 3 deploy hadoop cluster
    docker-compose -f ./docker-compose.yml up -d
    echo "========= Starting cluster successfully!!! ========="
    
    # 4 hadoop102 | hadoop103 | hadoop104 之间免密登录
    expect_pkg_name=$(rpm -qa | grep expect)
    if [ ! ${expect_pkg_name} ]; then
      	yum install -y expect
    fi
    
    hadoop102=172.16.21.102
    hadoop103=172.16.21.103
    hadoop104=172.16.21.104
    for hadoop in ${hadoop102} ${hadoop103} ${hadoop104}
    do
    	sed -i "/${hadoop}/d" /root/.ssh/known_hosts
    done
    
    password=000000
    for hadoop in ${hadoop102} ${hadoop103} ${hadoop104}
    do
    expect <<-EOF
    send_user "=============== ${hadoop} generate pri-pub key: start ===============\n"
    spawn ssh root@${hadoop} ssh-keygen -t rsa
    expect {
    	"(yes/no)?" {send "yes\n";exp_continue}
    	"password:" {send "${password}\n"}
    	}
    expect "(/root/.ssh/id_rsa):"
    send "\n"
    expect "passphrase):"
    send "\n"
    expect "again:"
    send "\n"
    expect eof
    send_user "=============== ${hadoop} generate pri-pub key: end ===============\n"
    EOF
    done
    
    for hadoop in hadoop102 hadoop103 hadoop104
    do
    	echo "=============== Copying ${hadoop} pri-pub key: start ==============="
    	docker cp ${hadoop}:/root/.ssh/id_rsa.pub ./
    	cat id_rsa.pub >> authorized_keys
    	rm id_rsa.pub -f
    	echo "=============== Copying ${hadoop} pri-pub key: end ==============="
    done
    
    for hadoop in hadoop102 hadoop103 hadoop104
    do
    	  echo "=============== Copying authorized_keys to ${hadoop}: start ==============="
    	  docker cp authorized_keys ${hadoop}:/root/.ssh/
    	  echo "=============== Copying authorized_keys to ${hadoop}: end ==============="
    done
    rm authorized_keys -f
    
    echo "=============== Interconnection between containers: start ==============="
    for hadoop1 in hadoop102 hadoop103 hadoop104
    do
    	  for hadoop2 in hadoop102 hadoop103 hadoop104
    	  do
    		    if [ ${hadoop1} != ${hadoop2} ]; then
    				expect <<-EOF
    				spawn docker exec -it ${hadoop1} ssh root@${hadoop2}
    				expect "(yes/no)?"
    				send "yes\n"
    				set timeout 1
    				expect eof
    				EOF
    		    fi
    	  done
    done
    echo "=============== Interconnection between containers: end ==============="
    

    [7] jpsall

    #! /bin/bash
    
    for host in hadoop102 hadoop103 hadoop104
    do
    	echo "======================== ${host} ========================"
    	ssh root@${host} jps
    done
    

    [8] myhadoop

    #! /bin/bash
    
    if [ $# -lt 1 ]; then
    	echo "No Args Input..."
    	exit;
    fi
    
    case ${1} in
    "start")
    	echo "================ 启动 hadoop 集群 ================"
    
    	echo "---------------- 启动 hdfs ----------------"
    	ssh root@hadoop102 "/opt/hadoop-3.1.3/sbin/start-dfs.sh"
    
    	echo "---------------- 启动 yarn ----------------"
    	ssh root@hadoop103 "/opt/hadoop-3.1.3/sbin/start-yarn.sh"
    
    	echo "---------------- 启动 historyserver ----------------"
    	ssh root@hadoop102 "/opt/hadoop-3.1.3/bin/mapred --daemon start historyserver"
    ;;
    "stop")
    	echo "================ 关闭 hadoop 集群 ================"
    
    	echo "---------------- 关闭 historyserver ----------------"
    	ssh root@hadoop102 "/opt/hadoop-3.1.3/bin/mapred --daemon stop historyserver"
    
    	echo "---------------- 关闭 yarn ----------------"
    	ssh root@hadoop103 "/opt/hadoop-3.1.3/sbin/stop-yarn.sh"
    
    	echo "---------------- 关闭 hdfs ----------------"
    	ssh root@hadoop102 "/opt/hadoop-3.1.3/sbin/stop-dfs.sh"
    ;;
    *)
    	echo "Input Args Error..."
    ;;
    esac
    
    
    

    [9] xsync

    #! /bin/bash
    
    # 1 判断参数个数
    if [ $# -lt 1 ];then
      echo Not Enough Argument!
      exit
    fi
    
    # 2 遍历集群所有机器
    for host in hadoop102 hadoop103 hadoop104
    do
    	# 3 遍历所有目录,挨个发送
    	cur_hostname=$(cat /etc/hostname)
    	if [ ${cur_hostname} != ${host} ]; then
    		echo ================= $host =================
    		for file in $@
    		do
    			# 4 判断文件是否存在
    			if [ -e $file ];then
    				# 5 获取父目录
    				pdir=$(cd -P $(dirname $file); pwd)
    
    				# 6 获取当前文件的名称
    				fname=$(basename $file)
    				ssh $host "mkdir -p $pdir"
    				rsync -av $pdir/$fname $host:$pdir
    			else
    				echo $file doces not exists!
    			fi
    		done
    	fi
    done
    

    [8] jdk 与 hadoop下载相应版本即可

    5 部署

    cd hadoop_docker
    ./image_container.sh
    

    6 启动集群

    [1] hadoop102
    在这里插入图片描述

    [root@hadoop102 ~]# hdfs namenode -format
    [root@hadoop102 ~]# cd /opt/hadoop-3.1.3/
    [root@hadoop102 hadoop-3.1.3]# sbin/start-dfs.sh
    

    [2] hadoop103

    [root@hadoop103 ~]# cd /opt/hadoop-3.1.3/
    [root@hadoop103 hadoop-3.1.3]# sbin/start-yarn.sh
    

    7 测试集群是否启动成功

    7.1 使用jps命令查看

    在这里插入图片描述
    使用jps命令查看各个进程启动情况,启动进程与上表保持一致即表示集群搭建成功。
    [1]hadoop102

    [root@hadoop102 hadoop-3.1.3]# jps
    1618 Jps
    842 NameNode
    1436 NodeManager
    1021 DataNode
    

    [2]hadoop103

    [root@hadoop103 hadoop-3.1.3]# jps
    1184 Jps
    275 DataNode
    826 NodeManager
    669 ResourceManager
    

    [3]hadoop104

    [root@hadoop104 ~]# jps
    368 SecondaryNameNode
    262 DataNode
    509 NodeManager
    4607 Jps
    

    7.2 上传文件测试

    在这里插入图片描述

    8 网页访问配置

    8.1 配置nginx

    nginx.conf

    
    #user  nobody;
    worker_processes  1;
    
    #error_log  logs/error.log;
    #error_log  logs/error.log  notice;
    #error_log  logs/error.log  info;
    
    #pid        logs/nginx.pid;
    
    
    events {
        worker_connections  1024;
    }
    
    
    http {
        include       mime.types;
        default_type  application/octet-stream;
    
        #log_format  main  '$remote_addr - $remote_user [$time_local] "$request" '
        #                  '$status $body_bytes_sent "$http_referer" '
        #                  '"$http_user_agent" "$http_x_forwarded_for"';
    
        #access_log  logs/access.log  main;
        sendfile        on;
        #tcp_nopush     on;
    
        #keepalive_timeout  0;
        keepalive_timeout  65;
    
        #gzip  on;
    
        server {
            listen       80;
            server_name  localhost;
    
            #charset koi8-r;
    
            #access_log  logs/host.access.log  main;
    
            location / {
                root   html;
                index  index.html index.htm;
            }
    
            #error_page  404              /404.html;
    
            # redirect server error pages to the static page /50x.html
            #
            error_page   500 502 503 504  /50x.html;
            location = /50x.html {
                root   html;
            }
        }
        server {
            listen 8042;
            server_name hadoop102 hadoop103 hadoop104;
            location / {
                if ($host = hadoop102) {
                    proxy_pass http://127.0.0.1:28042;
                }
                if ($host = hadoop103) {
                    proxy_pass http://127.0.0.1:38042;
                }
                if ($host = hadoop104) {
                    proxy_pass http://127.0.0.1:48042;
                }
            }
        }
        server {
            listen 8088;
            server_name hadoop102 hadoop103 hadoop104;
            location / {
                if ($host = hadoop102) {
                    proxy_pass http://127.0.0.1:28088;
                }
                if ($host = hadoop103) {
                    proxy_pass http://127.0.0.1:38088;
                }
                if ($host = hadoop104) {
                    proxy_pass http://127.0.0.1:48088;
                }
            }
        }
        server {
            listen 9864;
            server_name hadoop102 hadoop103 hadoop104;
            location / {
                if ($host = hadoop102) {
                    proxy_pass http://127.0.0.1:29864;
                }
                if ($host = hadoop103) {
                    proxy_pass http://127.0.0.1:39864;
                }
                if ($host = hadoop104) {
                    proxy_pass http://127.0.0.1:49864;
                }
            }
        }
        server {
            listen 9868;
            server_name hadoop102 hadoop103 hadoop104;
            location / {
                if ($host = hadoop102) {
                    proxy_pass http://127.0.0.1:29868;
                }
                if ($host = hadoop103) {
                    proxy_pass http://127.0.0.1:39868;
                }
                if ($host = hadoop104) {
                    proxy_pass http://127.0.0.1:49868;
                }
            }
        }
        server {
            listen 9870;
            server_name hadoop102 hadoop103 hadoop104;
            location / {
                if ($host = hadoop102) {
                    proxy_pass http://127.0.0.1:29870;
                }
                if ($host = hadoop103) {
                    proxy_pass http://127.0.0.1:39870;
                }
                if ($host = hadoop104) {
                    proxy_pass http://127.0.0.1:49870;
                }
            }
        }
    }
    

    将该文件直接替换掉服务器中的nginx.conf的配置文件

    8.2 启动nginx

    进入到/usr/local/nginx/sbin目录,执行:

    ./nginx
    

    8.3 测试nginx是否启动成功

    在本地浏览器访问 http://[ip of server]:80判断是否nginx启动成功。成功返回以下页面。
    在这里插入图片描述

    8.4 配置本地hosts文件

    在C:\Windows\System32\drivers\etc、hosts文件中末尾加入:

    ip_of_server hadoop102
    ip_of_server hadoop103
    ip_of_server hadoop104
    

    ip_of_server:远程服务器ip,三个ip_of_server是一样的,因为三个节点都在该远程服务器上。

    8.5 测试是否可以访问hadoop集群

    在本地浏览器中访问:http://hadoop102:9870,成功返回以下页面:
    在这里插入图片描述
    在本地浏览器中访问:http://hadoop103:8088
    在这里插入图片描述

    9 历史服务器进程

    9.1 启动历史服务器进程

    [root@hadoop102 ~]# cd /opt/hadoop-3.1.3/
    [root@hadoop102 hadoop-3.1.3]# bin/mapred --daemon start historyserver
    

    9.2 jps查看是否启动成功

    [root@hadoop102 hadoop-3.1.3]# jps
    2048 JobHistoryServer
    1618 Jps
    842 NameNode
    1436 NodeManager
    1021 DataNode
    

    9.3 网页查看测试

    [1] 分词测试

    [root@hadoop102 ~]# cd /opt/hadoop-3.1.3/
    [root@hadoop102 hadoop-3.1.3]# hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.3.jar wordcount /input/word.txt /output
    

    [2] 点击测试
    在这里插入图片描述

    在这里插入图片描述

    9.4 查看日志聚集功能

    在这里插入图片描述
    在这里插入图片描述

    10 测试脚本 myhadoop| jpsall

    10.1 使用 myhadoop 启动停止集群

    不论 hadoop102|hadoop103|hadoop104 中的哪个节点都可以执行以下命令:

    myhadoop start
    myhadoop stop
    

    10.2 使用 jpsall 查看集群各个服务的启动状态

    不论 hadoop102|hadoop103|hadoop104 中的哪个节点都可以执行以下命令:

    jpsall
    
    展开全文
  • Ubuntu下搭建Hadoop分布式集群

    千次阅读 热门讨论 2022-05-02 00:41:14
    应学校课程要求,这学期学习到了云计算与大数据,这次实验就是让我们在ubuntu中配置好hadoop分布式集群,这两天就从网上一边各种搜寻教程结合课本(课本上是使用的centos,而且版本较老,不太适用)一边自己动手做,...
  • Ubuntu 20.04 搭建hadoop 集群

    千次阅读 2022-03-14 19:20:19
    文章目录1. 虚拟机的安装1.1 下载VMware Workstation1.2 下载... 安装必要工具3.1 安装SSH3.2 安装JDK3.3 安装Mysql3.4 安装ZooKeeper3.6 安装Hadoop4. 配置相关文件4.1 修改hadoop 的配置文件4.1.1 core-site.xml4.1.
  • windows64位平台的hadoop2.6插件包(hadoop.dll,winutils.exe)

    千次下载 热门讨论 2015-01-24 21:05:59
    在windows8 64位系统安装时自己编译的,基于hadoop2.6的源码,版本低的请慎用。 包含 hadoop.dll hadoop.pdb hadoop.lib hadoop.exp winutils.exe winutils.pdb libwinutils.lib 测试可用。
  • Hadoop源码分析(完整版)

    热门讨论 2015-06-06 18:53:50
    Hadoop源码分析(完整版),详细分析了Hadoop源码程序,为学习Hadoop的人提供很好的入门指导
  • 安装Hadoop2.10.1

    千次阅读 2022-04-22 20:58:42
    通过在Hadoop1安装Hadoop,然后配置相应的配置文件,最后将Hadoop 所有文件同步到其他Hadoop节点。 一、集群规划 #主机名 ‘master/hadoop1’ ‘slave01/hadoop2’ ‘slave02/hadoop3’ #启动节点 Namenode ...
  • hadoop-eclipse-plugin-2.7.2.jar

    千次下载 热门讨论 2016-02-15 21:43:32
    Tested with following eclipse version for hadoop2.7.2(http://pan.baidu.com/s/1i4plIfF): Eclipse Java EE IDE for Web Developers. Version: Mars.1 Release (4.5.1) Build id: 20150924-1200
  • hadoop-eclipse-plugin-2.7.3.jar

    千次下载 热门讨论 2016-10-30 18:21:18
    Tested with Eclipse Java EE IDE for Web Developers. Version: Neon.1a Release (4.6.1) Build id: 20161007-1200
  • 大数据之Hadoop图解概述

    千次阅读 多人点赞 2021-10-22 11:04:54
    文章目录1 Hadoop是什么2 Hadoop 发展历史(了解)3 Hadoop 三大发行版本(了解)3.1 Apache Hadoop(常用)3.2 Cloudera Hadoop3.3 Hortonworks Hadoop4 Hadoop 优势(4 高)5 Hadoop 组成(面试重点)5.1 HDFS ...
  • HOME export HADOOP_PREFIX=$HADOOP_HOME export HADOOP_HDFS_HOME=$HADOOP_HOME export HADOOP_MAPRED_HOME=$HADOOP_HOME export HADOOP_YARN_HOME=$HADOOP_HOME export HADOOP_INSTALL=$HADOOP_HOME export HADOOP...
  • 第一章配置Hadoop 前言 本次我们python+大数据的作业我选择附加一 利用hadoop+python实现,最近考完试有时间来完成。 这次我们用到的是Hadoop,利用python进行操作首先我们要配置我们的虚拟机 简介: MapReduce是...
  • Hadoop3.x完全分布式搭建(详细)

    千次阅读 2022-03-10 01:10:03
    hadoop3.x安装包(linux版本) java1.8安装包(linux版本) 为了能够按照教程顺利操作,需要注意几点细节 不要不看文字直接复制粘贴 操作命令的用户很重要,否则后续会引发关于权限的问题 ftp 与 终
  • hadoop 自定义分区

    万次阅读 2022-01-03 10:00:43
    hadoop 自定义分区总结
  • 基于docker技术搭建hadoop与mapreduce分布式环境

    千次阅读 热门讨论 2022-04-14 19:27:57
    基于docker技术搭建hadoop与mapreduce分布式环境
  • Hadoop完全分布式搭建

    千次阅读 多人点赞 2021-03-01 22:11:13
    克隆另外两台虚拟机:hadoop102 hadoop103克隆 hadoop102更改 hadoop102 相关信息测试三台虚拟机是否可以相互通信二、Hadoop安装及相关配置1.引入库2.读入数据总结 前言 必备条件: 电脑内存最好8G以上 虚拟机ip...
  • 大数据——Hadoop集群调优

    万次阅读 2021-08-21 10:30:32
    HDFS中DataNode节点保存数据的路径由dfs.datanode.data.dir参数决定,其默认值为file://${hadoop.tmp.dir}/dfs/data,若服务器中有多个磁盘,必须对改参数进行修改。如服务器磁盘如上图所示,则该参数应修改为如下...
  • 1 使用CentOS7安装hadoop模板机 1.1 CentOS7安装配置流程 1.1.1 创建VMware虚拟机,使用CentOS7系统,命名为hadoop100 补充说明:我这里VMware的虚拟网络设置 VMware子网IP:192.168.10.0, 网关:192.168.10.2 ...
  • M1芯片 MacBookPro上搭建 hadoop完全分布式集群
  • docker部署Hadoop环境

    千次阅读 2020-12-17 23:19:28
    如果按我的来得话就是qwe123 1.4.3 测试是否配置成功 ping hadoop1 ping hadoop2 ping hadoop3 ssh hadoop1 ssh hadoop2 ssh hadoop3 1.5 安装配置hadoop 1.5.1 在hadoop1上操作 进入hadoop1 docker exec -it hadoop...
  • 自动化运维之hadoop——大数据平台

    千次阅读 2022-03-31 21:13:11
    一、hadoop的下载及安装 二、hadoop的使用模式 1、本地(独立)模式——Standalone Operation 2、伪分布模式——Pseudo-Distributed Operation 官方网站: Apache Hadoophttps://hadoop.apache.org/ 一、...
  • 基于 Docker 搭建 Hadoop 平台

    千次阅读 2022-02-26 22:42:54
    基于 Docker 搭建完全分布式 Hadoop 平台 小白的折腾记录
  • Hadoop集群搭建(超级详细)

    万次阅读 多人点赞 2021-05-07 11:34:06
    需要的安装包:jdk-8u162-linux-x64.tar.gz( 提取码:6k1i )、hadoop-3.1.3.tar.gz( 提取码:07p6 ) 1 集群规划 安装VMware,使用三台Ubuntu18.04虚拟机进行集群搭建,下面是每台虚拟机的规划: 主机名 IP ...
  • Hadoop 官方文档(中文版)

    热门讨论 2014-08-11 14:06:55
    Hadoop 中文教程 。包括快速入门,集群搭建,分布式文件系统,命令手册等文件。
  • Hadoop分布式集群搭建(完整版)

    千次阅读 2021-06-13 16:14:44
    hadoop102 hadoop103 hadoop104 注意:添加该文件时,内容上下不允许有空行,前后不允许有空格 3.4.6 分发修改后的配置文件 [master@hadoop102 ~]$ xsync /opt/module/hadoop-3.1.3/etc/hadoop/ 3.5 启动集群 3.5.1 ...

空空如也

空空如也

1 2 3 4 5 ... 20
收藏数 778,690
精华内容 311,476
关键字:

hadoop