精华内容
下载资源
问答
  • Hadoop运行模式包括:本地模式、伪分布式模式以及完全分布式模式。 Hadoop官方网站:http://hadoop.apache.org/ 1、本地运行模式 a) 官方Grep案例 其实就是按照给定的条件找到符合条件的单词。 $ mkdir ...

    演示的版本是:2.7.2 官方文档 

    Hadoop运行模式

    Hadoop运行模式包括:本地模式、伪分布式模式以及完全分布式模式。

    Hadoop官方网站:http://hadoop.apache.org/

    1、本地运行模式

    a)  官方Grep案例

    其实就是按照给定的条件找到符合条件的单词

      $ mkdir input      //1、创建在hadoop-2.7.2文件下面创建一个input文件夹 
      $ cp etc/hadoop/*.xml input     //2、将Hadoop的xml配置文件复制到input
      $ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar grep input output 'dfs[a-z.]+'       //3、执行share目录下的MapReduce程序
      $ cat output/*           //4、查看输出结果
    

     官方给的样例就是把 etc/haddop/ 里面所有以 .xml 结尾的文件拷贝到 input 目录里面,然后统计这些文件中 符合条件的单词是那些,这些信息保存在 output 目录里面, output 不能事先存在,不然会报错。

    执行流程:

    [atguigu@hadoop100 hadoop-2.7.2]$ 
    mkdir input
    [atguigu@hadoop100 hadoop-2.7.2]$ cp etc/hadoop/*.xml input
    [atguigu@hadoop100 hadoop-2.7.2]$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar grep input output 'dfs[a-z.]+'
    19/01/27 05:15:36 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
    19/01/27 05:15:36 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
    19/01/27 05:15:36 INFO input.FileInputFormat: Total input paths to process : 8
    19/01/27 05:15:36 INFO mapreduce.JobSubmitter: number of splits:8
    19/01/27 05:15:36 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local857720284_0001
    19/01/27 05:15:36 INFO mapreduce.Job: The url to track the job: http://localhost:8080/
    19/01/27 05:15:36 INFO mapreduce.Job: Running job: job_local857720284_0001
    19/01/27 05:15:36 INFO mapred.LocalJobRunner: OutputCommitter set in config null
    19/01/27 05:15:36 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
    19/01/27 05:15:36 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter
    19/01/27 05:15:36 INFO mapred.LocalJobRunner: Waiting for map tasks
    19/01/27 05:15:36 INFO mapred.LocalJobRunner: Starting task: attempt_local857720284_0001_m_000000_0
    19/01/27 05:15:36 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
    19/01/27 05:15:36 INFO mapred.Task:  Using ResourceCalculatorProcessTree : [ ]
    19/01/27 05:15:36 INFO mapred.MapTask: Processing split: file:/opt/module/hadoop-2.7.2/input/hadoop-policy.xml:0+9683
    19/01/27 05:15:37 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
    19/01/27 05:15:37 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
    19/01/27 05:15:37 INFO mapred.MapTask: soft limit at 83886080
    19/01/27 05:15:37 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
    19/01/27 05:15:37 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
    19/01/27 05:15:37 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
    19/01/27 05:15:37 INFO mapred.LocalJobRunner: 
    19/01/27 05:15:37 INFO mapred.MapTask: Starting flush of map output
    19/01/27 05:15:37 INFO mapred.MapTask: Spilling map output
    19/01/27 05:15:37 INFO mapred.MapTask: bufstart = 0; bufend = 17; bufvoid = 104857600
    19/01/27 05:15:37 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend = 26214396(104857584); length = 1/6553600
    19/01/27 05:15:37 INFO mapred.MapTask: Finished spill 0
    19/01/27 05:15:37 INFO mapred.Task: Task:attempt_local857720284_0001_m_000000_0 is done. And is in the process of committing
    19/01/27 05:15:37 INFO mapred.LocalJobRunner: map
    19/01/27 05:15:37 INFO mapred.Task: Task 'attempt_local857720284_0001_m_000000_0' done.
    19/01/27 05:15:37 INFO mapred.LocalJobRunner: Finishing task: attempt_local857720284_0001_m_000000_0
    19/01/27 05:15:37 INFO mapred.LocalJobRunner: Starting task: attempt_local857720284_0001_m_000001_0
    19/01/27 05:15:37 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
    19/01/27 05:15:37 INFO mapred.Task:  Using ResourceCalculatorProcessTree : [ ]
    19/01/27 05:15:37 INFO mapred.MapTask: Processing split: file:/opt/module/hadoop-2.7.2/input/kms-site.xml:0+5511
    19/01/27 05:15:37 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
    19/01/27 05:15:37 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
    19/01/27 05:15:37 INFO mapred.MapTask: soft limit at 83886080
    19/01/27 05:15:37 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
    19/01/27 05:15:37 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
    19/01/27 05:15:37 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
    19/01/27 05:15:37 INFO mapred.LocalJobRunner: 
    19/01/27 05:15:37 INFO mapred.MapTask: Starting flush of map output
    19/01/27 05:15:37 INFO mapred.Task: Task:attempt_local857720284_0001_m_000001_0 is done. And is in the process of committing
    19/01/27 05:15:37 INFO mapred.LocalJobRunner: map
    19/01/27 05:15:37 INFO mapred.Task: Task 'attempt_local857720284_0001_m_000001_0' done.
    19/01/27 05:15:37 INFO mapred.LocalJobRunner: Finishing task: attempt_local857720284_0001_m_000001_0
    19/01/27 05:15:37 INFO mapred.LocalJobRunner: Starting task: attempt_local857720284_0001_m_000002_0
    19/01/27 05:15:37 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
    19/01/27 05:15:37 INFO mapred.Task:  Using ResourceCalculatorProcessTree : [ ]
    19/01/27 05:15:37 INFO mapred.MapTask: Processing split: file:/opt/module/hadoop-2.7.2/input/capacity-scheduler.xml:0+4436
    19/01/27 05:15:37 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
    19/01/27 05:15:37 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
    19/01/27 05:15:37 INFO mapred.MapTask: soft limit at 83886080
    19/01/27 05:15:37 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
    19/01/27 05:15:37 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
    19/01/27 05:15:37 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
    19/01/27 05:15:37 INFO mapred.LocalJobRunner: 
    19/01/27 05:15:37 INFO mapred.MapTask: Starting flush of map output
    19/01/27 05:15:37 INFO mapred.Task: Task:attempt_local857720284_0001_m_000002_0 is done. And is in the process of committing
    19/01/27 05:15:37 INFO mapred.LocalJobRunner: map
    19/01/27 05:15:37 INFO mapred.Task: Task 'attempt_local857720284_0001_m_000002_0' done.
    19/01/27 05:15:37 INFO mapred.LocalJobRunner: Finishing task: attempt_local857720284_0001_m_000002_0
    19/01/27 05:15:37 INFO mapred.LocalJobRunner: Starting task: attempt_local857720284_0001_m_000003_0
    19/01/27 05:15:37 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
    19/01/27 05:15:37 INFO mapred.Task:  Using ResourceCalculatorProcessTree : [ ]
    19/01/27 05:15:37 INFO mapred.MapTask: Processing split: file:/opt/module/hadoop-2.7.2/input/kms-acls.xml:0+3518
    19/01/27 05:15:37 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
    19/01/27 05:15:37 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
    19/01/27 05:15:37 INFO mapred.MapTask: soft limit at 83886080
    19/01/27 05:15:37 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
    19/01/27 05:15:37 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
    19/01/27 05:15:37 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
    19/01/27 05:15:37 INFO mapred.LocalJobRunner: 
    19/01/27 05:15:37 INFO mapred.MapTask: Starting flush of map output
    19/01/27 05:15:37 INFO mapred.Task: Task:attempt_local857720284_0001_m_000003_0 is done. And is in the process of committing
    19/01/27 05:15:37 INFO mapred.LocalJobRunner: map
    19/01/27 05:15:37 INFO mapred.Task: Task 'attempt_local857720284_0001_m_000003_0' done.
    19/01/27 05:15:37 INFO mapred.LocalJobRunner: Finishing task: attempt_local857720284_0001_m_000003_0
    19/01/27 05:15:37 INFO mapred.LocalJobRunner: Starting task: attempt_local857720284_0001_m_000004_0
    19/01/27 05:15:37 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
    19/01/27 05:15:37 INFO mapred.Task:  Using ResourceCalculatorProcessTree : [ ]
    19/01/27 05:15:37 INFO mapred.MapTask: Processing split: file:/opt/module/hadoop-2.7.2/input/hdfs-site.xml:0+775
    19/01/27 05:15:37 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
    19/01/27 05:15:37 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
    19/01/27 05:15:37 INFO mapred.MapTask: soft limit at 83886080
    19/01/27 05:15:37 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
    19/01/27 05:15:37 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
    19/01/27 05:15:37 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
    19/01/27 05:15:37 INFO mapred.LocalJobRunner: 
    19/01/27 05:15:37 INFO mapred.MapTask: Starting flush of map output
    19/01/27 05:15:37 INFO mapred.Task: Task:attempt_local857720284_0001_m_000004_0 is done. And is in the process of committing
    19/01/27 05:15:37 INFO mapred.LocalJobRunner: map
    19/01/27 05:15:37 INFO mapred.Task: Task 'attempt_local857720284_0001_m_000004_0' done.
    19/01/27 05:15:37 INFO mapred.LocalJobRunner: Finishing task: attempt_local857720284_0001_m_000004_0
    19/01/27 05:15:37 INFO mapred.LocalJobRunner: Starting task: attempt_local857720284_0001_m_000005_0
    19/01/27 05:15:37 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
    19/01/27 05:15:37 INFO mapred.Task:  Using ResourceCalculatorProcessTree : [ ]
    19/01/27 05:15:37 INFO mapred.MapTask: Processing split: file:/opt/module/hadoop-2.7.2/input/core-site.xml:0+774
    19/01/27 05:15:37 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
    19/01/27 05:15:37 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
    19/01/27 05:15:37 INFO mapred.MapTask: soft limit at 83886080
    19/01/27 05:15:37 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
    19/01/27 05:15:37 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
    19/01/27 05:15:37 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
    19/01/27 05:15:37 INFO mapred.LocalJobRunner: 
    19/01/27 05:15:37 INFO mapred.MapTask: Starting flush of map output
    19/01/27 05:15:37 INFO mapred.Task: Task:attempt_local857720284_0001_m_000005_0 is done. And is in the process of committing
    19/01/27 05:15:37 INFO mapred.LocalJobRunner: map
    19/01/27 05:15:37 INFO mapred.Task: Task 'attempt_local857720284_0001_m_000005_0' done.
    19/01/27 05:15:37 INFO mapred.LocalJobRunner: Finishing task: attempt_local857720284_0001_m_000005_0
    19/01/27 05:15:37 INFO mapred.LocalJobRunner: Starting task: attempt_local857720284_0001_m_000006_0
    19/01/27 05:15:37 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
    19/01/27 05:15:37 INFO mapred.Task:  Using ResourceCalculatorProcessTree : [ ]
    19/01/27 05:15:37 INFO mapred.MapTask: Processing split: file:/opt/module/hadoop-2.7.2/input/yarn-site.xml:0+690
    19/01/27 05:15:37 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
    19/01/27 05:15:37 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
    19/01/27 05:15:37 INFO mapred.MapTask: soft limit at 83886080
    19/01/27 05:15:37 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
    19/01/27 05:15:37 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
    19/01/27 05:15:37 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
    19/01/27 05:15:37 INFO mapred.LocalJobRunner: 
    19/01/27 05:15:37 INFO mapred.MapTask: Starting flush of map output
    19/01/27 05:15:37 INFO mapred.Task: Task:attempt_local857720284_0001_m_000006_0 is done. And is in the process of committing
    19/01/27 05:15:37 INFO mapred.LocalJobRunner: map
    19/01/27 05:15:37 INFO mapred.Task: Task 'attempt_local857720284_0001_m_000006_0' done.
    19/01/27 05:15:37 INFO mapred.LocalJobRunner: Finishing task: attempt_local857720284_0001_m_000006_0
    19/01/27 05:15:37 INFO mapred.LocalJobRunner: Starting task: attempt_local857720284_0001_m_000007_0
    19/01/27 05:15:37 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
    19/01/27 05:15:37 INFO mapred.Task:  Using ResourceCalculatorProcessTree : [ ]
    19/01/27 05:15:37 INFO mapred.MapTask: Processing split: file:/opt/module/hadoop-2.7.2/input/httpfs-site.xml:0+620
    19/01/27 05:15:37 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
    19/01/27 05:15:37 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
    19/01/27 05:15:37 INFO mapred.MapTask: soft limit at 83886080
    19/01/27 05:15:37 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
    19/01/27 05:15:37 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
    19/01/27 05:15:37 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
    19/01/27 05:15:37 INFO mapred.LocalJobRunner: 
    19/01/27 05:15:37 INFO mapred.MapTask: Starting flush of map output
    19/01/27 05:15:37 INFO mapred.Task: Task:attempt_local857720284_0001_m_000007_0 is done. And is in the process of committing
    19/01/27 05:15:37 INFO mapred.LocalJobRunner: map
    19/01/27 05:15:37 INFO mapred.Task: Task 'attempt_local857720284_0001_m_000007_0' done.
    19/01/27 05:15:37 INFO mapred.LocalJobRunner: Finishing task: attempt_local857720284_0001_m_000007_0
    19/01/27 05:15:37 INFO mapred.LocalJobRunner: map task executor complete.
    19/01/27 05:15:37 INFO mapred.LocalJobRunner: Waiting for reduce tasks
    19/01/27 05:15:37 INFO mapred.LocalJobRunner: Starting task: attempt_local857720284_0001_r_000000_0
    19/01/27 05:15:37 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
    19/01/27 05:15:37 INFO mapred.Task:  Using ResourceCalculatorProcessTree : [ ]
    19/01/27 05:15:37 INFO mapred.ReduceTask: Using ShuffleConsumerPlugin: org.apache.hadoop.mapreduce.task.reduce.Shuffle@6eedaff1
    19/01/27 05:15:37 INFO reduce.MergeManagerImpl: MergerManager: memoryLimit=334338464, maxSingleShuffleLimit=83584616, mergeThreshold=220663392, ioSortFactor=10, memToMemMergeOutputsThreshold=10
    19/01/27 05:15:37 INFO reduce.EventFetcher: attempt_local857720284_0001_r_000000_0 Thread started: EventFetcher for fetching Map Completion Events
    19/01/27 05:15:37 INFO reduce.LocalFetcher: localfetcher#1 about to shuffle output of map attempt_local857720284_0001_m_000005_0 decomp: 2 len: 6 to MEMORY
    19/01/27 05:15:37 INFO reduce.InMemoryMapOutput: Read 2 bytes from map-output for attempt_local857720284_0001_m_000005_0
    19/01/27 05:15:37 INFO reduce.MergeManagerImpl: closeInMemoryFile -> map-output of size: 2, inMemoryMapOutputs.size() -> 1, commitMemory -> 0, usedMemory ->2
    19/01/27 05:15:37 INFO reduce.LocalFetcher: localfetcher#1 about to shuffle output of map attempt_local857720284_0001_m_000001_0 decomp: 2 len: 6 to MEMORY
    19/01/27 05:15:37 INFO reduce.InMemoryMapOutput: Read 2 bytes from map-output for attempt_local857720284_0001_m_000001_0
    19/01/27 05:15:37 INFO reduce.MergeManagerImpl: closeInMemoryFile -> map-output of size: 2, inMemoryMapOutputs.size() -> 2, commitMemory -> 2, usedMemory ->4
    19/01/27 05:15:37 INFO reduce.LocalFetcher: localfetcher#1 about to shuffle output of map attempt_local857720284_0001_m_000004_0 decomp: 2 len: 6 to MEMORY
    19/01/27 05:15:37 INFO reduce.InMemoryMapOutput: Read 2 bytes from map-output for attempt_local857720284_0001_m_000004_0
    19/01/27 05:15:37 INFO reduce.MergeManagerImpl: closeInMemoryFile -> map-output of size: 2, inMemoryMapOutputs.size() -> 3, commitMemory -> 4, usedMemory ->6
    19/01/27 05:15:37 INFO reduce.LocalFetcher: localfetcher#1 about to shuffle output of map attempt_local857720284_0001_m_000007_0 decomp: 2 len: 6 to MEMORY
    19/01/27 05:15:37 INFO reduce.InMemoryMapOutput: Read 2 bytes from map-output for attempt_local857720284_0001_m_000007_0
    19/01/27 05:15:37 INFO reduce.MergeManagerImpl: closeInMemoryFile -> map-output of size: 2, inMemoryMapOutputs.size() -> 4, commitMemory -> 6, usedMemory ->8
    19/01/27 05:15:37 INFO reduce.LocalFetcher: localfetcher#1 about to shuffle output of map attempt_local857720284_0001_m_000000_0 decomp: 21 len: 25 to MEMORY
    19/01/27 05:15:37 INFO reduce.InMemoryMapOutput: Read 21 bytes from map-output for attempt_local857720284_0001_m_000000_0
    19/01/27 05:15:37 INFO reduce.MergeManagerImpl: closeInMemoryFile -> map-output of size: 21, inMemoryMapOutputs.size() -> 5, commitMemory -> 8, usedMemory ->29
    19/01/27 05:15:37 INFO reduce.LocalFetcher: localfetcher#1 about to shuffle output of map attempt_local857720284_0001_m_000003_0 decomp: 2 len: 6 to MEMORY
    19/01/27 05:15:37 INFO reduce.InMemoryMapOutput: Read 2 bytes from map-output for attempt_local857720284_0001_m_000003_0
    19/01/27 05:15:37 INFO reduce.MergeManagerImpl: closeInMemoryFile -> map-output of size: 2, inMemoryMapOutputs.size() -> 6, commitMemory -> 29, usedMemory ->31
    19/01/27 05:15:37 INFO reduce.LocalFetcher: localfetcher#1 about to shuffle output of map attempt_local857720284_0001_m_000006_0 decomp: 2 len: 6 to MEMORY
    19/01/27 05:15:37 WARN io.ReadaheadPool: Failed readahead on ifile
    EBADF: Bad file descriptor
    	at org.apache.hadoop.io.nativeio.NativeIO$POSIX.posix_fadvise(Native Method)
    	at org.apache.hadoop.io.nativeio.NativeIO$POSIX.posixFadviseIfPossible(NativeIO.java:267)
    	at org.apache.hadoop.io.nativeio.NativeIO$POSIX$CacheManipulator.posixFadviseIfPossible(NativeIO.java:146)
    	at org.apache.hadoop.io.ReadaheadPool$ReadaheadRequestImpl.run(ReadaheadPool.java:206)
    	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    	at java.lang.Thread.run(Thread.java:748)
    19/01/27 05:15:37 INFO reduce.InMemoryMapOutput: Read 2 bytes from map-output for attempt_local857720284_0001_m_000006_0
    19/01/27 05:15:37 INFO reduce.MergeManagerImpl: closeInMemoryFile -> map-output of size: 2, inMemoryMapOutputs.size() -> 7, commitMemory -> 31, usedMemory ->33
    19/01/27 05:15:37 INFO reduce.LocalFetcher: localfetcher#1 about to shuffle output of map attempt_local857720284_0001_m_000002_0 decomp: 2 len: 6 to MEMORY
    19/01/27 05:15:37 INFO reduce.InMemoryMapOutput: Read 2 bytes from map-output for attempt_local857720284_0001_m_000002_0
    19/01/27 05:15:37 INFO reduce.MergeManagerImpl: closeInMemoryFile -> map-output of size: 2, inMemoryMapOutputs.size() -> 8, commitMemory -> 33, usedMemory ->35
    19/01/27 05:15:37 INFO reduce.EventFetcher: EventFetcher is interrupted.. Returning
    19/01/27 05:15:37 INFO mapred.LocalJobRunner: 8 / 8 copied.
    19/01/27 05:15:37 INFO reduce.MergeManagerImpl: finalMerge called with 8 in-memory map-outputs and 0 on-disk map-outputs
    19/01/27 05:15:37 INFO mapred.Merger: Merging 8 sorted segments
    19/01/27 05:15:37 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 10 bytes
    19/01/27 05:15:37 INFO reduce.MergeManagerImpl: Merged 8 segments, 35 bytes to disk to satisfy reduce memory limit
    19/01/27 05:15:37 INFO reduce.MergeManagerImpl: Merging 1 files, 25 bytes from disk
    19/01/27 05:15:37 INFO reduce.MergeManagerImpl: Merging 0 segments, 0 bytes from memory into reduce
    19/01/27 05:15:37 INFO mapred.Merger: Merging 1 sorted segments
    19/01/27 05:15:37 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 10 bytes
    19/01/27 05:15:37 INFO mapred.LocalJobRunner: 8 / 8 copied.
    19/01/27 05:15:37 INFO Configuration.deprecation: mapred.skip.on is deprecated. Instead, use mapreduce.job.skiprecords
    19/01/27 05:15:37 INFO mapred.Task: Task:attempt_local857720284_0001_r_000000_0 is done. And is in the process of committing
    19/01/27 05:15:37 INFO mapred.LocalJobRunner: 8 / 8 copied.
    19/01/27 05:15:37 INFO mapred.Task: Task attempt_local857720284_0001_r_000000_0 is allowed to commit now
    19/01/27 05:15:37 INFO output.FileOutputCommitter: Saved output of task 'attempt_local857720284_0001_r_000000_0' to file:/opt/module/hadoop-2.7.2/grep-temp-476836355/_temporary/0/task_local857720284_0001_r_000000
    19/01/27 05:15:37 INFO mapred.LocalJobRunner: reduce > reduce
    19/01/27 05:15:37 INFO mapred.Task: Task 'attempt_local857720284_0001_r_000000_0' done.
    19/01/27 05:15:37 INFO mapred.LocalJobRunner: Finishing task: attempt_local857720284_0001_r_000000_0
    19/01/27 05:15:37 INFO mapred.LocalJobRunner: reduce task executor complete.
    19/01/27 05:15:37 INFO mapreduce.Job: Job job_local857720284_0001 running in uber mode : false
    19/01/27 05:15:37 INFO mapreduce.Job:  map 100% reduce 100%
    19/01/27 05:15:37 INFO mapreduce.Job: Job job_local857720284_0001 completed successfully
    19/01/27 05:15:37 INFO mapreduce.Job: Counters: 30
    	File System Counters
    		FILE: Number of bytes read=2693510
    		FILE: Number of bytes written=5030435
    		FILE: Number of read operations=0
    		FILE: Number of large read operations=0
    		FILE: Number of write operations=0
    	Map-Reduce Framework
    		Map input records=745
    		Map output records=1
    		Map output bytes=17
    		Map output materialized bytes=67
    		Input split bytes=925
    		Combine input records=1
    		Combine output records=1
    		Reduce input groups=1
    		Reduce shuffle bytes=67
    		Reduce input records=1
    		Reduce output records=1
    		Spilled Records=2
    		Shuffled Maps =8
    		Failed Shuffles=0
    		Merged Map outputs=8
    		GC time elapsed (ms)=252
    		Total committed heap usage (bytes)=2667053056
    	Shuffle Errors
    		BAD_ID=0
    		CONNECTION=0
    		IO_ERROR=0
    		WRONG_LENGTH=0
    		WRONG_MAP=0
    		WRONG_REDUCE=0
    	File Input Format Counters 
    		Bytes Read=26007
    	File Output Format Counters 
    		Bytes Written=123
    19/01/27 05:15:37 INFO jvm.JvmMetrics: Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
    19/01/27 05:15:38 INFO input.FileInputFormat: Total input paths to process : 1
    19/01/27 05:15:38 INFO mapreduce.JobSubmitter: number of splits:1
    19/01/27 05:15:38 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local326049581_0002
    19/01/27 05:15:38 INFO mapreduce.Job: The url to track the job: http://localhost:8080/
    19/01/27 05:15:38 INFO mapreduce.Job: Running job: job_local326049581_0002
    19/01/27 05:15:38 INFO mapred.LocalJobRunner: OutputCommitter set in config null
    19/01/27 05:15:38 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
    19/01/27 05:15:38 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter
    19/01/27 05:15:38 INFO mapred.LocalJobRunner: Waiting for map tasks
    19/01/27 05:15:38 INFO mapred.LocalJobRunner: Starting task: attempt_local326049581_0002_m_000000_0
    19/01/27 05:15:38 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
    19/01/27 05:15:38 INFO mapred.Task:  Using ResourceCalculatorProcessTree : [ ]
    19/01/27 05:15:38 INFO mapred.MapTask: Processing split: file:/opt/module/hadoop-2.7.2/grep-temp-476836355/part-r-00000:0+111
    19/01/27 05:15:38 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
    19/01/27 05:15:38 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
    19/01/27 05:15:38 INFO mapred.MapTask: soft limit at 83886080
    19/01/27 05:15:38 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
    19/01/27 05:15:38 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
    19/01/27 05:15:38 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
    19/01/27 05:15:38 INFO mapred.LocalJobRunner: 
    19/01/27 05:15:38 INFO mapred.MapTask: Starting flush of map output
    19/01/27 05:15:38 INFO mapred.MapTask: Spilling map output
    19/01/27 05:15:38 INFO mapred.MapTask: bufstart = 0; bufend = 17; bufvoid = 104857600
    19/01/27 05:15:38 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend = 26214396(104857584); length = 1/6553600
    19/01/27 05:15:38 INFO mapred.MapTask: Finished spill 0
    19/01/27 05:15:38 INFO mapred.Task: Task:attempt_local326049581_0002_m_000000_0 is done. And is in the process of committing
    19/01/27 05:15:38 INFO mapred.LocalJobRunner: map
    19/01/27 05:15:38 INFO mapred.Task: Task 'attempt_local326049581_0002_m_000000_0' done.
    19/01/27 05:15:38 INFO mapred.LocalJobRunner: Finishing task: attempt_local326049581_0002_m_000000_0
    19/01/27 05:15:38 INFO mapred.LocalJobRunner: map task executor complete.
    19/01/27 05:15:38 INFO mapred.LocalJobRunner: Waiting for reduce tasks
    19/01/27 05:15:38 INFO mapred.LocalJobRunner: Starting task: attempt_local326049581_0002_r_000000_0
    19/01/27 05:15:38 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
    19/01/27 05:15:38 INFO mapred.Task:  Using ResourceCalculatorProcessTree : [ ]
    19/01/27 05:15:38 INFO mapred.ReduceTask: Using ShuffleConsumerPlugin: org.apache.hadoop.mapreduce.task.reduce.Shuffle@67af1c61
    19/01/27 05:15:38 INFO reduce.MergeManagerImpl: MergerManager: memoryLimit=334338464, maxSingleShuffleLimit=83584616, mergeThreshold=220663392, ioSortFactor=10, memToMemMergeOutputsThreshold=10
    19/01/27 05:15:38 INFO reduce.EventFetcher: attempt_local326049581_0002_r_000000_0 Thread started: EventFetcher for fetching Map Completion Events
    19/01/27 05:15:38 INFO reduce.LocalFetcher: localfetcher#2 about to shuffle output of map attempt_local326049581_0002_m_000000_0 decomp: 21 len: 25 to MEMORY
    19/01/27 05:15:38 INFO reduce.InMemoryMapOutput: Read 21 bytes from map-output for attempt_local326049581_0002_m_000000_0
    19/01/27 05:15:38 INFO reduce.MergeManagerImpl: closeInMemoryFile -> map-output of size: 21, inMemoryMapOutputs.size() -> 1, commitMemory -> 0, usedMemory ->21
    19/01/27 05:15:38 INFO reduce.EventFetcher: EventFetcher is interrupted.. Returning
    19/01/27 05:15:38 INFO mapred.LocalJobRunner: 1 / 1 copied.
    19/01/27 05:15:38 INFO reduce.MergeManagerImpl: finalMerge called with 1 in-memory map-outputs and 0 on-disk map-outputs
    19/01/27 05:15:38 INFO mapred.Merger: Merging 1 sorted segments
    19/01/27 05:15:38 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 11 bytes
    19/01/27 05:15:38 INFO reduce.MergeManagerImpl: Merged 1 segments, 21 bytes to disk to satisfy reduce memory limit
    19/01/27 05:15:38 INFO reduce.MergeManagerImpl: Merging 1 files, 25 bytes from disk
    19/01/27 05:15:38 INFO reduce.MergeManagerImpl: Merging 0 segments, 0 bytes from memory into reduce
    19/01/27 05:15:38 INFO mapred.Merger: Merging 1 sorted segments
    19/01/27 05:15:38 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 11 bytes
    19/01/27 05:15:38 INFO mapred.LocalJobRunner: 1 / 1 copied.
    19/01/27 05:15:38 INFO mapred.Task: Task:attempt_local326049581_0002_r_000000_0 is done. And is in the process of committing
    19/01/27 05:15:38 INFO mapred.LocalJobRunner: 1 / 1 copied.
    19/01/27 05:15:38 INFO mapred.Task: Task attempt_local326049581_0002_r_000000_0 is allowed to commit now
    19/01/27 05:15:38 INFO output.FileOutputCommitter: Saved output of task 'attempt_local326049581_0002_r_000000_0' to file:/opt/module/hadoop-2.7.2/output/_temporary/0/task_local326049581_0002_r_000000
    19/01/27 05:15:38 INFO mapred.LocalJobRunner: reduce > reduce
    19/01/27 05:15:38 INFO mapred.Task: Task 'attempt_local326049581_0002_r_000000_0' done.
    19/01/27 05:15:38 INFO mapred.LocalJobRunner: Finishing task: attempt_local326049581_0002_r_000000_0
    19/01/27 05:15:38 INFO mapred.LocalJobRunner: reduce task executor complete.
    19/01/27 05:15:39 INFO mapreduce.Job: Job job_local326049581_0002 running in uber mode : false
    19/01/27 05:15:39 INFO mapreduce.Job:  map 100% reduce 100%
    19/01/27 05:15:39 INFO mapreduce.Job: Job job_local326049581_0002 completed successfully
    19/01/27 05:15:39 INFO mapreduce.Job: Counters: 30
    	File System Counters
    		FILE: Number of bytes read=1159582
    		FILE: Number of bytes written=2231696
    		FILE: Number of read operations=0
    		FILE: Number of large read operations=0
    		FILE: Number of write operations=0
    	Map-Reduce Framework
    		Map input records=1
    		Map output records=1
    		Map output bytes=17
    		Map output materialized bytes=25
    		Input split bytes=127
    		Combine input records=0
    		Combine output records=0
    		Reduce input groups=1
    		Reduce shuffle bytes=25
    		Reduce input records=1
    		Reduce output records=1
    		Spilled Records=2
    		Shuffled Maps =1
    		Failed Shuffles=0
    		Merged Map outputs=1
    		GC time elapsed (ms)=0
    		Total committed heap usage (bytes)=658505728
    	Shuffle Errors
    		BAD_ID=0
    		CONNECTION=0
    		IO_ERROR=0
    		WRONG_LENGTH=0
    		WRONG_MAP=0
    		WRONG_REDUCE=0
    	File Input Format Counters 
    		Bytes Read=123
    	File Output Format Counters 
    		Bytes Written=23
    [atguigu@hadoop100 hadoop-2.7.2]$ cat output/*
    1	dfsadmin
    [atguigu@hadoop100 hadoop-2.7.2]$ 
    

     通过结果可得到 input 目录中的文件满足条件的就只有一个单词 dfsadmin。

    把正则的 dfs 改为 kms:

    [atguigu@hadoop100 hadoop-2.7.2]$ rm -rf output/
    [atguigu@hadoop100 hadoop-2.7.2]$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar grep input output 'kms[a-z.]+'
    19/01/27 05:20:07 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
                 ...
                   ...
                    ...
    		Bytes Read=1057
    	File Output Format Counters 
    		Bytes Written=715
    [atguigu@hadoop100 hadoop-2.7.2]$ cat output/*
    9	kms.acl.
    2	kms.keytab
    1	kms.key.provider.uri
    1	kms.current.key.cache.timeout.ms
    1	kms.cache.timeout.ms
    1	kms.cache.enable
    1	kms.authentication.type
    1	kms.authentication.signer.secret.provider.zookeeper.path
    1	kms.authentication.signer.secret.provider.zookeeper.kerberos.principal
    1	kms.keystore
    1	kms.authentication.signer.secret.provider.zookeeper.connection.string
    1	kms.authentication.signer.secret.provider.zookeeper.auth.type
    1	kms.authentication.signer.secret.provider
    1	kms.authentication.kerberos.principal
    1	kms.authentication.kerberos.name.rules
    1	kms.authentication.kerberos.keytab
    1	kms.audit.aggregation.window.ms
    1	kms.authentication.signer.secret.provider.zookeeper.kerberos.keytab
    [atguigu@hadoop100 hadoop-2.7.2]$ 
    

     

     

     

    b)  官方WordCount案例

    通过单词意思就知道是统计单词的个数,这个案例很经典很实用,面试经常问

    准备工作:

    1.    创建在hadoop-2.7.2文件下面创建一个wcinput文件夹

    [atguigu@hadoop100 hadoop-2.7.2]$ mkdir wcinput

    2.    在wcinput文件下创建一个wc.input文件 

     [atguigu@hadoop100 hadoop-2.7.2]$ cd wcinput
    [atguigu@hadoop100 wcinput]$ touch wc.input

    3.编辑wc.input文件

     [atguigu@hadoop100 wcinput]$ vi wc.input 

    在文件中输入如下内容
    hadoop yarn
    hadoop mapreduce
    atguigu
    atguigu
    保存退出::wq 

    4.    回到Hadoop目录/opt/module/hadoop-2.7.2 

    下面我们就可以运行官方给我们提供的 WordCount 案例:

    5.    执行程序 

    [atguigu@hadoop100 hadoop-2.7.2]$ hadoop jar  share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar wordcount wcinput wcoutput 

    6.    查看结果

    [atguigu@hadoop100 hadoop-2.7.2]$ cat wcoutput/part-r-00000
    atguigu 2
    hadoop  2
    mapreduce       1
    yarn    1

    hadoop jar share/hadoop/mapreduce/mkdir wcinput
    [atguigu@hadoop100 hadoop-2.7.2]$ cd wcinput
    [atguigu@hadoop100 wcinput]$ touch wc.input
    [atguigu@hadoop100 wcinput]$ vi wc.input
    [atguigu@hadoop100 wcinput]$ cd ..
    [atguigu@hadoop100 hadoop-2.7.2]$ hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar wordcount wcinput wcoutput
    19/01/27 05:48:06 INFO Configuration.deprecation: session.id is deprecated. Instead, use 
    ..........................
    		Bytes Written=50
    [atguigu@hadoop100 hadoop-2.7.2]$ cat wcoutput/*
    atguigu	2
    hadoop	2
    mapreduce	1
    yarn	1
    [atguigu@hadoop100 hadoop-2.7.2]$ 
    

     

     

     


    2、伪分布式运行模式

     

    2.1、启动HDFS并运行MapReduce程序

    步骤:

    a)分析

           (1)配置集群

           (2)启动、测试集群增、删、查

           (3)执行WordCount案例

    b)执行步骤

    (1)配置集群

           (a)配置:hadoop-env.sh   sudo vi etc/hadoop/hadoop-env.sh  (注意这里的etc 是Hadoop里面的,不是Linux 里面的 etc)

    Linux系统中获取JDK的安装路径:

    [atguigu@ hadoop100 ~]# echo $JAVA_HOME

    /opt/module/jdk1.8.0_144

    修改JAVA_HOME 路径:vi /etc/profile

    export JAVA_HOME=/opt/module/jdk1.8.0_144

     

     

           (b)配置:core-site.xml  vi etc/hadoop/core-site.xml 

    <!-- 指定HDFS中NameNode的地址 -->
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://hadoop101:9000</value>
    </property>

    <!-- 指定Hadoop运行时产生文件的存储目录 -->
    <property>
        <name>hadoop.tmp.dir</name>
        <value>/opt/module/hadoop-2.7.2/data/tmp</value>
    </property>

    !!!!!!!!!!!!!!!!:配置了这个,在去执行上面的本地模式,本地模式不可以用了,默认是本地模式,现在改了。 

     

     如果不指定Hadoop运行时产生文件的存储目录,他默认的目录是  /tmp/hadoop-用户名  (自动创建),在系统的根目录。配置的目录无需提前创建系统自动创建。

    vi etc/hadoop/core-site.xml 
    [atguigu@hadoop100 hadoop-2.7.2]$ sudo vi etc/hadoop/core-site.xml 
    [sudo] password for atguigu: 
    [atguigu@hadoop100 hadoop-2.7.2]$ 

     

           (c)配置:hdfs-site.xml  sudo vi etc/hadoop/hdfs-site.xml (这个配不配置都行)

    <!-- 指定HDFS副本的数量 -->
    <property>
        <name>dfs.replication</name>
        <value>1</value>
    </property> 

     注:默认个数是3个,也就是在3台机器上存储了同一份数据,任何一个存储数据的节点挂掉,那么还有两份,同时它会在其他服务器上增加一份节点副本,始终保持集群上的副本数是3,副本数的多少取决于集群机器的质量。

    如果只有一台机器就算默认的是3个,也只有一个备份,你后面增加它就会给你备份。

     

    (2)启动集群

           (a)格式化NameNode(第一次启动时格式化,以后就不要总格式化)

    [atguigu@hadoop100 hadoop-2.7.2]$ bin/hdfs namenode  -format

    注意:格式化需要把 Hadoop 里面的数据删除掉(所以说第一次是没有问题,后面就可能有问题)。

           (b)启动NameNode

    [atguigu@hadoop100 hadoop-2.7.2]$ sbin/hadoop-daemon.sh start namenode

           (c)启动DataNode

    [atguigu@hadoop100 hadoop-2.7.2]$ sbin/hadoop-daemon.sh start datanode

    (3)查看集群

           (a)查看是否启动成功

    [atguigu@hadoop100 hadoop-2.7.2]$ jps
    5302 DataNode
    5495 Jps
    5449 NameNode

    注意:jpsJDK中的命令,不是Linux命令。不安装JDK不能使用jps 

           (b)web端查看HDFS文件系统   http://hadoop100:50070/dfshealth.html#tab-overview

     

    注意:hadoop100 需要实现在 Windows 或者 Linux 系统里面配置好 (C:\Windows\System32\drivers\etc\hosts),取决于你的游览器是在拿个系统里面。

    如果不能查看,看如下帖子处理     https://blog.csdn.net/qq_40794973/article/details/86663969

     

    在 sdfs 的根目录下创建多级目录

    [atguigu@hadoop100 hadoop-2.7.2]$ bin/hdfs dfs -mkdir -p /usr/atguigu/input
    [atguigu@hadoop100 hadoop-2.7.2]$ 
    

    dfs 用来定义路径的 

     bin/hdfs dfs 后面跟要执行的命令

    [atguigu@hadoop100 hadoop-2.7.2]$ bin/hdfs dfs -ls /
    Found 1 items
    drwxr-xr-x   - atguigu supergroup          0 2019-01-27 19:00 /usr
    [atguigu@hadoop100 hadoop-2.7.2]$ bin/hdfs dfs -lsr /
    lsr: DEPRECATED: Please use 'ls -R' instead.
    drwxr-xr-x   - atguigu supergroup          0 2019-01-27 19:00 /usr
    drwxr-xr-x   - atguigu supergroup          0 2019-01-27 19:00 /usr/atguigu
    drwxr-xr-x   - atguigu supergroup          0 2019-01-27 19:00 /usr/atguigu/input
    [atguigu@hadoop100 hadoop-2.7.2]$ 
    

     

     

    把本地的文件上传到 hdfs上面,上传到刚刚创建的多级目录 input 里面     bin/hdfs dfs -put wcinput/wc.input /usr/atguigu/input

     

     

     在 hdfs 上面跑一个 WordCount 案例 (输入文件上面已经上传到了 /usr/atguigu/input 目录里面)

    [atguigu@hadoop100 hadoop-2.7.2]$ jps
    3269 DataNode
    3205 NameNode
    4622 Jps
    [atguigu@hadoop100 hadoop-2.7.2]$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar wordcount /usr/atguigu/input/wc.input  /usr/atguigu/output
    

    成功

     

           (c)查看产生的Log日志

    说明:在企业中遇到Bug时,经常根据日志提示信息去分析问题、解决Bug

    当前目录:/opt/module/hadoop-2.7.2/logs

    [atguigu@hadoop100 logs]$ ls
    hadoop-atguigu-datanode-hadoop100.log  hadoop-atguigu-namenode-hadoop100.out
    hadoop-atguigu-datanode-hadoop100.out  SecurityAuth-atguigu.audit
    hadoop-atguigu-namenode-hadoop100.log
    [atguigu@hadoop100 logs]$ cat hadoop-atguigu-datanode-hadoop100.log

    d)思考:为什么不能一直格式化NameNode,格式化NameNode,要注意什么?

    [atguigu@hadoop100 hadoop-2.7.2]$ cat data/tmp/dfs/name/current/VERSION 
    #Sun Jan 27 18:18:10 CST 2019
    namespaceID=64968429
    clusterID=CID-a4ad884d-998c-47df-b315-ae4e0a8e874d
    cTime=0
    storageType=NAME_NODE
    blockpoolID=BP-244670385-192.168.19.100-1548584290557
    layoutVersion=-63
    [atguigu@hadoop100 hadoop-2.7.2]$ cat  data/tmp/dfs/data/current/VERSION 
    #Sun Jan 27 18:18:34 CST 2019
    storageID=DS-fdc0a442-60e5-42ea-98b8-a0b90a5954ac
    clusterID=CID-a4ad884d-998c-47df-b315-ae4e0a8e874d
    cTime=0
    datanodeUuid=022f01d6-9a59-4fe5-8e9a-a86251f0afcd
    storageType=DATA_NODE
    layoutVersion=-56
    [atguigu@hadoop100 hadoop-2.7.2]$ 

     

    [atguigu@hadoop100 hadoop-2.7.2]$ cd data/tmp/dfs/
    [atguigu@hadoop100 dfs]$ tree
    .
    ├── data
    │   ├── current
    │   │   ├── BP-244670385-192.168.19.100-1548584290557
    │   │   │   ├── current
    │   │   │   │   ├── finalized
    │   │   │   │   │   └── subdir0
    │   │   │   │   │       └── subdir0
    │   │   │   │   │           ├── blk_1073741825
    │   │   │   │   │           ├── blk_1073741825_1001.meta
    │   │   │   │   │           ├── blk_1073741826
    │   │   │   │   │           └── blk_1073741826_1002.meta
    │   │   │   │   ├── rbw
    │   │   │   │   └── VERSION
    │   │   │   ├── scanner.cursor
    │   │   │   └── tmp
    │   │   └── VERSION
    │   └── in_use.lock
    └── name
        ├── current
        │   ├── edits_inprogress_0000000000000000001
        │   ├── fsimage_0000000000000000000
        │   ├── fsimage_0000000000000000000.md5
        │   ├── seen_txid
        │   └── VERSION
        └── in_use.lock
    
    11 directories, 14 files
    

    注意:格式化NameNode,会产生新的集群id,导致NameNodeDataNode的集群id不一致,集群找不到已往数据。所以,格式NameNode时,一定要先删除data数据和log日志,然后再格式化NameNode 

           (4)操作集群

    (a)在HDFS文件系统上创建一个input文件夹

    [atguigu@hadoop100 hadoop-2.7.2]$ bin/hdfs dfs -mkdir -p /user/atguigu/input

    (b)将测试文件内容上传到文件系统上

    [atguigu@hadoop100 hadoop-2.7.2]$bin/hdfs dfs -put wcinput/wc.input  /user/atguigu/input/

    (c)查看上传的文件是否正确

    [atguigu@hadoop100 hadoop-2.7.2]$ bin/hdfs dfs -ls  /user/atguigu/input/

    [atguigu@hadoop100 hadoop-2.7.2]$ bin/hdfs dfs -cat  /user/atguigu/ input/wc.input

    (d)运行MapReduce程序

    [atguigu@hadoop100 hadoop-2.7.2]$ bin/hadoop  jar  share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar wordcount  /user/atguigu/input/   /user/atguigu/output

    (e)查看输出结果

    命令行查看:
    [atguigu@hadoop100 hadoop-2.7.2]$ bin/hdfs dfs -cat /user/atguigu/output/*

    浏览器查看output文件,如图2-34所示


     
    (f)将测试文件内容下载到本地

    [atguigu@hadoop100 hadoop-2.7.2]$hdfs dfs -get /user/atguigu/output/part-r-00000  ./wcoutput/

    (g)删除输出结果

    [atguigu@hadoop100 hadoop-2.7.2]$ hdfs dfs -rm -r  /user/atguigu/output
     

     

    2.2、启动YARN并运行MapReduce程序

    a)分析

           (1)配置集群在YARN上运行MR

           (2)启动、测试集群增、删、查

           (3)在YARN上执行WordCount案例

    b) 执行步骤       

           (1)配置集群

    (a)配置 yarn-env.sh   vi etc/hadoop/yarn-env.sh   配置一下JAVA_HOME

    export JAVA_HOME=/opt/module/jdk1.8.0_144

    (b)配置 yarn-site.xml     vi etc/hadoop/yarn-site.xml

     <!-- Reducer获取数据的方式 --> 
    <property>
             <name>yarn.nodemanager.aux-services</name>
             <value>mapreduce_shuffle</value>
    </property>

    <!-- 指定YARN的ResourceManager的地址 -->
    <property>
            <name>yarn.resourcemanager.hostname</name>
            <value>hadoop100</value>
    </property>

    (c)配置:mapred-env.sh  vi etc/hadoop/mapred-env.sh  配置一下JAVA_HOME

    export JAVA_HOME=/opt/module/jdk1.8.0_144

    (d)配置: (对mapred-site.xml.template重新命名为) mapred-site.xml

    [atguigu@hadoop100 hadoop-2.7.2]$ cd etc/hadoop
    [atguigu@hadoop100 hadoop]$ mv mapred-site.xml.template mapred-site.xml
    [atguigu@hadoop100 hadoop]$ vi mapred-site.xml

     <!-- 指定MR运行在YARN上 -->
    <property>
            <name>mapreduce.framework.name</name>
            <value>yarn</value>
    </property>

    注:默认是本地运行。

     

            (2)启动集群

    (a)启动前必须保证 NameNode 和 DataNode 已经启动

    [atguigu@hadoop100 hadoop-2.7.2]$ jps
    3269 DataNode
    3205 NameNode
    5948 Jps

    (b)启动ResourceManager

    [atguigu@hadoop100 hadoop-2.7.2]$ sbin/yarn-daemon.sh start resourcemanager

    (c)启动NodeManager

    [atguigu@hadoop100 hadoop-2.7.2]$ sbin/yarn-daemon.sh start nodemanager

     [atguigu@hadoop100 hadoop-2.7.2]$ jps
    3269 DataNode
    3205 NameNode
    5991 ResourceManager
    6348 Jps
    6271 NodeManager

            (3)集群操作

    (a)YARN的浏览器页面查看,如下图所示   http://hadoop100:8088/cluster

    (b)删除文件系统上的output文件

    [atguigu@hadoop100 hadoop-2.7.2]$ bin/hdfs dfs -rm -r  /user/atguigu/output

    (c)执行MapReduce程序 

    [atguigu@hadoop100 hadoop-2.7.2]$ bin/hadoop jar  share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar wordcount /user/atguigu/input  /user/atguigu/output

    (d)查看运行结果

    [atguigu@hadoop100 hadoop-2.7.2]$ bin/hdfs dfs -cat /user/atguigu/output/*  

     

     

     2.3、配置历史服务器

    为了查看程序的历史运行情况,需要配置一下历史服务器。具体配置步骤如下:

    (1)、配置 mapred-site.xml

    [atguigu@hadoop100 hadoop-2.7.2]$ cd etc/hadoop/
    [atguigu@hadoop100 hadoop]$ vi mapred-site.xml

    在该文件里面增加如下配置:

    <!-- 历史服务器端地址 -->
    <property>
        <name>mapreduce.jobhistory.address</name>
        <value>hadoop100:10020</value>
    </property>
    <!-- 历史服务器web端地址 -->
    <property>
        <name>mapreduce.jobhistory.webapp.address</name>
        <value>hadoop100:19888</value>
    </property>

    (2)、  启动历史服务器

    [atguigu@hadoop100 hadoop]$ cd ../..
    [atguigu@hadoop100 hadoop-2.7.2]$ sbin/mr-jobhistory-daemon.sh start historyserver

    (3)、 查看历史服务器是否启动

    [atguigu@hadoop100 hadoop-2.7.2]$ jps
    7076 Jps
    3269 DataNode
    3205 NameNode
    5991 ResourceManager
    7033 JobHistoryServer
    6271 NodeManager

    (4)、查看JobHistory    http://hadoop100:19888/jobhistory

     

    2.4、配置日志的聚集

    日志聚集概念:应用运行完成以后,将程序运行日志信息上传到HDFS系统上。
    日志聚集功能好处:可以方便的查看到程序运行详情,方便开发调试。
    注意:开启日志聚集功能,需要重新启动NodeManager 、ResourceManager和HistoryManager。
    开启日志聚集功能具体步骤如下:

    (1)、配置 yarn-site.xml

    [atguigu@hadoop100 hadoop-2.7.2]$ cd etc/hadoop/
    [atguigu@hadoop100 hadoop]$ vi yarn-site.xml

    在该文件里面增加如下配置:

    <!-- 日志聚集功能使能 -->
    <property>
    <name>yarn.log-aggregation-enable</name>
    <value>true</value>
    </property>

    <!-- 日志保留时间设置7天  秒为单位 -->
    <property>
    <name>yarn.log-aggregation.retain-seconds</name>
    <value>604800</value>
    </property> 

    (2)、关闭NodeManager 、ResourceManager和HistoryManager

    [atguigu@hadoop100 hadoop]$ cd ../..
    [atguigu@hadoop100 hadoop-2.7.2]$ sbin/yarn-daemon.sh stop resourcemanager
    [atguigu@hadoop100 hadoop-2.7.2]$ sbin/yarn-daemon.sh stop nodemanager
    [atguigu@hadoop100 hadoop-2.7.2]$ sbin/mr-jobhistory-daemon.sh stop historyserver

    (3)、启动NodeManager 、ResourceManager和HistoryManager

    [atguigu@hadoop100 hadoop-2.7.2]$ sbin/yarn-daemon.sh start resourcemanager

    [atguigu@hadoop100 hadoop-2.7.2]$ sbin/yarn-daemon.sh start nodemanager

    [atguigu@hadoop100 hadoop-2.7.2]$ sbin/mr-jobhistory-daemon.sh start historyserver

    [atguigu@hadoop100 hadoop-2.7.2]$ jps
    8389 Jps
    3269 DataNode
    3205 NameNode
    7401 ResourceManager
    7801 JobHistoryServer
    7674 NodeManager 

    (4)、删除HDFS上已经存在的输出文件

    [atguigu@hadoop101 hadoop-2.7.2]$ bin/hdfs dfs -rm -r /user/atguigu/output

    (5)、执行WordCount程序

    [atguigu@hadoop101 hadoop-2.7.2]$ hadoop jar  share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar wordcount  /user/atguigu/input  /user/atguigu/output

    (6)、查看日志    http://hadoop100:19888/jobhistory

     

     

    2.5、配置文件说明 

    Hadoop配置文件分两类:默认配置文件和自定义配置文件,只有用户想修改某一默认配置值时,才需要修改自定义配置文件,更改相应属性值。

    (1)默认配置文件:

    要获取的默认文件

    文件存放在Hadoop的jar包中的位置

    [core-default.xml]

    hadoop-common-2.7.2.jar/ core-default.xml

    [hdfs-default.xml]

    hadoop-hdfs-2.7.2.jar/ hdfs-default.xml

    [yarn-default.xml]

    hadoop-yarn-common-2.7.2.jar/ yarn-default.xml

    [mapred-default.xml]

    hadoop-mapreduce-client-core-2.7.2.jar/ mapred-default.xml

     

     

     

     

     

     

    (2)自定义配置文件:

           core-site.xml、hdfs-site.xml、yarn-site.xml、mapred-site.xml四个配置文件存放在$HADOOP_HOME/etc/hadoop这个路径上,用户可以根据项目需求重新进行修改配置。

     

    展开全文
  • 1、现象:生产中分别部署了两台服务器,独立运行storm,然后拓扑程序提交是本地模式,发现不用启动storm和zookeeper也可以运行; 2、验证:在拓扑程序中增加参数传递,1代表本地模式,2代表集群模式  1)关闭集群...

    1、现象:生产中分别部署了两台服务器,独立运行storm,然后拓扑程序提交是本地模式,发现不用启动storm和zookeeper也可以运行;

    #jps  没有下面进程
     QuorumPeerMain  //zookeeper进程
     supervisor  
     nimbus


    2、验证:在拓扑程序中增加参数传递,1代表本地模式,2代表集群模式

         1)关闭集群后,传参1运行本地模式成功,传参2运行集群模式失败;

          2)启动集群后,传参2集群模式提交运行,成功;

        本地模式:不用启动storm和zookeeper集群
      storm jar /tmp/glabeling.jar com.glabeling.GlabelingInfoTopology -N 1-C /tmp/glabeling.xml > /tmp/storm.logs
       集群模式:需启动storm和zookeeper集群
    storm jar /tmp/glabeling.jar com.glabeling.GlabelingInfoTopology -N 2 -C /tmp/glabeling.xml > /tmp/storm.logs


    3、过程中,对zookeeper作用记录下:

    Storm中使用Zookeeper主要用于Storm集群各节点的分布式协调工作,具体功能如下:
    (1)存储客户端提供的topology任务信息,nimbus负责将任务分配信息写入Zookeeper,supervisor从Zookeeper上读取任务分配信息;
    (2)存储supervisor和worker的心跳(包括它们的状态),使得nimbus可以监控整个集群的状态, 从而重启一些挂掉的worker;
    (3)存储整个集群的所有状态信息和配置信息。

    展开全文
  • Hive本地模式

    千次阅读 2017-06-09 15:45:13
    一、原理本地运行map-reduce作业。这对于在小型数据集上运行查询非常有用 - 在这种情况下,本地模式的执行通常比向大型集群提交​​作业要快...对于所有mapreduce任务都以本地模式运行,要启用此功能,用户可以启用以下

    一、原理

    本地运行map-reduce作业。这对于在小型数据集上运行查询非常有用 - 在这种情况下,本地模式的执行通常比向大型集群提交​​作业要快得多。从HDFS透明地访问数据。相反,本地模式只能运行一个reducer,处理较大的数据集可能非常慢。

    二、配置

    1.完全本地模式
    从0.7版本开始,Hive完全支持本地模式的执行。对于所有mapreduce任务都以本地模式运行,要启用此功能,用户可以启用以下选项:

    SET mapreduce.framework.name = local;
    SET mapred.local.dir=/tmp/username/mapred/local

    2.自动本地模式
    Hive通过条件判断是否通过本地模式运行mapreduce任务

    条件为:
    作业的总输入大小低于:hive.exec.mode.local.auto.inputbytes.max,默认为128MB
    map任务的总数小于:hive.exec.mode.local.auto.tasks.max,默认为4
    所需的reduce任务总数为1或0。

    配置:

    SET hive.exec.mode.local.auto = true;

    默认情况下为false,禁用此功能。

    对于小数据集的查询,或者对于具有多个map-reduce作业的查询,其中对后续作业的输入要小得多(因为先前作业中的减少/过滤),作业可能在本地运行。

    请注意,Hadoop服务器节点和运行Hive客户端的机器(由于不同的jvm版本或不同的软件库)的运行时环境可能会有差异。这可能会在本地模式下运行时导致意外的行为/错误。还要注意,本地模式的执行是在Hive客户端的一个单独的jvm中完成的。如果用户希望,则可以通过hive.mapred.local.mem选项来控制此子jvm的最大内存量。默认情况下,它设置为零,在这种情况下,Hive允许Hadoop确定子jvm的默认内存限制

    展开全文
  • spark设置本地运行模式

    千次阅读 2017-06-01 17:32:55
    import org.apache.spark.{SparkConf, SparkContext} /** * Created by root on 2016/5/16. */ object ForeachDemo { def main(args: Array[String]) { val conf = new SparkConf().setAppName("Foreach
    import org.apache.spark.{SparkConf, SparkContext}
    
    /**
      * Created by root on 2016/5/16.
      */
    object ForeachDemo {
    
      def main(args: Array[String]) {
        val conf = new SparkConf().setAppName("ForeachDemo").setMaster("local")
        val sc = new SparkContext(conf)
        //
        //
    
        sc.stop()
    
      }
    }

    展开全文
  • MapReduce的本地运行模式(debug调试)

    千次阅读 2018-11-14 09:24:18
    (1)mapreduce程序是被提交给LocalJobRunner在本地以单进程的形式运行。在本地运行mapreduce程序可以更快地运行,并且可以使用debug进行跟踪代码,方便查错,在本地运行...(3)本地模式非常便于进行业务逻辑的de...
  • Apache Flink本地模式部署

    千次阅读 2018-12-12 21:28:30
    Apache Flink部署模式有好几种,本文主要介绍Apache Flink的本地部署模式本地部署模式主要用于开发者程序调试测试使用。 先决条件 运行系统:系统方面没有过多要求,Linux、Mac、Windows均可 Java 1.8.x以上,...
  • hive 使用本地模式

    千次阅读 2015-01-27 17:38:55
    0.7版本后Hive开始支持任务执行选择本地模式(local mode)。大多数的Hadoop job是需要hadoop提供的完整的可扩展性来处理大数据的。不过,有时hive的输入数据量是非常小的。在这种情况下,为查询出发执行任务的时间...
  • 在hive中运行的sql有很多是比较小的sql,数据量小,计算量小 ,这些比较小的sql如果也采用分布式的方式来执行,那么是得不偿失的.因为sql真正执行的时间可能只有10秒,但是分布式任务的生成的其他过程的执行可能要1分钟....
  • 开启Hive的本地模式

    千次阅读 2017-12-19 20:19:37
    但是你会发现job确实是以本地模式运行了(看job名字就能看出来,中间有local字样),但是还是会报错,各种找不到jar包。 这里还要运行一个语句:set fs.defaultFS=file:/// 然后你再去执行前面的那条语句,...
  • hive开启本地模式执行

    千次阅读 2015-01-06 17:32:08
    但是你会发现job确实是以本地模式运行了(看job名字就能看出来,中间有local字样),但是还是会报错,各种找不到jar包。 这里还要运行一个语句:set fs.defaultFS=file:/// 然后你再去执行前面的那条语句,可以...
  • Hadoop2.6.0运行mapreduce之Uber模式验证

    万次阅读 2016-05-05 14:55:38
    在有些情况下,运行于Hadoop集群上的一些mapreduce作业本身的数据量并不是很大,如果此时的任务分片很多,那么为每个map任务或者reduce任务频繁创建Container,势必会增加Hadoop集群的资源消耗,并且因为创建分配...
  • Java常见设计模式总结

    万次阅读 多人点赞 2021-09-18 17:18:54
    设计模式是一套经过反复使用的代码设计经验,目的是为了重用代码、让代码更容易被他人理解、保证代码可靠性。设计模式于己于人于系统都是多赢的,它使得代码编写真正工程化,它是软件工程的基石,如同大厦的一块块...
  • 一、分布式事物:本地事务和分布式事务(2PC+3PC)+传统分布式事务的问题 (一)本地事务和分布式事务(2PC+3PC) (1)两阶段提交协议2PC (2)三阶段提交协议3PC (二)对于微服务,传统分布式事务存在的问题 ...
  • JAVA设计模式之原型模式

    万次阅读 多人点赞 2014-04-08 08:22:45
    定义:用原型实例指定创建对象的种类,并通过拷贝这些原型创建新的对象。...在java语言有一个Cloneable接口,它的作用只有一个,就是在运行时通知虚拟机可以安全地在实现了此接口的类上使用clone方法。在ja
  • JAVA设计模式--原型模式

    万次阅读 2016-05-23 19:48:46
    目录 ...原型(Prototype)模式是一种对象创建型模式,它通过原型实例指定创建对象的种类,并采用拷贝原型实例的方法来创建新的对象。所以,使用原型模式创建的实例,具有与原型实例一样的数据。 ...
  • java 23种设计模式详解

    万次阅读 多人点赞 2016-07-01 14:45:14
    行为型模式,共十一种:策略模式、模板方法模式、观察者模式、迭代子模式、责任链模式、命令模式、备忘录模式、状态模式、访问者模式、中介者模式、解释器模式。其实还有两类:并发型模式和线程池模式
  • 这次要介绍一下对象池模式(Object Pool Pattern),这个模式为常见 23 种设计模式之外的设计模式,介绍的初衷主要是在平时的 android 开发中经常会看到,比如 ThreadPool 和 MessagePool 等。  在 java 中,所有...
  • 转自浅墨毛星云:http://blog.csdn.net/poem_qianmo/article/details/53240330...我们知道,游戏行业其实一直很缺一本系统介绍游戏编程进阶技巧的书籍,而《游戏编程模式》的出现,正好弥补了这一点。之前已经有提到过
  • spark部署模式解析

    千次阅读 2017-04-24 10:41:22
    单机上可以本地模式运行 单机上伪分布式模式运行 集群上standalone模式,spark on yarn模式,spark on mesos模式,这里主要介绍集群前两种。standalone模式类似于单机伪分布式模式,如果是使用spark-shell交互运行...
  • 1、VS出现此问题 问题分析:本地电脑安装的oracle客户端为64位客户端,vs启动网站默认启动自带的32位IIS ...如果“先决条件检查”,提示系统变量PATH路径过长,只要把Oracle安装地址(如:D:\app\XXXX\product\1...
  • 设计模式(11) 原型模式

    万次阅读 2019-05-02 12:21:44
    **定义:**用原型实例指定创建对象的种类,并通过拷贝这些原型创建新的对象。...在java语言有一个Cloneable接口,它的作用只有一个,就是在运行时通知虚拟机可以安全地在实现了此接口的类上使用clone方法。在...
  • 本篇博客将围绕Hadoop伪分布安装+MapReduce运行原理+...Hadoop的安装方式有三种:本地模式,伪分布模式,集群(分布)模式,其中后两种模式为重点,有意义 伪分布:如果Hadoop对应的Java进程都运行在一个物理机器上,称为伪
  • 原型模式

    千次阅读 2015-03-14 09:23:04
    原型模式主要用于对象的复制,它的核心是就是类图中的原型类Prototype。Prototype类需要具备以下两个条件: 实现Cloneable接口。在java语言有一个Cloneable接口,它的作用只有一个,就是在运行时通知虚拟机可以...
  • Android7.0 Doze模式

    万次阅读 2016-11-07 15:30:56
    在Android M中,Google就引入了Doze模式。它定义了一种全新的、低能耗的状态。...在该状态,后台只有部分任务被允许运行,其它任务都被强制停止。本篇博客中,我们就来分析一下Android 7.0中Doze模式相关的流程。
  • Java设计模式之创建型:原型模式

    千次阅读 2018-11-01 21:59:50
    原型模式主要用于对象的...原型模式比 new 方式创建对象的性能要好的多,因为 Object 类的 clone() 方法是一个本地方法,直接操作内存中的二进制流,特别是复制大对象时,性能的差别非常明显;并且可以简化对象的创建;
  • 这是一篇超过万字读书笔记,总结了《游戏编程模式》一书中所有章节与内容的知识梗概。 我们知道,游戏行业其实一直很缺一本系统介绍游戏编程进阶技巧的书籍,而《游戏编程模式》得出现,正好弥补了这一点。在这篇...
  • win10搭建并运行kafka

    千次阅读 2018-09-09 12:43:47
    前置条件: 想要在win运行kafka需要先配置好jdk喝zookeeper jdk 下载地址: https://www.oracle.com/technetwork/cn/java/javase/downloads/jdk8-downloads-2133151-zhs.html 本人用的是jdk1.8 ,之前电脑上有...
  • 【设计模式】——原型模式

    千次阅读 2018-12-10 09:16:16
    原型模式是创建型模式的一种,其特点在于通过“复制”一个已经存在的实例来返回新的实例,而不是新建实例。被复制的实例就是我们所称的“原型”,这个原型是可定制的。 客户(Client)角色:客户类提出创建对象的...
  • 23种设计模式(5):原型模式

    万次阅读 多人点赞 2012-03-26 08:58:25
    定义:用原型实例指定创建对象的种类,并通过拷贝这些原型创建新的对象。...在java语言有一个Cloneable接口,它的作用只有一个,就是在运行时通知虚拟机可以安全地在实现了此接口的类上使用clone方法。在ja

空空如也

空空如也

1 2 3 4 5 ... 20
收藏数 194,724
精华内容 77,889
关键字:

哪个不是本地模式运行的条件