精华内容
下载资源
问答
  • 大体执行流程 1.driver执行main方法(懒执行),action算子触发触发job 2.根据宽窄依赖划分stage 3.每个stage会被整理成taskset(包含多个task)4.每个task分发到具体的Executor去执行 完整调度流程 1....
    大体执行流程
    1.driver执行main方法(懒执行),action算子触发触发job
    2.根据宽窄依赖划分stage
    3.每个stage会被整理成taskset(包含多个task)4.每个task分发到具体的Executor去执行
     
    完整调度流程
     
     

    1.当Driver启动的时候,初始化时会相应的创建DagScheduler、TaskScheduler

    2.TaskScheduler初始化的时候,会创建SchedulerBacked(主要负责集群之间的通讯)

    3.SchedulerBacked和ApplicationMaster进行通讯,SchedulerBacked会告诉ApplicationManager会启动多少个Executor

    4.然后ApplicationManager会向ResourceManager申请资源

    5.然后启动相应的Executor

    6.Executor调用ExecutorBackend向Driver里面的ScheduleBackend注册,当所有的Executor都注册完之后

    7.Driver 开始执行 main 函数,遇到 Action 算子的时候会触发 Job ,根据宽依赖在DagScheduler中进行Stage的划分,然后进行TaskSet的创建,创建完成之后,DagScheduler会把TaskSet发送给TaskScheduler进行维护,TaskSchedule会将其封装为TaskSetManager,与TaskSet一一对应

    8.TaskScheduler以一定的调度策略,决定哪个Task被执行,然后决定要到哪一个Executor上执行

    9.然后SchedulerBackend与ExecutorBackend建立通讯,告诉哪一个Executor执行

    10.然后Executor开始执行

    11.在执行过程中,Executor会不断的发送心跳给Driver的HeartbeatReceiver,从而使实时监测Executor是否是工作状态的(active),

    12.同时,Executor在执行Task过程中,会把每一个Task的运行情况通过ExecutorBackend发送给ScheduleBackend

    13.ScheduleBackend会进一步把这些运行情况发送给TaskScheduler,从而使TaskScheduler掌握当前运行在每一个Executor上的Task的运行状态,一旦某个Task执行失败,TaskScheduler就会重新调度,重新选择Executor执行Task 

    简单来说 ,Action算子 触发 Job ,DAGScheduler对象 根据 宽依赖 划分  Stage ,根据 Stage 里所处理的数据,决定有多少个 Task ,然后创建对应的 TaskSet ,然后交给 TaskScheduler 进行调度,交给 SchedulerBackend(用于Driver和其他组件交互) ,发送个对应的 Executor 执行。 Executor 执行多少个 Task 由 TaskSchedule 决定。

     

    各个相关组件详解
    Driver
    Spark驱动器节点,用于执行Spark任务中的main方法,负责实际代码的执行工作。Driver在Spark作业中执行时主要负责:
    1.将用户程序转化为任务(job)
    2.在Executor之间调度任务(task)
    3.跟踪Executor的执行情况
    4.通过UI展示查询运行情况
     
    Executor
    Executor节点是一个JVM进程,负责在Spark作业中运行具体任务,任务彼此相互独立。Spark启动时,Executor节点同时被启动,并且始终伴随整个Spark应用的生命周期,如果发生故障,Spark应用也可以继续执行,会将出错节点上的任务调度到其他Executor节点上继续执行。
    Executor核心功能:
    1.负责运行组成Spark应用的任务,并将结果返回给驱动器进程;
    2.通过自身块管理器为用户程序中要求缓存的RDD提供内存式存储。RDD是直接缓存在Executor进程内的,因此任务可以在运行时充分利用缓存数据的加速运算。
     
    Master:是一个进程,主要负责资源的调度和分配,并进行集群的监控等职责
     
    Worker:是一个进程,一个 Worker 运行在集群中的一台服务器上,主要负责两个职责,一个是用自己的内存存储 RDD 的某个或某些 partition;另一个是启动其他进程和线程(Executor) ,对 RDD 上的 partition 进行并行的处理和计算。
     
    YARN Cluster模式
    1.任务提交后会和 ResourceManager 通讯申请启动 ApplicationMaster ,随后 ResourceManager 分配 container ,在合适的 NodeManager 上启动 ApplicationManager ,此时 ApplicationManager 就是Driver ;
    2.Driver 启动后向 ResourceManager 申请 Executor 内存,ResourceManager 接到ApplicationManager 的资源申请后会分配 container ,然后在合适的NodeManager上启动 Executor 进程,Executor 进程启动后会向 Driver 反向注册;
    3.Executor 全部注册完成后 Driver 开始执行 main 函数,之后执行 Action 算子时,触发一个job,并根据宽窄依赖划分 stage ,每个stage生成对应的taskSet,之后分发到各个 Executor 上执行。
     
    补充
    Spark的任务调度主要集中在两个方面:资源申请和任务分发。
    Job是以 Action 方法为界,遇到一个 Action 方法则触发一个Job;
    Stage是 Job 的子集,以 RDD 款依赖(即Shuffle)为界,遇到一个 Shuffle  做一次划分;
    Task 是 Stage 的子集,以并行度(分区数)来衡量,分区数是多少,则有多少个task,一个 task 对应一个RDD分区 ,如果数据来自HDFS,一个分区的数据就是一个 split 的数据。
     
    +++++++++++++++++++++++++++++++++++++++++
    +         如有问题可+Q:1602701980 共同探讨           +
    +++++++++++++++++++++++++++++++++++++++++
    展开全文
  • spark执行流程

    2020-08-26 10:35:25
  • Spark执行流程与原理

    千次阅读 2018-11-06 16:36:12
    Spark执行计划分析: https://blog.csdn.net/zyzzxycj/article/details/82704713 ----------- 先贴一张sql解析的总流程图: 第一次看这图可能还不是很理解,先看一个简单sql: select * from heguozi....

     

    Spark执行计划分析:

    https://blog.csdn.net/zyzzxycj/article/details/82704713

    -----------

    先贴一张sql解析的总流程图:

    第一次看这图可能还不是很理解,先看一个简单sql:

    select * from heguozi.payinfo where pay = 0 limit 10

    当这个sqlText,到获得最终结果中间,会存在哪些执行计划呢?

    explain extended select * from heguozi.payinfo where pay = 0 limit 10

     会看到有4个执行计划:

    == Parsed Logical Plan ==
    'GlobalLimit 10
    +- 'LocalLimit 10
       +- 'Project [*]
          +- 'Filter ('pay = 0)
             +- 'UnresolvedRelation `heguozi`.`payinfo`

     Parsed Logical Plan对应图中Unresolved LogicalPlan,

    == Analyzed Logical Plan ==
    pay_id: string, totalpay_id: string, kindpay_id: string, kindpayname: string, fee: double, operator: string, operator_name: string, pay_time: bigint, pay: double, charge: double, is_valid: int, entity_id: string, create_time: bigint, op_time: bigint, last_ver: bigint, opuser_id: string, card_id: string, card_entity_id: string, online_bill_id: string, type: int, code: string, waitingpay_id: string, load_time: int, modify_time: int, ... 8 more fields
    GlobalLimit 10
    +- LocalLimit 10
       +- Project [pay_id#10079, totalpay_id#10080, kindpay_id#10081, kindpayname#10082, fee#10083, operator#10084, operator_name#10085, pay_time#10086L, pay#10087, charge#10088, is_valid#10089, entity_id#10090, create_time#10091L, op_time#10092L, last_ver#10093L, opuser_id#10094, card_id#10095, card_entity_id#10096, online_bill_id#10097, type#10098, code#10099, waitingpay_id#10100, load_time#10101, modify_time#10102, ... 8 more fields]
          +- Filter (pay#10087 = cast(0 as double))
             +- SubqueryAlias payinfo
                +- Relation[pay_id#10079,totalpay_id#10080,kindpay_id#10081,kindpayname#10082,fee#10083,operator#10084,operator_name#10085,pay_time#10086L,pay#10087,charge#10088,is_valid#10089,entity_id#10090,create_time#10091L,op_time#10092L,last_ver#10093L,opuser_id#10094,card_id#10095,card_entity_id#10096,online_bill_id#10097,type#10098,code#10099,waitingpay_id#10100,load_time#10101,modify_time#10102,... 8 more fields] parquet

     Analyzed Logical Plan对应图中Resolved LogicalPlan,

    == Optimized Logical Plan ==
    GlobalLimit 10
    +- LocalLimit 10
       +- Filter (isnotnull(pay#10087) && (pay#10087 = 0.0))
          +- Relation[pay_id#10079,totalpay_id#10080,kindpay_id#10081,kindpayname#10082,fee#10083,operator#10084,operator_name#10085,pay_time#10086L,pay#10087,charge#10088,is_valid#10089,entity_id#10090,create_time#10091L,op_time#10092L,last_ver#10093L,opuser_id#10094,card_id#10095,card_entity_id#10096,online_bill_id#10097,type#10098,code#10099,waitingpay_id#10100,load_time#10101,modify_time#10102,... 8 more fields] parquet

    Optimized Logical Plan对应图中Optimized LogicalPlan, 

    == Physical Plan ==
    CollectLimit 10
    +- *(1) LocalLimit 10
       +- *(1) Project [pay_id#10079, totalpay_id#10080, kindpay_id#10081, kindpayname#10082, fee#10083, operator#10084, operator_name#10085, pay_time#10086L, pay#10087, charge#10088, is_valid#10089, entity_id#10090, create_time#10091L, op_time#10092L, last_ver#10093L, opuser_id#10094, card_id#10095, card_entity_id#10096, online_bill_id#10097, type#10098, code#10099, waitingpay_id#10100, load_time#10101, modify_time#10102, ... 8 more fields]
          +- *(1) Filter (isnotnull(pay#10087) && (pay#10087 = 0.0))
             +- *(1) FileScan parquet heguozi.payinfo[pay_id#10079,totalpay_id#10080,kindpay_id#10081,kindpayname#10082,fee#10083,operator#10084,operator_name#10085,pay_time#10086L,pay#10087,charge#10088,is_valid#10089,entity_id#10090,create_time#10091L,op_time#10092L,last_ver#10093L,opuser_id#10094,card_id#10095,card_entity_id#10096,online_bill_id#10097,type#10098,code#10099,waitingpay_id#10100,load_time#10101,modify_time#10102,... 8 more fields] Batched: true, Format: Parquet, Location: CatalogFileIndex[hdfs://cluster-cdh/user/flume/heguozi/payinfo], PartitionCount: 0, PartitionFilters: [], PushedFilters: [IsNotNull(pay), EqualTo(pay,0.0)], ReadSchema: struct<pay_id:string,totalpay_id:string,kindpay_id:string,kindpayname:string,fee:double,operator:...

    Physical Plan即为最终可执行的PhysicalPlan。

     

    在Spark执行计划分析(https://blog.csdn.net/zyzzxycj/article/details/82704713)中已经说明,Physical Plan 中的*(n)为WholeStageCodegenId,这个WholeStageCodegen又是个啥东西呢??

    (whole-stage code generation --暂时没找到有什么确切的翻译)

    它是在spark2.x中才有的一个新技术,它的作用是将spark job执行过程中的算子自动生成为可执行代码来执行,本质就是scala的反射机制,不涉及虚函数的调用,更优于spark1.x的Volcano Iterator Model (火山迭代模型)。当然,whole-stage code generation技术只是从CPU密集操作的方面进行性能调优,对IO密集操作的层面是无法提高效率的,比如Shuffle中产生的读写磁盘操作是无法通过该技术提升性能的。

     

    那么就拿刚刚的sql为例,来看一下*(1)所生成的代码吧:

    代码比较长,但是仔细看一下,几乎都是重复的操作。

    Found 1 WholeStageCodegen subtrees.
    == Subtree 1 / 1 ==
    *(1) LocalLimit 10
    +- *(1) Project [pay_id#10263, totalpay_id#10264, kindpay_id#10265, kindpayname#10266, fee#10267, operator#10268, operator_name#10269, pay_time#10270L, pay#10271, charge#10272, is_valid#10273, entity_id#10274, create_time#10275L, op_time#10276L, last_ver#10277L, opuser_id#10278, card_id#10279, card_entity_id#10280, online_bill_id#10281, type#10282, code#10283, waitingpay_id#10284, load_time#10285, modify_time#10286, ... 8 more fields]
       +- *(1) Filter (isnotnull(pay#10271) && (pay#10271 = 0.0))
          +- *(1) FileScan parquet heguozi.payinfo[pay_id#10263,totalpay_id#10264,kindpay_id#10265,kindpayname#10266,fee#10267,operator#10268,operator_name#10269,pay_time#10270L,pay#10271,charge#10272,is_valid#10273,entity_id#10274,create_time#10275L,op_time#10276L,last_ver#10277L,opuser_id#10278,card_id#10279,card_entity_id#10280,online_bill_id#10281,type#10282,code#10283,waitingpay_id#10284,load_time#10285,modify_time#10286,... 8 more fields] Batched: true, Format: Parquet, Location: CatalogFileIndex[hdfs://cluster-cdh/user/flume/heguozi/payinfo], PartitionCount: 0, PartitionFilters: [], PushedFilters: [IsNotNull(pay), EqualTo(pay,0.0)], ReadSchema: struct<pay_id:string,totalpay_id:string,kindpay_id:string,kindpayname:string,fee:double,operator:...
    
    Generated code:
    /* 001 */ public Object generate(Object[] references) {
    /* 002 */   return new GeneratedIteratorForCodegenStage1(references);
    /* 003 */ }
    /* 004 */
    /* 005 */ final class GeneratedIteratorForCodegenStage1 extends org.apache.spark.sql.execution.BufferedRowIterator {
    /* 006 */   private Object[] references;
    /* 007 */   private scala.collection.Iterator[] inputs;
    /* 008 */   private long scan_scanTime_0;
    /* 009 */   private int scan_batchIdx_0;
    /* 010 */   private boolean locallimit_stopEarly_0;
    /* 011 */   private int locallimit_count_0;
    /* 012 */   private org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder[] scan_mutableStateArray_4 = new org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder[4];
    /* 013 */   private org.apache.spark.sql.execution.vectorized.OnHeapColumnVector[] scan_mutableStateArray_2 = new org.apache.spark.sql.execution.vectorized.OnHeapColumnVector[32];
    /* 014 */   private org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter[] scan_mutableStateArray_5 = new org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter[4];
    /* 015 */   private UnsafeRow[] scan_mutableStateArray_3 = new UnsafeRow[4];
    /* 016 */   private org.apache.spark.sql.vectorized.ColumnarBatch[] scan_mutableStateArray_1 = new org.apache.spark.sql.vectorized.ColumnarBatch[1];
    /* 017 */   private scala.collection.Iterator[] scan_mutableStateArray_0 = new scala.collection.Iterator[1];
    /* 018 */
    /* 019 */   public GeneratedIteratorForCodegenStage1(Object[] references) {
    /* 020 */     this.references = references;
    /* 021 */   }
    /* 022 */
    /* 023 */   public void init(int index, scala.collection.Iterator[] inputs) {
    /* 024 */     partitionIndex = index;
    /* 025 */     this.inputs = inputs;
    /* 026 */     wholestagecodegen_init_0_0();
    /* 027 */     wholestagecodegen_init_0_1();
    /* 028 */
    /* 029 */   }
    /* 030 */
    /* 031 */   private void wholestagecodegen_init_0_1() {
    /* 032 */     scan_mutableStateArray_4[3] = new org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder(scan_mutableStateArray_3[3], 512);
    /* 033 */     scan_mutableStateArray_5[3] = new org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter(scan_mutableStateArray_4[3], 32);
    /* 034 */
    /* 035 */   }
    /* 036 */
    /* 037 */   private void scan_nextBatch_0() throws java.io.IOException {
    /* 038 */     long getBatchStart = System.nanoTime();
    /* 039 */     if (scan_mutableStateArray_0[0].hasNext()) {
    /* 040 */       scan_mutableStateArray_1[0] = (org.apache.spark.sql.vectorized.ColumnarBatch)scan_mutableStateArray_0[0].next();
    /* 041 */       ((org.apache.spark.sql.execution.metric.SQLMetric) references[0] /* numOutputRows */).add(scan_mutableStateArray_1[0].numRows());
    /* 042 */       scan_batchIdx_0 = 0;
    /* 043 */       scan_mutableStateArray_2[0] = (org.apache.spark.sql.execution.vectorized.OnHeapColumnVector) scan_mutableStateArray_1[0].column(0);
    /* 044 */       scan_mutableStateArray_2[1] = (org.apache.spark.sql.execution.vectorized.OnHeapColumnVector) scan_mutableStateArray_1[0].column(1);
    /* 045 */       scan_mutableStateArray_2[2] = (org.apache.spark.sql.execution.vectorized.OnHeapColumnVector) scan_mutableStateArray_1[0].column(2);
    /* 046 */       scan_mutableStateArray_2[3] = (org.apache.spark.sql.execution.vectorized.OnHeapColumnVector) scan_mutableStateArray_1[0].column(3);
    /* 047 */       scan_mutableStateArray_2[4] = (org.apache.spark.sql.execution.vectorized.OnHeapColumnVector) scan_mutableStateArray_1[0].column(4);
    /* 048 */       scan_mutableStateArray_2[5] = (org.apache.spark.sql.execution.vectorized.OnHeapColumnVector) scan_mutableStateArray_1[0].column(5);
    /* 049 */       scan_mutableStateArray_2[6] = (org.apache.spark.sql.execution.vectorized.OnHeapColumnVector) scan_mutableStateArray_1[0].column(6);
    /* 050 */       scan_mutableStateArray_2[7] = (org.apache.spark.sql.execution.vectorized.OnHeapColumnVector) scan_mutableStateArray_1[0].column(7);
    /* 051 */       scan_mutableStateArray_2[8] = (org.apache.spark.sql.execution.vectorized.OnHeapColumnVector) scan_mutableStateArray_1[0].column(8);
    /* 052 */       scan_mutableStateArray_2[9] = (org.apache.spark.sql.execution.vectorized.OnHeapColumnVector) scan_mutableStateArray_1[0].column(9);
    /* 053 */       scan_mutableStateArray_2[10] = (org.apache.spark.sql.execution.vectorized.OnHeapColumnVector) scan_mutableStateArray_1[0].column(10);
    /* 054 */       scan_mutableStateArray_2[11] = (org.apache.spark.sql.execution.vectorized.OnHeapColumnVector) scan_mutableStateArray_1[0].column(11);
    /* 055 */       scan_mutableStateArray_2[12] = (org.apache.spark.sql.execution.vectorized.OnHeapColumnVector) scan_mutableStateArray_1[0].column(12);
    /* 056 */       scan_mutableStateArray_2[13] = (org.apache.spark.sql.execution.vectorized.OnHeapColumnVector) scan_mutableStateArray_1[0].column(13);
    /* 057 */       scan_mutableStateArray_2[14] = (org.apache.spark.sql.execution.vectorized.OnHeapColumnVector) scan_mutableStateArray_1[0].column(14);
    /* 058 */       scan_mutableStateArray_2[15] = (org.apache.spark.sql.execution.vectorized.OnHeapColumnVector) scan_mutableStateArray_1[0].column(15);
    /* 059 */       scan_mutableStateArray_2[16] = (org.apache.spark.sql.execution.vectorized.OnHeapColumnVector) scan_mutableStateArray_1[0].column(16);
    /* 060 */       scan_mutableStateArray_2[17] = (org.apache.spark.sql.execution.vectorized.OnHeapColumnVector) scan_mutableStateArray_1[0].column(17);
    /* 061 */       scan_mutableStateArray_2[18] = (org.apache.spark.sql.execution.vectorized.OnHeapColumnVector) scan_mutableStateArray_1[0].column(18);
    /* 062 */       scan_mutableStateArray_2[19] = (org.apache.spark.sql.execution.vectorized.OnHeapColumnVector) scan_mutableStateArray_1[0].column(19);
    /* 063 */       scan_mutableStateArray_2[20] = (org.apache.spark.sql.execution.vectorized.OnHeapColumnVector) scan_mutableStateArray_1[0].column(20);
    /* 064 */       scan_mutableStateArray_2[21] = (org.apache.spark.sql.execution.vectorized.OnHeapColumnVector) scan_mutableStateArray_1[0].column(21);
    /* 065 */       scan_mutableStateArray_2[22] = (org.apache.spark.sql.execution.vectorized.OnHeapColumnVector) scan_mutableStateArray_1[0].column(22);
    /* 066 */       scan_mutableStateArray_2[23] = (org.apache.spark.sql.execution.vectorized.OnHeapColumnVector) scan_mutableStateArray_1[0].column(23);
    /* 067 */       scan_mutableStateArray_2[24] = (org.apache.spark.sql.execution.vectorized.OnHeapColumnVector) scan_mutableStateArray_1[0].column(24);
    /* 068 */       scan_mutableStateArray_2[25] = (org.apache.spark.sql.execution.vectorized.OnHeapColumnVector) scan_mutableStateArray_1[0].column(25);
    /* 069 */       scan_mutableStateArray_2[26] = (org.apache.spark.sql.execution.vectorized.OnHeapColumnVector) scan_mutableStateArray_1[0].column(26);
    /* 070 */       scan_mutableStateArray_2[27] = (org.apache.spark.sql.execution.vectorized.OnHeapColumnVector) scan_mutableStateArray_1[0].column(27);
    /* 071 */       scan_mutableStateArray_2[28] = (org.apache.spark.sql.execution.vectorized.OnHeapColumnVector) scan_mutableStateArray_1[0].column(28);
    /* 072 */       scan_mutableStateArray_2[29] = (org.apache.spark.sql.execution.vectorized.OnHeapColumnVector) scan_mutableStateArray_1[0].column(29);
    /* 073 */       scan_mutableStateArray_2[30] = (org.apache.spark.sql.execution.vectorized.OnHeapColumnVector) scan_mutableStateArray_1[0].column(30);
    /* 074 */       scan_mutableStateArray_2[31] = (org.apache.spark.sql.execution.vectorized.OnHeapColumnVector) scan_mutableStateArray_1[0].column(31);
    /* 075 */
    /* 076 */     }
    /* 077 */     scan_scanTime_0 += System.nanoTime() - getBatchStart;
    /* 078 */   }
    /* 079 */
    /* 080 */   protected void processNext() throws java.io.IOException {
    /* 081 */     if (scan_mutableStateArray_1[0] == null) {
    /* 082 */       scan_nextBatch_0();
    /* 083 */     }
    /* 084 */     while (scan_mutableStateArray_1[0] != null) {
    /* 085 */       int scan_numRows_0 = scan_mutableStateArray_1[0].numRows();
    /* 086 */       int scan_localEnd_0 = scan_numRows_0 - scan_batchIdx_0;
    /* 087 */       for (int scan_localIdx_0 = 0; scan_localIdx_0 < scan_localEnd_0; scan_localIdx_0++) {
    /* 088 */         int scan_rowIdx_0 = scan_batchIdx_0 + scan_localIdx_0;
    /* 089 */         do {
    /* 090 */           boolean scan_isNull_8 = scan_mutableStateArray_2[8].isNullAt(scan_rowIdx_0);
    /* 091 */           double scan_value_8 = scan_isNull_8 ? -1.0 : (scan_mutableStateArray_2[8].getDouble(scan_rowIdx_0));
    /* 092 */
    /* 093 */           if (!(!(scan_isNull_8))) continue;
    /* 094 */
    /* 095 */           boolean filter_isNull_2 = false;
    /* 096 */
    /* 097 */           boolean filter_value_2 = false;
    /* 098 */           filter_value_2 = ((java.lang.Double.isNaN(scan_value_8) && java.lang.Double.isNaN(0.0D)) || scan_value_8 == 0.0D);
    /* 099 */           if (!filter_value_2) continue;
    /* 100 */
    /* 101 */           ((org.apache.spark.sql.execution.metric.SQLMetric) references[2] /* numOutputRows */).add(1);
    /* 102 */
    /* 103 */           if (locallimit_count_0 < 10) {
    /* 104 */             locallimit_count_0 += 1;
    /* 105 */
    /* 106 */             boolean scan_isNull_0 = scan_mutableStateArray_2[0].isNullAt(scan_rowIdx_0);
    /* 107 */             UTF8String scan_value_0 = scan_isNull_0 ? null : (scan_mutableStateArray_2[0].getUTF8String(scan_rowIdx_0));
    /* 108 */             boolean scan_isNull_1 = scan_mutableStateArray_2[1].isNullAt(scan_rowIdx_0);
    /* 109 */             UTF8String scan_value_1 = scan_isNull_1 ? null : (scan_mutableStateArray_2[1].getUTF8String(scan_rowIdx_0));
    /* 110 */             boolean scan_isNull_2 = scan_mutableStateArray_2[2].isNullAt(scan_rowIdx_0);
    /* 111 */             UTF8String scan_value_2 = scan_isNull_2 ? null : (scan_mutableStateArray_2[2].getUTF8String(scan_rowIdx_0));
    /* 112 */             boolean scan_isNull_3 = scan_mutableStateArray_2[3].isNullAt(scan_rowIdx_0);
    /* 113 */             UTF8String scan_value_3 = scan_isNull_3 ? null : (scan_mutableStateArray_2[3].getUTF8String(scan_rowIdx_0));
    /* 114 */             boolean scan_isNull_4 = scan_mutableStateArray_2[4].isNullAt(scan_rowIdx_0);
    /* 115 */             double scan_value_4 = scan_isNull_4 ? -1.0 : (scan_mutableStateArray_2[4].getDouble(scan_rowIdx_0));
    /* 116 */             boolean scan_isNull_5 = scan_mutableStateArray_2[5].isNullAt(scan_rowIdx_0);
    /* 117 */             UTF8String scan_value_5 = scan_isNull_5 ? null : (scan_mutableStateArray_2[5].getUTF8String(scan_rowIdx_0));
    /* 118 */             boolean scan_isNull_6 = scan_mutableStateArray_2[6].isNullAt(scan_rowIdx_0);
    /* 119 */             UTF8String scan_value_6 = scan_isNull_6 ? null : (scan_mutableStateArray_2[6].getUTF8String(scan_rowIdx_0));
    /* 120 */             boolean scan_isNull_7 = scan_mutableStateArray_2[7].isNullAt(scan_rowIdx_0);
    /* 121 */             long scan_value_7 = scan_isNull_7 ? -1L : (scan_mutableStateArray_2[7].getLong(scan_rowIdx_0));
    /* 122 */             boolean scan_isNull_9 = scan_mutableStateArray_2[9].isNullAt(scan_rowIdx_0);
    /* 123 */             double scan_value_9 = scan_isNull_9 ? -1.0 : (scan_mutableStateArray_2[9].getDouble(scan_rowIdx_0));
    /* 124 */             boolean scan_isNull_10 = scan_mutableStateArray_2[10].isNullAt(scan_rowIdx_0);
    /* 125 */             int scan_value_10 = scan_isNull_10 ? -1 : (scan_mutableStateArray_2[10].getInt(scan_rowIdx_0));
    /* 126 */             boolean scan_isNull_11 = scan_mutableStateArray_2[11].isNullAt(scan_rowIdx_0);
    /* 127 */             UTF8String scan_value_11 = scan_isNull_11 ? null : (scan_mutableStateArray_2[11].getUTF8String(scan_rowIdx_0));
    /* 128 */             boolean scan_isNull_12 = scan_mutableStateArray_2[12].isNullAt(scan_rowIdx_0);
    /* 129 */             long scan_value_12 = scan_isNull_12 ? -1L : (scan_mutableStateArray_2[12].getLong(scan_rowIdx_0));
    /* 130 */             boolean scan_isNull_13 = scan_mutableStateArray_2[13].isNullAt(scan_rowIdx_0);
    /* 131 */             long scan_value_13 = scan_isNull_13 ? -1L : (scan_mutableStateArray_2[13].getLong(scan_rowIdx_0));
    /* 132 */             boolean scan_isNull_14 = scan_mutableStateArray_2[14].isNullAt(scan_rowIdx_0);
    /* 133 */             long scan_value_14 = scan_isNull_14 ? -1L : (scan_mutableStateArray_2[14].getLong(scan_rowIdx_0));
    /* 134 */             boolean scan_isNull_15 = scan_mutableStateArray_2[15].isNullAt(scan_rowIdx_0);
    /* 135 */             UTF8String scan_value_15 = scan_isNull_15 ? null : (scan_mutableStateArray_2[15].getUTF8String(scan_rowIdx_0));
    /* 136 */             boolean scan_isNull_16 = scan_mutableStateArray_2[16].isNullAt(scan_rowIdx_0);
    /* 137 */             UTF8String scan_value_16 = scan_isNull_16 ? null : (scan_mutableStateArray_2[16].getUTF8String(scan_rowIdx_0));
    /* 138 */             boolean scan_isNull_17 = scan_mutableStateArray_2[17].isNullAt(scan_rowIdx_0);
    /* 139 */             UTF8String scan_value_17 = scan_isNull_17 ? null : (scan_mutableStateArray_2[17].getUTF8String(scan_rowIdx_0));
    /* 140 */             boolean scan_isNull_18 = scan_mutableStateArray_2[18].isNullAt(scan_rowIdx_0);
    /* 141 */             UTF8String scan_value_18 = scan_isNull_18 ? null : (scan_mutableStateArray_2[18].getUTF8String(scan_rowIdx_0));
    /* 142 */             boolean scan_isNull_19 = scan_mutableStateArray_2[19].isNullAt(scan_rowIdx_0);
    /* 143 */             int scan_value_19 = scan_isNull_19 ? -1 : (scan_mutableStateArray_2[19].getInt(scan_rowIdx_0));
    /* 144 */             boolean scan_isNull_20 = scan_mutableStateArray_2[20].isNullAt(scan_rowIdx_0);
    /* 145 */             UTF8String scan_value_20 = scan_isNull_20 ? null : (scan_mutableStateArray_2[20].getUTF8String(scan_rowIdx_0));
    /* 146 */             boolean scan_isNull_21 = scan_mutableStateArray_2[21].isNullAt(scan_rowIdx_0);
    /* 147 */             UTF8String scan_value_21 = scan_isNull_21 ? null : (scan_mutableStateArray_2[21].getUTF8String(scan_rowIdx_0));
    /* 148 */             boolean scan_isNull_22 = scan_mutableStateArray_2[22].isNullAt(scan_rowIdx_0);
    /* 149 */             int scan_value_22 = scan_isNull_22 ? -1 : (scan_mutableStateArray_2[22].getInt(scan_rowIdx_0));
    /* 150 */             boolean scan_isNull_23 = scan_mutableStateArray_2[23].isNullAt(scan_rowIdx_0);
    /* 151 */             int scan_value_23 = scan_isNull_23 ? -1 : (scan_mutableStateArray_2[23].getInt(scan_rowIdx_0));
    /* 152 */             boolean scan_isNull_24 = scan_mutableStateArray_2[24].isNullAt(scan_rowIdx_0);
    /* 153 */             byte scan_value_24 = scan_isNull_24 ? (byte)-1 : (scan_mutableStateArray_2[24].getByte(scan_rowIdx_0));
    /* 154 */             boolean scan_isNull_25 = scan_mutableStateArray_2[25].isNullAt(scan_rowIdx_0);
    /* 155 */             UTF8String scan_value_25 = scan_isNull_25 ? null : (scan_mutableStateArray_2[25].getUTF8String(scan_rowIdx_0));
    /* 156 */             boolean scan_isNull_26 = scan_mutableStateArray_2[26].isNullAt(scan_rowIdx_0);
    /* 157 */             double scan_value_26 = scan_isNull_26 ? -1.0 : (scan_mutableStateArray_2[26].getDouble(scan_rowIdx_0));
    /* 158 */             boolean scan_isNull_27 = scan_mutableStateArray_2[27].isNullAt(scan_rowIdx_0);
    /* 159 */             double scan_value_27 = scan_isNull_27 ? -1.0 : (scan_mutableStateArray_2[27].getDouble(scan_rowIdx_0));
    /* 160 */             boolean scan_isNull_28 = scan_mutableStateArray_2[28].isNullAt(scan_rowIdx_0);
    /* 161 */             int scan_value_28 = scan_isNull_28 ? -1 : (scan_mutableStateArray_2[28].getInt(scan_rowIdx_0));
    /* 162 */             boolean scan_isNull_29 = scan_mutableStateArray_2[29].isNullAt(scan_rowIdx_0);
    /* 163 */             UTF8String scan_value_29 = scan_isNull_29 ? null : (scan_mutableStateArray_2[29].getUTF8String(scan_rowIdx_0));
    /* 164 */             boolean scan_isNull_30 = scan_mutableStateArray_2[30].isNullAt(scan_rowIdx_0);
    /* 165 */             int scan_value_30 = scan_isNull_30 ? -1 : (scan_mutableStateArray_2[30].getInt(scan_rowIdx_0));
    /* 166 */             boolean scan_isNull_31 = scan_mutableStateArray_2[31].isNullAt(scan_rowIdx_0);
    /* 167 */             UTF8String scan_value_31 = scan_isNull_31 ? null : (scan_mutableStateArray_2[31].getUTF8String(scan_rowIdx_0));
    /* 168 */             scan_mutableStateArray_4[3].reset();
    /* 169 */
    /* 170 */             scan_mutableStateArray_5[3].zeroOutNullBytes();
    /* 171 */
    /* 172 */             if (scan_isNull_0) {
    /* 173 */               scan_mutableStateArray_5[3].setNullAt(0);
    /* 174 */             } else {
    /* 175 */               scan_mutableStateArray_5[3].write(0, scan_value_0);
    /* 176 */             }
    /* 177 */
    /* 178 */             if (scan_isNull_1) {
    /* 179 */               scan_mutableStateArray_5[3].setNullAt(1);
    /* 180 */             } else {
    /* 181 */               scan_mutableStateArray_5[3].write(1, scan_value_1);
    /* 182 */             }
    /* 183 */
    /* 184 */             if (scan_isNull_2) {
    /* 185 */               scan_mutableStateArray_5[3].setNullAt(2);
    /* 186 */             } else {
    /* 187 */               scan_mutableStateArray_5[3].write(2, scan_value_2);
    /* 188 */             }
    /* 189 */
    /* 190 */             if (scan_isNull_3) {
    /* 191 */               scan_mutableStateArray_5[3].setNullAt(3);
    /* 192 */             } else {
    /* 193 */               scan_mutableStateArray_5[3].write(3, scan_value_3);
    /* 194 */             }
    /* 195 */
    /* 196 */             if (scan_isNull_4) {
    /* 197 */               scan_mutableStateArray_5[3].setNullAt(4);
    /* 198 */             } else {
    /* 199 */               scan_mutableStateArray_5[3].write(4, scan_value_4);
    /* 200 */             }
    /* 201 */
    /* 202 */             if (scan_isNull_5) {
    /* 203 */               scan_mutableStateArray_5[3].setNullAt(5);
    /* 204 */             } else {
    /* 205 */               scan_mutableStateArray_5[3].write(5, scan_value_5);
    /* 206 */             }
    /* 207 */
    /* 208 */             if (scan_isNull_6) {
    /* 209 */               scan_mutableStateArray_5[3].setNullAt(6);
    /* 210 */             } else {
    /* 211 */               scan_mutableStateArray_5[3].write(6, scan_value_6);
    /* 212 */             }
    /* 213 */
    /* 214 */             if (scan_isNull_7) {
    /* 215 */               scan_mutableStateArray_5[3].setNullAt(7);
    /* 216 */             } else {
    /* 217 */               scan_mutableStateArray_5[3].write(7, scan_value_7);
    /* 218 */             }
    /* 219 */
    /* 220 */             scan_mutableStateArray_5[3].write(8, scan_value_8);
    /* 221 */
    /* 222 */             if (scan_isNull_9) {
    /* 223 */               scan_mutableStateArray_5[3].setNullAt(9);
    /* 224 */             } else {
    /* 225 */               scan_mutableStateArray_5[3].write(9, scan_value_9);
    /* 226 */             }
    /* 227 */
    /* 228 */             if (scan_isNull_10) {
    /* 229 */               scan_mutableStateArray_5[3].setNullAt(10);
    /* 230 */             } else {
    /* 231 */               scan_mutableStateArray_5[3].write(10, scan_value_10);
    /* 232 */             }
    /* 233 */
    /* 234 */             if (scan_isNull_11) {
    /* 235 */               scan_mutableStateArray_5[3].setNullAt(11);
    /* 236 */             } else {
    /* 237 */               scan_mutableStateArray_5[3].write(11, scan_value_11);
    /* 238 */             }
    /* 239 */
    /* 240 */             if (scan_isNull_12) {
    /* 241 */               scan_mutableStateArray_5[3].setNullAt(12);
    /* 242 */             } else {
    /* 243 */               scan_mutableStateArray_5[3].write(12, scan_value_12);
    /* 244 */             }
    /* 245 */
    /* 246 */             if (scan_isNull_13) {
    /* 247 */               scan_mutableStateArray_5[3].setNullAt(13);
    /* 248 */             } else {
    /* 249 */               scan_mutableStateArray_5[3].write(13, scan_value_13);
    /* 250 */             }
    /* 251 */
    /* 252 */             if (scan_isNull_14) {
    /* 253 */               scan_mutableStateArray_5[3].setNullAt(14);
    /* 254 */             } else {
    /* 255 */               scan_mutableStateArray_5[3].write(14, scan_value_14);
    /* 256 */             }
    /* 257 */
    /* 258 */             if (scan_isNull_15) {
    /* 259 */               scan_mutableStateArray_5[3].setNullAt(15);
    /* 260 */             } else {
    /* 261 */               scan_mutableStateArray_5[3].write(15, scan_value_15);
    /* 262 */             }
    /* 263 */
    /* 264 */             if (scan_isNull_16) {
    /* 265 */               scan_mutableStateArray_5[3].setNullAt(16);
    /* 266 */             } else {
    /* 267 */               scan_mutableStateArray_5[3].write(16, scan_value_16);
    /* 268 */             }
    /* 269 */
    /* 270 */             if (scan_isNull_17) {
    /* 271 */               scan_mutableStateArray_5[3].setNullAt(17);
    /* 272 */             } else {
    /* 273 */               scan_mutableStateArray_5[3].write(17, scan_value_17);
    /* 274 */             }
    /* 275 */
    /* 276 */             if (scan_isNull_18) {
    /* 277 */               scan_mutableStateArray_5[3].setNullAt(18);
    /* 278 */             } else {
    /* 279 */               scan_mutableStateArray_5[3].write(18, scan_value_18);
    /* 280 */             }
    /* 281 */
    /* 282 */             if (scan_isNull_19) {
    /* 283 */               scan_mutableStateArray_5[3].setNullAt(19);
    /* 284 */             } else {
    /* 285 */               scan_mutableStateArray_5[3].write(19, scan_value_19);
    /* 286 */             }
    /* 287 */
    /* 288 */             if (scan_isNull_20) {
    /* 289 */               scan_mutableStateArray_5[3].setNullAt(20);
    /* 290 */             } else {
    /* 291 */               scan_mutableStateArray_5[3].write(20, scan_value_20);
    /* 292 */             }
    /* 293 */
    /* 294 */             if (scan_isNull_21) {
    /* 295 */               scan_mutableStateArray_5[3].setNullAt(21);
    /* 296 */             } else {
    /* 297 */               scan_mutableStateArray_5[3].write(21, scan_value_21);
    /* 298 */             }
    /* 299 */
    /* 300 */             if (scan_isNull_22) {
    /* 301 */               scan_mutableStateArray_5[3].setNullAt(22);
    /* 302 */             } else {
    /* 303 */               scan_mutableStateArray_5[3].write(22, scan_value_22);
    /* 304 */             }
    /* 305 */
    /* 306 */             if (scan_isNull_23) {
    /* 307 */               scan_mutableStateArray_5[3].setNullAt(23);
    /* 308 */             } else {
    /* 309 */               scan_mutableStateArray_5[3].write(23, scan_value_23);
    /* 310 */             }
    /* 311 */
    /* 312 */             if (scan_isNull_24) {
    /* 313 */               scan_mutableStateArray_5[3].setNullAt(24);
    /* 314 */             } else {
    /* 315 */               scan_mutableStateArray_5[3].write(24, scan_value_24);
    /* 316 */             }
    /* 317 */
    /* 318 */             if (scan_isNull_25) {
    /* 319 */               scan_mutableStateArray_5[3].setNullAt(25);
    /* 320 */             } else {
    /* 321 */               scan_mutableStateArray_5[3].write(25, scan_value_25);
    /* 322 */             }
    /* 323 */
    /* 324 */             if (scan_isNull_26) {
    /* 325 */               scan_mutableStateArray_5[3].setNullAt(26);
    /* 326 */             } else {
    /* 327 */               scan_mutableStateArray_5[3].write(26, scan_value_26);
    /* 328 */             }
    /* 329 */
    /* 330 */             if (scan_isNull_27) {
    /* 331 */               scan_mutableStateArray_5[3].setNullAt(27);
    /* 332 */             } else {
    /* 333 */               scan_mutableStateArray_5[3].write(27, scan_value_27);
    /* 334 */             }
    /* 335 */
    /* 336 */             if (scan_isNull_28) {
    /* 337 */               scan_mutableStateArray_5[3].setNullAt(28);
    /* 338 */             } else {
    /* 339 */               scan_mutableStateArray_5[3].write(28, scan_value_28);
    /* 340 */             }
    /* 341 */
    /* 342 */             if (scan_isNull_29) {
    /* 343 */               scan_mutableStateArray_5[3].setNullAt(29);
    /* 344 */             } else {
    /* 345 */               scan_mutableStateArray_5[3].write(29, scan_value_29);
    /* 346 */             }
    /* 347 */
    /* 348 */             if (scan_isNull_30) {
    /* 349 */               scan_mutableStateArray_5[3].setNullAt(30);
    /* 350 */             } else {
    /* 351 */               scan_mutableStateArray_5[3].write(30, scan_value_30);
    /* 352 */             }
    /* 353 */
    /* 354 */             if (scan_isNull_31) {
    /* 355 */               scan_mutableStateArray_5[3].setNullAt(31);
    /* 356 */             } else {
    /* 357 */               scan_mutableStateArray_5[3].write(31, scan_value_31);
    /* 358 */             }
    /* 359 */             scan_mutableStateArray_3[3].setTotalSize(scan_mutableStateArray_4[3].totalSize());
    /* 360 */             append(scan_mutableStateArray_3[3]);
    /* 361 */
    /* 362 */           } else {
    /* 363 */             locallimit_stopEarly_0 = true;
    /* 364 */           }
    /* 365 */
    /* 366 */         } while(false);
    /* 367 */         if (shouldStop()) { scan_batchIdx_0 = scan_rowIdx_0 + 1; return; }
    /* 368 */       }
    /* 369 */       scan_batchIdx_0 = scan_numRows_0;
    /* 370 */       scan_mutableStateArray_1[0] = null;
    /* 371 */       scan_nextBatch_0();
    /* 372 */     }
    /* 373 */     ((org.apache.spark.sql.execution.metric.SQLMetric) references[1] /* scanTime */).add(scan_scanTime_0 / (1000 * 1000));
    /* 374 */     scan_scanTime_0 = 0;
    /* 375 */   }
    /* 376 */
    /* 377 */   @Override
    /* 378 */   protected boolean stopEarly() {
    /* 379 */     return locallimit_stopEarly_0;
    /* 380 */   }
    /* 381 */
    /* 382 */   private void wholestagecodegen_init_0_0() {
    /* 383 */     scan_mutableStateArray_0[0] = inputs[0];
    /* 384 */
    /* 385 */     scan_mutableStateArray_3[0] = new UnsafeRow(32);
    /* 386 */     scan_mutableStateArray_4[0] = new org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder(scan_mutableStateArray_3[0], 512);
    /* 387 */     scan_mutableStateArray_5[0] = new org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter(scan_mutableStateArray_4[0], 32);
    /* 388 */     scan_mutableStateArray_3[1] = new UnsafeRow(32);
    /* 389 */     scan_mutableStateArray_4[1] = new org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder(scan_mutableStateArray_3[1], 512);
    /* 390 */     scan_mutableStateArray_5[1] = new org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter(scan_mutableStateArray_4[1], 32);
    /* 391 */     scan_mutableStateArray_3[2] = new UnsafeRow(32);
    /* 392 */     scan_mutableStateArray_4[2] = new org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder(scan_mutableStateArray_3[2], 512);
    /* 393 */     scan_mutableStateArray_5[2] = new org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter(scan_mutableStateArray_4[2], 32);
    /* 394 */     scan_mutableStateArray_3[3] = new UnsafeRow(32);
    /* 395 */
    /* 396 */   }
    /* 397 */
    /* 398 */ }

    关于虚调用为什么耗时的原因如下:

    以具体的SQL语句 select a+b fromtable 为例进行说明,下面是它的解析过程: 
        1.调用虚函数Add.eval(),需确认Add两边数据类型 
        2.调用虚函数a.eval(),需要确认a的数据类型 
        3.确认a的数据类型是int,装箱 
        4.调用虚函数b.eval(),需确认b的数据类型 
        5.确认b的数据类型是int,装箱 
        6.调用int类型的add 
        7.返回装箱后的计算结果 
        从上面的步骤可以看出,一条SQL语句的解析需要进行多次虚函数的调用。我们知道,虚函数的调用会极大的降低效率。那么,虚函数的调用为什么会影响效率呢? 
        有人答案是:虚函数调用会进行一次间接寻址过程。事实上这一步间接寻址真的会显著降低运行效率?显然不是。 
        流水线的打断才是真正降低效率的原因。 
        我们知道,虚函数的调用时是运行时多态,意思就是在编译期你是无法知道虚函数的具体调用。设想一下,如果说不是虚函数,那么在编译时期,其相对地址是确定的,编译器可以直接生成jmp/invoke指令; 如果是虚函数,多出来的一次查找vtable所带来的开销,倒是次要的,关键在于,这个函数地址是动态的,譬如 取到的地址在eax里,则在call eax之后的那些已经被预取进入流水线的所有指令都将失效。流水线越长,一次分支预测失败的代价也就越大,如下所示: 
       pf->test 
        001E146D mov eax,dword ptr[pf] 
        011E1470 mov edx,dword,ptr[eax] 
        011E1472 mov esi,esp 
        011E1474 mov ecx,dword ptr[pf] 
        011E1477 mov eax,dword ptr[edx] 
        011E1479 eax <-----------------------分支预测失败 
        011E147B cmp esi esp 
        011E147D @ILT+355(__RTC_CheckEsp)(11E1168h) 

     

    展开全文
  • Spark运行流程

    2018-03-27 13:02:58
    运行流程:1.开发好代码以后打成jar包提交到集群上面去运行。使用的是Spark-submit脚本提交的。首先运行的就是new SparkContext()2.在SparkContext进行初始化的时候,完成了两个重要的事情:创建了DAGScheduler、...

    运行流程:

    1.开发好代码以后打成jar包提交到集群上面去运行。使用的是Spark-submit脚本提交的。首先运行的就是new SparkContext()
    2.在SparkContext进行初始化的时候,完成了两个重要的事情:创建了DAGScheduler、TestScheduler
    3.TaskScheduler去向master进行注册,并进行资源申请。
    4.Executor启动完成以后,去向Driver应用服务进行注册
    5.遇到一个action以后,代码才会真正的执行。DAGScheduler会根绝Stage划分算法,划分stage
    6.DAGScheduler把TaskSet发送给TaskScheduler
    7.TaskScheduler向Executor发送task
    8.代码真正执行

    展开全文
  • spark运行基本流程

    千次阅读 2019-03-10 15:58:51
    1.构建Spark Application的运行环境(启动SparkContext),SparkContext向资源管理器(可以是Standalone、Mesos或YARN)注册并申请运行Executor资源; 2.资源管理器分配Executor资源并启动StandaloneExecutorBackend...
  • spark运行流程

    万次阅读 2018-08-19 11:18:19
    Spark运行流程 看任何东西都是(知其然,再知其所以然), 我会先简单的介绍流程, 让初学者有个大概的概念, 其中有很多名称,我会在下面对其做出介绍, 当 jar 在客户端进行spark-submit的时候spark流程就开始了,先...
  • Spark运行流程步骤

    2018-11-26 11:11:34
    初识Spark的朋友 ,下面是我汇总后的Spark总结和自己整合的一张流程图,希望可以帮助到你,也可以给我提出建议,一同进步1.认知Spark:Apache Spark是一个围绕速度、易用性和复杂分析构建的大数据处理框架,最初在...
  • Spark总体架构和运行流程

    千次阅读 2019-06-29 17:45:05
    本节将首先介绍 Spark运行架构和基本术语,然后介绍 Spark 运行的基本流程,最后介绍 RDD 的核心理念和运行原理。 Spark 总体架构 Spark 运行架构如图 1 所示,包括集群资源管理器(Cluster Manager)、多个...
  • Spark 执行流程

    2019-07-22 17:00:11
    Spark 执行流程 解析 1、我们提交Spark程序通过 spark-submit (shell)提交到Spark集群中我们提交程序后会创建一个Driver进程 2、Driver构造SparkConf 初始化SparkContext ,SparkContext 构建 DAGScheduler和Task...
  • Spark任务执行流程

    千次阅读 2018-09-27 11:34:34
    这是Spark官方给的图,大致意思就是: 四个步骤 1.构建DAG(有向无环图)(调用RDD上的方法) 2.DAGScheduler将DAG切分Stage(切分的依据是Shuffle),将Stage中生成的Task以TaskSet的形式给TaskScheduler 3.Task...
  • Spark执行流程

    千次阅读 2019-05-16 23:29:00
    四个步骤: 1.构建DAG(调用RDD上的算子) 2.DAGScheduler将DAG切分Stage...3.TaskScheduler调度Task(根据资源情况将Task调度到相应的Executor中执行) 4.Executor接收Task,然后将Task丢入到线程池中执行 ...
  • spark执行流程

    2019-07-16 18:46:00
    Spark运行流程 看任何东西都是(知其然,再知其所以然), 我会先简单的介绍流程, 让初学者有个大概的概念, 其中有...
  • Spark执行流程

    2018-08-15 17:11:45
    Spark最全面执行流程,废话不多说,直接上图! 如果有错误或者不足,希望大家指出来,谢谢!
  • 1.简述Spark运行流程: 1.构建Spark Application的运行环境,启动SparkContext 2. SparkContext向资源管理器(可以是Standalone, Mesos, Yarm)申请运行Executor资源, 并启动 StandaloneExecutorbackend 3. ...
  • Spark与MapReduce的区别及Spark运行流程

    千次阅读 2018-12-15 00:01:55
    一、Spark与MapReduce的区别 MapReduce简介: MapReduce是hadoop中的一个计算框架,具体核心是将编程抽象为map和reduce两个方法,程序员只需要编写map和reduce两个方法的具体代码就可以完成一个分布式计算操作,...
  • spark任务执行流程

    2019-10-31 21:11:53
  • 当使用spark-submit提交一个作业之后,这个作业就会启动一个对应的driver进程。     根据你使用的部署模式(deploy-mode)不同,driver进程可能在本地启动,也可能在集群中某个工作节点上启动。     ...
  • 本系列主要描述Spark Streaming的运行流程,然后对每个流程的源码分别进行解析 之前总听同事说Spark源码有多么棒,咱也不知道,就是疯狂点头。今天也来撸一下Spark源码。 对Spark的使用也就是Spark Streaming使用的...
  • 应用程序的执行流程为: 写好的应用程序,打包成jar文件。然后通过客户端上传到集群。根据Driver的配置模式,要么运行在客户端,要么由master指定worker启动driver进程,并对整个应用程序进行监控和管理。接着,...
  • 结合官网以及两篇高质量博客学习Spark性能调优,摘要几点加深理解原文:Spark性能优化指南——基础篇Spark性能优化指南——高级篇官方文档了解资源调优,需要首先知道spark运行流程,要了解运行流程需要了解一些...
  • Spark执行流程 Spark带注释源码 对于整个Spark源码分析系列,我将带有注释的Spark源码和分析的文件放在我的GitHub上Spark源码剖析欢迎大家fork和star 过程描述: 1.通过Shell脚本启动Master,Master类继承Actor...

空空如也

空空如也

1 2 3 4 5 ... 20
收藏数 37,704
精华内容 15,081
关键字:

spark运行流程