精华内容
下载资源
问答
  • java Tuple2 maptopair lambda

    2020-08-02 13:45:03
    import org.apache.spark.api.java.JavaPairRDD; import org.apache.spark.api.java.JavaSparkContext; import org.apache.spark.api.java.JavaRDD; import org.apache.spark.SparkConf;...import scala.Tuple2...

     

     

    import org.apache.spark.api.java.JavaPairRDD;
    import org.apache.spark.api.java.JavaSparkContext;
    import org.apache.spark.api.java.JavaRDD;
    import org.apache.spark.SparkConf;
    import org.apache.spark.api.java.function.PairFunction;
    import scala.Tuple2;
    
    import java.util.Arrays;
    import java.util.List;
    import java.util.Random;
    
    public class Test1 {
        public static void main(String[] args)  {
            // (x,y) 元组第一个元素加一个随机的数字前缀
    
            SparkConf conf = new SparkConf().setAppName("appName").setMaster("local[*]");
            JavaSparkContext sc = new JavaSparkContext(conf);
    
            List<Tuple2> data = Arrays.asList(new Tuple2(111L, 222L), new Tuple2(100L, 200L));
            JavaRDD<Tuple2> rdd = sc.parallelize(data);
    
    
    //        JavaPairRDD<String, Long> randomPrefixRdd = rdd.mapToPair
    //                (
    //                        new PairFunction<Tuple2<Long,Long>, String, Long>()
    //                            // input <long,long>
    //                            // output typle2(string,long)
    //                        {
    //                            private static final long serialVersionUID = 1L;
    //                            @Override
    //                            public Tuple2<String, Long> call(Tuple2<Long, Long> tuple)throws Exception
    //                            {
    //                                Random random = new Random();
    //                                int prefix = random.nextInt(10);
    //                                return new Tuple2<String, Long>(prefix + "_" + tuple._1, tuple._2);
    //                            }
    //                        }
    //                );
    
            JavaPairRDD randomPrefixRdd = rdd.mapToPair( x -> new Tuple2<>(new Random().nextInt(10) + "_" + String.valueOf(x._1), x._2));
    
            System.out.println(randomPrefixRdd.collect());
    
    
    
        }
    
    }

     

     

     

    展开全文
  • Tuple2 报错Incompatible equality constrain T1 and String **解决方式:**可能是spark_core依赖包版本问题,更换一个版本就行了。

    Tuple2 报错Incompatible equality constrain T1 and String

    问题1

    **解决方式:**可能是spark_core依赖包版本问题,更换一个版本就行了。

    展开全文
  • 19/10/14 16:33:00 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID ...java.lang.RuntimeException: Error while encoding: java.lang.RuntimeException: scala.Tuple2 is not a valid external type for...
    19/10/14 16:33:00 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)
    java.lang.RuntimeException: Error while encoding: java.lang.RuntimeException: scala.Tuple2 is not a valid external type for schema of string
    if (assertnotnull(input[0, org.apache.spark.sql.Row, true]).isNullAt) null else staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, fromString, validateexternaltype(getexternalrowfield(assertnotnull(input[0, org.apache.spark.sql.Row, true]), 0, domain), StringType), true, false) AS domain#0
    if (assertnotnull(input[0, org.apache.spark.sql.Row, true]).isNullAt) null else staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, fromString, validateexternaltype(getexternalrowfield(assertnotnull(input[0, org.apache.spark.sql.Row, true]), 1, ip), StringType), true, false) AS ip#1
    	at org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.toRow(ExpressionEncoder.scala:292)
    	at org.apache.spark.sql.SparkSession$$anonfun$4.apply(SparkSession.scala:593)
    	at org.apache.spark.sql.SparkSession$$anonfun$4.apply(SparkSession.scala:593)
    	at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
    	at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
    	at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:256)
    	at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:247)
    	at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:836)
    	at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:836)
    	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
    	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
    	at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
    	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
    	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
    	at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
    	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
    	at org.apache.spark.scheduler.Task.run(Task.scala:123)
    	at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
    	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
    	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
    	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    	at java.lang.Thread.run(Thread.java:748)
    Caused by: java.lang.RuntimeException: scala.Tuple2 is not a valid external type for schema of string
    	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.StaticInvoke_0$(Unknown Source)
    	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown Source)
    	at org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.toRow(ExpressionEncoder.scala:289)
    	... 22 more
    

    解决方案:

    字段缺失 与schema不匹配
    在这里插入图片描述

    展开全文
  • 今天在写spark 提数的时候遇到一个异常,如下Caused by: java.lang.ClassCastException: org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema cannot be cast to scala.Tuple2 at $anonfun$1$$anonfun...

    今天在写spark 提数的时候遇到一个异常,如下

    Caused by: java.lang.ClassCastException: org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema cannot be cast to scala.Tuple2
      at $anonfun$1$$anonfun$apply$1.apply(<console>:27)
      at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
      at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
      at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
      at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:35)
      at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
      at scala.collection.AbstractTraversable.map(Traversable.scala:104)
      at $anonfun$1.apply(<console>:27)
      at $anonfun$1.apply(<console>:27)
      at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
      at scala.collection.Iterator$class.foreach(Iterator.scala:893)
      at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
      at org.apache.spark.rdd.RDD$$anonfun$foreach$1$$anonfun$apply$28.apply(RDD.scala:917)
      at org.apache.spark.rdd.RDD$$anonfun$foreach$1$$anonfun$apply$28.apply(RDD.scala:917)
      at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1944)
      at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1944)
      at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
      at org.apache.spark.scheduler.Task.run(Task.scala:99)
      at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
      at java.lang.Thread.run(Thread.java:745)

    df的schema如下:

    root
     |-- cityId: long (nullable = true)
     |-- countryId: long (nullable = true)
     |-- outline: array (nullable = true)
     |    |-- element: struct (containsNull = true)
     |    |    |-- _1: double (nullable = true)
     |    |    |-- _2: double (nullable = true)
     |-- provinceId: long (nullable = true)
     |-- townId: long (nullable = true)
    

    使用如下方法在提取outline 字段时,报错

    val outline = df.select("outline").collect().map(row => row.getAs[Seq[(Double,Double)]]("outline"))

    解决方法,我已知两种:
    第一种,在生成数据时,定义case class对象,不存储tuple,就我个人而言,不适用,数据已经生成完毕,再次生成数据需要2天的时间
    第二种方法:

     val lines = json.filter(row => !row.isNullAt(2)).select("outline").rdd.map(r => {
          val row:Seq[(Double,Double)] = r.getAs[Seq[Row]](0).map(x =>{(x.getDouble(0),x.getDouble(1))})
          row
        }).collect()(0)

    至于报错的原因,google了一下,我觉得有一种说法可信:
    在row 中的这些列取得时候,要根据类型取,简单的像String,Seq[Double] 这种类型就可以直接取出来,但是像 Seq[(Double,Double)] 这种类型直接取得花就会丢失schema信息,虽然值能取到,但是schema信息丢了,在dataFrame中操作的时候就会抛错

    展开全文
  • } } } } Exception in thread "main" org.apache.flink.api.common.typeutils.CompositeType$InvalidFieldReferenceException: Cannot reference field by position on GenericType<scala.Tuple2>Referencing a ...
  • Master当前消息类型:Tuple2 这里第二次匹配case的时候竟然是元组类型,仔细看debug, 难怪原来使用case content: PageContent一直匹配不成功。。原来message消息类型不是纯粹的就是消息本身的类型,可能还...
  • tuple元组

    2020-02-04 17:32:04
    tuple不可变列表 tup=(‘沈阳’,‘大连’,‘盘锦’) ...tuple2=(4,5,6) tuple3=tuple1+tuple2 print(tuple3) (1, 2, 3, 4, 5, 6) tuple=('love','python') tuple1=tuple*2 print(tuple1) ('love', 'python', 'love',...
  • tuple

    2020-11-06 14:21:10
    tuple是一个异质数据集合,是pair的扩展,支持任意数量元素。 tuple是TR1引入的,由于当时语言不支持不定参模板,所以最初的tuple元素个数是有限的。C++11引入不定参模板后,tuple得到重新实现。 tuple支持下列操作...
  • 元组 tuple

    2018-11-12 17:53:22
    tuple1 = (1,2) tuple2 =1,2  特性 不可变 元组本身不可变,如果元组中包含其他可变元素,那么这些可变元素可以改变   定义一个空元组 tuple1 = () tuple1 = tuple() 定义元组并赋值 tuple2 = 1,2,3 ...
  • 元祖 tuple

    2018-07-30 21:01:44
    tuple=("apple","banana"...tuple2=("apple") tuple3=("apple",) print(tuple) #print(tuple1[0]) print(tuple2[0]) print(tuple3[0]) """ print
  • Tuple

    2020-02-02 17:33:06
    Tuple 什么是Tuple? Tuple是python对象组成的序列,此序列内容不可变,即不能编辑元素,这一点和List不同;...定义Tuple使用逗号将对象分割,或者...tup2 = (1, 2, 3, 4, 5, 6, 7) tup3 = "a", "b", "c", "d"# ...
  • Python tuple

    2017-01-19 16:10:00
    Python的元组与列表类似,不同之处在于元组的元素不能修改 ...tuple2=1,2,3,"a","b","c" tuple2 #元组中只包含一个元素时,需要在元素后面添加逗号 tuple3=(1) type(tuple3) int tuple4=(1,) type(...
  • tuple2 = ("hello",) print(tuple1[0]) print (tuple1[1:3]) print (tuple1[1:]) print (tuple2 * 2) print (tuple1+tuple2) 注意事项,tuple2 的后面有一个逗号 执行结果如下 str (1, ['a', 'b', 'c']) (1, ['a'...
  • tuple不定值数组

    2021-01-20 13:39:11
    1、tuple 2、函数 3、测试 1、tuple tuple扩展了pair将两个元素看成一个单元的功能,实现了可以将任意数量元素当作一个单元的功能。 2、函数 tuple t; 以n个给定类型的元素建立tuple tuple t(v1,v2...vn);建立tuple...
  • 一、定义:不可变序列的数据元素集合,元组的元素...在函数传递参数时候用(*arg)来接受任意长度与个数的参数,并用元组保存1 #----------tuple语法--------#23 tuple1 = (1,2,3,'a','v','g')45 tuple2 = 1,2,3,4,'s...
  • TupleBoost::tuple是类似于std::pair的一个类。Pair有且只能有两个成员(first和second),而tuple的元素可以是0-10个。使用tuple要包含boost/tuple/tuple.hpp文件。例如:#include...tuple t1(1,2.0,”Hello”);...
  • 元组 (tuple)

    2020-07-29 20:28:12
    tuple2=('程潇','刘亦菲','张丽') for index , item in enumerate(tuple2): print(index,item) 切片 tuple2=('程潇','刘亦菲','张丽') print(tuple2[::-1]) 元组的相关操作 print(('a','b')+
  • # 元组(tuple) #主要操作:1.in和not in;2.比较、串联、切片和索引;3.min()和max();4.可以放不同类型的元素 # 创建元组(以下两种创建方法一样) tuple1 = (1, 2, 'a') ...tuple2 = 1, 2, 'a' # output: (1, ...
  • 三、tuple元组

    2019-09-18 14:59:12
    一、元组 tuple1=(1,2,3,4,5,6) 用()括起来,差别于列表的中括号[ ] 注意:元组中的元素不可以进行修改,否则会报错 二、元组的创建与插入 (1)单元素元组的创建 ...print(type(tuple2)) #返回tuple类型,创...

空空如也

空空如也

1 2 3 4 5 ... 20
收藏数 20,451
精华内容 8,180
关键字:

tuple2