-
java Tuple2 maptopair lambda
2020-08-02 13:45:03import org.apache.spark.api.java.JavaPairRDD; import org.apache.spark.api.java.JavaSparkContext; import org.apache.spark.api.java.JavaRDD; import org.apache.spark.SparkConf;...import scala.Tuple2...import org.apache.spark.api.java.JavaPairRDD; import org.apache.spark.api.java.JavaSparkContext; import org.apache.spark.api.java.JavaRDD; import org.apache.spark.SparkConf; import org.apache.spark.api.java.function.PairFunction; import scala.Tuple2; import java.util.Arrays; import java.util.List; import java.util.Random; public class Test1 { public static void main(String[] args) { // (x,y) 元组第一个元素加一个随机的数字前缀 SparkConf conf = new SparkConf().setAppName("appName").setMaster("local[*]"); JavaSparkContext sc = new JavaSparkContext(conf); List<Tuple2> data = Arrays.asList(new Tuple2(111L, 222L), new Tuple2(100L, 200L)); JavaRDD<Tuple2> rdd = sc.parallelize(data); // JavaPairRDD<String, Long> randomPrefixRdd = rdd.mapToPair // ( // new PairFunction<Tuple2<Long,Long>, String, Long>() // // input <long,long> // // output typle2(string,long) // { // private static final long serialVersionUID = 1L; // @Override // public Tuple2<String, Long> call(Tuple2<Long, Long> tuple)throws Exception // { // Random random = new Random(); // int prefix = random.nextInt(10); // return new Tuple2<String, Long>(prefix + "_" + tuple._1, tuple._2); // } // } // ); JavaPairRDD randomPrefixRdd = rdd.mapToPair( x -> new Tuple2<>(new Random().nextInt(10) + "_" + String.valueOf(x._1), x._2)); System.out.println(randomPrefixRdd.collect()); } }
-
Tuple2 报错Incompatible equality constrain T1 and String
2020-05-10 19:43:47Tuple2 报错Incompatible equality constrain T1 and String **解决方式:**可能是spark_core依赖包版本问题,更换一个版本就行了。展开全文 -
scala.Tuple2 is not a valid external type for schema of string报错原因
2019-10-14 16:42:2119/10/14 16:33:00 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID ...java.lang.RuntimeException: Error while encoding: java.lang.RuntimeException: scala.Tuple2 is not a valid external type for...19/10/14 16:33:00 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0) java.lang.RuntimeException: Error while encoding: java.lang.RuntimeException: scala.Tuple2 is not a valid external type for schema of string if (assertnotnull(input[0, org.apache.spark.sql.Row, true]).isNullAt) null else staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, fromString, validateexternaltype(getexternalrowfield(assertnotnull(input[0, org.apache.spark.sql.Row, true]), 0, domain), StringType), true, false) AS domain#0 if (assertnotnull(input[0, org.apache.spark.sql.Row, true]).isNullAt) null else staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, fromString, validateexternaltype(getexternalrowfield(assertnotnull(input[0, org.apache.spark.sql.Row, true]), 1, ip), StringType), true, false) AS ip#1 at org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.toRow(ExpressionEncoder.scala:292) at org.apache.spark.sql.SparkSession$$anonfun$4.apply(SparkSession.scala:593) at org.apache.spark.sql.SparkSession$$anonfun$4.apply(SparkSession.scala:593) at scala.collection.Iterator$$anon$11.next(Iterator.scala:409) at scala.collection.Iterator$$anon$11.next(Iterator.scala:409) at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:256) at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:247) at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:836) at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:836) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) at org.apache.spark.rdd.RDD.iterator(RDD.scala:288) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) at org.apache.spark.rdd.RDD.iterator(RDD.scala:288) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) at org.apache.spark.scheduler.Task.run(Task.scala:123) at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.RuntimeException: scala.Tuple2 is not a valid external type for schema of string at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.StaticInvoke_0$(Unknown Source) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown Source) at org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.toRow(ExpressionEncoder.scala:289) ... 22 more
解决方案:
字段缺失 与schema不匹配
-
org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema cannot be cast to scala.Tuple2
2017-09-07 17:11:26今天在写spark 提数的时候遇到一个异常,如下Caused by: java.lang.ClassCastException: org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema cannot be cast to scala.Tuple2 at $anonfun$1$$anonfun...今天在写spark 提数的时候遇到一个异常,如下
Caused by: java.lang.ClassCastException: org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema cannot be cast to scala.Tuple2 at $anonfun$1$$anonfun$apply$1.apply(<console>:27) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:35) at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at scala.collection.AbstractTraversable.map(Traversable.scala:104) at $anonfun$1.apply(<console>:27) at $anonfun$1.apply(<console>:27) at scala.collection.Iterator$$anon$11.next(Iterator.scala:409) at scala.collection.Iterator$class.foreach(Iterator.scala:893) at scala.collection.AbstractIterator.foreach(Iterator.scala:1336) at org.apache.spark.rdd.RDD$$anonfun$foreach$1$$anonfun$apply$28.apply(RDD.scala:917) at org.apache.spark.rdd.RDD$$anonfun$foreach$1$$anonfun$apply$28.apply(RDD.scala:917) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1944) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1944) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) at org.apache.spark.scheduler.Task.run(Task.scala:99) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745)
df的schema如下:
root |-- cityId: long (nullable = true) |-- countryId: long (nullable = true) |-- outline: array (nullable = true) | |-- element: struct (containsNull = true) | | |-- _1: double (nullable = true) | | |-- _2: double (nullable = true) |-- provinceId: long (nullable = true) |-- townId: long (nullable = true)
使用如下方法在提取outline 字段时,报错
val outline = df.select("outline").collect().map(row => row.getAs[Seq[(Double,Double)]]("outline"))
解决方法,我已知两种:
第一种,在生成数据时,定义case class对象,不存储tuple,就我个人而言,不适用,数据已经生成完毕,再次生成数据需要2天的时间
第二种方法:val lines = json.filter(row => !row.isNullAt(2)).select("outline").rdd.map(r => { val row:Seq[(Double,Double)] = r.getAs[Seq[Row]](0).map(x =>{(x.getDouble(0),x.getDouble(1))}) row }).collect()(0)
至于报错的原因,google了一下,我觉得有一种说法可信:
在row 中的这些列取得时候,要根据类型取,简单的像String,Seq[Double] 这种类型就可以直接取出来,但是像 Seq[(Double,Double)] 这种类型直接取得花就会丢失schema信息,虽然值能取到,但是schema信息丢了,在dataFrame中操作的时候就会抛错 -
flink java 代码引入 scala tuple2包导致失败
2020-11-20 15:01:17} } } } Exception in thread "main" org.apache.flink.api.common.typeutils.CompositeType$InvalidFieldReferenceException: Cannot reference field by position on GenericType<scala.Tuple2>Referencing a ... -
Scala中(of class scala.Tuple2)异常,match case匹配失败
2019-02-24 15:25:28Master当前消息类型:Tuple2 这里第二次匹配case的时候竟然是元组类型,仔细看debug, 难怪原来使用case content: PageContent一直匹配不成功。。原来message消息类型不是纯粹的就是消息本身的类型,可能还... -
tuple元组
2020-02-04 17:32:04tuple不可变列表 tup=(‘沈阳’,‘大连’,‘盘锦’) ...tuple2=(4,5,6) tuple3=tuple1+tuple2 print(tuple3) (1, 2, 3, 4, 5, 6) tuple=('love','python') tuple1=tuple*2 print(tuple1) ('love', 'python', 'love',... -
tuple
2020-11-06 14:21:10tuple是一个异质数据集合,是pair的扩展,支持任意数量元素。 tuple是TR1引入的,由于当时语言不支持不定参模板,所以最初的tuple元素个数是有限的。C++11引入不定参模板后,tuple得到重新实现。 tuple支持下列操作... -
元组 tuple
2018-11-12 17:53:22tuple1 = (1,2) tuple2 =1,2 特性 不可变 元组本身不可变,如果元组中包含其他可变元素,那么这些可变元素可以改变 定义一个空元组 tuple1 = () tuple1 = tuple() 定义元组并赋值 tuple2 = 1,2,3 ... -
元祖 tuple
2018-07-30 21:01:44tuple=("apple","banana"...tuple2=("apple") tuple3=("apple",) print(tuple) #print(tuple1[0]) print(tuple2[0]) print(tuple3[0]) """ print -
Tuple
2020-02-02 17:33:06Tuple 什么是Tuple? Tuple是python对象组成的序列,此序列内容不可变,即不能编辑元素,这一点和List不同;...定义Tuple使用逗号将对象分割,或者...tup2 = (1, 2, 3, 4, 5, 6, 7) tup3 = "a", "b", "c", "d"# ... -
Python tuple
2017-01-19 16:10:00Python的元组与列表类似,不同之处在于元组的元素不能修改 ...tuple2=1,2,3,"a","b","c" tuple2 #元组中只包含一个元素时,需要在元素后面添加逗号 tuple3=(1) type(tuple3) int tuple4=(1,) type(... -
python TypeError can only concatenate tuple not str to tuple
2020-06-06 20:35:28tuple2 = ("hello",) print(tuple1[0]) print (tuple1[1:3]) print (tuple1[1:]) print (tuple2 * 2) print (tuple1+tuple2) 注意事项,tuple2 的后面有一个逗号 执行结果如下 str (1, ['a', 'b', 'c']) (1, ['a'... -
tuple不定值数组
2021-01-20 13:39:111、tuple 2、函数 3、测试 1、tuple tuple扩展了pair将两个元素看成一个单元的功能,实现了可以将任意数量元素当作一个单元的功能。 2、函数 tuple t; 以n个给定类型的元素建立tuple tuple t(v1,v2...vn);建立tuple... -
python tuple_Python3 元组(tuple)
2020-12-04 09:12:40一、定义:不可变序列的数据元素集合,元组的元素...在函数传递参数时候用(*arg)来接受任意长度与个数的参数,并用元组保存1 #----------tuple语法--------#23 tuple1 = (1,2,3,'a','v','g')45 tuple2 = 1,2,3,4,'s... -
Learning boost 2 Tuple and ref
2007-08-17 11:11:00TupleBoost::tuple是类似于std::pair的一个类。Pair有且只能有两个成员(first和second),而tuple的元素可以是0-10个。使用tuple要包含boost/tuple/tuple.hpp文件。例如:#include...tuple t1(1,2.0,”Hello”);... -
元组 (tuple)
2020-07-29 20:28:12tuple2=('程潇','刘亦菲','张丽') for index , item in enumerate(tuple2): print(index,item) 切片 tuple2=('程潇','刘亦菲','张丽') print(tuple2[::-1]) 元组的相关操作 print(('a','b')+ -
2.python数据结构-元组(tuple)
2018-03-22 20:25:00# 元组(tuple) #主要操作:1.in和not in;2.比较、串联、切片和索引;3.min()和max();4.可以放不同类型的元素 # 创建元组(以下两种创建方法一样) tuple1 = (1, 2, 'a') ...tuple2 = 1, 2, 'a' # output: (1, ... -
三、tuple元组
2019-09-18 14:59:12一、元组 tuple1=(1,2,3,4,5,6) 用()括起来,差别于列表的中括号[ ] 注意:元组中的元素不可以进行修改,否则会报错 二、元组的创建与插入 (1)单元素元组的创建 ...print(type(tuple2)) #返回tuple类型,创...
收藏数
20,451
精华内容
8,180