$PySpark$ 在遇到$map$类型的列的一些处理 在$spark$中,有时会遇到$column$的类型是$array$和$map$类型的,这时候需要将它们转换为多行数据 $Explode\ array\ and\ map\ columns\ to\ rows$ 1import pyspark 2from pyspark.sql import SparkSession 3 4spark = SparkSession.builder.appName('pyspark-by-examples').getOrCreate() 5 6arrayData = [ 7 ('James',['Java','Scala'],{'hair':'black','eye':'brown'}), 8 ('Michael',['Spark','Java',None],{'hair':'brown','eye':None}), 9 ('Robert',['CSharp',''],{'hair':'red','eye':''}), 10 ('Washington',None,None), 11 ('Jefferson',['1','2'],{}) ] 12 13df = spark.createDataFrame(data=arrayData, schema = ['name','knownLanguages','properties']) 14df.printSchema() 15df.show() 1root 2 |-- name: string (nullable = true) 3 |-- knownLanguages: array (nullable = true) 4 | |-- element: string (containsNull = true) 5 |-- properties: map (nullable = true) 6 | |-- key: string 7 | |-- value: string (valueContainsNull = true) 8 9+----------+--------------+--------------------+ 10| name|knownLanguages| properties| 11+----------+--------------+--------------------+ 12| James| [Java, Scala]|[eye -> brown, ha...| 13| Michael|[Spark, Java,]|[eye ->, hair -> ...| 14| Robert| [CSharp, ]|[eye -> , hair ->...| 15|Washington| null| null| 16| Jefferson| [1, 2]| []| 17+----------+--------------+--------------------+ $explode –……
阅读全文