What Is the Scala Type Mapping for All Spark SQL Datatype

What is the Scala type mapping for all Spark SQL DataType

Directly from the Spark SQL and DataFrame Guide:

Data type       |    Value type in Scala
------------------------------------------------
ByteType | Byte
ShortType | Short
IntegerType | Int
LongType | Long
FloatType | Float
DoubleType | Double
DecimalType | java.math.BigDecimal
StringType | String
BinaryType | Array[Byte]
BooleanType | Boolean
TimestampType | java.sql.Timestamp
DateType | java.sql.Date
ArrayType | scala.collection.Seq
MapType | scala.collection.Map
StructType | org.apache.spark.sql.Row

Jdbc data type to Spark SQL datatype

If you have access to JDBC source with a table having a given schema you can simply copy from there:

val jdbcOptions: Map[String, String] = ???
val jdbcSchema = sqlContext.load("jdbc", jdbcOptions).schema

JSON representation is quite simple. Each StructField is represented as document with fields metadata, name, nullable and type.

{"metadata":{},"name":"f","nullable":true,"type":"string"}

For most applications you can ignore metadata and focus on the remaining three. Tricky part is mapping from Java class to type, but a naive solution can look like this:

import net.liftweb.json.JsonDSL._
import net.liftweb.json.{compact, render}

val columns = Seq(
("UserName", "java.lang.String"),
("Age", "java.lang.Long"),
("Salary", "java.lang.Double")
).map{case (n, t) => (n, t.split("\\.").last.toLowerCase)}

val fields = columns.map {case (n, t) => (
("metadata" -> Map.empty[String, String]) ~
("name" -> n) ~
("nullable" -> false) ~
("type" -> t)
)}

val schemaJSON = compact(render(("fields" -> fields) ~ ("type" -> "struct"))
val schema = DataType.fromJson(schemaJSON).asInstanceOf[StructType]

Mapping List items to org.apache.spark.sql.Column type

You can get a column object from a string by using function col (you are actually already using it in your first snippet).

So this should work:

columnsToSum.map(col).reduce(_ + _)

or move verbose version:

columnsToSum.map(c => col(c)).reduce(_ + _)


Related Topics



Leave a reply



Submit