What Is the Scala Type Mapping for All Spark SQL Datatype

What is the Scala type mapping for all Spark SQL DataType

Directly from the Spark SQL and DataFrame Guide:

Data type       |    Value type in Scala
ByteType | Byte
ShortType | Short
IntegerType | Int
LongType | Long
FloatType | Float
DoubleType | Double
DecimalType | java.math.BigDecimal
StringType | String
BinaryType | Array[Byte]
BooleanType | Boolean
TimestampType | java.sql.Timestamp
DateType | java.sql.Date
ArrayType | scala.collection.Seq
MapType | scala.collection.Map
StructType | org.apache.spark.sql.Row

Jdbc data type to Spark SQL datatype

If you have access to JDBC source with a table having a given schema you can simply copy from there:

val jdbcOptions: Map[String, String] = ???
val jdbcSchema = sqlContext.load("jdbc", jdbcOptions).schema

JSON representation is quite simple. Each StructField is represented as document with fields metadata, name, nullable and type.


For most applications you can ignore metadata and focus on the remaining three. Tricky part is mapping from Java class to type, but a naive solution can look like this:

import net.liftweb.json.JsonDSL._
import net.liftweb.json.{compact, render}

val columns = Seq(
("UserName", "java.lang.String"),
("Age", "java.lang.Long"),
("Salary", "java.lang.Double")
).map{case (n, t) => (n, t.split("\\.").last.toLowerCase)}

val fields = columns.map {case (n, t) => (
("metadata" -> Map.empty[String, String]) ~
("name" -> n) ~
("nullable" -> false) ~
("type" -> t)

val schemaJSON = compact(render(("fields" -> fields) ~ ("type" -> "struct"))
val schema = DataType.fromJson(schemaJSON).asInstanceOf[StructType]

Mapping List items to org.apache.spark.sql.Column type

You can get a column object from a string by using function col (you are actually already using it in your first snippet).

So this should work:

columnsToSum.map(col).reduce(_ + _)

or move verbose version:

columnsToSum.map(c => col(c)).reduce(_ + _)

Related Topics

Leave a reply
