What Is the Scala Type Mapping for All Spark SQL Datatype

What is the Scala type mapping for all Spark SQL DataType

Directly from the Spark SQL and DataFrame Guide:

Data type       |    Value type in Scala
------------------------------------------------
ByteType        |    Byte   
ShortType       |    Short  
IntegerType     |    Int    
LongType        |    Long   
FloatType       |    Float  
DoubleType      |    Double     
DecimalType     |    java.math.BigDecimal
StringType      |    String
BinaryType      |    Array[Byte]
BooleanType     |    Boolean 
TimestampType   |    java.sql.Timestamp
DateType        |    java.sql.Date
ArrayType       |    scala.collection.Seq   
MapType         |    scala.collection.Map   
StructType      |    org.apache.spark.sql.Row

Jdbc data type to Spark SQL datatype

If you have access to JDBC source with a table having a given schema you can simply copy from there:

val jdbcOptions: Map[String, String] = ???
val jdbcSchema = sqlContext.load("jdbc", jdbcOptions).schema

JSON representation is quite simple. Each StructField is represented as document with fields metadata, name, nullable and type.

{"metadata":{},"name":"f","nullable":true,"type":"string"}

For most applications you can ignore metadata and focus on the remaining three. Tricky part is mapping from Java class to type, but a naive solution can look like this:

import net.liftweb.json.JsonDSL._
import net.liftweb.json.{compact, render}

val columns = Seq(
    ("UserName", "java.lang.String"),
    ("Age", "java.lang.Long"),
    ("Salary", "java.lang.Double")
).map{case (n, t) => (n, t.split("\\.").last.toLowerCase)}

val fields =  columns.map {case (n, t) => (
    ("metadata" -> Map.empty[String, String]) ~
    ("name" -> n) ~
    ("nullable" -> false) ~
    ("type" -> t)
)}

val schemaJSON = compact(render(("fields" -> fields) ~ ("type" -> "struct"))
val schema = DataType.fromJson(schemaJSON).asInstanceOf[StructType]

Mapping List items to org.apache.spark.sql.Column type

You can get a column object from a string by using function col (you are actually already using it in your first snippet).

So this should work:

columnsToSum.map(col).reduce(_ + _)

or move verbose version:

columnsToSum.map(c => col(c)).reduce(_ + _)

What Is the Scala Type Mapping for All Spark SQL Datatype