What is the Scala type mapping for all Spark SQL DataType
Directly from the Spark SQL and DataFrame Guide:
Data type | Value type in Scala
------------------------------------------------
ByteType | Byte
ShortType | Short
IntegerType | Int
LongType | Long
FloatType | Float
DoubleType | Double
DecimalType | java.math.BigDecimal
StringType | String
BinaryType | Array[Byte]
BooleanType | Boolean
TimestampType | java.sql.Timestamp
DateType | java.sql.Date
ArrayType | scala.collection.Seq
MapType | scala.collection.Map
StructType | org.apache.spark.sql.Row
Jdbc data type to Spark SQL datatype
If you have access to JDBC source with a table having a given schema you can simply copy from there:
val jdbcOptions: Map[String, String] = ???
val jdbcSchema = sqlContext.load("jdbc", jdbcOptions).schema
JSON representation is quite simple. Each StructField
is represented as document with fields metadata
, name
, nullable
and type
.
{"metadata":{},"name":"f","nullable":true,"type":"string"}
For most applications you can ignore metadata
and focus on the remaining three. Tricky part is mapping from Java class to type
, but a naive solution can look like this:
import net.liftweb.json.JsonDSL._
import net.liftweb.json.{compact, render}
val columns = Seq(
("UserName", "java.lang.String"),
("Age", "java.lang.Long"),
("Salary", "java.lang.Double")
).map{case (n, t) => (n, t.split("\\.").last.toLowerCase)}
val fields = columns.map {case (n, t) => (
("metadata" -> Map.empty[String, String]) ~
("name" -> n) ~
("nullable" -> false) ~
("type" -> t)
)}
val schemaJSON = compact(render(("fields" -> fields) ~ ("type" -> "struct"))
val schema = DataType.fromJson(schemaJSON).asInstanceOf[StructType]
Mapping List items to org.apache.spark.sql.Column type
You can get a column object from a string by using function col (you are actually already using it in your first snippet).
So this should work:
columnsToSum.map(col).reduce(_ + _)
or move verbose version:
columnsToSum.map(c => col(c)).reduce(_ + _)
Related Topics
Finding the Data Types of a SQL Temporary Table
Most Recent Record in a Left Join
In SQL, Is There Something Like "In", But for Multiple "And" Conditions
Difference Between Stored Procedures and User Defined Functions
Rails: How to Find_By a Field Containing a Certain String
In MySQL, How to Copy the Content of One Table to Another Table Within the Same Database
Why Does SQL Server Return 0 for 1/2
Generate a Unique Time-Based Id on a Table in SQL Server
Schedule Import CSV to SQL Server 2014 Express Edition
Strange Postgresql "Value Too Long for Type Character Varying(500)"
How to Enforce Set-Like Uniqueness Between Multiple Columns
SQL Different Between Left Join On... and Left Join On..Where