Pyspark SQL : Converting a string to timestamp custom format
try
select date_format(to_timestamp('072022','MMyyyy'),'yyyy-MM-dd HH:mm:ss.SSS')
replace the 072022 with your column
how to convert date of format string to timestamp in spark?
You haven't given value for the new column to convert. you should use withColumn to add the new date column and tell him to use Date column values.
import org.apache.spark.sql.functions.{col, to_date}
import org.apache.spark.sql.types._
val df = Seq((20110813),(20090724)).toDF("Date")
val newDf = df.withColumn("to_date", to_date(col("Date").cast(TimestampType), "yyyy-MM-dd"))
newDf.show()
Spark fails to convert String to TIMESTAMP
Trim your extra 0s. Then,
df.withColumn("new", to_timestamp($"date".substr(lit(1),length($"date") - 6), "yyyy-MM-dd HH:mm:ss.SSS")).show(false)
the result is:
+-----------------------------+-------------------+
|date |new |
+-----------------------------+-------------------+
|2019-05-07 00:03:53.837000000|2019-05-07 00:03:53|
+-----------------------------+-------------------+
The schema:
root
|-- date: string (nullable = true)
|-- new: timestamp (nullable = true)
How to convert string date into timestamp in pyspark?
06/21/2021 9:27 AM
doesn't contain the second-of-minute value so you should remove the :ss
in the parser format, see this example:
spark.sql("select from_unixtime(unix_timestamp('06/21/2021 9:27 AM', 'MM/dd/yyyy hh:mm a')) ts").show()
+-------------------+
| ts|
+-------------------+
|2021-06-21 09:27:00|
+-------------------+
Spark scala convert string to timestamp (1147880044 - mm/dd/yyyy HH:mm:ss format)
Use from_unixtime
& date_format
functions.
scala> val df = Seq(("1","296","5.0","1147880044","null"),("1","306","3.5","1147868817","null")).toDF("userId","movieId","rating","ts","ratingtimestamp")
df: org.apache.spark.sql.DataFrame = [userId: string, movieId: string ... 3 more fields]
scala> df.show(false)
+------+-------+------+----------+---------------+
|userId|movieId|rating|ts |ratingtimestamp|
+------+-------+------+----------+---------------+
|1 |296 |5.0 |1147880044|null |
|1 |306 |3.5 |1147868817|null |
+------+-------+------+----------+---------------+
scala> df.withColumn("ratingtimestamp",date_format(from_unixtime($"ts"),"MM/dd/yyyy HH:mm:ss")).show(false)
+------+-------+------+----------+-------------------+
|userId|movieId|rating|ts |ratingtimestamp |
+------+-------+------+----------+-------------------+
|1 |296 |5.0 |1147880044|05/17/2006 21:04:04|
|1 |306 |3.5 |1147868817|05/17/2006 17:56:57|
+------+-------+------+----------+-------------------+
scala> df.withColumn("ratingtimestamp",from_unixtime($"ts","MM/dd/yyyy HH:mm:ss")).show(false)
+------+-------+------+----------+-------------------+
|userId|movieId|rating|ts |ratingtimestamp |
+------+-------+------+----------+-------------------+
|1 |296 |5.0 |1147880044|05/17/2006 21:04:04|
|1 |306 |3.5 |1147868817|05/17/2006 17:56:57|
+------+-------+------+----------+-------------------+
Related Topics
Stratified Random Sampling with Bigquery
SQL Join Table Naming Convention
How to Import a SQL Data File into SQL Server
Executing a Stored Procedure Inside Begin/End Transaction
What Are Indexes and How to Use Them to Optimize Queries in My Database
A Good Reference for Oracle Pl/Sql
Postgres - Aggregate Two Columns into One Item
Is Varchar(Max) Always Preferable
Psql: Server Closed the Connection Unexepectedly
How to Connect to SQL Express "Error: 26-Error Locating Server/Instance Specified)
Query to List SQL Server Stored Procedures Along with Lines of Code for Each Procedure
Oracle Convert Seconds to Hours:Minutes:Seconds
Create SQL Insert Script with Values Gathered from Table
When Should You Use Full-Text Indexing
How to Grant Read Access for a User to a Database in SQL Server
Counting Number of Records Hour by Hour Between Two Dates in Oracle