Convert Pyspark String to Date Format

Convert pyspark string to date format

Update (1/10/2018):

For Spark 2.2+ the best way to do this is probably using the to_date or to_timestamp functions, which both support the format argument. From the docs:

>>> from pyspark.sql.functions import to_timestamp
>>> df = spark.createDataFrame([('1997-02-28 10:30:00',)], ['t'])
>>> df.select(to_timestamp(df.t, 'yyyy-MM-dd HH:mm:ss').alias('dt')).collect()
[Row(dt=datetime.datetime(1997, 2, 28, 10, 30))]

Original Answer (for Spark < 2.2)

It is possible (preferrable?) to do this without a udf:

from pyspark.sql.functions import unix_timestamp, from_unixtime

df = spark.createDataFrame(
[("11/25/1991",), ("11/24/1991",), ("11/30/1991",)],
['date_str']
)

df2 = df.select(
'date_str',
from_unixtime(unix_timestamp('date_str', 'MM/dd/yyy')).alias('date')
)

print(df2)
#DataFrame[date_str: string, date: timestamp]

df2.show(truncate=False)
#+----------+-------------------+
#|date_str |date |
#+----------+-------------------+
#|11/25/1991|1991-11-25 00:00:00|
#|11/24/1991|1991-11-24 00:00:00|
#|11/30/1991|1991-11-30 00:00:00|
#+----------+-------------------+

Convert date from String to Date format in Dataframes

Use to_date with Java SimpleDateFormat.

TO_DATE(CAST(UNIX_TIMESTAMP(date, 'MM/dd/yyyy') AS TIMESTAMP))

Example:

spark.sql("""
SELECT TO_DATE(CAST(UNIX_TIMESTAMP('08/26/2016', 'MM/dd/yyyy') AS TIMESTAMP)) AS newdate"""
).show()

+----------+
| dt|
+----------+
|2016-08-26|
+----------+

convert a string type(MM/dd/YYYY hh:mm:ss AM/PM) to date format in PySpark?

# Create data frame like below
df = spark.createDataFrame(
[("Test", "05/26/2021 11:31:56 AM")],
("user_name", "login_date"))

# Import functions
from pyspark.sql import functions as f

# Create data framew with new column new_date with data in desired format
df1 = df.withColumn("new_date", f.from_unixtime(f.unix_timestamp("login_date",'MM/dd/yyyy hh:mm:ss a'),'yyyy-MM-dd').cast('date'))

Convert string date to date format in pyspark SQL

I simplified your code now it will work fine (just tested it) :-)

%sql
SELECT
Col1,
Col2,
Col3,
Col4,
TO_DATE(TRIM(date), 'yyyy_MM') AS DATE,
sales
FROM
db.table

Change column type from string to date in Pyspark

from pyspark.sql.functions import col, unix_timestamp, to_date

#sample data
df = sc.parallelize([['12-21-2006'],
['05-30-2007'],
['01-01-1984'],
['12-24-2017']]).toDF(["date_in_strFormat"])
df.printSchema()

df = df.withColumn('date_in_dateFormat',
to_date(unix_timestamp(col('date_in_strFormat'), 'MM-dd-yyyy').cast("timestamp")))
df.show()
df.printSchema()

Output is:

root
|-- date_in_strFormat: string (nullable = true)

+-----------------+------------------+
|date_in_strFormat|date_in_dateFormat|
+-----------------+------------------+
| 12-21-2006| 2006-12-21|
| 05-30-2007| 2007-05-30|
| 01-01-1984| 1984-01-01|
| 12-24-2017| 2017-12-24|
+-----------------+------------------+

root
|-- date_in_strFormat: string (nullable = true)
|-- date_in_dateFormat: date (nullable = true)


Related Topics



Leave a reply



Submit