Split Date into Different Columns for Year, Month and Day

How to split a date column into separate day , month ,year column in pandas

The problem is that datetime_utc is in your index instead a column, so you have to access your index to be able to make your new columns:

df['day'] = df.index.day
df['month'] = df.index.month
df['year'] = df.index.year

print(df)
                 Dewptm  Fog   Humidity    Pressurem      Tempm     Wspdm  \
datetime_utc                                                                
1996-11-01    11.666667  0.0  52.916667 -2659.666667  22.333333  2.466667   
1996-11-02    10.458333  0.0  48.625000  1009.833333  22.916667  8.028571   
1996-11-03    12.041667  0.0  55.958333  1010.500000  21.791667  4.804545   
1996-11-04    10.222222  0.0  48.055556  1011.333333  22.722222  1.964706   

              Rainfall  day  month  year  
datetime_utc                              
1996-11-01           0    1     11  1996  
1996-11-02           0    2     11  1996  
1996-11-03           0    3     11  1996  
1996-11-04           0    4     11  1996

If you want datetime_utc as a column you have to reset your index and then you can access the datetime methods with dt.month, dt.year and dt.day like following:

# Reset our index so datetime_utc becomes a column
df.reset_index(inplace=True)

# Create new columns
df['day'] = df['datetime_utc'].dt.day
df['month'] = df['datetime_utc'].dt.month
df['year'] = df['datetime_utc'].dt.year

print(df)
  datetime_utc     Dewptm  Fog   Humidity    Pressurem      Tempm     Wspdm  \
0   1996-11-01  11.666667  0.0  52.916667 -2659.666667  22.333333  2.466667   
1   1996-11-02  10.458333  0.0  48.625000  1009.833333  22.916667  8.028571   
2   1996-11-03  12.041667  0.0  55.958333  1010.500000  21.791667  4.804545   
3   1996-11-04  10.222222  0.0  48.055556  1011.333333  22.722222  1.964706   

   Rainfall  day  month  year  
0         0    1     11  1996  
1         0    2     11  1996  
2         0    3     11  1996  
3         0    4     11  1996

Note if your index is not in datetime type yet, use the following before you try to extract year, month and day:

df.index = pd.to_datetime(df.index)

How can I separate this date time column into different columns?

You can convert it into POSIXct and use then format to extract day, month, year and time.

x <- c("17/09/2019 9:15:27 a.m.", "17/09/2019 9:15:27 p.m.")
x <- gsub("\\.", "", x) #Remove the . in a.m.
x <- as.POSIXct(x, format="%d/%m/%Y %I:%M:%S %p") #convert to POSIX
data.frame(day   = format(x, "%d"), 
           month = format(x, "%m"),
           year  = format(x, "%Y"),
           time  = format(x, "%T"))
#  day month year     time
#1  17    09 2019 09:15:27
#2  17    09 2019 21:15:27

In case only splitting up into columns is enough, I would use strsplit and split on / or .

x <- c("17/09/2019 9:15:27 a.m.", "17/09/2019 9:15:27 p.m.")
do.call(rbind, strsplit(x, "[/ ]"))
#     [,1] [,2] [,3]   [,4]      [,5]  
#[1,] "17" "09" "2019" "9:15:27" "a.m."
#[2,] "17" "09" "2019" "9:15:27" "p.m."

How to split a date index into separate day , month ,year column in pandas

If your dates are index then your code should have worked. However, if the dates are in date column then try:

df['day'] = df.date.dt.day
df['month'] = df.date.dt.month
df['year'] = df.date.dt.year

Split date into different columns for year, month and day

1) columns. We can use lubridate's year/month/day or chron's month.day.year:

1a) columns via lubridate

library(zoo)
z <- zoo(1:1000, as.Date("1932-01-01") + 0:999)

library(lubridate)
tt <- time(z)
zz <- cbind(z, year = year(tt), month = month(tt), day = day(tt))

1b) columns via chron

library(zoo)
z <- zoo(1:1000, as.Date("1932-01-01") + 0:999)

library(chron)
zz <- with(month.day.year(time(z)), zoo(cbind(z, day, month, year)))

2) aggregate. However, we do not really need to create columns in the first place. We can just use aggregate.zoo directly with the original zoo object, z, using lubridate or chron or just using yearmon from zoo depending on what it is that you want to do:

2a) aggregate using lubridate

library(zoo)
z <- zoo(1:1000, as.Date("1932-01-01") + 0:999)

library(lubridate)
aggregate(z, day, mean)
aggregate(z, month, mean)
aggregate(z, year, mean)

2b) aggregate using chron

library(zoo)
z <- zoo(1:1000, as.Date("1932-01-01") + 0:999)

library(chron)
mdy <- month.day.year(time(z))

aggregate(z, mdy$day, mean)
aggregate(z, mdy$month, mean)
aggregate(z, mdy$year, mean)

# or
ct <- as.chron(time(z))

aggregate(z, days(ct), mean)
aggregate(z, months(ct), mean)
aggregate(z, years(ct), mean)

# days(ct) and years(ct) can actually
# be shortened to just days and years within the above context
# (and that would work for months too except they would be out of order)
aggregate(z, days, mean)
aggregate(z, years, mean)

2c) aggregate using yearmon

If we wish to summarize each year/month rather than lumping all January months together, all February months together, etc. then we need neither chron nor lubridate but rather can use zoo's yearmon:

library(zoo)
z <- zoo(1:1000, as.Date("1932-01-01") + 0:999)

aggregate(z, yearmon, mean)

Split date into day of the week, month,year using Pyspark

Spark 1.5 and higher has many date processing functions. Here are some that maybe useful for you

from pyspark.sql.functions import *
from pyspark.sql.functions import year, month, dayofweek
df = df.withColumn('dayOfWeek', dayofweek(col('your_date_column')))
df = df.withColumn('month', month(col('your_date_column')))
df = df.withColumn('year', year(col('your_date_column')))