Combine Separate Year and Month Columns into Single Date Column

Combine separate Year and Month columns into single Date Column

I'd use zoo::as.yearmon as follows

df$Date <- as.yearmon(paste(df$year, df$month), "%Y %m")

It will not look like the desired output (i.e. 2011-01).

However, IMO this approach is better than m0h3n's because df$Date will be saved as a yearmon object rather than a string. Therefore you can be handled it like a date. For example, if you save df$Date as a string you're going to have a hard time plotting your data over time, etc...

Problems with date format. How to combine separate month and year columns into a single date?

I think you just need to have the string in one of the default formats that can be read by as.yearmon. From the docs, one of these is "%b %Y", so you can do:

library(dplyr)
library(zoo)

df %>% mutate(date = as.yearmon(paste(periodName, year)))
#> # A tibble: 6 x 8
#> year period periodName latest value footnotes seriesID date
#> <int> <chr> <chr> <chr> <int> <chr> <chr> <yearmon>
#> 1 2020 M07 July true 139582 "P preliminary" CES0000000001 Jul 2020
#> 2 2020 M06 June <NA> 137819 "P preliminary" CES0000000001 Jun 2020
#> 3 2020 M05 May <NA> 133028 "" CES0000000001 May 2020
#> 4 2020 M04 April <NA> 130303 "" CES0000000001 Apr 2020
#> 5 2020 M03 March <NA> 151090 "" CES0000000001 Mar 2020
#> 6 2020 M02 February <NA> 152463 "C corrected" CES0000000001 Feb 2020

Data

df <- structure(list(year = c(2020L, 2020L, 2020L, 2020L, 2020L, 2020L
), period = c("M07", "M06", "M05", "M04", "M03", "M02"), periodName = c("July",
"June", "May", "April", "March", "February"), latest = c("true",
NA, NA, NA, NA, NA), value = c(139582L, 137819L, 133028L, 130303L,
151090L, 152463L), footnotes = c("P preliminary", "P preliminary",
"", "", "", "C corrected"), seriesID = c("CES0000000001", "CES0000000001",
"CES0000000001", "CES0000000001", "CES0000000001", "CES0000000001"
)), row.names = c(NA, -6L), class = c("tbl_df", "tbl", "data.frame"
))

df
#> year period periodName latest value footnotes seriesID
#> 1 2020 M07 July true 139582 P preliminary CES0000000001
#> 2 2020 M06 June <NA> 137819 P preliminary CES0000000001
#> 3 2020 M05 May <NA> 133028 CES0000000001
#> 4 2020 M04 April <NA> 130303 CES0000000001
#> 5 2020 M03 March <NA> 151090 CES0000000001
#> 6 2020 M02 February <NA> 152463 C corrected CES0000000001

Created on 2020-08-07 by the reprex package (v0.3.0)

Combine month and year columns to create date column

You get the Invalid argument, not a string or column because argument 1 in your concat_ws('/', df.month, 1, df.year) is neither a column or a string (string that should be the name of a column). You can correct it by using lit built-in function, as follows:

from pyspark.sql import functions as F

df = df.select(F.concat_ws('/', df.month, F.lit(1), df.year).alias('Month'), df["*"])

Cleanly combine year and month columns to single date column with pandas

Option 1
Pass a dataframe slice with 3 columns - YEAR, MONTH, and DAY, to pd.to_datetime.

df['DATE'] = pd.to_datetime(df[['YEAR', 'MONTH']].assign(DAY=1))
df

ID MONTH YEAR DATE
0 A 1 2017 2017-01-01
1 B 2 2017 2017-02-01
2 C 3 2017 2017-03-01
3 D 4 2017 2017-04-01
4 E 5 2017 2017-05-01
5 F 6 2017 2017-06-01

Option 2
String concatenation, with pd.to_datetime.

pd.to_datetime(df.YEAR.astype(str) + '/' + df.MONTH.astype(str) + '/01')

0 2017-01-01
1 2017-02-01
2 2017-03-01
3 2017-04-01
4 2017-05-01
5 2017-06-01
dtype: datetime64[ns]

Year column and Month column into Date colum R

You could use sprintf to format the month into two digits and paste with the year:

sprintf("%04d-%02d",df$Year,df$Month)
[1] "1983-05" "1983-06" "1983-07" "1983-08" "1983-09" "1983-10"
df$Date <- sprintf("%04d-%02d",df$Year,df$Month)
## dplyr alternative
df %>% mutate(Date=sprintf("%04d-%02d",df$Year,df$Month))
  Year Month   MEI    CO2     CH4     N2O  CFC.11  CFC.12      TSI Aerosols  Temp    Date
1 1983 5 2.556 345.96 1638.59 303.677 191.324 350.113 1366.102 0.0863 0.109 1983-05
2 1983 6 2.167 345.52 1633.71 303.746 192.057 351.848 1366.121 0.0794 0.118 1983-06
3 1983 7 1.741 344.15 1633.22 303.795 192.818 353.725 1366.285 0.0731 0.137 1983-07
4 1983 8 1.130 342.25 1631.35 303.839 193.602 355.633 1366.420 0.0673 0.176 1983-08
5 1983 9 0.428 340.17 1648.40 303.901 194.392 357.465 1366.234 0.0619 0.149 1983-09
6 1983 10 0.002 340.30 1663.79 303.970 195.171 359.174 1366.059 0.0569 0.093 1983-10

For completeness (improving on @KarthikS's comment) U could use stringr::str_c and stringr::str_pad the latter to pad the month with 0s and the former to concatenate the Year and Month

str_c(df$Year, str_pad(df$Month, 2, pad=0), sep='-')
[1] "1983-05" "1983-06" "1983-07" "1983-08" "1983-09" "1983-10"

Combine Year , Month , and Day columns in Biqquery to a Date column

Use below

SELECT date(year, month, day) as date_ymd, *
FROM `bigquery-public-data.samples.gsod`

How to combine year, month, and day columns to single datetime column?

There is an easier way:

In [250]: df['Date']=pd.to_datetime(df[['year','month','day']])

In [251]: df
Out[251]:
id lat lon year month day Date
0 381 53.3066 -0.54649 2004 1 2 2004-01-02
1 381 53.3066 -0.54649 2004 1 3 2004-01-03
2 381 53.3066 -0.54649 2004 1 4 2004-01-04

from docs:

Assembling a datetime from multiple columns of a DataFrame. The keys
can be common abbreviations like [year, month, day, minute,
second, ms, us, ns]) or plurals of the same

Combine year, month and day in Python to create a date

Solution

You could use datetime.datetime along with .apply().

import datetime

d = datetime.datetime(2020, 5, 17)
date = d.date()

For pandas.to_datetime(df)

It looks like your code is fine. See pandas.to_datetime documentation and How to convert columns into one datetime column in pandas?.

df = pd.DataFrame({'year': [2015, 2016],
'month': [2, 3],
'day': [4, 5]})
pd.to_datetime(df[["year", "month", "day"]])

Output:

0   2015-02-04
1 2016-03-05
dtype: datetime64[ns]

What if your YEAR, MONTH and DAY columns have different headers?

Let's say your YEAR, MONTH and DAY columns are labeled as yy, mm and dd respectively. And you prefer to keep your column names unchanged. In that case you could do it as follows.

import pandas as pd

df = pd.DataFrame({'yy': [2015, 2016],
'mm': [2, 3],
'dd': [4, 5]})
df2 = df[["yy", "mm", "dd"]].copy()
df2.columns = ["year", "month", "day"]
pd.to_datetime(df2)

Output:

0   2015-02-04
1 2016-03-05
dtype: datetime64[ns]


Related Topics



Leave a reply



Submit