Combine Separate Year and Month Columns into Single Date Column

Combine separate Year and Month columns into single Date Column

I'd use zoo::as.yearmon as follows

df$Date <- as.yearmon(paste(df$year, df$month), "%Y %m")

It will not look like the desired output (i.e. 2011-01).

However, IMO this approach is better than m0h3n's because df$Date will be saved as a yearmon object rather than a string. Therefore you can be handled it like a date. For example, if you save df$Date as a string you're going to have a hard time plotting your data over time, etc...

Problems with date format. How to combine separate month and year columns into a single date?

I think you just need to have the string in one of the default formats that can be read by as.yearmon. From the docs, one of these is "%b %Y", so you can do:

library(dplyr)
library(zoo)

df %>% mutate(date = as.yearmon(paste(periodName, year)))
#> # A tibble: 6 x 8
#>    year period periodName latest  value footnotes       seriesID      date     
#>   <int> <chr>  <chr>      <chr>   <int> <chr>           <chr>         <yearmon>
#> 1  2020 M07    July       true   139582 "P preliminary" CES0000000001 Jul 2020 
#> 2  2020 M06    June       <NA>   137819 "P preliminary" CES0000000001 Jun 2020 
#> 3  2020 M05    May        <NA>   133028 ""              CES0000000001 May 2020 
#> 4  2020 M04    April      <NA>   130303 ""              CES0000000001 Apr 2020 
#> 5  2020 M03    March      <NA>   151090 ""              CES0000000001 Mar 2020 
#> 6  2020 M02    February   <NA>   152463 "C corrected"   CES0000000001 Feb 2020

Data

df <- structure(list(year = c(2020L, 2020L, 2020L, 2020L, 2020L, 2020L
), period = c("M07", "M06", "M05", "M04", "M03", "M02"), periodName = c("July", 
"June", "May", "April", "March", "February"), latest = c("true", 
NA, NA, NA, NA, NA), value = c(139582L, 137819L, 133028L, 130303L, 
151090L, 152463L), footnotes = c("P preliminary", "P preliminary", 
"", "", "", "C corrected"), seriesID = c("CES0000000001", "CES0000000001", 
"CES0000000001", "CES0000000001", "CES0000000001", "CES0000000001"
)), row.names = c(NA, -6L), class = c("tbl_df", "tbl", "data.frame"
))

df
#>   year period periodName latest  value     footnotes      seriesID
#> 1 2020    M07       July   true 139582 P preliminary CES0000000001
#> 2 2020    M06       June   <NA> 137819 P preliminary CES0000000001
#> 3 2020    M05        May   <NA> 133028               CES0000000001
#> 4 2020    M04      April   <NA> 130303               CES0000000001
#> 5 2020    M03      March   <NA> 151090               CES0000000001
#> 6 2020    M02   February   <NA> 152463   C corrected CES0000000001

^{Created on 2020-08-07 by the reprex package (v0.3.0)}

Combine month and year columns to create date column

You get the Invalid argument, not a string or column because argument 1 in your concat_ws('/', df.month, 1, df.year) is neither a column or a string (string that should be the name of a column). You can correct it by using lit built-in function, as follows:

from pyspark.sql import functions as F

df = df.select(F.concat_ws('/', df.month, F.lit(1), df.year).alias('Month'), df["*"])

Cleanly combine year and month columns to single date column with pandas

Option 1
Pass a dataframe slice with 3 columns - YEAR, MONTH, and DAY, to pd.to_datetime.

df['DATE'] = pd.to_datetime(df[['YEAR', 'MONTH']].assign(DAY=1))
df

  ID  MONTH  YEAR       DATE
0  A      1  2017 2017-01-01
1  B      2  2017 2017-02-01
2  C      3  2017 2017-03-01
3  D      4  2017 2017-04-01
4  E      5  2017 2017-05-01
5  F      6  2017 2017-06-01

Option 2
String concatenation, with pd.to_datetime.

pd.to_datetime(df.YEAR.astype(str) + '/' + df.MONTH.astype(str) + '/01')

0   2017-01-01
1   2017-02-01
2   2017-03-01
3   2017-04-01
4   2017-05-01
5   2017-06-01
dtype: datetime64[ns]

Year column and Month column into Date colum R

You could use sprintf to format the month into two digits and paste with the year:

sprintf("%04d-%02d",df$Year,df$Month)
[1] "1983-05" "1983-06" "1983-07" "1983-08" "1983-09" "1983-10"
df$Date <- sprintf("%04d-%02d",df$Year,df$Month)
## dplyr alternative
df %>% mutate(Date=sprintf("%04d-%02d",df$Year,df$Month))

  Year Month   MEI    CO2     CH4     N2O  CFC.11  CFC.12      TSI Aerosols  Temp    Date
1 1983     5 2.556 345.96 1638.59 303.677 191.324 350.113 1366.102   0.0863 0.109 1983-05
2 1983     6 2.167 345.52 1633.71 303.746 192.057 351.848 1366.121   0.0794 0.118 1983-06
3 1983     7 1.741 344.15 1633.22 303.795 192.818 353.725 1366.285   0.0731 0.137 1983-07
4 1983     8 1.130 342.25 1631.35 303.839 193.602 355.633 1366.420   0.0673 0.176 1983-08
5 1983     9 0.428 340.17 1648.40 303.901 194.392 357.465 1366.234   0.0619 0.149 1983-09
6 1983    10 0.002 340.30 1663.79 303.970 195.171 359.174 1366.059   0.0569 0.093 1983-10

For completeness (improving on @KarthikS's comment) U could use stringr::str_c and stringr::str_pad the latter to pad the month with 0s and the former to concatenate the Year and Month

str_c(df$Year, str_pad(df$Month, 2, pad=0), sep='-')
[1] "1983-05" "1983-06" "1983-07" "1983-08" "1983-09" "1983-10"

Combine Year , Month , and Day columns in Biqquery to a Date column

Use below

SELECT date(year, month, day) as date_ymd, *
FROM `bigquery-public-data.samples.gsod`

How to combine year, month, and day columns to single datetime column?

There is an easier way:

In [250]: df['Date']=pd.to_datetime(df[['year','month','day']])

In [251]: df
Out[251]:
    id      lat      lon  year  month  day       Date
0  381  53.3066 -0.54649  2004      1    2 2004-01-02
1  381  53.3066 -0.54649  2004      1    3 2004-01-03
2  381  53.3066 -0.54649  2004      1    4 2004-01-04

from docs:

Assembling a datetime from multiple columns of a DataFrame. The keys
can be common abbreviations like [year, month, day, minute,
second, ms, us, ns]) or plurals of the same

Combine year, month and day in Python to create a date

Solution

You could use datetime.datetime along with .apply().

import datetime

d = datetime.datetime(2020, 5, 17)
date = d.date()

For `pandas.to_datetime(df)`

It looks like your code is fine. See pandas.to_datetime documentation and How to convert columns into one datetime column in pandas?.

df = pd.DataFrame({'year': [2015, 2016],
                   'month': [2, 3],
                   'day': [4, 5]})
pd.to_datetime(df[["year", "month", "day"]])

Output:

0   2015-02-04
1   2016-03-05
dtype: datetime64[ns]

What if your YEAR, MONTH and DAY columns have different headers?

Let's say your YEAR, MONTH and DAY columns are labeled as yy, mm and dd respectively. And you prefer to keep your column names unchanged. In that case you could do it as follows.

import pandas as pd

df = pd.DataFrame({'yy': [2015, 2016],
                   'mm': [2, 3],
                   'dd': [4, 5]})
df2 = df[["yy", "mm", "dd"]].copy()
df2.columns = ["year", "month", "day"]
pd.to_datetime(df2)

Output:

0   2015-02-04
1   2016-03-05
dtype: datetime64[ns]

Combine Separate Year and Month Columns into Single Date Column