Combine separate Year and Month columns into single Date Column
I'd use zoo::as.yearmon
as follows
df$Date <- as.yearmon(paste(df$year, df$month), "%Y %m")
It will not look like the desired output (i.e. 2011-01).
However, IMO this approach is better than m0h3n's because df$Date
will be saved as a yearmon
object rather than a string. Therefore you can be handled it like a date. For example, if you save df$Date
as a string you're going to have a hard time plotting your data over time, etc...
Problems with date format. How to combine separate month and year columns into a single date?
I think you just need to have the string in one of the default formats that can be read by as.yearmon
. From the docs, one of these is "%b %Y"
, so you can do:
library(dplyr)
library(zoo)
df %>% mutate(date = as.yearmon(paste(periodName, year)))
#> # A tibble: 6 x 8
#> year period periodName latest value footnotes seriesID date
#> <int> <chr> <chr> <chr> <int> <chr> <chr> <yearmon>
#> 1 2020 M07 July true 139582 "P preliminary" CES0000000001 Jul 2020
#> 2 2020 M06 June <NA> 137819 "P preliminary" CES0000000001 Jun 2020
#> 3 2020 M05 May <NA> 133028 "" CES0000000001 May 2020
#> 4 2020 M04 April <NA> 130303 "" CES0000000001 Apr 2020
#> 5 2020 M03 March <NA> 151090 "" CES0000000001 Mar 2020
#> 6 2020 M02 February <NA> 152463 "C corrected" CES0000000001 Feb 2020
Data
df <- structure(list(year = c(2020L, 2020L, 2020L, 2020L, 2020L, 2020L
), period = c("M07", "M06", "M05", "M04", "M03", "M02"), periodName = c("July",
"June", "May", "April", "March", "February"), latest = c("true",
NA, NA, NA, NA, NA), value = c(139582L, 137819L, 133028L, 130303L,
151090L, 152463L), footnotes = c("P preliminary", "P preliminary",
"", "", "", "C corrected"), seriesID = c("CES0000000001", "CES0000000001",
"CES0000000001", "CES0000000001", "CES0000000001", "CES0000000001"
)), row.names = c(NA, -6L), class = c("tbl_df", "tbl", "data.frame"
))
df
#> year period periodName latest value footnotes seriesID
#> 1 2020 M07 July true 139582 P preliminary CES0000000001
#> 2 2020 M06 June <NA> 137819 P preliminary CES0000000001
#> 3 2020 M05 May <NA> 133028 CES0000000001
#> 4 2020 M04 April <NA> 130303 CES0000000001
#> 5 2020 M03 March <NA> 151090 CES0000000001
#> 6 2020 M02 February <NA> 152463 C corrected CES0000000001
Created on 2020-08-07 by the reprex package (v0.3.0)
Combine month and year columns to create date column
You get the Invalid argument, not a string or column
because argument 1
in your concat_ws('/', df.month, 1, df.year)
is neither a column or a string (string that should be the name of a column). You can correct it by using lit
built-in function, as follows:
from pyspark.sql import functions as F
df = df.select(F.concat_ws('/', df.month, F.lit(1), df.year).alias('Month'), df["*"])
Cleanly combine year and month columns to single date column with pandas
Option 1
Pass a dataframe slice with 3 columns - YEAR
, MONTH
, and DAY
, to pd.to_datetime
.
df['DATE'] = pd.to_datetime(df[['YEAR', 'MONTH']].assign(DAY=1))
df
ID MONTH YEAR DATE
0 A 1 2017 2017-01-01
1 B 2 2017 2017-02-01
2 C 3 2017 2017-03-01
3 D 4 2017 2017-04-01
4 E 5 2017 2017-05-01
5 F 6 2017 2017-06-01
Option 2
String concatenation, with pd.to_datetime
.
pd.to_datetime(df.YEAR.astype(str) + '/' + df.MONTH.astype(str) + '/01')
0 2017-01-01
1 2017-02-01
2 2017-03-01
3 2017-04-01
4 2017-05-01
5 2017-06-01
dtype: datetime64[ns]
Year column and Month column into Date colum R
You could use sprintf
to format the month into two digits and paste with the year:
sprintf("%04d-%02d",df$Year,df$Month)
[1] "1983-05" "1983-06" "1983-07" "1983-08" "1983-09" "1983-10"
df$Date <- sprintf("%04d-%02d",df$Year,df$Month)
## dplyr alternative
df %>% mutate(Date=sprintf("%04d-%02d",df$Year,df$Month))
Year Month MEI CO2 CH4 N2O CFC.11 CFC.12 TSI Aerosols Temp Date
1 1983 5 2.556 345.96 1638.59 303.677 191.324 350.113 1366.102 0.0863 0.109 1983-05
2 1983 6 2.167 345.52 1633.71 303.746 192.057 351.848 1366.121 0.0794 0.118 1983-06
3 1983 7 1.741 344.15 1633.22 303.795 192.818 353.725 1366.285 0.0731 0.137 1983-07
4 1983 8 1.130 342.25 1631.35 303.839 193.602 355.633 1366.420 0.0673 0.176 1983-08
5 1983 9 0.428 340.17 1648.40 303.901 194.392 357.465 1366.234 0.0619 0.149 1983-09
6 1983 10 0.002 340.30 1663.79 303.970 195.171 359.174 1366.059 0.0569 0.093 1983-10
For completeness (improving on @KarthikS's comment) U could use stringr::str_c
and stringr::str_pad
the latter to pad the month with 0
s and the former to concatenate the Year and Month
str_c(df$Year, str_pad(df$Month, 2, pad=0), sep='-')
[1] "1983-05" "1983-06" "1983-07" "1983-08" "1983-09" "1983-10"
Combine Year , Month , and Day columns in Biqquery to a Date column
Use below
SELECT date(year, month, day) as date_ymd, *
FROM `bigquery-public-data.samples.gsod`
How to combine year, month, and day columns to single datetime column?
There is an easier way:
In [250]: df['Date']=pd.to_datetime(df[['year','month','day']])
In [251]: df
Out[251]:
id lat lon year month day Date
0 381 53.3066 -0.54649 2004 1 2 2004-01-02
1 381 53.3066 -0.54649 2004 1 3 2004-01-03
2 381 53.3066 -0.54649 2004 1 4 2004-01-04
from docs:
Assembling a datetime from multiple columns of a DataFrame. The keys
can be common abbreviations like [year
,month
,day
,minute
,
second
,ms
,us
,ns
]) or plurals of the same
Combine year, month and day in Python to create a date
Solution
You could use datetime.datetime
along with .apply()
.
import datetime
d = datetime.datetime(2020, 5, 17)
date = d.date()
For pandas.to_datetime(df)
It looks like your code is fine. See pandas.to_datetime
documentation and How to convert columns into one datetime column in pandas?.
df = pd.DataFrame({'year': [2015, 2016],
'month': [2, 3],
'day': [4, 5]})
pd.to_datetime(df[["year", "month", "day"]])
Output:
0 2015-02-04
1 2016-03-05
dtype: datetime64[ns]
What if your YEAR, MONTH and DAY columns have different headers?
Let's say your YEAR, MONTH and DAY columns are labeled as yy
, mm
and dd
respectively. And you prefer to keep your column names unchanged. In that case you could do it as follows.
import pandas as pd
df = pd.DataFrame({'yy': [2015, 2016],
'mm': [2, 3],
'dd': [4, 5]})
df2 = df[["yy", "mm", "dd"]].copy()
df2.columns = ["year", "month", "day"]
pd.to_datetime(df2)
Output:
0 2015-02-04
1 2016-03-05
dtype: datetime64[ns]
Related Topics
Plots with Good Resolution for Printing and Screen Display
Create Sections Through a Loop with Knitr
Convert Matrix to Three Column Data.Frame
Extract Standard Errors from Lm Object
Enter New Column Names as String in Dplyr's Rename Function
What's the Real Meaning About 'Everything That Exists Is an Object' in R
Solving for the Inverse of a Function in R
Can R Read from a File Through an Ssh Connection
Dataframe Create New Column Based on Other Columns
How to Self Join a Data.Table on a Condition
Labeling Center of Map Polygons in R Ggplot
Population Pyramid Density Plot in R
How to Remove Na from Facet_Wrap in Ggplot2