Read Column Names as Date Format

Looking to format column names to dates after reading in a CSV

Using as.yearmon you can try :

names(df) <- zoo::as.yearmon(names(df), 'X%Y.%m')

Or in base R pasting an arbitrary date :

names(df) <- format(as.Date(paste0(names(df), '.01'), 'X%Y.%m.%d'), '%b-%Y')

As an example :

x <- c('X2017.04', 'X2017.05', 'X2017.06')
format(as.Date(paste0(x, '.01'), 'X%Y.%m.%d'), '%b-%Y')
#[1] "Apr-2017" "May-2017" "Jun-2017"

How to convert data frame headers from excel values to date format

Here is a literal answer to your question, but I don't recommend it.

Table1 <- data.frame("43220"=c(1,7), "43251"=c(2,8), check.names = FALSE)
Table1
# 43220 43251
# 1 1 2
# 2 7 8
as.Date(as.integer(names(Table1)), origin = "1899-12-30")
# [1] "2018-04-30" "2018-05-31"
format(as.Date(as.integer(names(Table1)), origin = "1899-12-30"), format="%b %y")
# [1] "Apr 18" "May 18"
names(Table1) <- format(as.Date(as.integer(names(Table1)), origin = "1899-12-30"), format="%b %y")
Table1
# Apr 18 May 18
# 1 1 2
# 2 7 8

In general I discourage the presence of "data-like things" as column names, unless being shown in a report or rendering of the data. It can often be easier (but not always) dealing with data in a "long" format, where the date is a value of a column, not a column name:

Table1 <- data.frame("43220"=c(1,7), "43251"=c(2,8), check.names = FALSE)
Table1long <- tidyr::pivot_longer(Table1, everything(), names_to = "date")
Table1long
# # A tibble: 4 x 2
# date value
# <chr> <dbl>
# 1 43220 1
# 2 43251 2
# 3 43220 7
# 4 43251 8
Table1long$date <- as.Date(as.integer(Table1long$date), origin = "1899-12-30")
Table1long
# # A tibble: 4 x 2
# date value
# <date> <dbl>
# 1 2018-04-30 1
# 2 2018-05-31 2
# 3 2018-04-30 7
# 4 2018-05-31 8

where the date column is now actually number-like values, where things like addition and differencing work naturally. Once you are done munging/calculating on the data and are ready for reports, then you can format it with format(., format="%b %y") or such.

(An alternative to tidyr is reshape2:

reshape2::melt(Table1, -seq_len(ncol(Table1)), variable.name = "date")
# date value
# 1 43220 1
# 2 43220 7
# 3 43251 2
# 4 43251 8

Replace column headers date, into year and month, without changing remaining column headers

a clean way to do this would be to check the datatype of the columns Index element and format to the desired string if it is a timestamp.

Ex:

import pandas as pd

# simplified example for dataframe columns
cols = [ 'Key', 'COUNTRY', 'MARKET',
'A1', 'CAB', 'type',
pd.Timestamp("2021-06-01 00:00:00"), pd.Timestamp("2021-07-01 00:00:00")]

cols_new = [k.strftime("%Y_%B_%d") if isinstance(k, pd.Timestamp) else k for k in cols]

print(cols_new)
# ['Key', 'COUNTRY', 'MARKET', 'A1', 'CAB', 'type', '2021_June_01', '2021_July_01']

# update in one line:
# df.columns = [k.strftime("%Y_%B_%d") if isinstance(k, pd.Timestamp) else k for k in df.columns]

Change dataframe column names from string format to datetime

If select by loc columns values was not changed, so get KeyError.

So you need assign output to columns:

df.columns = pd.to_datetime(df.columns)

Sample:

cols = ['2000-01-01', '2000-02-01', '2000-03-01', '2000-04-01', '2000-05-01']
vals = np.arange(5)
df = pd.DataFrame(columns = cols, data=[vals])
print (df)
2000-01-01 2000-02-01 2000-03-01 2000-04-01 2000-05-01
0 0 1 2 3 4

print (df.columns)
Index(['2000-01-01', '2000-02-01', '2000-03-01', '2000-04-01', '2000-05-01'], dtype='object')

df.columns = pd.to_datetime(df.columns)

print (df.columns)
DatetimeIndex(['2000-01-01', '2000-02-01', '2000-03-01', '2000-04-01',
'2000-05-01'],
dtype='datetime64[ns]', freq=None)

Also is possible convert to period:

print (df.columns)
Index(['2000-01-01', '2000-02-01', '2000-03-01', '2000-04-01', '2000-05-01'], dtype='object')

df.columns = pd.to_datetime(df.columns).to_period('M')

print (df.columns)
PeriodIndex(['2000-01', '2000-02', '2000-03', '2000-04', '2000-05'],
dtype='period[M]', freq='M')


Related Topics



Leave a reply



Submit