R Create Function to Add Water Year Column

R: Create Function to Add Water Year Column

You're writing the function (a closure) to NN_Loads_Calculation$wtr_yr, rather than the vectorised output of the function.

Because the function has a length of one (ie. there's only one function), R tries to repeat it so that it's a vector of functions that's the same length as a column in your data frame. You can't run rep() on a closure, so that's where the code crashes.

You need to define your function then use it to calculate the new value:

## Define the function
get_water_year <- function(date_x = NULL, start_month = 9) {
# Convert dates into POSIXlt
dates.posix = as.POSIXlt(date_x)
# Year offset
offset = ifelse(dates.posix$mon >= start_month - 1, 1, 0)
# Water year
adj.year = dates.posix$year + 1900 + offset
# Return the water year
adj.year
}

### Write the function output to the df, rather than the function.
NN_Loads_Calculation$wtr_yr <- get_water_year(
date_x = NN_Loads_Calculations$Date.x,
start_month = 9)

R Create function to add water year column

We can use POSIXlt to come up with an answer.

wtr_yr <- function(dates, start_month=9) {
# Convert dates into POSIXlt
dates.posix = as.POSIXlt(dates)
# Year offset
offset = ifelse(dates.posix$mon >= start_month - 1, 1, 0)
# Water year
adj.year = dates.posix$year + 1900 + offset
# Return the water year
adj.year
}

Let's now use this function in an example.

# Sample input vector
dates = c("2008-01-01 00:00:00",
"2008-02-01 00:00:00",
"2008-03-01 00:00:00",
"2008-04-01 00:00:00",
"2009-01-01 00:00:00",
"2009-02-01 00:00:00",
"2009-03-01 00:00:00",
"2009-04-01 00:00:00")

# Display the function output
wtr_yr(dates, 2)

# Combine the input and output vectors in a dataframe
df = data.frame(dates, wtr_yr=wtr_yr(dates, 2))

Is there a R function to calculate Day of Water Year?

This should do it:

library(lubridate)
hydro.day.new = function(x, start.month = 10L){
start.yr = year(x) - (month(x) < start.month)
start.date = make_date(start.yr, start.month, 1L)
as.integer(x - start.date + 1L)
}

Testing it out:

set.seed(123)
x = as.Date(as.POSIXct(sample(5000,10)*60*60*24, origin = "2000-01-01", tz = "GMT"))

hydro.day.new(x)
# [1] 70 16 311 123 43 321 174 166 289 180

hydro.day(x, "Fed")
# [1] 70 16 311 123 43 321 174 166 289 180

Reorder data frame from calendar year to water year using R

One option is to introduce a new column on which data will be arranged. One can subtract 1 year from the date when month is between Oct - Dec so that data for those rows appears with previous years/period.

library(dplyr)
library(lubridate)

df %>% mutate(DATE = ydm(DATE)) %>%
mutate(WaterPeriod =
as.Date(ifelse(month(DATE)>=10, DATE-years(1), DATE),origin = "1970-01-01")) %>%
arrange(STATION, WaterPeriod) %>%
select(-WaterPeriod)

How to aggregate using water years (oct 1 2008- sept 31 2009)

Assuming your data are in long format, I'd do something like this:

 library(tidyverse)

#make sure R knows your dates are dates - you mention they're 'yyyy-mm-dd', so
yourdataframe <- yourdataframe %>%
mutate(yourcolumnforprecipdate = ymd(yourcolumnforprecipdate)

#in this script or another, define a water year function
water_year <- function(date) {
ifelse(month(date) < 10, year(date), year(date)+1)}

#new wateryear column for your data, using your new function
yourdataframe <- yourdataframe %>%
mutate(wateryear = water_year(yourcolumnforprecipdate)

#now group by water year (and location if there's more than one)
#and sum and create new data.frame

wy_sums <- yourdataframe %>% group_by(locationcolumn, wateryear) %>%
summarize(wy_totalprecip = sum(dailyprecip))

For more info, read up on the tidyverse 's great sublibrary called lubridate -
where the ymd() function is from. There are others like ymd_hms(). mutate() is from the tidyverse's dplyr libary. Both libraries are extremely useful!

Create new variable using year first treated

We can create the group key with cumsum , then transform the first value assign it back

s = df['Treated'].eq(0)
df['new'] = df[~s].groupby(df['Treated'].eq(0).cumsum())['Year'].transform('first')
df.new.fillna(0,inplace=True)
#df.new = df.new.astype(int)
df
Group_ID Year Treated new
0 CA 2014 0 0.0
1 CA 2015 0 0.0
2 CA 2016 1 2016.0
3 CA 2017 1 2016.0
4 WA 2011 0 0.0
5 WA 2012 1 2012.0
6 WA 2013 1 2012.0
7 TX 2010 0 0.0

r - Add variable in column according to specific date in year

df <- data.frame(date=as.Date(c("2013-04-01", 
"2013-04-02",
"2014-04-01")),
re=1:3)
x <- -100
i <- which(format(df$date, "%d.%m.") == "01.04.")
df$re[i] <- df$re[i] + x
# date re
# 1 2013-04-01 -99
# 2 2013-04-02 2
# 3 2014-04-01 -97

Python, adding a Water-Year time variable in an X-array

I'll create a sample dataset with a single variable for this example:

In [2]: scratch = xr.Dataset(
...: {'Baseflow': (('time', ), np.random.random(4018))},
...: coords={'time': pd.date_range('2002-10-01', freq='D', periods=4018)},
...: )

In [3]: scratch
Out[3]:
<xarray.Dataset>
Dimensions: (time: 4018)
Coordinates:
* time (time) datetime64[ns] 2002-10-01 2002-10-02 ... 2013-09-30
Data variables:
Baseflow (time) float64 0.7588 0.05129 0.9914 ... 0.7744 0.6581 0.8686

We can build a water_year array using the Datetime Components accessor .dt:

In [4]: water_year = (scratch.time.dt.month >= 10) + scratch.time.dt.year
...: water_year
Out[4]:
<xarray.DataArray (time: 4018)>
array([2003, 2003, 2003, ..., 2013, 2013, 2013])
Coordinates:
* time (time) datetime64[ns] 2002-10-01 2002-10-02 ... 2013-09-30

Because water_year is a DataArray indexed by an existing dimension, we can just add it as a coordinate and xarray will understand that it's a non-dimension coordinate. This is important to make sure we don't create a new dimension in our data.

In [7]: scratch.coords['water_year'] = water_year

In [8]: scratch
Out[8]:
<xarray.Dataset>
Dimensions: (time: 4018)
Coordinates:
* time (time) datetime64[ns] 2002-10-01 2002-10-02 ... 2013-09-30
water_year (time) int64 2003 2003 2003 2003 2003 ... 2013 2013 2013 2013
Data variables:
Baseflow (time) float64 0.7588 0.05129 0.9914 ... 0.7744 0.6581 0.8686

Because water_year is indexed by time, we still need to select from the arrays using the time dimension, but we can subset the arrays to specific water years:

In [9]: scratch.sel(time=(scratch.water_year == 2010))
Out[9]:
<xarray.Dataset>
Dimensions: (time: 365)
Coordinates:
* time (time) datetime64[ns] 2009-10-01 2009-10-02 ... 2010-09-30
water_year (time) int64 2010 2010 2010 2010 2010 ... 2010 2010 2010 2010
Data variables:
Baseflow (time) float64 0.441 0.7586 0.01377 ... 0.2656 0.1054 0.6964

Aggregation operations can use non-dimension coordinates directly, so the following works:

In [10]: scratch.groupby('water_year').sum()
Out[10]:
<xarray.Dataset>
Dimensions: (water_year: 11)
Coordinates:
* water_year (water_year) int64 2003 2004 2005 2006 ... 2010 2011 2012 2013
Data variables:
Baseflow (water_year) float64 187.6 186.4 184.7 ... 185.2 189.6 192.7


Related Topics



Leave a reply



Submit