R Create Function to Add Water Year Column

R: Create Function to Add Water Year Column

You're writing the function (a closure) to NN_Loads_Calculation$wtr_yr, rather than the vectorised output of the function.

Because the function has a length of one (ie. there's only one function), R tries to repeat it so that it's a vector of functions that's the same length as a column in your data frame. You can't run rep() on a closure, so that's where the code crashes.

You need to define your function then use it to calculate the new value:

## Define the function
get_water_year <- function(date_x = NULL, start_month = 9) {
  # Convert dates into POSIXlt
  dates.posix = as.POSIXlt(date_x)
  # Year offset
  offset = ifelse(dates.posix$mon >= start_month - 1, 1, 0)
  # Water year
  adj.year = dates.posix$year + 1900 + offset
  # Return the water year
  adj.year
}

### Write the function output to the df, rather than the function.
NN_Loads_Calculation$wtr_yr <- get_water_year(
date_x = NN_Loads_Calculations$Date.x,
start_month = 9)

R Create function to add water year column

We can use POSIXlt to come up with an answer.

wtr_yr <- function(dates, start_month=9) {
  # Convert dates into POSIXlt
  dates.posix = as.POSIXlt(dates)
  # Year offset
  offset = ifelse(dates.posix$mon >= start_month - 1, 1, 0)
  # Water year
  adj.year = dates.posix$year + 1900 + offset
  # Return the water year
  adj.year
}

Let's now use this function in an example.

# Sample input vector
dates = c("2008-01-01 00:00:00",
"2008-02-01 00:00:00",
"2008-03-01 00:00:00",
"2008-04-01 00:00:00",
"2009-01-01 00:00:00",
"2009-02-01 00:00:00",
"2009-03-01 00:00:00",
"2009-04-01 00:00:00")

# Display the function output
wtr_yr(dates, 2)

# Combine the input and output vectors in a dataframe
df = data.frame(dates, wtr_yr=wtr_yr(dates, 2))

Is there a R function to calculate Day of Water Year?

This should do it:

library(lubridate)
hydro.day.new = function(x, start.month = 10L){
  start.yr = year(x) - (month(x) < start.month)
  start.date = make_date(start.yr, start.month, 1L)
  as.integer(x - start.date + 1L)
}

Testing it out:

set.seed(123)
x = as.Date(as.POSIXct(sample(5000,10)*60*60*24, origin = "2000-01-01", tz = "GMT"))

hydro.day.new(x)
# [1]  70  16 311 123  43 321 174 166 289 180

hydro.day(x, "Fed")
# [1]  70  16 311 123  43 321 174 166 289 180

Reorder data frame from calendar year to water year using R

One option is to introduce a new column on which data will be arranged. One can subtract 1 year from the date when month is between Oct - Dec so that data for those rows appears with previous years/period.

library(dplyr)
library(lubridate)

df %>% mutate(DATE = ydm(DATE)) %>%
  mutate(WaterPeriod = 
      as.Date(ifelse(month(DATE)>=10, DATE-years(1), DATE),origin = "1970-01-01")) %>%
  arrange(STATION, WaterPeriod) %>%
  select(-WaterPeriod)

How to aggregate using water years (oct 1 2008- sept 31 2009)

Assuming your data are in long format, I'd do something like this:

 library(tidyverse)

 #make sure R knows your dates are dates - you mention they're 'yyyy-mm-dd', so
 yourdataframe <- yourdataframe %>% 
                  mutate(yourcolumnforprecipdate = ymd(yourcolumnforprecipdate) 

 #in this script or another, define a water year function
 water_year <- function(date) {
               ifelse(month(date) < 10, year(date), year(date)+1)}

 #new wateryear column for your data, using your new function
 yourdataframe <- yourdataframe %>% 
                  mutate(wateryear = water_year(yourcolumnforprecipdate)

 #now group by water year (and location if there's more than one) 
 #and sum and create new data.frame

 wy_sums <- yourdataframe %>% group_by(locationcolumn, wateryear) %>% 
            summarize(wy_totalprecip = sum(dailyprecip))

For more info, read up on the tidyverse 's great sublibrary called lubridate -
where the ymd() function is from. There are others like ymd_hms(). mutate() is from the tidyverse's dplyr libary. Both libraries are extremely useful!

Create new variable using year first treated

We can create the group key with cumsum , then transform the first value assign it back

s = df['Treated'].eq(0)
df['new'] = df[~s].groupby(df['Treated'].eq(0).cumsum())['Year'].transform('first')
df.new.fillna(0,inplace=True)
#df.new = df.new.astype(int)
df
  Group_ID  Year  Treated     new
0       CA  2014        0     0.0
1       CA  2015        0     0.0
2       CA  2016        1  2016.0
3       CA  2017        1  2016.0
4       WA  2011        0     0.0
5       WA  2012        1  2012.0
6       WA  2013        1  2012.0
7       TX  2010        0     0.0

r - Add variable in column according to specific date in year

df <- data.frame(date=as.Date(c("2013-04-01", 
                                "2013-04-02", 
                                "2014-04-01")), 
                 re=1:3)
x <- -100
i <- which(format(df$date, "%d.%m.") == "01.04.") 
df$re[i] <- df$re[i] + x
#         date  re
# 1 2013-04-01 -99
# 2 2013-04-02   2
# 3 2014-04-01 -97

Python, adding a Water-Year time variable in an X-array

I'll create a sample dataset with a single variable for this example:

In [2]: scratch = xr.Dataset(
   ...:     {'Baseflow': (('time', ), np.random.random(4018))},
   ...:     coords={'time': pd.date_range('2002-10-01', freq='D', periods=4018)},
   ...: )

In [3]: scratch
Out[3]:
<xarray.Dataset>
Dimensions:   (time: 4018)
Coordinates:
  * time      (time) datetime64[ns] 2002-10-01 2002-10-02 ... 2013-09-30
Data variables:
    Baseflow  (time) float64 0.7588 0.05129 0.9914 ... 0.7744 0.6581 0.8686

We can build a water_year array using the Datetime Components accessor .dt:

In [4]: water_year = (scratch.time.dt.month >= 10) + scratch.time.dt.year
   ...: water_year
Out[4]:
<xarray.DataArray (time: 4018)>
array([2003, 2003, 2003, ..., 2013, 2013, 2013])
Coordinates:
  * time     (time) datetime64[ns] 2002-10-01 2002-10-02 ... 2013-09-30

Because water_year is a DataArray indexed by an existing dimension, we can just add it as a coordinate and xarray will understand that it's a non-dimension coordinate. This is important to make sure we don't create a new dimension in our data.

In [7]: scratch.coords['water_year'] = water_year

In [8]: scratch
Out[8]:
<xarray.Dataset>
Dimensions:     (time: 4018)
Coordinates:
  * time        (time) datetime64[ns] 2002-10-01 2002-10-02 ... 2013-09-30
    water_year  (time) int64 2003 2003 2003 2003 2003 ... 2013 2013 2013 2013
Data variables:
    Baseflow    (time) float64 0.7588 0.05129 0.9914 ... 0.7744 0.6581 0.8686

Because water_year is indexed by time, we still need to select from the arrays using the time dimension, but we can subset the arrays to specific water years:

In [9]: scratch.sel(time=(scratch.water_year == 2010))
Out[9]:
<xarray.Dataset>
Dimensions:     (time: 365)
Coordinates:
  * time        (time) datetime64[ns] 2009-10-01 2009-10-02 ... 2010-09-30
    water_year  (time) int64 2010 2010 2010 2010 2010 ... 2010 2010 2010 2010
Data variables:
    Baseflow    (time) float64 0.441 0.7586 0.01377 ... 0.2656 0.1054 0.6964

Aggregation operations can use non-dimension coordinates directly, so the following works:

In [10]: scratch.groupby('water_year').sum()
Out[10]:
<xarray.Dataset>
Dimensions:     (water_year: 11)
Coordinates:
  * water_year  (water_year) int64 2003 2004 2005 2006 ... 2010 2011 2012 2013
Data variables:
    Baseflow    (water_year) float64 187.6 186.4 184.7 ... 185.2 189.6 192.7