Merge Netcdf Files in R

Merge netCDF files in R

The ncdf4 package will do what you want to do. Have a look at the code below, example for one variable only.

#install.packages('ncdf4')
library(ncdf4)

file1 <- nc_open('England_aggr_GPW4_2000_0001.nc')
file2 <- nc_open('England_aggr_GPW4_2000_0002.nc')

# Just for one variable for now
dat_new <- cbind(
ncvar_get(file1, 'Susceptible'),
ncvar_get(file2, 'Susceptible'))
dim(dat_new)
var <- file1$var['Susceptible']$Susceptible

# Create a new file
file_new3 <- nc_create(
filename = 'England_aggr_GPW4_2000_new.nc',
# We need to define the variables here
vars = ncvar_def(
name = 'Susceptible',
units = var$units,
dim = dim(dat_new)))

# And write to it
ncvar_put(
nc = file_new,
varid = 'Susceptible',
vals = dat_new)

# Finally, close the file
nc_close(file_new)

Update:
An alternative approach is using the raster package as shown below. I didn't figure out how to make 4D raster stacks, so I am splitting your data into one NCDF file per variable. Would that work for you?

#install.packages('ncdf4')
library(ncdf4)
library(raster)

var_names <- c('Susceptible', 'Infected', 'Recovered', 'Inhabitable')

for (var_name in var_names) {

# Create raster stack
x <- stack(
raster('England_aggr_GPW4_2000_0001.nc', varname = var_name),
raster('England_aggr_GPW4_2000_0002.nc', varname = var_name))

# Name each layer
names(x) <- c('01', '02')

writeRaster(x = x,
filename = paste0(var_name, '_out.nc'),
overwrite = TRUE,
format = 'CDF')
}

How to combine 'variables' from multiple NetCDF files into a single NetCDF file?

xarray has great documentation on combining data, and I highly recommend giving them a close read! But sometimes it can be confusing if you're just getting started which operation to use. Also, if you have specific feedback on which parts of the documentation you found confusing, I'm sure the xarray devs would love that feedback (esp if you're willing to make a contribution to the docs yourself)!

There are generally four ways to combine data. Directly from the docs:

  • For combining datasets or data arrays along a single dimension, see concatenate.
  • For combining datasets with different variables, see merge.
  • For combining datasets or data arrays with different indexes or missing values, see combine.
  • For combining datasets or data arrays along multiple dimensions see combining along multiple dimensions.

From your question, it looks like you have two datasets which are distinct only in the month of data represented. Other than the time component, it sounds like the two datasets are the same, each with u, v, and w variables, and with the dimensions of these variables consistent between the two Datasets with the exception of the time dimension. Because of this, this seems like a perfect use case for concatenate. Concatenation just means joining two arrays together by placing them next to each other along a single axis to form a single, larger array. When you concatenate datasets, xarray automatically concatenates each array within the dataset.

Merge is more appropriate if you have two datasets that are similar in all of their dimensions, but differ in which variables are present. For example, if you had three datasets, all of which the same dims, but one had the u variable, the second had v, and the third dataset had w, then we would combine these variables into one larger dataset with three variables (and the same dims) using merge.

Now that we now which approach to take, we're ready to start concatenating. The actual implementation will depend a bit on whether the data has a time dimension, with each file having only one value along this dimension, or if there's no time dimension at all.

If the concatenation dim is already present in the data

If the time dimension is already present, this is very easy - all we need to do is tell xarray to concatenate along time.

Using the data you've already read in, we can use xr.concat to combine along any single dim:

# I'm using the more standard variable names "ds" to avoid confusion 
# with pandas DataFrames, but these refer to df1 and df2 in your question
ds_merged = xr.concat([ds1, ds2], dim="time")

Alternately, you could concatenate the arrays as you read them in, by using xr.open_mfdataset. The syntax is similar:

fps = ["ocean_avg_November.nc4", "ocean_avg_December.nc4"]
ds = xr.open_mfdataset(fps, concat_dim="time")

If the concatenation dim is not present

If your data does not yet have a time dimension, we'll need to tell xarray how to differentiate between the two arrays in time. We can do this in a couple of ways. You could expand the dimensionality of the arrays first, using xr.Dataset.expand_dims, e.g. ds1.expand_dims(time=['2013-11-01']), and the same for ds2, and then concatenate the datasets as above. This makes it very clear what's going on, but it has a slight disadvantage of being slower, since you'll need to resize your arrays twice.

A faster option is to define your dimension as you concatenate. To do this, we'll create a pandas DatetimeIndex object manually with pd.to_datetime, which will form the new dimension.

new_dimension = pd.to_datetime(["2013-11-01", "2013-12-01"], name='time')
ds = pd.concat([ds1, ds2], dim=new_dimension)

Similarly, we can use the DatetimeIndex as we read in the data:

ds = xr.open_mfdataset(fps, concat_dim=new_dimension)

When doing this, we do need to be careful to make sure that the order of the datasets (or filepaths) is consistent with the order of the dates in the new dimension, because we're manually pairing them.

If you want only a subset of the variables in each dataset

The above methods will work for either a single variable (or DataArray), or for all of the arrays in a dataset (xarray will apply the combine rules to all variables and coordinates automatically).

If you're trying to concatenate only some of the available variables (let's say each file had variables u, v, w, x, y, and z), you could filter them using the above methods ahead of time, or when reading them.

Using xr.concat:

ds = xr.concat([ds1[["u", "v", "w"]], ds2[["u", "v", "w"]]], dim="time")

or using the data_vars argument to xr.open_mfdataset:

ds = xr.open_mfdataset(fps, data_vars=["u", "v", "w"], concat_dim="time")

How to merge/ combine time variable in multiple netcdf

You can use the mergetime argument with CDO:

cdo mergetime sstdas_*.nc merged_file.nc

note that CDO has all files open at once for this operation, and on some systems there is a limit to the number of open file concurrently, in which case CDO will throw an error. In that case you will need to loop, which you can do with CDO using a bash loop, I would suggest trying one year at a time and then merging those:

#!/bin/bash
for year in {1981..2007} ; do
cdo mergetime sstdas_${year}??.nc sstdas_${year}.nc
done
cdo mergetime sstdas_????.nc sstdas_wholeseries.nc

Note I use the single letter wildcard "?" to be stricter and avoid any mixing of daily and yearly files with the two merges.



Related Topics



Leave a reply



Submit