How to Convert Fixed Size Dimension to Unlimited in a Netcdf File

How to convert fixed size dimension to unlimited in a netcdf file

Your answers are very insightful. I'm not really looking a way to improve this ncdump-sed-ncgen method, I know that dumping a netcdf file that is 600MB uses almost 5 times more space in a text file (CDL representation). To then modify some header text and generate the netcdf file again, doesn't feels very efficient.

I read the latest NCO commands documentation, and found a option specific to ncks "--mk_rec_dmn". Ncks mainly extracts and writes or appends data to a new netcdf file, then this seems the better approach, extract all the data of myfile.nc and write it with a new record dimension (unlimited dimension) which the "--mk_rec_dmn" does, then replace the old file.


ncks --mk_rec_dmn time_counter myfile.nc -o myfileunlimited.nc ; mv myfileunlimited.nc myfile.nc

To do the opposite operation (record dimension to fixed-size) would be.


ncks --fix_rec_dmn time_counter myfile.nc -o myfilefixedsize.nc ; mv myfilefixedsize.nc myfile.nc

NetCDF file - why is file 1/3 size after fixing record dimension?

Please try to provide a code that works without modification if possible, I had to edit to get it working, but it wasn't too difficult.

import netCDF4 as nc
import numpy as np
dout = nc.Dataset('testdset.nc4', mode='w', clobber=True, format="NETCDF4")
dout.createDimension('point', size=None)
dout.createDimension('realization', size=24)
for varname in ['mod_hs','mod_ws']:
    v = dout.createVariable(varname, np.short, 
            dimensions=('point', 'realization'), zlib=False,chunksizes=[1000,24])
    v.scale_factor = 0.01
date = 1
end_date = 5000
n = 0
while date < end_date: 
    sz=100
    dout.variables['mod_hs'][n:n+sz,:] = np.ones((sz,24))
    dout.variables['mod_ws'][n:n+sz,:] = np.ones((sz,24))
    n += sz
    date += 1
dout.close()

The main difference is in createVariable command. For file size, without providing "chunksizes" in creating variable, I also got twice as large file compared to when I added it. So for file size it should do the trick.
For reading variables from file, I did not notice any difference actually, maybe I should add more variables?
Anyway, it should be clear how to add chunk size now, You probably need to test a bit to get good conf for Your problem. Feel free to ask more if it still does not work for You, and if You want to understand more about chunking, read the hdf5 docs

Extend dimensions in netCDF file using R

Yes, I think you are confusing the 'dimension definition' and the actual data within the dimension variable.

If you run your first snippet of code and then dump the NetCDF file using ncdump, you'll see:

netcdf tmp {
dimensions:
        latitude = UNLIMITED ; // (1 currently)
        longitude = UNLIMITED ; // (1 currently)
        time = 1000 ;
variables:
        double latitude(latitude) ;
                latitude:units = "degrees_east" ;
                latitude:long_name = "latitude" ;
        double longitude(longitude) ;
                longitude:units = "degrees_north" ;
                longitude:long_name = "longitude" ;
        int time(time) ;
                time:units = "days since 0000-01-01" ;
                time:long_name = "time" ;
        float myvar(time, longitude, latitude) ;
                myvar:units = "m2" ;
data:

 latitude = 44 ;

 longitude = -88.5 ;

 time = 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19,
    20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33,
    ...
    990, 991, 992, 993, 994, 995, 996, 997, 998, 999, 1000 ;

 myvar =
  {{1}},
  {{2}},
  {{3}},
  ...
  {{1000}} ;
}

The dimensions are saying latitude and longitude are unlimited while the time dimension is fixed at 1000 points/days since 0000-01-01. This is exactly what you specified, which is good.

So to add another latitude and longitude. I would open the file again, read in the current data, append to it and then write it back.

library(ncdf4)
nc <- nc_open("tmp.nc", write = TRUE)
lat <- ncvar_get(nc, varid='latitude')
lat <- append(lat, 44.5)
ncvar_put(nc, varid='latitude', vals=lat, start=c(1), count=2)
nc_close(nc)

Now ncdump will show you two latitudes:

data:

 latitude = 44, 44.5 ;

 longitude = -88.5 ;

Of course for large datasets you do not need or want to read in all the data and the append, you can just tell NetCDF where you want it written.

library(ncdf4)
nc <- nc_open("tmp.nc", write = TRUE)
lon = -89.0
ncvar_put(nc, varid='longitude', vals=lon, start=c(2), count=1)
nc_close(nc)

Now ncdump will show you two latitudes and two longitudes:

data:

 latitude = 44, 44.5 ;

 longitude = -88.5, -89 ;

The data represents of myvar is a 3D array, so I would have done the initial write different. I would have specified it's dimensions when creating the data and when writing it to the file, like this:

data <- array(1:1000, c(1,1,1000))
ncvar_put(nc = nc, varid='myvar', vals=data, start=c(1,1,1), count=c(1,1,1000))

Then to append to the second latitude and longitude:

data <- array(11:1011, c(1,1,1000))
ncvar_put(nc = nc, varid='myvar', vals=data, start=c(2,2,1), count=c(1,1,1000))

NOTE

I feel the R package hides too much from you. When you create a dimension with ncdim_def, you can give it values. This is in my mind more of a 3 step process.

Create the dimension.
Create a variable associated with that dimension.
Add data to this variable.

Hope this helps.

NetCDF re-ordering dimension

I found similar problem and answer from https://stackoverflow.com/a/55883675/10874805

My mistake, to make UNLIMITED the time dimension, I must use --mk_rec_dmn instead of --fix_rec_dmn

So the code should be: ncks --mk_rec_dmn time out1.nc out2.nc

How to split NetCDF file into multiple NetCDF files based on indices of dimension?

You can use nco to split data according to a index according to this answer:

ncks -d row,1,300 in.nc -O row1_300.nc
ncks -d row,301,700 in.nc -O row301_700.nc

etc...

ps: careful with ncks selection, if you use float values it splits using the dimension entry, not the index. If the row entry is simply counting 1,2,3,4 etc then this doesn't matter as the index and value are identical.

Using NCML to reduce dimensions in netCDF file

Update:
The solution below appears to work, but it doesn't: extracting data from it fails, as John M. found out (see other answers). We thought we had figured out that maintaining a singleton dimension was the solution, but going from four dimensions to one dimension ultimately leads to errors. As Sean A. pointed out, you cannot change the shape of variables using NcML.

original "solution" (doesn't actually work):

If your goal was to make your data CF-1.6 compliant, you could make that dimension be station with a value of one. So you could do this:

<?xml version="1.0" encoding="UTF-8"?>
<netcdf xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2" location="/usgs/data/file1.nc">
  <remove type="dimension" name="lon"/>
  <remove type="dimension" name="lat"/>
  <remove type="dimension" name="z"/>
  <dimension name="station" length="1"/>
  <dimension name="name_strlen" length="20" />
  <variable name="lat" shape="station"/>
  <variable name="lon" shape="station"/>
  <variable name="z" shape="station"/>
  <variable name="temp" shape="time station"/>
  <variable name="site" shape="station name_strlen" type="char">
    <attribute name="standard_name" value="station_id" />
    <attribute name="cf_role" value="timeseries_id" />
    <values> my_station_001 </values>
  </variable>
  <attribute name="Conventions" value="CF-1.6" />
  <attribute name="featureType" value="timeSeries" />
</netcdf>

Fortran NetCDF - added new dimension need to fill it with zeroes

As I understand it a dimension is distinct from a variable, dimensions can't have values but variables can -- I think a fairly common practice may be to create the dimension and also create a variable with the same name. You can then give the variable whatever values you want.

Your code may look like

retval = nf_open(cfn,NF_WRITE,ncid)
if (retval .ne. nf_noerr) call handle_err(retval)
retval = nf_redef(ncid)
if (retval .ne. nf_noerr) call handle_err(retval)
retval = nf_def_dim(ncid,"xyz",len,dimid_xyz)
if (retval .ne. nf_noerr) call handle_err(retval) 

retval = nf_def_var(ncid,"xyz",netcdf_real,1,[dimid_xyz], varid_xyz)
if (retval .ne. nf_noerr) call handle_err(retval) 

retval = nf_enddef(ncid)

retval = nf_put_vara(ncid,varid_xyz,[1],[len],arrayOfZero)
if (retval .ne. nf_noerr) call handle_err(retval)

Note I'd recommend against using a variable named len inside your fortran code -- this will clash with the intrinsic of the same name.

nco, reduce dim, netcdf

Your original file has the structure of a curvilinear grid, where both lat and lon arrays are 2-D. Your desired grid is rectangular, where both lat and lon are 1-D. The only way to do this, in general, is to regrid. The NCO operator ncremap does regridding...

How to Convert Fixed Size Dimension to Unlimited in a Netcdf File