Does Caffe need data to be shuffled?
Should you shuffle the samples? Think about the learning process if you don't shuffle; caffe sees only 0
samples - what do you expect the algorithm to deduce? simply predict 0
all the time and everything is cool. If you have plenty of 0
before you hit the first 1
caffe will be very confident in predicting always 0
. It will be very difficult to move the model from this point.
On the other hand, if it constantly sees a mix of 0
and 1
it learns from the beginning meaningful features for separating the examples.
Bottom line: it is very advantageous to shuffle the training samples, especially when using SGD-based approaches.
AFAIK, caffe does not randomly sample batch_size
samples, but rather goes sequentially over the input DB batch_size
after batch_size
samples.
TL;DR
shuffle.
Caffe's way of doing data shuffling
Using convert_imageset
tool creates a copy of your training/validation data in a binary database file (either in lmdb or leveldb format). The data encoded in the dataset includes pairs of example and its corresponding label.
Therefore, when shuffle
-ing the dataset the labels are shuffled with the data to maintain the correspondence between data and its ground-truth label.
There is no need to shuffle the data again during training.
Shuffle in caffe with multiple tmdbs
If you use layer type
"Data"
you can't useshuffle
as there is no shuffle parameter indata_param
.As for layer type
"ImageData"
you can't uselmdb
as data source assource
file should be a text file withimage address and label
. But it hasshuffle
parameter. If you look inside image_data_layer.cpp you'll find ifshuffle
istrue
then image sources are shuffled in eachepoch
usingFisher–Yates
algorithm. If you use two differentImageData
layer thenShuffleImages()
will be called for each of them and it is unlikely that two shuffle will generate the same sequence. So you can't useshuffle
in any of these twoImageData
layer.
Does machine learning framework caffe support different data type precisions?
The mean file and the trained parameter you are using in the tutorial are stored in single precision values. Changing float
to double
in the program does not change the stored values, thus trying to read stored single-precision values as double-precision results with you reading "garbage". You'll have to manually convert the files to double precision values
BatchNorm and Reshuffle train images after each epoch
If you use the ImageData Layer as your input, set "shuffle" to true
.
For example, if you have:
layer {
name: "data"
type: "ImageData"
top: "data"
top: "label"
transform_param {
mirror: false
crop_size: 227
mean_file: "data/ilsvrc12/imagenet_mean.binaryproto"
}
image_data_param {
source: "examples/_temp/file_list.txt"
batch_size: 50
new_height: 256
new_width: 256
}
}
Just add:
layer {
name: "data"
type: "ImageData"
top: "data"
top: "label"
transform_param {
mirror: false
crop_size: 227
mean_file: "data/ilsvrc12/imagenet_mean.binaryproto"
}
image_data_param {
source: "examples/_temp/file_list.txt"
batch_size: 50
new_height: 256
new_width: 256
shuffle: true
}
}
For documentation, see:
- http://caffe.berkeleyvision.org/tutorial/layers.html#images
- https://github.com/BVLC/caffe/blob/master/src/caffe/proto/caffe.proto#L770
You can also find the source code here:
- https://github.com/BVLC/caffe/blob/master/src/caffe/layers/image_data_layer.cpp
Of particular interest is the code within the function load_batch
which re-shuffles the data at the end of each epoch:
lines_id_++;
if (lines_id_ >= lines_size) {
// We have reached the end. Restart from the first.
DLOG(INFO) << "Restarting data prefetching from start.";
lines_id_ = 0;
if (this->layer_param_.image_data_param().shuffle()) {
ShuffleImages();
}
}
How to input multiple N-D arrays to a net in caffe?
You want to caffe to use several N-D signals for each training sample. You are concerned with the fact that the default "Data"
layer can only handle one image as a training sample.
There are several solutions for this concern:
Using several
"Data"
layers (as was done in the model you linked to). In order to sync between the three"Data"
layers you'll have you need to know that caffe reads the samples from the underlying LMDB sequentially. So, if you prepare your three LMDBs in the same order caffe will read one sample at a time from each of the LMDBs in the order in which the samples were put there, so the three inputs will be in sync during training/validation.
Note thatconvert_imageset
has a'shuffle'
flag, do NOT use it as it will shuffle your samples differently in each of the three LMDBs and you will have no sync. You are strongly advised to shuffle the samples yourself before preparing the LMDBs but in a way that the same "shuffle" is applied to all three inputs leaving them in sync with each other.Using 5 channel input. caffe can store N-D data in LMDB and not only color/gray images. You can use python to create LMDB with each "image" is a 5-channel array with the first three channels are image's RGB and the last two are the ground-truth labels and the weight for the per-pixel loss.
In your model you only need to add a"Slice"
layer on top of your"Data"
:layer {
name: "slice_input"
type: "Slice"
bottom: "raw_input" # 5-channel "image" stored in LMDB
top: "rgb"
top: "gt"
top: "weight"
slice_param {
axis: 1
slice_point: 3
slice_point: 4
}
}Using
"HDF5Data"
layer (my personal favorite). You can store your inputs in a binaryhdf5
format and have caffe read from these files. Using"HDF5Data"
is much more flexible in caffe and allows you to shape the inputs as much as you like. In your case you need to prepare a binary hdf5 file with three "datasets":'rgb'
,'gt'
and'weight'
. You need to make sure the samples are synced when you create the hdf5 file(s). Once you have the, ready you can have a"HDF5Data"
layer with three "top"s ready to be used.Write your own "Python" input layer. I will not go into the details here. But you can implement your own input layer in python. See this thread for more details.
Impact of data shuffling on results reproducibility in Pytorch
The main algorithm/principal deep learning is based on is weight optimization using stochastic gradient descend (and its variants). Being a stochastic algorithm you cannot expect to get exactly the same results if you run your algorithm multiple times.
In fact, you should see some variations, but they should be "roughly the same".
If you need to have exactly the same results when running your algorithm multiple times, you should look into reproducibility of results - which is a very delicate subject.
In summary:
1. If you do not shuffle at all, you will have perfect reproducibility, but the resulting accuracy are expected to be very low.
2. If you randomly shuffle (what most of the world does) you should expect slightly different accuracy value for each run, but they should all be significantly larger than the values of (1) "no shuffle".
3. If you follow the guidelines of reproducible results, you should have the exact same accuracy values for each run and they should be close to the values of (2) "shuffle".
Related Topics
Arithmetic Right Shift Gives Bogus Result
If I Want to Specialise Just One Method in a Template, How to Do It
C++ Fatal Error Lnk1120: 1 Unresolved Externals
Is It Ok to Use C-Style Cast for Built-In Types
Configuring Compilers on MAC M1 (Big Sur, Monterey) for Rcpp and Other Tools
How to Create a Temporary Directory in C++
How to Read Bmp Pixel Values into an Array
How to Use C++ Preprocessor Stringification on Variadic MACro Arguments
Cannot Convert 'Const Char*' to 'Lpcwstr {Aka Const Wchar_T*}'
How to Download a File with Winhttp in C/C++
Array Decay to Pointers in Templates
Cpu Dispatcher for Visual Studio for Avx and Sse
Somehow Register My Classes in a List
On How to Recognize Rvalue or Lvalue Reference and If-It-Has-A-Name Rule
Passing a Pointer Representing a 2D Array to a Function in C++