TensorFlow Only running on 1/32 of the Training data provided
This is a common misconception, there have been updates to Keras and it now shows batches, not samples, in the progress bar. And this is perfectly consistent because you say 1/32 of the data provided, and 32 is the default batch size in keras.
Keras not training on entire dataset
The number 1875
shown during fitting the model is not the training samples; it is the number of batches.
model.fit
includes an optional argument batch_size
, which, according to the documentation:
If unspecified,
batch_size
will default to 32.
So, what happens here is - you fit with the default batch size of 32 (since you have not specified anything different), so the total number of batches for your data is
60000/32 = 1875
tensorflow CNN does not use all the images for training
57 is the number of iterations of each epoch.
You have 1809 images and the default size of batch is 32 (also images) - as it is not specified by you in the code (see the documentation), so cell(1809/batch_szie) = 57
.
As a result, all images are taken into account on each epoch, it just takes 57 steps (iterations) to complete.
Model fitting doesn't use all of the provided data
The model is being trained with a batchsize of 32, hence there are 60,000/32 = 1875
batches.
Despite tensorflow documentation shows batch_size=None
in the fit
function overview, the information about this argument says:
batch_size
: Integer or None. Number of samples per gradient update. If unspecified, batch_size will default to 32. Do not specify the batch_size if your data is in the form of datasets, generators, or keras.utils.Sequence instances (since they generate batches).
model using only the first sample of the dataset to train
This line indicates it's training on one batch, not one sample:
1/1 [==============================] - 19s 19s/step - loss: 0.2291 - mae: 0.4116
The default batch_size in Keras is 32, I believe. A Keras embedding layer expects integers, not floats, and you're using too many dimensions, so you should change this line in your converter:
return np.array(normal_list, ndmin=2).astype(np.float32)
To this:
return np.array(normal_list)
You want each training sample to have a shape of (?), where ? is 50-70 in your case. You want each target to have a shape of (10), because your model is outputting 10 values from its last dense layer. Combined with the number of samples, you want x_set
to have a shape of (950, ?)
and y_set
to have a shape of (950, 10)
. To avoid issues, you should probably pad all of your samples to have the same size instead of varying between 50 and 70.
Your model expects this input:
>>> model.input_shape
(None, None)
Your model.summary()
is as follows (the first None
dimension is the batch_size, which in your case is 950):
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
embedding_2 (Embedding) (None, None, 64) 1920
_________________________________________________________________
bidirectional (Bidirectional (None, 2048) 8921088
_________________________________________________________________
dense (Dense) (None, 128) 262272
_________________________________________________________________
dense_1 (Dense) (None, 10) 1290
=================================================================
Total params: 9,186,570
Trainable params: 9,186,570
Non-trainable params: 0
_________________________________________________________________
In short, you're embedding your entire training set into a single sample, I believe.
Tensorflow model.train() not looping through all data
You are using the whole data, no worries!
Due to the Keras documentation, https://github.com/keras-team/keras/blob/master/keras/engine/training.py
when you use model.fit
and you do not specify the batch size, it got assigned to 32 by default.
batch_size Integer or NULL. Number of samples per gradient update. If
unspecified, batch_size will default to 32
It means that for each epoch you have 1875 steps, and in each step, your model has taken 32 data examples into the account. And guess what, 1875*32 is equal to 60,000.
Why the model is training on only 1875 training set images if there are 60000 images in the MNIST dataset?
There's no problem with the training. Model is being trained on 1875 batches of 32 images each, not 1875 images.
1875*32 = 60000 images
Related Topics
Difference Between Two Dates in Python
How to Scroll the Background Surface in Pygame
Find and Replace String Values in List
Python/Numpy First Occurrence of Subarray
How to See If There's an Available and Active Network Connection in Python
How to Add Percentages on Top of Bars in Seaborn
Recommendations of Python Rest (Web Services) Framework
Speeding Up Pandas.Dataframe.To_SQL with Fast_Executemany of Pyodbc
Reading/Parsing Excel (Xls) Files with Python
How to Overload _Init_ Method Based on Argument Type
How to Add an Image or Icon to a Button Rectangle in Pygame
How to Manually Install a Pypi Module Without Pip/Easy_Install
Django Filefield with Upload_To Determined at Runtime