Tensorflow Read Images with Labels

Tensorflow read images with labels

Using slice_input_producer provides a solution which is much cleaner. Slice Input Producer allows us to create an Input Queue containing arbitrarily many separable values. This snippet of the question would look like this:

def read_labeled_image_list(image_list_file):
    """Reads a .txt file containing pathes and labeles
    Args:
       image_list_file: a .txt file with one /path/to/image per line
       label: optionally, if set label will be pasted after each line
    Returns:
       List with all filenames in file image_list_file
    """
    f = open(image_list_file, 'r')
    filenames = []
    labels = []
    for line in f:
        filename, label = line[:-1].split(' ')
        filenames.append(filename)
        labels.append(int(label))
    return filenames, labels

def read_images_from_disk(input_queue):
    """Consumes a single filename and label as a ' '-delimited string.
    Args:
      filename_and_label_tensor: A scalar string tensor.
    Returns:
      Two tensors: the decoded image, and the string label.
    """
    label = input_queue[1]
    file_contents = tf.read_file(input_queue[0])
    example = tf.image.decode_png(file_contents, channels=3)
    return example, label

# Reads pfathes of images together with their labels
image_list, label_list = read_labeled_image_list(filename)

images = ops.convert_to_tensor(image_list, dtype=dtypes.string)
labels = ops.convert_to_tensor(label_list, dtype=dtypes.int32)

# Makes an input queue
input_queue = tf.train.slice_input_producer([images, labels],
                                            num_epochs=num_epochs,
                                            shuffle=True)

image, label = read_images_from_disk(input_queue)

# Optional Preprocessing or Data Augmentation
# tf.image implements most of the standard image augmentation
image = preprocess_image(image)
label = preprocess_label(label)

# Optional Image and Label Batching
image_batch, label_batch = tf.train.batch([image, label],
                                          batch_size=batch_size)

See also the generic_input_producer from the TensorVision examples for full input-pipeline.

Convert folder of images with labels in CSV file into a tensorflow Dataset

Based on the answers:

https://stackoverflow.com/a/72343548/
https://stackoverflow.com/a/54752691/

I have DIY created the following. I am sure there is a simpler way, but this at least is something functional. I was hoping for more built-in support though:

import os.path
from typing import Dict, Tuple

import pandas as pd
import tensorflow as tf

def get_full_dataset(
    batch_size: int = 32, image_size: Tuple[int, int] = (256, 256)
) -> tf.data.Dataset:
    data = pd.read_csv(os.path.join(DATA_ABS_PATH, "images.csv"))
    images_path = os.path.join(DATA_ABS_PATH, "images")
    data["image"] = data["image"].map(lambda x: os.path.join(images_path, f"{x}.jpg"))
    filenames: tf.Tensor = tf.constant(data["image"], dtype=tf.string)
    data["label"] = data["label"].str.lower()
    class_name_to_label: Dict[str, int] = {
        label: i for i, label in enumerate(set(data["label"]))
    }
    labels: tf.Tensor = tf.constant(
        data["label"].map(class_name_to_label.__getitem__), dtype=tf.uint8
    )
    dataset = tf.data.Dataset.from_tensor_slices((filenames, labels))

    def _parse_function(filename, label):
        jpg_image: tf.Tensor = tf.io.decode_jpeg(tf.io.read_file(filename))
        return tf.image.resize(jpg_image, size=image_size), label

    dataset = dataset.map(_parse_function)
    return dataset.batch(batch_size)

Create Tensorflow Dataset with dataframe of images and labels

You can actually pass a dataframe directly to tf.data.Dataset.from_tensor_slices:

import tensorflow as tf
import numpy as np
import pandas as pd

df = pd.DataFrame(data={'images': [np.random.random((64, 64, 3)) for _ in range(100)],
                        'labels': ['ok', 'not ok']*50})

dataset = tf.data.Dataset.from_tensor_slices((list(df['images'].values), df['labels'].values)).batch(2)

for x, y in dataset.take(1):
  print(x.shape, y)
# (2, 64, 64, 3) tf.Tensor([b'ok' b'not ok'], shape=(2,), dtype=string)

Get labels from dataset when using tensorflow image_dataset_from_directory

If I were you, I'll iterate over the entire testData, I'll save the predictions and labels along the way and I'll build the confusion matrix at the end.

testData = tf.keras.preprocessing.image_dataset_from_directory(
    dataDirectory,
    labels='inferred',
    label_mode='categorical',
    seed=324893,
    image_size=(height,width),
    batch_size=32)

predictions = np.array([])
labels =  np.array([])
for x, y in testData:
  predictions = np.concatenate([predictions, model.predict_classes(x)])
  labels = np.concatenate([labels, np.argmax(y.numpy(), axis=-1)])

tf.math.confusion_matrix(labels=labels, predictions=predictions).numpy()

and the result is

Found 4 files belonging to 2 classes.
array([[2, 0],
       [2, 0]], dtype=int32)

Using queues in TensorFlow to load images and labels from text file

It might be caused by num_epochs=1 here tf.train.slice_input_producer([filenames, labels], num_epochs=1, shuffle=True). You can check api of slice_input_producer, where it explains: num_epochs: An integer (optional). If specified, slice_input_producer produces each slice num_epochs times before generating an OutOfRange error.

Tensorflow Read Images with Labels