# Exercices


# Exercise: Create a Custom Dataset with Transformations

Objective: Create a custom dataset class for a set of images stored in a local directory, apply transformations to the images, and visualize a few samples with the transformations applied.

Steps:

  1. Create a Custom Dataset Class:

    • Write a PyTorch Dataset class to load images from a directory.
    • Each image should have a corresponding label from a CSV file (format: filename,label).
  2. Apply Transformations:

    • Resize the images to 128x128 pixels.
    • Convert the images to PyTorch tensors.
    • Normalize the images with a mean of 0.5 and standard deviation of 0.5.
  3. Load the Dataset with DataLoader:

    • Use DataLoader to load the dataset and prepare it for training.
  4. Visualize Transformed Images:

    • Display a few images from the dataset with the transformations applied.
import os
import pandas as pd
from PIL import Image
import torch
from torch.utils.data import Dataset, DataLoader
from torchvision import transforms
import matplotlib.pyplot as plt

# Step 1: Create a custom dataset class
class CustomImageDataset(Dataset):
    def __init__(self, csv_file, img_dir, transform=None):
        self.annotations = pd.read_csv(csv_file)  # Read CSV file with image paths and labels
        self.img_dir = img_dir                   # Directory with images
        self.transform = transform               # Transformations to apply

    def __len__(self):
        return len(self.annotations)             # Return the number of samples in the dataset

    def __getitem__(self, idx):
        img_path = os.path.join(self.img_dir, self.annotations.iloc[idx, 0])  # Get image path
        image = Image.open(img_path)             # Load image
        label = torch.tensor(int(self.annotations.iloc[idx, 1]))  # Get label
        if self.transform:
            image = self.transform(image)        # Apply transformations
        return image, label

# Step 2: Define transformations
transform = transforms.Compose([
    transforms.Resize((128, 128)),               # Resize images to 128x128 pixels
    transforms.ToTensor(),                       # Convert images to PyTorch tensors
    transforms.Normalize(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5])  # Normalize images
])

# Step 3: Create an instance of the custom dataset with transformations
dataset = CustomImageDataset(csv_file='data/labels.csv', img_dir='data/images', transform=transform)

# Step 4: Load the dataset with DataLoader
dataloader = DataLoader(dataset, batch_size=8, shuffle=True)

# Function to display images
def show_images(images, labels):
    images = images / 2 + 0.5  # Unnormalize
    np_images = images.numpy()
    fig, axes = plt.subplots(1, 8, figsize=(15, 5))
    for idx, ax in enumerate(axes):
        ax.imshow(np.transpose(np_images[idx], (1, 2, 0)))
        ax.axis('off')
        ax.set_title(f'Label: {labels[idx].item()}')

# Step 5: Visualize a few transformed images
data_iter = iter(dataloader)
images, labels = data_iter.next()
show_images(images, labels)
plt.show()

# Explanation of the Solution

  1. Custom Dataset Class:

    • A custom dataset class CustomImageDataset is created to load images and labels from a specified directory and CSV file.
    • The __getitem__ method loads an image and its label and applies transformations if provided.
  2. Define Transformations:

    • Images are resized to 128x128 pixels.
    • Images are converted to PyTorch tensors.
    • Images are normalized using a mean of 0.5 and a standard deviation of 0.5 for all channels.
  3. DataLoader:

    • DataLoader is used to load the dataset in batches and shuffle the data for training.
  4. Visualization:

    • A helper function show_images is used to display the first 8 images from a batch with the transformations applied, showing the effect of resizing, normalization, and tensor conversion.