Building a Simple Convolutional Neural Network for Image Classification

This challenge requires you to design and implement a basic Convolutional Neural Network (CNN) architecture in Python using a popular deep learning framework. This is a fundamental skill for anyone looking to work with image recognition tasks, as CNNs are the de facto standard for processing visual data.

Problem Description

Your task is to create a functional CNN model capable of performing image classification. You will define the layers, their configurations, and how they connect to form a coherent network. The model should be ready to be trained on a suitable image dataset.

Key Requirements:

Architecture Definition: Define a CNN architecture that includes at least:
- One or more Convolutional layers (e.g., Conv2D).
- One or more Pooling layers (e.g., MaxPooling2D).
- A Flatten layer to transition from convolutional to dense layers.
- One or more Dense (fully connected) layers.
- An output Dense layer with the appropriate number of units for classification.
Activation Functions: Use appropriate activation functions (e.g., ReLU for hidden layers, Softmax for the output layer for multi-class classification).
Model Compilation: Compile the model with a suitable optimizer (e.g., Adam), a loss function (e.g., categorical_crossentropy for multi-class, binary_crossentropy for binary), and metrics (e.g., accuracy).
Framework Usage: Implement the CNN using a widely adopted Python deep learning library such as TensorFlow/Keras or PyTorch.
Output: The expected output is a compiled Keras or PyTorch model object that can be subsequently used for training.

Expected Behavior:

The Python code should successfully define and compile a CNN model. When executed, it should not raise any errors related to model definition or compilation. The model object should be ready to accept input data and undergo the training process.

Edge Cases to Consider:

Input Shape Mismatch: Ensure the initial convolutional layer correctly handles the expected input image dimensions (height, width, channels).
Number of Output Classes: The final Dense layer must have the correct number of units corresponding to the number of classes in the target dataset.

Examples

Example 1: A Basic CNN Architecture

# Assuming TensorFlow/Keras is used

from tensorflow import keras
from tensorflow.keras import layers

def build_simple_cnn(input_shape=(32, 32, 3), num_classes=10):
    model = keras.Sequential([
        # Input layer
        keras.Input(shape=input_shape),

        # Convolutional Block 1
        layers.Conv2D(filters=32, kernel_size=(3, 3), activation="relu", padding="same"),
        layers.MaxPooling2D(pool_size=(2, 2)),

        # Convolutional Block 2
        layers.Conv2D(filters=64, kernel_size=(3, 3), activation="relu", padding="same"),
        layers.MaxPooling2D(pool_size=(2, 2)),

        # Flattening and Dense Layers
        layers.Flatten(),
        layers.Dense(units=128, activation="relu"),
        layers.Dense(units=num_classes, activation="softmax")
    ])

    # Compile the model
    model.compile(optimizer="adam",
                  loss="categorical_crossentropy",
                  metrics=["accuracy"])

    return model

# --- How to use it ---
# model = build_simple_cnn(input_shape=(28, 28, 1), num_classes=10)
# model.summary()

Input: input_shape=(32, 32, 3) representing color images of 32x32 pixels. num_classes=10 indicating a 10-class classification problem.

Output: A compiled tensorflow.keras.Model object named model. The model.summary() would display the layers, output shapes, and parameter counts.

Explanation: The function defines a sequential model. It starts with an input layer, followed by two convolutional blocks, each containing a Conv2D and MaxPooling2D layer. These are designed to extract features from the image. Then, a Flatten layer converts the 2D feature maps into a 1D vector. This vector is fed into a dense layer for further processing, and finally, a dense output layer with softmax activation predicts the probability distribution over the 10 classes. The model is compiled with the Adam optimizer, categorical crossentropy loss, and accuracy metric.

Example 2: A Slightly Deeper CNN

# Assuming TensorFlow/Keras is used

from tensorflow import keras
from tensorflow.keras import layers

def build_deeper_cnn(input_shape=(64, 64, 1), num_classes=5):
    model = keras.Sequential([
        keras.Input(shape=input_shape),

        layers.Conv2D(filters=16, kernel_size=(5, 5), activation="relu"),
        layers.MaxPooling2D(pool_size=(2, 2)),

        layers.Conv2D(filters=32, kernel_size=(3, 3), activation="relu", padding="same"),
        layers.MaxPooling2D(pool_size=(2, 2)),

        layers.Conv2D(filters=64, kernel_size=(3, 3), activation="relu"),
        layers.MaxPooling2D(pool_size=(2, 2)),

        layers.Flatten(),
        layers.Dense(units=256, activation="relu"),
        layers.Dropout(0.5), # Added Dropout for regularization
        layers.Dense(units=num_classes, activation="softmax")
    ])

    model.compile(optimizer="adam",
                  loss="categorical_crossentropy",
                  metrics=["accuracy"])

    return model

# --- How to use it ---
# model = build_deeper_cnn(input_shape=(32, 32, 3), num_classes=5)
# model.summary()

Input: input_shape=(64, 64, 1) representing grayscale images of 64x64 pixels. num_classes=5 indicating a 5-class classification problem.

Output: A compiled tensorflow.keras.Model object named model.

Explanation: This example shows a slightly deeper network with three convolutional blocks. It also introduces a Dropout layer after the dense layer to help prevent overfitting. The input shape is set for grayscale images, and the number of output classes is adjusted accordingly.

Constraints

Framework: You must use either TensorFlow/Keras or PyTorch.
Layer Types: Your architecture must include Conv2D (or equivalent) and MaxPooling2D (or equivalent) layers.
Output Layer: The final layer must be a Dense layer with a softmax activation function for multi-class classification.
Compilation: The model must be successfully compiled with an optimizer, a loss function, and at least one metric.
Function Signature: Your solution should be encapsulated within a function that accepts input_shape and num_classes as arguments and returns the compiled model. For example: def build_cnn_model(input_shape: tuple, num_classes: int) -> Model:. (The exact return type hint will depend on the framework).
No Pre-trained Models: You are expected to build the architecture from scratch, not load pre-trained weights.

Notes

Consider the trade-off between network depth and complexity. Deeper networks can learn more complex features but are prone to overfitting and require more computational resources.
Experiment with different kernel sizes, filter counts, and pooling strategies to see how they affect the model's structure.
The choice of optimizer and loss function is crucial for effective training. For this challenge, "adam" and "categorical_crossentropy" are good starting points for multi-class problems.
Understanding the shape transformations that occur after each layer (convolution, pooling, flattening) is key to designing a valid CNN.
The input_shape will typically be (height, width, channels). For grayscale images, channels is 1. For RGB images, channels is 3.
The num_classes parameter dictates the number of neurons in the final output layer.