Decoder-only, Shakespeare words generator
We build, train, and evaluate a minimal decoder-only Transformer from scratch using PyTorch. We will train this model on the Tiny Shakespeare dataset to generate Shakespeare-like text.
We build, train, and evaluate a minimal decoder-only Transformer from scratch using PyTorch, trained on the Tiny Shakespeare dataset to generate Shakespeare-like text.
Experiments
We systematically varied key architectural hyperparameters across multiple runs, each trained for 5,000 iterations:
-
n_embd— embedding dimension -
n_layer— number of transformer layers -
n_head— number of attention heads -
dropout— regularization rate
For each configuration, we tracked both training loss and validation loss to evaluate generalization.
Finding
Model size (capacity) significantly improves language understanding and text coherence, producing sentences with improved fluency — at the cost of increased training time and computation.
Timeline: Spring 2026