Dani da23742671
Some checks failed
Discord Webhook / git (push) Has been cancelled
Added the Discord Sync for mirroring GITEA -> GitHub
2025-06-03 23:50:19 -04:00

Nora: Train a Transformer LM from Scratch

A minimal, from-scratch language model. No pretrained weights—just public-domain books and your GPU (or CPU).

Overview

Nora is a character-level Transformer language model written entirely in PyTorch. It learns from whatever plaintext .txt files you place in data/books/. Over time, you can extend Noras codebase (e.g., add reinforcement-learning loops, self-improvement modules, etc.) toward more advanced AI, if you wish.

Why “Nora”?

  • A simple, humanlike female name.
  • Short, easy to pronounce.
  • As the project scales, “Nora” could theoretically be extended with modules to approach more general intelligence.

Requirements

  • Python 3.10.6 (Windows 11 or any OS)
  • CUDA-capable GPU (if you want to train faster; otherwise CPU)
  • PyTorch (install with pip install torch torchvision)
  • tqdm (pip install tqdm)
  • Other Python packages: numpy, typing

Folder Structure

  • nora/
  • ├── config.py
  • ├── tokenizer.py
  • ├── data_loader.py
  • ├── model.py
  • ├── train.py
  • ├── utils.py
  • ├── main.py
  • └── README.md
Description
No description provided
Readme MIT 36 KiB
Languages
Python 100%