You know how ChatGPT and Claude keep getting better over time? A big part of that is something called RLHF — Reinforcement Learning from Human Feedback. The short version: real humans compare two AI responses and pick the better one. The model watches those choices, learns what "good" looks like, and gradually improves. That human preference data is basically dopamine for the AI — hence the name.
AI Dopamine is a full-stack simulation of that exact pipeline, built from scratch. You can submit any prompt alongside two different AI-generated responses, then send them into the Arena — a side-by-side voting interface where anyone can pick their favorite. Every vote gets stored in a database and feeds a reward model that learns to predict which responses humans actually prefer.
It's a working, end-to-end RLHF loop: data collection, preference labeling, and model training — just at a scale you can actually poke around in.