ABOUT
SUBMIT PROMPTS AI ARENA REWARD SCORER
About the project
What even is AI Dopamine?

You know how ChatGPT and Claude keep getting better over time? A big part of that is something called RLHF — Reinforcement Learning from Human Feedback. The short version: real humans compare two AI responses and pick the better one. The model watches those choices, learns what "good" looks like, and gradually improves. That human preference data is basically dopamine for the AI — hence the name.

AI Dopamine is a full-stack simulation of that exact pipeline, built from scratch. You can submit any prompt alongside two different AI-generated responses, then send them into the Arena — a side-by-side voting interface where anyone can pick their favorite. Every vote gets stored in a database and feeds a reward model that learns to predict which responses humans actually prefer.

It's a working, end-to-end RLHF loop: data collection, preference labeling, and model training — just at a scale you can actually poke around in.

How it works
The pipeline
1
Submit a battle. Enter a prompt and paste in two AI responses — from any model you like. Label the source so the data stays organized.
2
Head to the Arena. Response pairs are served up in random order. You see the prompt, two anonymized responses side-by-side, and click whichever you think is better.
3
Results come in live. After voting, you instantly see how everyone else has voted — animated bar charts show the A vs. B split in real time.
4
The reward model learns. Preference data gets fed into a sentence-transformer + reward model training loop that learns to score responses the way humans do. The more votes, the smarter it gets.
Tech stack
What's under the hood
Frontend
HTML / JS — Vercel
Backend
Python / Flask
Database
Neon Postgres
Embeddings
sentence-transformers
Reward model
PyTorch
Baseline
scikit-learn
Support me on Ko-fi
❤️
💖
💕
❤️
💗
💖
Scroll to top