AI Dopamine

About the project

What even is AI Dopamine?

You know how ChatGPT and Claude keep getting better over time? A big part of that is something called RLHF — Reinforcement Learning from Human Feedback. The short version: real humans compare two AI responses and pick the better one. The model watches those choices, learns what "good" looks like, and gradually improves. That human preference data is basically dopamine for the AI — hence the name.

AI Dopamine is a full-stack simulation of that exact pipeline, built from scratch. You can submit any prompt alongside two different AI-generated responses, then send them into the Arena — a side-by-side voting interface where anyone can pick their favorite. Every vote gets stored in a database and feeds a reward model that learns to predict which responses humans actually prefer.

It's a working, end-to-end RLHF loop: data collection, preference labeling, and model training — just at a scale you can actually poke around in.

How it works

The pipeline

1

Submit a battle. Enter a prompt and paste in two AI responses — from any model you like. Label the source so the data stays organized.

2

Head to the Arena. Response pairs are served up in random order. You see the prompt, two anonymized responses side-by-side, and click whichever you think is better.

3

Results come in live. After voting, you instantly see how everyone else has voted — animated bar charts show the A vs. B split in real time.

4

The reward model learns. Preference data gets fed into a sentence-transformer + reward model training loop that learns to score responses the way humans do. The more votes, the smarter it gets.

Tech stack

What's under the hood

Frontend

HTML / JS — Vercel

Backend

Python / Flask

Database

Neon Postgres

Embeddings

sentence-transformers

Reward model

PyTorch

Baseline

scikit-learn