I Tested All Major vibe-coding Models

Decided to test all top models in vibe-coding.
So I took Cursor and ran same prompt through 7 models - Gemini 3 Flash, Gemini 3 Pro, GPT 5.2, GPT 5.2 Thinking Extra High, Sonnet, Opus 4.5 and Opus 4.6.
I simply applied auto-accept mode and waited for the model to finish the task

Tests

  1. First prompt was to exactly replicate the website by provided link
    GPT5.2 was the only one who matched the style, others implemented their own versions (completely different colors, fonts, style).
    Gemini did very light job and replicated only main page, others tried to replicate referenced pages.

  2. Reddit scraper to find business ideas
    I asked to build a website which scrapes reddit API to find buisness ideas for specified subreddits. For ideas analyses I told to use OpenAI api.
    Actually every model delivered something workable, GPT and both Opus were the best imo, they produced interesting clustering graph visualisation.

  3. Desktop app for video dubbing, only local LLMs allowed
    Gemini completely failed, nothing worked. Others delivered half workable results, but for GPT and Opus at least it looked like a solid desktop app.

Final observations:

Surprisingly, I didn't notice any difference between Gemini 3Flash and 3Pro, they both delivered simple low quality results, but fast and for cheap.
GPT: took 30-60 min for every task to finish, always one of the highest quality, moderately expensive. Have not noticed any difference between GPT 5.2 and 5.2 Thinking Extra High
Opus: 4.6 tends to do less mistakes than 4.5, but overall produces very similar results. Both Opus are the most expensive from the list. For some exercises it was worth it, for some dont
Sonnet: Tends to do smth simple, but workable

The conclusions I made for myself:
Gemini models are OK if you know what you want to build exactly and can provide the model good precise instructions.
Sonnet is very good for average complexity tasks.
For complex task, research, unclear problem I would prefer using Opus or GPT

Spend a week testing out the models. If you feel like saying thanks, you can buy me a coffee 🙌