PPQS
WILD CORPUS · reddit

PQS 31 (F) - prompt from www.reddit.com

Source: www.reddit.com · Scraped 2026-05-04 · Scored 2026-05-04

Score

F
31 / 80
gemma4:latest · local · pqs-v2.0 · canonical
Clarity8 / 10
Specificity6 / 10
Context9 / 10
Constraints4 / 10
Output format1 / 10
Role definition1 / 10
Examples1 / 10
CoT structure1 / 10

The prompt

Got tired of arguing with my team about which "Claude pro tip" tweets were real and which were vibes, so I built a rig and ran them.

**Setup:**

* 24 fixed tasks across writing, coding, analysis
* Fresh contexts per trial, no carryover
* Same model (Sonnet 4.6 and Opus 4.7), same temperature
* 3 blind reviewers rating outputs on decisiveness, accuracy, token efficiency

**Headline finding: 47% of the codes showed no statistically significant lift over a plain prompt.** Some of the most-upvoted ones on Twitter and this sub were dead weight.

**Three patterns that consistently won (out of the 53% that worked):**

1. **Front-loaded scope anchors.** Putting "Review only the database connection logic in src/db/" at the START of a prompt held scope better than the same wording at the end. Token output \~30% tighter on review tasks.
2. **Explicit OUT OF SCOPE rejection clauses.** Telling the model "if a finding is outside the scope above, mark it OUT OF SCOPE rather than including it" cut cross-file noise measurably. Works as the model's escape valve, not a positive constraint.
3. **The L99 prefix.** Switches Claude into a less-hedged, more decisive mode. Best for hard architectural decisions, terrible for simple lookups (waste of tokens).

**Three that turned out to be placebos:**

1. "Take a deep breath." Real finding on older models, doesn't replicate on Sonnet 4.6 or Opus 4.7. Tested both with and without it on the same task battery, no measurable delta.
2. "You are a Stanford-trained expert." Slight lift on pure factual recall, flips negative on reasoning tasks. The model gets defensive about its expertise instead of admitting uncertainty.
3. Most "step by step" variants. Already the default behavior on current Claude. Adding the phrase didn't change structure or quality in our tests.

Happy to walk through methodology, specific test cases, or the codes that flipped between models if anyone wants to dig in.

This prompt was scraped from a public source. The score reflects the input as written, not the quality of any output it produced. The AI input quality problem is the gap between what people type and what the model can act on.