Lab Notes · 4

How Well Do Image AIs Understand Abstract Instructions?

Jun 12, 2026 · AI Note Lab

Same abstract prompt, three interpretations — yet 12 of 15 converged on 'person + mood'

Anyone can get good results from a concrete prompt like "sunset over the ocean." I was curious about the opposite: what does an AI draw when you hand it a concept nobody ever meant to be drawn — an emotion, a time, a relationship? I gave the same 5 abstract prompts (in Korean) to Midjourney, DALL·E, and Stable Diffusion.

The 5 prompts I tested

"A lonely Tuesday afternoon"
"What it means to become an adult"
"An apology never spoken" (Korean: 사과 — which means both "apology" and "apple")
"The feeling of a Monday morning"
"Quiet satisfaction"

Observations

The common move: everything escapes into "a person + a mood"

Of the 15 images (3 services × 5 prompts), 12 featured a person. Given an abstract concept, AIs converge on drawing a figure feeling that emotion: someone sitting by a window, someone walking alone, someone with their head bowed. Presumably because in the training data, those concepts came attached to exactly those kinds of photos.

"A lonely Tuesday afternoon" — unanimous verdict: an overcast window

All three services chose the same visual grammar: dim light, an empty room, a window. The "Tuesday" part appeared in none of the images (obvious in hindsight, but fun to confirm). It got flattened into "a lonely afternoon."

"An apology never spoken" — this is where they split

This was the hardest prompt. Two services drew two people standing back to back — but one, out of nowhere, drew an actual apple. It tripped over the Korean ambiguity: 사과 (sagwa) means both "apology" and "apple." When I re-ran it in English as "an apology never spoken," all three services returned to figure compositions.

The interpretation gap mattered more than the quality gap

Technical quality went to Midjourney, as expected — but the interesting part was that each service's direction of interpretation was consistently different. One always rendered a film still, one always an illustration, one always a photograph.

What I learned

Abstract prompts work better than you'd expect. But the AI isn't "understanding" the concept — it's summoning the visual clichés that most often travel with those words.
Ambiguous Korean words (사과 apple/apology, 배 pear/boat/belly, 밤 night/chestnut…) are a trap for image AIs. When in doubt, prompt in English or pin the meaning down with a modifier.
Practical tip: a two-step flow was most efficient — set the mood with abstract words first, then layer concrete conditions (composition, color, style) onto the result you like.

One of the results for "quiet satisfaction" was a hand closing a notebook at a desk before dawn, a single lamp on. It was accurate enough that calling it a cliché felt unfair — I was briefly embarrassed.