Welcome to ORDER BY jungle

SQL is fun and not at all boring. The latest article by Markus Winand on Order by Has Come a Long Way sent me on quite a journey.

First, set up a table called nums with one integer column and four rows:

CREATE TABLE nums (a int);
INSERT INTO nums VALUES (0), (1), (2), (3);

Try to guess what these two queries return.

SELECT -a AS a FROM nums ORDER BY a;
SELECT -a AS a FROM nums ORDER BY -a;

Most of us would guess the same rows in a different order. The actual answer is that they produce exactly the same rows in exactly the same order. By the same logic you might expect

SELECT a AS c FROM nums ORDER BY -c;

to do exactly the same. Except it does not. It errors with column "c" does not exist despite the alias being right there in the statement. Welcome to ORDER BY jungle.

Names and expressions are not the same

If you ask most developers how ORDER BY works, they will say "you put a column name there and it sorts the rows". In 99% of queries that is exactly what happens. People sort by created_at or id and move on.

Strictly speaking, three, if you count ORDER BY 1. Positional references are their own can of worms and out of scope for this post.

But `ORDER BY` accepts two different kinds of things:

SELECT created_at, user_id FROM events ORDER BY created_at;
SELECT created_at, user_id FROM events ORDER BY date(created_at);

Both feel natural. And the thing nobody tells you is that they go down completely different code paths in the parser. Different scope rules, different lookups, different error messages. The first looks at your SELECT list. The second looks at your FROM clause. They never look at the same place.

Same answer, two different sorts

Look at the first query again.

SELECT -a AS a FROM nums ORDER BY a;

You wrote ORDER BY a. A bare identifier, no decoration. Postgres goes down the name path. It scans the SELECT list for something called a, finds the aliased column -a AS a, and sorts by its output values. The negated values are -3, -2, -1, 0, ascending is -3, -2, -1, 0. That is what comes out.

Now the twin.

SELECT -a AS a FROM nums ORDER BY -a;

You wrote ORDER BY -a. This is no longer an identifier. It's an expression: unary minus around a column reference. The parser does not even try the same logic.

Instead it switches to the expression path, where the only a it knows is the column in nums, and sorts the input values negated. And by arithmetic luck, the two queries land on the same row order. Same output, completely different logic. If you don't believe it is just luck, drop the negation from the SELECT list and keep it in ORDER BY:

SELECT a AS c FROM nums ORDER BY -a;

 c
---
 3
 2
 1
 0
(4 rows)

ORDER BY -a is an expression, so it sorts by -input_a ascending, which is input_a descending. The alias c was never consulted. The result has nothing to do with whatever c happens to be.

And ORDER BY -c is now obvious. -c is an expression, so the parser looks for column c in FROM, doesn't find it, and errors. The alias exists, but in a scope this code path cannot see.

Above the identifier, or around it

Once the rule is clear (bare identifier hits the SELECT list, anything else hits the table) the rest of the surprises fall out.

SELECT 'hello' AS x FROM nums ORDER BY x::text;
-- ERROR: column "x" does not exist

It is probably not surprising that casts count as expressions and push the lookup to the table.

The surprise might come with

SELECT a AS c FROM nums ORDER BY c DESC NULLS FIRST;

Which will work as expected. Both DESC and NULLS FIRST are part of the sort clause itself, not of the sort expression. They sit above the identifier in the parse tree, so they never touch it. The parser still sees a bare c, takes the fast path, finds the alias, sorts by it, and then applies "descending, nulls first" on top of the resolved key.

The same cannot be said about collation.

SELECT 'A'::text AS x FROM nums ORDER BY x COLLATE "C";
-- ERROR: column "x" does not exist

This is a really bad one. COLLATE might look the same as a sort modifier, but it is not. It wraps the expression in the parse tree.

Parentheses are a special case.

SELECT -a AS a FROM nums ORDER BY (a);
-- works, sorts by alias

Postgres collapses redundant parens before the bare-identifier check, so (a) is still bare a. The seam is asymmetric in the way that maximises confusion: COLLATE is "still a name to a human, an expression to the parser", and (a) is "an expression to a human, still a name to the parser". You get both flavours of wrong intuition mixed here.

Unary plus. +a and a evaluate to the same value, but they do not parse to the same node.

SELECT -a AS a FROM nums ORDER BY a;
SELECT -a AS a FROM nums ORDER BY +a;

A plus sign you would not even think about changes which rows come out in which order.

The parser stores a column reference as a list of name parts: one part when it is unqualified, two or more once you add a table or schema. The fast path only fires on lists of length one.

Finally, schema- and table-qualified references. ORDER BY nums.a looks like an identifier, but it is not.

SELECT -a AS a FROM nums ORDER BY a;
SELECT -a AS a FROM nums ORDER BY nums.a;

 a
----
 -3
 -2
 -1
  0
(4 rows)

 a
----
  0
 -1
 -2
 -3
(4 rows)

Aliases that aren't the names you think

Here is one that cost me an afternoon once. Easy to come across once an ORM or a generated view declared the alias for you. SQLAlchemy, Hibernate, jOOQ, and most code generators quote anything that isn't pure lowercase. Two queries, identical except that the alias is quoted in one. Two different result sets.

SELECT -a AS  A  FROM nums ORDER BY a;    -- sorts by alias (-3,-2,-1,0)
SELECT -a AS "A" FROM nums ORDER BY a;    -- sorts by input (0,-1,-2,-3)

The bare-identifier check compares names with strcmp. Unquoted A folds to lowercase a and matches. Quoted "A" preserves case, stays A, and does not match the lowercase a in the ORDER BY. The lookup fails, the parser falls through to the expression path, the expression path finds the column a in nums, and the query runs successfully while doing something different from what you meant.

GROUP BY checks the opposite scope first

Both GROUP BY and ORDER BY accept a bare identifier, and both can resolve it either way: to a table column or to a SELECT-list alias. The difference is the order they check:

ORDER BY a looks at the SELECT list first, then the table.
GROUP BY a looks at the table first, then the SELECT list.

For most queries this never matters. The two clauses end up picking the same thing because nothing is shadowed. The surprise happens when an alias has the same name as a base column but a different value:

SELECT a/2 AS a, count(*)
FROM nums
GROUP BY a
ORDER BY a;

Now the two clauses disagree about what a means. GROUP BY a picks the input column (four distinct values, four groups, one row each). ORDER BY a picks the alias, which is a/2. The result has four rows because the grouping was on a finer-grained key than the projection:

 a | count
---+-------
 0 |     1
 0 |     1
 1 |     1
 1 |     1

Two rows where a/2 = 0 (from input 0 and 1), two where a/2 = 1 (from input 2 and 3). The duplicates are real. The same identifier means two different columns in two adjacent clauses of one query.

Window ORDER BY does not even pretend

This one trips people up because it does not look like a different clause.

SELECT a, -a AS neg, row_number() OVER (ORDER BY neg) FROM nums;
-- ERROR: column "neg" does not exist

OVER (ORDER BY ...) is a different parse path entirely. It does not check the targetlist at all, only the FROM scope. The bare-name fast path simply does not exist here.

SELECT a, -a AS neg, row_number() OVER (ORDER BY -a) FROM nums;
-- this works

Two ORDER BY clauses in the same query, two different scoping rules.

UNION ORDER BY is name-only

When ORDER BY follows a UNION, neither path is fully open.

-- ok
(SELECT a FROM nums) UNION ALL (SELECT 9) ORDER BY a;
-- ERROR
(SELECT a FROM nums) UNION ALL (SELECT 9) ORDER BY -a;
-- ERROR
(SELECT a FROM nums) UNION ALL (SELECT 9) ORDER BY a COLLATE "C";

The error message is unusually helpful:

Only result column names can be used, not expressions or functions. HINT: Add the expression/function to every SELECT, or move the UNION into a FROM clause.

Set operations do not have a single FROM scope to fall back to, so the expression path is closed entirely. Bare names or nothing.

The seam, in the source

Full disclosure: I got this section wrong three times before Claude Code helped me trace the actual parse tree. Lack of sleep from a whole night of geeking out over ORDER BY is the other plausible explanation.

Open `src/backend/parser/parse_clause.c` and find `findTargetlistEntrySQL92`. It is forty lines of comment, two `if` blocks, and a final `return`. SQL92's two resolution rules are tried first; SQL99 is the fallback.

Block one: the bare-name path. The gate is a ColumnRef node with exactly one name part, and that part must be a string identifier (not *, which is also a ColumnRef but with an A_Star field). If the node passes, the function walks the target list looking for a non-resjunk entry whose resname equals the identifier. The loop keeps going past the first match to detect ambiguity: identical expressions are fine (this is why SELECT a, a FROM nums ORDER BY a works), different expressions error out. On a unique match, return.

If the loop finds nothing, the block does not return. Control falls through. This is the case behind the quoted-alias surprise earlier in the post: AS "A" stores resname = "A", ORDER BY a looks up resname = "a", the strcmp fails, and the function moves on as if no SQL92 fast path applied.

GROUP BY is the small exception inside this block. The name is first tested against the FROM scope, and a hit there causes the targetlist loop to be skipped. That is how GROUP BY ends up preferring the input column.

Block two: the positional path. The gate is IsA(node, A_Const). A non-integer constant errors immediately ("non-integer constant in ..."), which catches ORDER BY NULL, ORDER BY 'a', ORDER BY TRUE. An integer is used as a 1-based position into the non-resjunk target list; anything outside the range errors as "position %d is not in select list". Block two never falls through.

Both 1 and -1 arrive here as integer A_Consts. doNegate in the grammar folds '-' Iconst into a single integer constant before the function ever runs, so ORDER BY 1 and ORDER BY -1 go through the same code, with only the integer value (and the result of the position lookup) differing.

The fallthrough. Anything not caught above reaches the last line:

/*
 * Otherwise, we have an expression, so process it per SQL99 rules.
 */
return findTargetlistEntrySQL99(pstate, node, tlist, exprKind);

That is the seam. SQL92 succeeds in two narrow shapes: a bare identifier with a matching alias, or an in-range positive integer. Everything else, including a bare identifier whose alias lookup found nothing, becomes a SQL99 expression resolved against FROM.

A useful workaround

If you want the alias inside an expression in ORDER BY, the portable trick is to wrap the query in a subselect:

SELECT *
FROM (SELECT -a AS x FROM nums) s
ORDER BY x + 0;

Now x is a real column in the FROM scope of the outer query. The expression path finds it. The seam has been moved out of the way.

This is, conceptually, what you would want the engine to do for you when you write ORDER BY x + 0 directly. The SQL-99 standard does not actually require that, though, and Postgres (along with SQL Server) documents explicitly that an alias inside an ORDER BY expression is not supported. So you do it by hand.

The boring takeaway

Most of the time none of this matters. You sort by a column you just selected, the alias and the input column have the same name and the same value, and either parser path gives the same answer. The seam is invisible.

The minute the alias and the input column disagree in expression, value, case, or anything wrapped around the identifier, the parser picks one or the other silently, by a rule older than most working programmers.

There are two parsers. The bare-name path is SQL-92, the expression path is SQL-99, and they were stitched together in the late 1990s. They still disagree about which scope your identifiers live in, and knowing which one you triggered tells you which scope.

If after reading this post you still have to stop and think for a minute before predicting what

SELECT -a AS a FROM nums ORDER BY a COLLATE "C";

does, that is the right reaction. It means you have the mental model.

The opening puzzle queries are from Jamie Brandon's comment on the Lobsters thread discussing Markus Winand's history of ORDER BY on modern-sql.com. Everything that follows here is the explanation that comment did not give. Both pieces are worth reading on their own.