query optimizer assumes uniform distribution
The query optimizer makes 4 assumptions about the data in tables.
One of them is that the values in statistics are uniformly distributed in the table.
For example, we have a table with 1,000,000 rows and
in one column half are '0' and '1'.
The column statistics show this.
All the '1' values are at the front of the table and all the '0' at the back.
For a query like "select .. where flag = '0'",
the query optimizer picks a "cluster index scan" execution plan
because the uniformity assumptions makes in think the values are
0 1 0 1 0 1 0 1
and it will find a '0' row in the first couple of rows.
But because the assumption is invalid on this table, the plan ends
up scanning half the table on every execution.
We evaluated 3 ways to trick SQL into picking a better plan.
But why should we need to do this?
Give us a way to tell the query optimizer when its 4 assumptions do
not apply to a table or query.
Or better yet, have update stats figure it out.