Now let's examine the case of semi-open, semi-closed ranges:
select x from t where x > low_x and x <= high_x; -- (open, closed)
select x from t where x >= low_x and x < high_x; -- (closed, open)
Instead of repeating the same steps, that would be a rather tedious process, let's see how we can apply "formal method of ranges decomposition " to re-use the results for open and closed ranges already discussed.
The strategy is:
Similarly for three, four subranges.
We'll experiment (script range_sel_semi.sql ) on the case of num_distinct=4 only; remember that for this case we have:
left band : 0 < x < 2.5
central region: 2.5 < x < 7.5
right band : 7.5 < x < 10
For ranges completely contained in the left band:
For x > low_x and x <= high_x, choose X at random, eg X = 1.0:
SQL> select x from t where x > 0.1 and x <= 2.4;
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1920K| 5625K| 1440 (6)| 00:00:18 | (ORIGINAL)
|* 1 | TABLE ACCESS FULL| T | 1920K| 5625K| 1440 (6)| 00:00:18 |
--------------------------------------------------------------------------
SQL> select x from t where x > 0.1 and x < 1.0;
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 960K| 2812K| 1436 (6)| 00:00:18 | (LEFT)
|* 1 | TABLE ACCESS FULL| T | 960K| 2812K| 1436 (6)| 00:00:18 |
--------------------------------------------------------------------------
SQL> select x from t where x >= 1.0 and x <= 2.4;
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1960K| 5742K| 1440 (6)| 00:00:18 | (RIGHT)
|* 1 | TABLE ACCESS FULL| T | 1960K| 5742K| 1440 (6)| 00:00:18 |
--------------------------------------------------------------------------
960K + 1960K - 1000k = 1920K, as requested.
Note that the choice of X is irrelevant, since (LEFT) is open and so the right extreme(X) is ignored, and (RIGHT) is closed and so the left extreme (X again) is ignored as well.
Analitically:
(LEFT) + (RIGHT) - num_rows / num_distinct
= height_ramp - height_ramp * (low_x - min_x) / B
+ height_ramp + height_ramp * (high_x - min_x) / B
- num_rows / num_distinct
= (num_rows / num_distinct) (high_x - low_x) / B + num_rows / num_distinct
= 1000000 * (2.4 - 0.1) / 2.5 + 1000000 = 1920K.
For x >= low_x and x < high_x, choose X at random, eg X = 1.0:
SQL> select x from t where x >= 0.1 and x < 2.4;
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1000K| 2929K| 1436 (6)| 00:00:18 | (ORIGINAL)
|* 1 | TABLE ACCESS FULL| T | 1000K| 2929K| 1436 (6)| 00:00:18 |
--------------------------------------------------------------------------
SQL> select x from t where x >= 0.1 and x <= 1.0;
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1400K| 4101K| 1437 (6)| 00:00:18 | (LEFT)
|* 1 | TABLE ACCESS FULL| T | 1400K| 4101K| 1437 (6)| 00:00:18 |
--------------------------------------------------------------------------
SQL> select x from t where x > 1.0 and x < 2.4;
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 600K| 1757K| 1436 (6)| 00:00:18 | (RIGHT)
|* 1 | TABLE ACCESS FULL| T | 600K| 1757K| 1436 (6)| 00:00:18 |
--------------------------------------------------------------------------
1400K + 600K - 1000K = 1000K, as requested.
Note that the choice of X is irrelevant, but also that the actual values of low_x and high_x do not matter(!) .
Analitically:
(LEFT) + (RIGHT) - num_rows / num_distinct
= height_ramp + height_ramp * (X - min_x) / B
+ height_ramp - height_ramp * (X - min_x) / B
- num_rows / num_distinct
= num_rows / num_distinct = 1000K.
It's funny that just moving the equality from one extreme to the other can have such a big impact on the cardinality formula.
For ranges partially contained in the left band, and partially in the central region:
choose X = min_x + B
For x > low_x and x <= high_x:
SQL> select x from t where x > 0.1 and x <= 6.4;
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 3520K| 10M| 1447 (7)| 00:00:18 | (ORIGINAL)
|* 1 | TABLE ACCESS FULL| T | 3520K| 10M| 1447 (7)| 00:00:18 |
--------------------------------------------------------------------------
SQL> select x from t where x > 0.1 and x < 2.5;
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 960K| 2812K| 1436 (6)| 00:00:18 | (LEFT)
|* 1 | TABLE ACCESS FULL| T | 960K| 2812K| 1436 (6)| 00:00:18 |
--------------------------------------------------------------------------
SQL> select x from t where x >= 2.5 and x <= 6.4;
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 3560K| 10M| 1447 (7)| 00:00:18 | (CENTRAL)
|* 1 | TABLE ACCESS FULL| T | 3560K| 10M| 1447 (7)| 00:00:18 |
--------------------------------------------------------------------------
960K + 3560K - 1000K = 3520K, as requested.
Note that the standard formula applies :
4000000 * (6.4 - 0.1) / (10-0) + 1000000 = 3520K.
For x >= low_x and x < high_x:
SQL> select x from t where x >= 0.1 and x < 6.4;
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2560K| 7500K| 1443 (6)| 00:00:18 | (ORIGINAL)
|* 1 | TABLE ACCESS FULL| T | 2560K| 7500K| 1443 (6)| 00:00:18 |
--------------------------------------------------------------------------
SQL> select x from t where x >= 0.1 and x <= 2.5;
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2000K| 5859K| 1440 (6)| 00:00:18 | (LEFT)
|* 1 | TABLE ACCESS FULL| T | 2000K| 5859K| 1440 (6)| 00:00:18 |
--------------------------------------------------------------------------
SQL> select x from t where x > 2.5 and x < 6.4;
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1560K| 4570K| 1443 (6)| 00:00:18 | (CENTRAL)
|* 1 | TABLE ACCESS FULL| T | 1560K| 4570K| 1443 (6)| 00:00:18 |
--------------------------------------------------------------------------
2000K + 1560K - 1000K = 2560K, as requested.
Note that the standard formula applies if you chop the "<=" extreme to the band boundary (2.5):
4000000 * (6.4 - 2.5 ) / (10-0) + 1000000 = 2560K
For ranges partially contained in the left band, in the central region, and partially in the right band:
For x > low_x and x <= high_x:
SQL> select x from t where x > 0.1 and x <= 9.8;
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 3960K| 11M| 1449 (7)| 00:00:18 | (ORIGINAL)
|* 1 | TABLE ACCESS FULL| T | 3960K| 11M| 1449 (7)| 00:00:18 |
--------------------------------------------------------------------------
SQL> select x from t where x > 0.1 and x < 2.5;
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 960K| 2812K| 1436 (6)| 00:00:18 | (LEFT)
|* 1 | TABLE ACCESS FULL| T | 960K| 2812K| 1436 (6)| 00:00:18 |
--------------------------------------------------------------------------
SQL> select x from t where x > 2.5 and x < 7.5;
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2000K| 5859K| 1444 (7)| 00:00:18 | (CENTRAL)
|* 1 | TABLE ACCESS FULL| T | 2000K| 5859K| 1444 (7)| 00:00:18 |
--------------------------------------------------------------------------
SQL> select x from t where x >= 7.5 and x <= 9.8;
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2000K| 5859K| 1440 (6)| 00:00:18 | (RIGHT)
|* 1 | TABLE ACCESS FULL| T | 2000K| 5859K| 1440 (6)| 00:00:18 |
--------------------------------------------------------------------------
960K + 2000K + 2000K - 1000k = 3960K, as requested.
Note that the standard formula applies if you chop the "<=" extreme to the band boundary (7.5):
4000000 * (7.5 - 0.1) / (10-0) + 1000000 = 3960K
For x >= low_x and x < high_x:
SQL> select x from t where x >= 0.1 and x < 9.8;
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 3920K| 11M| 1449 (7)| 00:00:18 | (ORIGINAL)
|* 1 | TABLE ACCESS FULL| T | 3920K| 11M| 1449 (7)| 00:00:18 |
--------------------------------------------------------------------------
SQL> select x from t where x >= 0.1 and x <= 2.5;
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2000K| 5859K| 1440 (6)| 00:00:18 | (LEFT)
|* 1 | TABLE ACCESS FULL| T | 2000K| 5859K| 1440 (6)| 00:00:18 |
--------------------------------------------------------------------------
SQL> select x from t where x > 2.5 and x < 7.5;
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2000K| 5859K| 1444 (7)| 00:00:18 | (CENTRAL)
|* 1 | TABLE ACCESS FULL| T | 2000K| 5859K| 1444 (7)| 00:00:18 |
--------------------------------------------------------------------------
SQL> select x from t where x > 7.5 and x < 9.8;
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 920K| 2695K| 1436 (6)| 00:00:18 | (RIGHT)
|* 1 | TABLE ACCESS FULL| T | 920K| 2695K| 1436 (6)| 00:00:18 |
--------------------------------------------------------------------------
2000K + 2000K + 920K - 1000k = 3920K, as requested.
Note that the standard formula applies if you chop the ">=" extreme to the band boundary (2.5):
4000000 * (9.8 - 2.5 ) / (10-0) + 1000000 = 3920K
Special case for x > low_x = min_x (or x < high_x = max_x for the right ramp):
SQL> select x from t where x > 0 and x <= 1;
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 400K| 1171K| 1437 (6)| 00:00:18 |
|* 1 | TABLE ACCESS FULL| T | 400K| 1171K| 1437 (6)| 00:00:18 |
--------------------------------------------------------------------------
SQL> select x from t where x > 0 + 1e-12 and x <= 1;
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1400K| 4101K| 1437 (6)| 00:00:18 |
|* 1 | TABLE ACCESS FULL| T | 1400K| 4101K| 1437 (6)| 00:00:18 |
--------------------------------------------------------------------------
SQL> select x from t where x > 0 and x <= 3;
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1200K| 3515K| 1441 (6)| 00:00:18 |
|* 1 | TABLE ACCESS FULL| T | 1200K| 3515K| 1441 (6)| 00:00:18 |
--------------------------------------------------------------------------
SQL> select x from t where x > 0 + 1e-12 and x <= 3;
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2200K| 6445K| 1441 (6)| 00:00:18 |
|* 1 | TABLE ACCESS FULL| T | 2200K| 6445K| 1441 (6)| 00:00:18 |
--------------------------------------------------------------------------
SQL> select x from t where x > 0 and x <= 9;
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 3000K| 8789K| 1444 (7)| 00:00:18 |
|* 1 | TABLE ACCESS FULL| T | 3000K| 8789K| 1444 (7)| 00:00:18 |
--------------------------------------------------------------------------
SQL> select x from t where x > 0 + 1e-12 and x <= 9;
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 4000K| 11M| 1449 (7)| 00:00:18 |
|* 1 | TABLE ACCESS FULL| T | 4000K| 11M| 1449 (7)| 00:00:18 |
--------------------------------------------------------------------------
SQL> select x from t where x > 0 and x < 10;
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2000K| 5859K| 1444 (7)| 00:00:18 |
|* 1 | TABLE ACCESS FULL| T | 2000K| 5859K| 1444 (7)| 00:00:18 |
--------------------------------------------------------------------------
SQL> select x from t where x > 0 + 1e-12 and x < 10-1e-12;
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 4000K| 11M| 1449 (7)| 00:00:18 |
|* 1 | TABLE ACCESS FULL| T | 4000K| 11M| 1449 (7)| 00:00:18 |
--------------------------------------------------------------------------
The rule seems to be "compute as normal, then subtract num_rows/num_distinct for each range extreme matching the corresponding interval extreme".
Note that for x >= low_x = min_x instead, we don't see anything special:
SQL> select x from t where x >= 0 and x <= 1;
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1400K| 4101K| 1437 (6)| 00:00:18 |
|* 1 | TABLE ACCESS FULL| T | 1400K| 4101K| 1437 (6)| 00:00:18 |
--------------------------------------------------------------------------
SQL> select x from t where x >= 0 + 1e-12 and x <= 1;
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1400K| 4101K| 1437 (6)| 00:00:18 |
|* 1 | TABLE ACCESS FULL| T | 1400K| 4101K| 1437 (6)| 00:00:18 |
--------------------------------------------------------------------------
Let's move to the conclusion .
For corrections / feedback:
alberto.dellera@gmail.com