I am using Oracle 10g Enterprise edition.
A table in our Oracle database stores the soundex value representation of another text column. We are using a custom soundex implementation in which the soundex values are longer than are generated by traditional soundex algorithms (such as the one Oracle uses). That's really beside the point.
Basically I have a varchar2 column that has values containing a single character followed by a dynamic number of numeric values (e.g. 'A12345', 'S382771', etc). The table is partitioned by another column, but I'd like to add a partitioned index to the soundex column since it is often searched. When trying to add a range partitioned index using the first character of the soundex column it worked great:
create index IDX_NAMES_SOUNDEX on NAMES_SOUNDEX (soundex)
global partition by range (soundex) (
partition IDX_NAMES_SOUNDEX_PART_A values less than ('B'), -- 'A%'
partition IDX_NAMES_SOUNDEX_PART_B values less than ('C'), -- 'B%'
...
);
However, I in order to more evenly distribute the size of the partitions, I want to define some partitions by the first two chars, like so:
create index IDX_NAMES_SOUNDEX on NAMES_SOUNDEX (soundex)
global partition by range (soundex) (
partition IDX_NAMES_SOUNDEX_PART_A5 values less than ('A5'), -- 'A0% - A4%'
partition IDX_NAMES_SOUNDEX_PART_A values less than ('B'), -- 'A4% - A9%'
partition IDX_NAMES_SOUNDEX_PART_B values less than ('C'), -- 'B%'
...
);
I'm not sure how to properly range partition using varchar2 columns. I'm sure this is a less than ideal choice, so perhaps someone can recommend a better solution. Here's a distribution of the soundex data in my table:
-----------------------------------
| SUBSTR(SOUNDEX,1,1) | COUNT |
-----------------------------------
| A | 6476349 |
| B | 854880 |
| D | 520676 |
| F | 1200045 |
| G | 280647 |
| H | 3048637 |
| J | 711031 |
| K | 1336522 |
| L | 348743 |
| M | 3259464 |
| N | 1510070 |
| Q | 276769 |
| R | 1263008 |
| S | 3396223 |
| V | 533844 |
| W | 555007 |
| Y | 348504 |
| Z | 1079179 |
-----------------------------------
As you can see, the distribution is not evenly spread, which is why I want to define range partitions using the first two characters instead of just the first character.
Suggestions?
Thanks!
Best Solution
What exactly is your question?
Don't you know how you can split your table in n equal parts to avoid skew?
You can do that with analytic function percentile_disc().
Here an SQL PLUS example with n=100, I admit that it isn't very sophisticated but it will do the job.
This will output in file parts.lst:
Now you can run script parts.lst to get the partition values. Each partition will contain 1% of the data initially.
Script parts.lst will output: