- Queries die many times because range queries do a full table scan, but the partitioned table never runs out of spool!.
- In Range Query, it uses the keyword BETWEEN.
- The BETWEEN keyword in Teradata means find everything in the range BETWEEN this date and this other date.
- The BETWEEN statement is said to be inclusive. If someone said to me tell me what is BETWEEN the numbers 8 and 10 I would normally say, “The number 9”.
- In Teradata land I would be wrong because the BETWEEN statement is inclusive, so it INCLUDES the starting and ending numbers. What is BETWEEN 8 and 10? The numbers 8, 9 and 10!.
- Partitioned tables work very well on Range Queries using the keyword BETWEEN.
Why we had to perform a Full Table Scan
- Order_Table spread across the AMPs. Notice that two colors are color coded the January and February dates.
- Also notice that January and February dates are mixed on every AMP in what is a random order.
- Because the January Data is on all AMPs and because the January Dates are randomly mixed we have to do Full Table Scan.
- We had no indexes on Order_Date so it is obvious the PE will command the AMPs to do a Full Table Scan.
- By Partitioning the table and we can prevent the Full Table Scan.
- We partition tables so we won’t have to do a Full Table Scan on our Range Queries.
A Partitioned Table
- Each AMP always sorts its rows by the Row-ID in order to do a Binary Search on Primary Index queries.
- Partitioned Table will have the AMPs first sort their rows by the Partition.
- Rows on an AMP don’t change AMPs because the table is partitioned If the table is partitioned then the AMP will sort its rows by the partition.
- Partitioning doesn't affect distribution. Partitioning only affects how each AMP sorts the rows they get!!!
Add the Partition to the Row-ID for the Row Key
- A partitioned table will always add two bytes to every row as part of the Row-ID.
- If a table is partitioned, the partition number is placed in front of the Row-ID for each row.
- This combination of the Partition number, Row-Hash, and Uniqueness value are now called the ROW KEY.
- Instead of sorting by the Row-ID we are merely first sorting by the Partition Number. We are really just sorting by the Row Key!
- If a table is NOT partitioned the Partition Number is merely set to ZERO!
Partition Primary Index:-
- Suppose we are retrieving some portion of data from table.
- To prevent Full table scan, we are using partitioning table.
- In PPI we are using some keywords. Those are Range_n, Case_n ,No case, Unknown.
- In Range_n we can retrieve range data, between dates or between some numbers…etc.
- In Case_n we can retrieve data by using case operations.
- In ‘no case’ partition data will be, but the data is, which data not satisfy the case condition.
- The table can have up to 65,535 partitions.
- Partitioning never determines which AMP gets a row.
Multi-Level Partitioning
- You can have up to 15 levels of partitions within partitions.
- Partitioned Table merely tells each AMP how to sort their rows for the table.
- So think of Multi-Level partitioning as a table with multiple sort keys. The first partition statement is how the data is sorted first. The second partition statement is the second sort key.
- Entire purpose of partitioning is to eliminate the Full Table Scan. Instead of reading all rows in a table each AMP merely has to one or more of their partitions.
,CUST_NUM INTEGER
,ORDER_DATE DATE
,ORDER_TOT DECIMAL(10,2))PRIMARY
INDEX(ORDER_NUM)
PARTITION BY (RANGE_N ( ORDER_DATE BETWEEN DATE ‘2012-01-01’
AND DATE 2012-12-31 EACH INTERVAL ‘1’ DAY), CASE_N(ORDER_TOT<1000,
ORDER_TOT<5000, ORDER_TOT< 6000, NO CASE, UNKNOWN));
No comments:
Post a Comment