StartwithTeradata.com: Partitioning table and Partition Primary Index

Queries die many times because range queries do a full table scan, but the partitioned table never runs out of spool!.
In Range Query, it uses the keyword BETWEEN.
The BETWEEN keyword in Teradata means find everything in the range BETWEEN this date and this other date.
The BETWEEN statement is said to be inclusive. If someone said to me tell me what is BETWEEN the numbers 8 and 10 I would normally say, “The number 9”.
In Teradata land I would be wrong because the BETWEEN statement is inclusive, so it INCLUDES the starting and ending numbers. What is BETWEEN 8 and 10? The numbers 8, 9 and 10!.
Partitioned tables work very well on Range Queries using the keyword BETWEEN.

Why we had to perform a Full Table Scan

Order_Table spread across the AMPs. Notice that two colors are color coded the January and February dates.
Also notice that January and February dates are mixed on every AMP in what is a random order.
Because the January Data is on all AMPs and because the January Dates are randomly mixed we have to do Full Table Scan.
We had no indexes on Order_Date so it is obvious the PE will command the AMPs to do a Full Table Scan.
By Partitioning the table and we can prevent the Full Table Scan.
We partition tables so we won’t have to do a Full Table Scan on our Range Queries.

A Partitioned Table

Each AMP always sorts its rows by the Row-ID in order to do a Binary Search on Primary Index queries.
Partitioned Table will have the AMPs first sort their rows by the Partition.
Rows on an AMP don’t change AMPs because the table is partitioned If the table is partitioned then the AMP will sort its rows by the partition.
Partitioning doesn't affect distribution. Partitioning only affects how each AMP sorts the rows they get!!!

Add the Partition to the Row-ID for the Row Key

A partitioned table will always add two bytes to every row as part of the Row-ID.
If a table is partitioned, the partition number is placed in front of the Row-ID for each row.
This combination of the Partition number, Row-Hash, and Uniqueness value are now called the ROW KEY.
Instead of sorting by the Row-ID we are merely first sorting by the Partition Number. We are really just sorting by the Row Key!
If a table is NOT partitioned the Partition Number is merely set to ZERO!

Partition Primary Index:-

Suppose we are retrieving some portion of data from table.
To prevent Full table scan, we are using partitioning table.
In PPI we are using some keywords. Those are Range_n, Case_n ,No case, Unknown.
In Range_n we can retrieve range data, between dates or between some numbers…etc.
In Case_n we can retrieve data by using case operations.
In ‘no case’ partition data will be, but the data is, which data not satisfy the case condition.
The table can have up to 65,535 partitions.
Partitioning never determines which AMP gets a row.

Multi-Level Partitioning

You can have up to 15 levels of partitions within partitions.
Partitioned Table merely tells each AMP how to sort their rows for the table.
So think of Multi-Level partitioning as a table with multiple sort keys. The first partition statement is how the data is sorted first. The second partition statement is the second sort key.
Entire purpose of partitioning is to eliminate the Full Table Scan. Instead of reading all rows in a table each AMP merely has to one or more of their partitions.

Ex:- CREATE TABLE ORDER_DATA ( ORDER_NUM INTEGER NOT NULL

,CUST_NUM INTEGER

,ORDER_DATE DATE

,ORDER_TOT DECIMAL(10,2))PRIMARY INDEX(ORDER_NUM)

PARTITION BY (RANGE_N ( ORDER_DATE BETWEEN DATE ‘2012-01-01’ AND DATE 2012-12-31 EACH INTERVAL ‘1’ DAY), CASE_N(ORDER_TOT<1000, ORDER_TOT<5000, ORDER_TOT< 6000, NO CASE, UNKNOWN));

StartwithTeradata.com

Pages

About my Blog

Monday, 5 November 2012

Partitioning table and Partition Primary Index

Why we had to perform a Full Table Scan

A Partitioned Table

Add the Partition to the Row-ID for the Row Key

Partition Primary Index:-

Multi-Level Partitioning

No comments:

Post a Comment