About my Blog

This blog will help the people, who are interested in Learning Teradata basics in deep.. And it will be helpful for Certification and interview perspective also..

By Santhosh.B

Thursday, 1 November 2012

How Teradata Distribute the data?

  • The Primary Index is the column(s) that lays out the data row to the proper AMP.
  • The Primary Index column(s) is also the fastest way to retrieve a row from that same AMP(this one we will discuss in next post).
  • Teradata takes a table and spreads the rows across the AMPs one row at a time.
  • A Unique Primary Index on the table will spread the data rows perfectly evenly across the AMPs.
  • Teradata knows exactly which rows went to which AMPs so retrieval is always a 1-AMP operation when users use the Primary Index in the WHERE Clause of their SQL. Here is how that works.


Hashing the primary Index Value and Placing the row's.

  • The Teradata Parsing Engine will take the Primary index value of a row and run a math calculation called the hash formula on that primary index column value.
  • This hash formula does't change and can be calculated on any value or datatype.
  • The result of hash formula will result in a number ranging from one to one million.
  • The Teradata hashMAP with one million buckets, those buckets contain AMP number's. For example :- we have 4 AMP's in our TD system, so the million buckets contain numbers like 1,2,3,4,1,2,3,4,1,2,3,4,1,2,3,4,........ million buckets. 1st bucket contain 1st amp number, 2nd bucket contain 2nd amp num, 3rd bucket contain 3rd amp num, 4th bucket contain 4th amp number, and 5th bucket contain 1st amp number, 6th bucket contain 2nd amp num.... like this upto million.
  • 1st PE will generate row hash value of primary index value. that is 32 bit value, Is called Row Hash value.
  • In that 32 bit value,1st 16 digits are find the Hash bucket value.(for example: think 1st 16 bit's of Row hash value is gave value like '20', so PE will go to that 20th number bucket, and that will take amp number)
  • in that bucket we will get AMP number.
  • based on this , the row will lays in AMP's.
  • for example: We have one table in that table we have two columns number, name.. in this 'number' is primary index. now we are inserting one row in our table.
  • INSERT INTO TABLENAME VALUES(101,'SANTHOSH').. in this primary index value is 101, so 101 value will go to hashmap for math calculations, and that will give 32 bit value of 101. in that 32 bit value 1st 16bit's will give one number for example take '34', so our bucket number is 34, in that bucket, we have amp value like '2'... now our row will goto that 2nd amp..


Thanks for visiting my Blog.. and if i missed any thing please let me know.. or if you have any doubt's please give me your valuable comment's.. i will give you proper answer.. 

Thanking you all..

Tuesday, 30 October 2012

Teradata Architecture


Basically Teradata architecture have three components..
those are:
1) PE(Parser Engine)
2)BYNET
3) AMP(Access Module Processor).


PE-parser Engine
  • Heart of the Teradata.
  • The Parsing Engines are perfectly balanced, with each having the capability to handle up to 120 users at a time.
  • This could be 120 distinct users or a single user utilizing the power of all 120 sessions for a single application.
  • That is why there are multiple PE’s in every Teradata system.
  • Each PE has total command over every AMP.
  • Each PE will take users SQL and do three things:
1.Syntax check - check the users SQL syntax.
2.Security Check - check the users ACCESS RIGHTS.
3.Plan - PLAN to satisfy the user request.
  • The fastest plan is a Single-AMP retrieve.
  • The second fastest plan is a Two-AMP retrieve.
  • The next fastest plan will be all AMPs reading only a portion of the table, and The slowest plan is the full table scan. That is where each AMP reads every row they contain for a table.

AMP(Access Module Processor)

  • Each PE rules them all because the rows of every table are spread across all the AMPs.
  •  AMPs organize every table in separate blocks.
  • PE passes the PLAN to the AMPs over the BYNET.
  •  When a table is first created each AMP creates a table header on their disk.
  •  When the table is loaded each AMP receives rows for that table that they and only they own.
  • They carefully place the rows inside data blocks where they can easily be retrieved.

BYNET

  • The PE comes up with a PLAN and passes the plan to the AMPs in steps over the BYNET.
  • AMPs then retrieve the data requested by the PE and they deliver their portion of the answer set to the PE over the BYNET.
  • BYNET provides the communications between AMPs and Pes.
  • There are always two BYNETs for redundancy and extra bandwidth. AMPs and PEs can use both BYNETs to send and retrieve data simultaneously.


Database and Logical Modeling?


 Database:
                A database is collection of permanently stored data used by an application or enterprise. 
A database contains logically related data, which means that the database was created purpose of mind.  
A database supports shared access by many users.
·Protected access to data is controlled.
·Managed data has integrity and value.
·Based on the relational model.
Logical Modeling:
Tables are logically created for all database systems.
The logical model should be independent of usage. A variety of front end tools can be accommodated simultaneously so that the database can be created more quickly. Teradata supports normalized logical models, because we are able to perform 64 table joins and we are to perform large aggregations during queries.
A key Teradata strength is our ability to model the business of customers. Teradata Business models are truly normalized avoiding the costly star schema, snowflake. Teradata can do star schema and other types of relational modeling, but 3NF is recommended.

Teradata Advantages

·Automatic, Even data distribution.
·High scalability.
·Mature optimizer (complex queries, 64 joins, ad-hoc processing).
·Model the business: 3NF, Star schema..etc.
·Lowest TCO (Total Cost of Ownership):  Easy to install, Easy to work, Easy to manage and robust utilities.
·Acts like Single DataStore.

  • Many bulk load utilities: BTEQ, FASTLOAD, MULTILOAD, TPUMP, FAST EXPORT…

why Teradata?

·When compare other RDBMS it is good in performance wise, because it is shared nothing architecture.
·It can store billions of rows.
·Unconditional parallelism.
·Using Indexes for better storing and fast retrieval.
·Supports easy scalability from small (10GB) to a massive database (100+TB).
·System to grow to support more users/data/queries/complexity of queries without experiencing performance degradation.
·Provides a parallel aware optimizer that makes query tuning unnecessary to get a query to run.
·And Optimizer determines the least expensive plan (time-wise) to process queries fast and in parallel.
·Automatic and even distribution avoiding complex indexing schemes or time consuming
reorganizations.Single operational view of the entire MPP (massively parallel processing) system and single point of control for the DBA (TD manager).

Monday, 29 October 2012

What is Teradata(TD)?

·TD is a RDBMS in DW Environment.
·TD is an open system means that it is platform independent and satisfies industry standards.
·Compatible with industry ANSI standards.
·And it is currently available for the UNIX and WINDOWS operating system.
In this, for Teradata 13.0 version will support only windows7 32bit only. If you want to install in windows 7 64bit you should use VMvare.
And TD13.0 version is not supporting for windows XP. TD 12.0 version will install in XP o.s.

Check bellow table once.

Windows 7 64bit
Windows 7 32bit
Windows XP
TD 13.0
Use  VMvare
It will support
Use VMvare
TD 12.0
VMvare
VMvare
It will support
·It will run on single or multiple nodes or severs.
·And it can act’s like a server.
·Built in parallelism (for this I will give you deep explanation in coming posts).
·Client platforms access the database through TCP-IP connection or across an IBM mainframe channel connection.
·Large database server.
·The Teradata Database was the first commercial database system to support a trillion bytes of data. 10^12= 1,000,000,000,000 (Trillion) bytes.
·Built on a parallel architecture.