Saturday, August 30, 2014

How do I make the Oracle EE Stage Read Operator Run in Parallel?

Posted by Venkat ♥ Duvvuri 11:49 PM, under | 1 comment


By default the oraread operator is set to run in sequential mode. To enable parallel mode on oraread , you have to add the partition table property. This is set in the Source properties section. Then you specify the table name of this property. If more than one table is being used in the select statement, then specify only one of the tables used.

You can then specify a Partitioning Algorithm to use when partitioning the table across nodes.

It is recommend that you add the environment APT_ORAREAD_PARALLEL_ALGORITHM to the job and set its value to ROWID_HASH

APT_ORAREAD_PARALLEL_ALGORITHM

This environment variable is used to determine which partitioning algorithm to use during Oracle Enterprise parallel read operations. The algorithm defines how the stage divides the input dataset into subsets so that each parallel instance of the stage reads one subset of the data. The environment variable can be set to one of the following values:


ROUND_ROBIN - The operator divides the rows from the input dataset in a round-robin fashion using modulus function applied on the row identifier (ROWID) values of the rows within the storage blocks in which they reside

ROWID_HASH - The operator divides the rows from the input dataset in an approximately random fashion using modulus function applied on the hash codes calculated from the rowid values of the rows (recommended)

ROWID_RANGE - The operator divides the rows from the input dataset by taking into account the physical collocation of rows in the table segment and splitting the overall range of rowid values into sub-ranges. This is the default option and is a preferred option for use.

If the environment variable is not defined or is set to a value other than the values listed above, the ROWID_RANGE value is used by default.

If the ROWID_RANGE option is selected (either explicitly or implicitly) and the Oracle user does not have access to DBA_EXTENTS dictionary view, or the target table is an index-organized table (IOT) or a view, then the stage cannot use ROWID_RANGE algorithm and it automatically switches at runtime to using ROWID_HASH partitioning algorithm instead.

Article Source: IBM Support Guide

Saturday, May 10, 2014

DataStage Scenario on Finding Unique Distance from Source and Destination

Posted by Venkat ♥ Duvvuri 2:46 AM, under | 1 comment

Source,Destination,Distance
hyd,bang,1000
delhi,chennai,1500
chennai,bang,600
bang,hyd,1000
bombay,pune,1000
bang,chennai,600

**Sorted on Distance**

Source,Destination,Distance
bang,chennai,600
chennai,bang,600
bang,hyd,1000
bombay,pune,1000
hyd,bang,1000
delhi,chennai,1500

**Stg Variables**
PrevActualStr: ActualStr
ActualStr: inp.Src:inp.Dest
RevStr: inp.Dest:inp.Src
isDup: if PrevActualStr = RevStr then 1 else 0

**Constraint**
isDup = 0 [Constraint]

The below steps will explain you, How the records will be processed to o/p based on the above criteria.

Rec-1:

PrevActualStr: Garbage value
ActualStr: bangchennai
RevStr: chennaibang
isDup: 0 [if PrevActualStr = RevStr then 1 else 0]

isDup = 0 [Rec will be processed to o/p as per our constraint isDup=0]

O/P
Src Dest Distance
bang chennai 600


Rec-2:

PrevActualStr: bangchennai
ActualStr: chennaibang
RevStr: bangchennai
isDup: 1 [if PrevActualStr = RevStr then 1 else 0]

isDup = 1 [Rec willn't be processed to o/p as our constraint isDup=0]

O/P
Src Dest Distance
bang chennai 600

Rec-3

PrevActualStr: chennaibang
ActualStr: banghyd
RevStr: hydbang
isDup: 0 [if PrevActualStr = RevStr then 1 else 0]

isDup = 0

O/P
Src Dest Distance
bang chennai 600
bang hyd 1000
---------------------
====================