August 2014 ~ IBM InfoSphere DataStage

By default the oraread operator is set to run in sequential mode. To enable parallel mode on oraread , you have to add the partition table property. This is set in the Source properties section. Then you specify the table name of this property. If more than one table is being used in the select statement, then specify only one of the tables used.

You can then specify a Partitioning Algorithm to use when partitioning the table across nodes.

It is recommend that you add the environment APT_ORAREAD_PARALLEL_ALGORITHM to the job and set its value to ROWID_HASH

APT_ORAREAD_PARALLEL_ALGORITHM

This environment variable is used to determine which partitioning algorithm to use during Oracle Enterprise parallel read operations. The algorithm defines how the stage divides the input dataset into subsets so that each parallel instance of the stage reads one subset of the data. The environment variable can be set to one of the following values:

ROUND_ROBIN - The operator divides the rows from the input dataset in a round-robin fashion using modulus function applied on the row identifier (ROWID) values of the rows within the storage blocks in which they reside

ROWID_HASH - The operator divides the rows from the input dataset in an approximately random fashion using modulus function applied on the hash codes calculated from the rowid values of the rows (recommended)

ROWID_RANGE - The operator divides the rows from the input dataset by taking into account the physical collocation of rows in the table segment and splitting the overall range of rowid values into sub-ranges. This is the default option and is a preferred option for use.

If the environment variable is not defined or is set to a value other than the values listed above, the ROWID_RANGE value is used by default.

If the ROWID_RANGE option is selected (either explicitly or implicitly) and the Oracle user does not have access to DBA_EXTENTS dictionary view, or the target table is an index-organized table (IOT) or a view, then the stage cannot use ROWID_RANGE algorithm and it automatically switches at runtime to using ROWID_HASH partitioning algorithm instead.

Article Source: IBM Support Guide

Venkat ♥ Duvvuri

Saturday, August 30, 2014

How do I make the Oracle EE Stage Read Operator Run in Parallel?

Categories

Labels

Blog Archive

Blog Archive

Popular Posts