Tuesday, December 7, 2010

DataStage Stages and Jobs

Posted by Venkat ♥ Duvvuri 9:48 PM, under | 1 comment

An IBM InfoSphere DataStage job consists of individual stages linked together which describe the flow of data from a data source to a data target. A stage usually has at least one data input and/or one data output. However, some stages can accept more than one data input, and output to more than one stage. Each stage has a set of predefined and editable properties that tell it how to perform or process data. Properties might include the file name for the Sequential File stage, the columns to sort, the transformations to perform, and the database table name for the DB2 stage. These properties are viewed or edited using stage editors. Stages are added to a job and linked together using the Designer. Figure shows some of the stages and their iconic representations.


Stages and links can be grouped in a shared container. Instances of the shared container can then be reused in different parallel jobs. You can also define a local container within a job — this groups stages and links into a single unit, but can only be used within the job in which it is defined. The different types of jobs have different stage types. The stages that are available in the Designer depend on the type of job that is currently open in the Designer. Parallel Job stages are organized into different groups on the Designer palette as follows:
General includes stages such as Container and Link. Data Quality includes stages such as Investigate, Standardize, Reference Match, and Survive.

Database includes stages such as Classic Federation, DB2 UDB, DB2 UDB/Enterprise, Oracle, Sybase, SQL Server®, Teradata, Distributed Transaction, and ODBC.
Development/Debug includes stages such as Peek, Sample, Head, Tail, and Row Generator.
File includes stages such as Complex Flat File, Data Set, Lookup File Set, and Sequential File.
Processing includes stages such as Aggregator, Copy, FTP, Funnel, Join, Lookup, Merge, Remove Duplicates, Slowly Changing Dimension, Surrogate Key Generator, Sort, and Transformer
Real Time includes stages such as Web Services Transformer, WebSphere MQ, and Web Services Client.
Restructure includes stages such as Column Export and Column Import.

1 comments:

Post a Comment