In its simplest form, IBM InfoSphere DataStage performs data transformation and movement from source systems to target systems in batch and in real time. The data sources might include indexed files, sequential files, relational databases, archives, external data sources, enterprise applications, and message queues.
DataStage manages data that arrives and data that is received on a periodic or scheduled basis. It enables companies to solve large-scale business problems with high-performance processing of massive data volumes. By leveraging the parallel processing capabilities of multiprocessor hardware platforms, DataStage can scale to satisfy the demands of ever-growing data volumes, stringent real-time requirements, and ever-shrinking batch windows.
Leveraging the combined suite of IBM Information Server, DataStage can simplify the development of authoritative master data by showing where and how information is stored across source systems. DataStage can also consolidate disparate data into a single, reliable record, cleanses and standardizes information, removes duplicates, and links records together across systems. This master record can be loaded into operational data stores, data warehouses, or master data applications such as IBM MDM using IBM InfoSphere DataStage.
IBM InfoSphere DataStage delivers four core capabilities:
* Connectivity to a wide range of mainframe, legacy, and enterprise applications, databases, file formats, and external information sources.
* Prebuilt library of more than 300 functions including data validation rules and very complex transformations.
* Maximum throughput using a parallel, high-performance processing architecture.
* Enterprise-class capabilities for development, deployment, maintenance, and high-availability. It leverages metadata for analysis and maintenance. It also operates in batch, real time, or as a Web service.
IBM InfoSphere DataStage enables an integral part of the information integration process.
Functions used in IBM InfoSphere DataStage and QualityStage
The functions that are valid in IBM® InfoSphere® DataStage® and QualityStage are also valid in IBM Information Server
FastTrack, The following is a list of functions that are generally used when defining a column derivation in a Transformer stage.
- Date and time functions
- Logical functions
- Mathematical functions
- Null handling functions
- Number functions
- Raw functions
- String functions
- Vector function
- Type conversion functions
- Type casting functions
- Utility functions
Date and time functions
The following table lists the functions that are available in the Date and Time category (Square brackets indicate an argument is optional):
Name | Description | Arguments | Output |
---|---|---|---|
DateFromDaysSince | Returns a date by adding an integer to a baseline date | number (int32) [baseline date] | date |
DateFromJulianDay | Returns a date from the given julian date | juliandate (uint32) | date |
DaysSinceFromDate | Returns the number of days from source date to the given date | source_date | days since (int32) |
HoursFromTime | Returns the hour portion of a time | time | hours (int8) |
JulianDayFromDate | Returns julian day from the given date | date | julian date (int32) |
MicroSecondsFromTime | Returns the microsecond portion from a time | time | microseconds (int32) |
MinutesFromTime | Returns the minute portion from a time | time | minutes (int8) |
MonthDayFromDate | Returns the day of the month given the date | date | day (int8) |
MonthFromDate | Returns the month number given the date | date | month number (int8) |
NextWeekdayFromDate | Returns the date of the specified day of the week soonest after the source date | source date | date |
PreviousWeekdayFromDate | Returns the date of the specified day of the week most recent before the source date | source date | date |
SecondsFromTime | Returns the second portion from a time | time | seconds (dfloat) |
SecondsSinceFromTimestamp | Returns the number of seconds between two timestamps | timestamp base timestamp | seconds (dfloat) |
TimeDate | Returns the system time and date as a formatted string | - | system time and date (string) |
TimeFromMidnightSeconds | Returns the time given the number of seconds since midnight | seconds (dfloat) | time |
TimestampFromDateTime | Returns a timestamp form the given date and time | date time | timestamp |
TimestampFromSecondsSince | Returns the timestamp from the number of seconds from the base timestamp | seconds (dfloat) | timestamp |
TimestampFromTimet | Returns a timestamp from the given unix time_t value | timet (int32) | timestamp |
TimetFromTimestamp | Returns a unix time_t value from the given timestamp | timestamp | timet (int32) |
WeekdayFromDate | Returns the day number of the week from the given date. Origin day optionally specifies the day regarded as the first in the week and is Sunday by default | date [origin day] | day (int8) |
YeardayFromDate | Returns the day number in the year from the given date | date | day (int16) |
YearFromDate | Returns the year from the given date | date | year (int16) |
YearweekFromDate | Returns the week number in the year from the given date | date | week (int16) |
Logical Functions
The following table lists the functions available in the Logical category (square brackets indicate an argument is optional):
Mathematical Functions
The following table lists the functions available in the Mathematical category (square brackets indicate an argument is optional):
Name | Description | Arguments | Output |
---|---|---|---|
Abs | Absolute value of any numeric expression | number (int32) | result (dfloat) |
Acos | Calculates the trigonometric arc-cosine of an expression | number (dfloat) | result (dfloat) |
Asin | Calculates the trigonometric arc-sine of an expression | number (dfloat) | result (dfloat) |
Atan | Calculates the trigonometric arc-tangent of an expression | number (dfloat) | result (dfloat) |
Ceil | Calculates the smallest dfloat value greater than or equal to the given decimal value | number (decimal) | result (dfloat) |
Cos | Calculates the trigonometric cosine of an expression | number (dfloat) | result (dfloat) |
Cosh | Calculates the hyperbolic cosine of an expression | number (dfloat) | result (dfloat) |
Div | Outputs the whole part of the real division of two real numbers (dividend, divisor) | dividend (dfloat) divisor (dfloat) | result (dfloat) |
Exp | Calculates the result of base 'e' raised to the power designated by the value of the expression | number (dfloat) | result (dfloat) |
Fabs | Calculates the absolute value of the given value | number (dfloat) | result (dfloat) |
Floor | Calculates the largest dfloat value less than or equal to the given decimal value | number (decimal) | result (dfloat) |
Ldexp | Calculates a number from an exponent and mantissa | mantissa (dfloat) | result (dfloat) |
Llabs | Returns the absolute value of the given integer | number (uint64) | result (int64) |
Ln | Calculates the natural logarithm of an expression in base 'e' | number (dfloat) | result (dfloat) |
Log10 | Returns the log to the base 10 of the given value | number (dfloat) | result (dfloat) |
Max | Returns the greater of the two argument values | number 1 (int32) number 2(int32) | result (int32) |
Min | Returns the lower of the two argument values | number 1 (int32) number 2 (int32) | result (int32) |
Mod | Calculates the modulo (the remainder) of two expressions (dividend, divisor) | dividend (int32) divisor (int32) | result (int32) |
Neg | Negate a number | number (dfloat) | result (dfloat) |
Pwr | Calculates the value of an expression when raised to a specified power (expression, power) | expression (dfloat) power (dfloat) | result (dfloat) |
Rand | Return a psuedo random integer between 0 and 232-1 | - | result (uint32) |
Random | Returns a random number between 0 232-1 | - | result (uint32) |
Sin | Calculates the trigonometric sine of an angle | number (dfloat) | result (dfloat) |
Sinh | Calculates the hyperbolic sine of an expression | number (dfloat) | result (dfloat) |
Sqrt | Calculates the square root of a number | number (dfloat) | result (dfloat) |
Tan | Calculates the trigonometric tangent of an angle | number (dfloat) | result (dfloat) |
Tanh | Calculates the hyperbolic tangent of an expression | number (dfloat) | result (dfloat) |
Null handling functions
The following table lists the functions available in the Null Handling category (square brackets indicate an argument is optional):
Name | Description | Arguments | Output |
---|---|---|---|
IsNotNull | Returns true when an expression does not evaluate to the null value | any | true/false (int8) |
IsNull | Returns true when an expression evaluates to the null value | any | true/false (int8) |
MakeNull | Change an in-band null to out of band null | any (column) | - |
NullToEmpty | Returns an empty string if input column is null, otherwise returns the input column value | input column | input column value or empty string |
NullToZero | Returns zero if input column is null, otherwise returns the input column value | input column | input column value or zero |
NullToValue | Returns specified value if input column is null, otherwise returns the input column value | input column, value | input column value or value |
SetNull | Assign a null value to the target column | - | - |
Number functions
The following table lists the functions available in the Number category (square brackets indicate an argument is optional):
Name | Description | Arguments | Output |
---|---|---|---|
MantissaFromDecimal | Returns the mantissa from the given decimal | number (decimal) | result (dfloat) |
MantissaFromDFloat | Returns the mantissa from the given dfloat | number (dfloat) | result (dfloat) |
Raw functions
The following table lists the functions available in the Raw category (square brackets indicate an argument is optional):
Name | Description | Arguments | Output |
---|---|---|---|
RawLength | Returns the length of a raw string | input string (raw) | Result (int32) |
String functions
The following table lists the functions available in the String category (square brackets indicate an argument is optional):
Name | Description | Arguments | Output |
---|---|---|---|
AlNum | Return whether the given string consists of alphanumeric characters | string (string) | true/false (int8) |
Alpha | Returns 1 if string is purely alphabetic | string (string) | result (int8) |
CompactWhiteSpace | Return the string after reducing all consective whitespace to a single space | string (string) | result (string) |
Compare | Compares two strings for sorting | string1 (string) | result (int8) |
ComparNoCase | Case insensitive comparison of two strings | string1 (string) string2 (string) | result (int8) |
ComparNum | Compare the first n characters of the two strings | string1 (string) string2 (string) | result (int8) |
CompareNumNoCase | Caseless comparison of the first n characters of the two strings | string1 (string) string2 (string) | result (int8) |
Convert | Converts specified characters in a string to designated replacement characters | fromlist (string) | result (string) |
Count | Count number of times a substring occurs in a string | string (string) | result (int32) |
Dcount | Count number of delimited fields in a string | string (string) | result (int32) |
DownCase | Change all uppercase letters in a string to lowercase | string (string) | result (string) |
DQuote | Enclose a string in double quotation marks | string (string) | result (string) |
Field | Return 1 or more delimited substrings | string (string) delimiter (string) | result (string) |
Index | Find starting character position of substring | string (string) substring (string) occurrence (int32) | result (int32) |
Left | Leftmost n characters of string | string (string) | result (string) |
Len | Length of string in characters | string (string) | result (int32) |
Num | Return 1 if string can be converted to a number | string (string) | result (int8) |
PadString | Return the string padded with the optional pad character and optional length | string (string) | result (string) |
Right | Rightmost n characters of string | string (string) | result (string) |
Soundex | Returns a string which identifies a set of words that are (roughly) phonetically alike based on the standard, open algorithm for SOUNDEX evaluation | string (string) | result (string) |
Space | Return a string of N space characters | length (int32) | result (string) |
Squote | Enclose a string in single quotation marks | string (string) | result (string) |
Str | Repeat a string | string (string) | result (string) |
StripWhiteSpace | Return the string after stripping all whitespace from it | string (string) | result (string) |
Trim | Remove all leading and trailing spaces and tabs plus reduce internal occurrences to one | string (string) [stripchar (string)] [options (string)] | result (string) |
TrimB | Remove all trailing spaces and tabs | string (string) | result (string) |
TrimF | Remove all leading spaces and tabs | string (string) | result (string) |
Trim Leading Trailing | Returns a string with leading and trailing whitespace removed | string (string) | result (string) |
Upcase | Change all lowercase letters in a string to uppercase | string (string) | result (string) |
Possible options for the Trim function are:
* L Removes leading occurrences of character.
* T Removes trailing occurrences of character.
* B Removes leading and trailing occurrences of character.
* R Removes leading and trailing occurrences of character, and reduces multiple occurrences to a single occurrence.
* A Removes all occurrences of character.
* F Removes leading spaces and tabs.
* E Removes trailing spaces and tabs.
* D Removes leading and trailing spaces and tabs, and reduces multiple spaces and tabs to single ones.
Vector function
The following function can be used within expressions to access an element in a vector column. The vector index starts at 0.
Name | Description | Arguments | Output |
---|---|---|---|
ElementAt | Accesses an element of a vector | input column index (int) | element of vector |
Type Conversion Functions
The following table lists the functions available in the Type Conversion category (square brackets indicate an argument is optional):
Name | Description | Arguments | Output |
---|---|---|---|
DateToString | Return the string representation of the given date | date | result (string) |
DecimalToDecimal | Returns the given decimal in decimal representation with specified precision and scale | decimal (decimal) [rtype (string)] [packedflag (int8)] | result (decimal) |
DecimalToDFloat | Returns the given decimal in dfloat representation | number (decimal) ["fix_zero"] | result (dfloat) |
DecimalToString | Return the string representation of the given decimal | number (decimal) ["fix_zero"] | result (string) |
DfloatToDecimal | Returns the given dfloat in decimal representation | number (dfloat) [rtype (string)] | result (decimal) |
DfloatToStringNoExp | Returns the given dfloat in its string representation with no exponent, using the specified scale | number (dfloat) scale (string) | result (string) |
IsValid | Return whether the given string is valid for the given type. Valid types are "date", "decimal", "dfloat", "sfloat", "int8", "uint8", "int16", "uint16", "int32", "uint32", "int64", "uint64", "raw", "string", "time", "timestamp". "ustring" | type (string) format (string) | result (int8) |
StringToDate | Returns a date from the given string in the given format | date (string) | date |
StringToDecimal | Returns the given string in decimal representation | string (string) [rtype (string)] | result (decimal) |
StringToRaw | Returns a string in raw representation | string (string) | result (raw) |
StringToTime | Returns a time representation of the given string | string (string) [format (string)] | time |
StringToTimestamp | Returns a timestamp representation of the given string | string (string) [format (string)] | timestamp |
TimestampToDate | Returns a date from the given timestamp | timestamp | date |
TimestampToString | Return the string representation of the given timestamp | timestamp [format (string)] | result (string) |
TimestampToTime | Returns the time from a given timestamp | timestamp | time |
TimeToString | Return the string representation of the given time | time [format (string)] | result (string) |
StringToUstring | Returns a ustring from the given string, optionally using the specified map (otherwise uses project default) | string (string) [,mapname (string)] | result (ustring) |
UstringToString | Returns a string from the given ustring, optionally using the specified map (otherwise uses project default) | string(ustring) | result (string) |
* ceil: Round the source field toward positive infinity. E.g, 1.4 -> 2, -1.6 -> -1.
* floor: Round the source field toward negative infinity. E.g, 1.6 -> 1, -1.4 -> -2.
* round_inf: Round or truncate the source field toward the nearest representable value, breaking ties by rounding positive values toward positive infinity and negative values toward negative infinity. E.g, 1.4 -> 1, 1.5 -> 2, -1.4 -> -1, -1.5 -> -2.
* trunc_zero. Discard any fractional digits to the right of the rightmost fractional digit supported in the destination, regardless of sign. For example, if the destination is an integer, all fractional digits are truncated. If the destination is another decimal with a smaller scale, round or truncate to the scale size of the destination decimal. E.g, 1.6 -> 1, -1.6 -> -1.
The default is trunc_zero.
Format string: Date, Time, and Timestamp functions that take a format string (e.g., timetostring(time, stringformat)) need to have the date format specified. The format strings are described in Date and time formats. Where your dates, times, or timestamps convert to or from ustrings, InfoSphere DataStage will pick this up automatically. In these cases the separators in your format string (for example, `:' or `-') can themselves be Unicode characters.
fix_zero: By default decimal numbers comprising all zeros are treated as invalid. If the string fix_zero is specified as a second argument, then all zero decimal values are regarded as valid.
Type "casting" functions
There is a special class of type conversion function to help you when performing mathematical calculations using numeric fields. For example, if you have a calculation using an output column of type float derived from an input column of type integer in a Parallel Transformer stage the result will be derived as an integer regardless of its float type. If you want a non-integral result for a calculation using integral operands, you can use the following functions (which act in a similar way as casting in C) to cast the integer operands into non-integral operands:
Name | Description | Arguments | Output |
---|---|---|---|
AsDouble | Treat the given number as a double | number (number) | number (double) |
AsFloat | Treat the given number as a float | number (number) | number (float) |
AsInteger | Treat the given number as an integer | number (number) | number (int) |
Utility functions
The following table lists the functions available in the Utility category (square brackets indicate an argument is optional):
Name | Description | Arguments | Output |
---|---|---|---|
GetEnvironment | Returns the value of the given environment variable | environment variable (string) | result (string) |
NextSKChain | Returns the value of the surrogate key column for the next record in the chain, or value for the newest record | value (number) | result (int64) |
NextSurrogateKey | Returns the value of the next surrogate key | None | result (int64) |
PrevSKChain | Returns the value of the surrogate key column for the previous record in the chain, or value for the first record | value (number) | result (int64) |