Bottom
In its simplest form, IBM InfoSphere DataStage performs data transformation and movement from source systems to target systems in batch and in real time. The data sources might include indexed files, sequential files, relational databases, archives, external data sources, enterprise applications, and message queues.
DataStage manages data that arrives and data that is received on a periodic or scheduled basis. It enables companies to solve large-scale business problems with high-performance processing of massive data volumes. By leveraging the parallel processing capabilities of multiprocessor hardware platforms, DataStage can scale to satisfy the demands of ever-growing data volumes, stringent real-time requirements, and ever-shrinking batch windows.
Leveraging the combined suite of IBM Information Server, DataStage can simplify the development of authoritative master data by showing where and how information is stored across source systems. DataStage can also consolidate disparate data into a single, reliable record, cleanses and standardizes information, removes duplicates, and links records together across systems. This master record can be loaded into operational data stores, data warehouses, or master data applications such as IBM MDM using IBM InfoSphere DataStage.
IBM InfoSphere DataStage delivers four core capabilities:
* Connectivity to a wide range of mainframe, legacy, and enterprise applications, databases, file formats, and external information sources.
* Prebuilt library of more than 300 functions including data validation rules and very complex transformations.
* Maximum throughput using a parallel, high-performance processing architecture.
* Enterprise-class capabilities for development, deployment, maintenance, and high-availability. It leverages metadata for analysis and maintenance. It also operates in batch, real time, or as a Web service.
IBM InfoSphere DataStage enables an integral part of the information integration process.
Functions used in IBM InfoSphere DataStage and QualityStage
The functions that are valid in IBM® InfoSphere® DataStage® and QualityStage are also valid in IBM Information Server
FastTrack, The following is a list of functions that are generally used when defining a column derivation in a Transformer stage.
Top
| Next
Date and time functions
Bottom
The following table lists the functions that are available in the Date and Time category (Square brackets indicate an
argument is optional):
Name | Description | Arguments | Output |
DateFromDaysSince | Returns a date by adding an integer to a baseline date | number (int32) [baseline date] | date |
DateFromJulianDay | Returns a date from the given julian date | juliandate (uint32) | date |
DaysSinceFromDate | Returns the number of days from source date to the given date | source_date
given_date
| days since (int32) |
HoursFromTime | Returns the hour portion of a time | time | hours (int8) |
JulianDayFromDate | Returns julian day from the given date | date | julian date (int32) |
MicroSecondsFromTime | Returns the microsecond portion from a time | time | microseconds (int32) |
MinutesFromTime | Returns the minute portion from a time | time | minutes (int8) |
MonthDayFromDate | Returns the day of the month given the date | date | day (int8) |
MonthFromDate | Returns the month number given the date | date | month number (int8) |
NextWeekdayFromDate | Returns the date of the specified day of the week soonest after the
source date | source date
day of week (string)
| date |
PreviousWeekdayFromDate | Returns the date of the specified day of the week most recent
before
the source date | source date
day of week (string)
| date |
SecondsFromTime | Returns the second portion from a time | time | seconds (dfloat) |
SecondsSinceFromTimestamp | Returns the number of seconds between two timestamps | timestamp base timestamp | seconds (dfloat) |
TimeDate | Returns the system time and date as a formatted string | - | system time and date (string) |
TimeFromMidnightSeconds | Returns the time given the number of seconds since midnight | seconds (dfloat) | time |
TimestampFromDateTime | Returns a timestamp form the given date and time | date time | timestamp |
TimestampFromSecondsSince | Returns the timestamp from the number of seconds from the base
timestamp | seconds (dfloat)
[base timestamp]
| timestamp |
TimestampFromTimet | Returns a timestamp from the given unix time_t value | timet (int32) | timestamp |
TimetFromTimestamp | Returns a unix time_t value from the given timestamp | timestamp | timet (int32) |
WeekdayFromDate | Returns the day number of the week from the given date. Origin day
optionally specifies the day regarded as the first in the week and is Sunday
by default | date [origin day] | day (int8) |
YeardayFromDate | Returns the day number in the year from the given date | date | day (int16) |
YearFromDate | Returns the year from the given date | date | year (int16) |
YearweekFromDate | Returns the week number in the year from the given date | date | week (int16) |
Date, Time, and Timestamp functions that specify dates, times, or timestamps in the argument use strings with specific
formats:
For a date, the format is %yyyy-%mm-%dd
For a time, the format is %hh:%nn:%ss, or, if extended to include microseconds, %hh:%nn:%ss.x where x gives the number of decimal places seconds is given to.
For a timestamp the format is %yyyy-%mm-%dd %hh:%nn:%ss, or, if extended to include microseconds, %yyyy-%mm-%dd %hh:%nn:%ss.x
where x gives the number of decimal places seconds is given to.
This applies to the arguments date, baseline date, given date, time, timestamp, and base timestamp.
Functions that have days of week in the argument take a string specifying the day of the week, this applies to day of week and origin day.
Top
| Next
Logical Functions
Previous | Bottom
The following table lists the functions available in the Logical category (square brackets indicate an argument is optional):
Name | Description | Arguments | Output |
Not | Returns the complement of the logical value of an expression | expression | Complement (int8) |
BitAnd | Returns the bitwise AND of the two integer arguments | number 1 (uint64) number 2 (uint64) | number (uint64) |
BitOr | Returns the bitwise OR of the two integer arguments | number 1 (uint64) number 2 (uint64) | number (uint64) |
BitXOr | Returns the bitwise Exclusive OR of the two integer arguments | number 1 (uint64) number 2 (uint64) | number (uint64) |
BitExpand | Returns a string containing the binary representation in "1"s and "0"s of the given integer | number (uint64) | string |
BitCompress | Returns the integer made from the string argument, which contains a
binary representation of "1"s and "0"s. | number (string) | number (uint64) |
SetBit | Returns an integer with specific bits set to a specific state, where
origfield is the input value to perform the action on,
bitlist is a string containing a list of comma separated bit numbers to set the state of, and bitstate is either 1 or 0, indicating which state to set those bits.
| origfield (uint64) bitlist (string)
bitstate (uint8)
| number (uint64) |
Top
| Next
Mathematical Functions
Previous | Bottom
The following table lists the functions available in the Mathematical category (square brackets indicate an argument is optional):
Name | Description | Arguments | Output |
Abs | Absolute value of any numeric expression | number (int32) | result (dfloat) |
Acos | Calculates the trigonometric arc-cosine of an expression | number (dfloat) | result (dfloat) |
Asin | Calculates the trigonometric arc-sine of an expression | number (dfloat) | result (dfloat) |
Atan | Calculates the trigonometric arc-tangent of
an expression | number (dfloat) | result
(dfloat) |
Ceil | Calculates the smallest dfloat value
greater than or equal to the given
decimal value | number (decimal) | result
(dfloat) |
Cos | Calculates the trigonometric cosine of an
expression | number (dfloat) | result (dfloat) |
Cosh | Calculates the hyperbolic cosine of an
expression | number (dfloat) | result (dfloat) |
Div | Outputs the whole part of the real division
of two real numbers (dividend,
divisor) | dividend (dfloat) divisor (dfloat) | result (dfloat) |
Exp | Calculates the result of base 'e' raised to
the power designated by
the value of the expression | number (dfloat) | result (dfloat) |
Fabs | Calculates the absolute value of the given
value | number (dfloat) | result (dfloat) |
Floor | Calculates the largest dfloat value less
than or equal to the given
decimal value | number (decimal) | result
(dfloat) |
Ldexp | Calculates a number from an exponent and
mantissa | mantissa (dfloat)
exponent (int32)
| result (dfloat) |
Llabs | Returns the absolute value of the given
integer | number (uint64) | result (int64) |
Ln | Calculates the natural logarithm of an
expression in base 'e' | number (dfloat) | result
(dfloat) |
Log10 | Returns the log to the base 10 of the
given value | number (dfloat) | result
(dfloat) |
Max | Returns the greater of the two argument
values | number 1 (int32) number 2(int32) | result
(int32) |
Min | Returns the lower of the two argument
values | number 1 (int32) number 2 (int32) | result
(int32) |
Mod | Calculates the modulo (the remainder) of two
expressions (dividend,
divisor) | dividend (int32) divisor (int32) | result (int32) |
Neg | Negate a number | number (dfloat) | result (dfloat) |
Pwr | Calculates the value of an expression when
raised to a specified power
(expression, power) | expression (dfloat) power (dfloat) | result (dfloat) |
Rand | Return a psuedo random integer between 0
and 232-1 | - | result (uint32) |
Random | Returns a random number between 0
232-1 | - | result (uint32) |
Sin | Calculates the trigonometric sine of an
angle | number (dfloat) | result (dfloat) |
Sinh | Calculates the hyperbolic sine of an
expression | number (dfloat) | result (dfloat) |
Sqrt | Calculates the square root of a number | number (dfloat) | result (dfloat) |
Tan | Calculates the trigonometric tangent of an
angle | number (dfloat) | result (dfloat) |
Tanh | Calculates the hyperbolic tangent of an
expression | number (dfloat) | result (dfloat) |
Top
| Next
Null handling functions
Previous | Bottom
The following table lists the functions available in the Null Handling category (square brackets indicate an argument is
optional):
Name | Description | Arguments | Output |
IsNotNull | Returns true when an expression does
not evaluate to the null value | any | true/false
(int8) |
IsNull | Returns true when an expression evaluates
to the null value | any | true/false (int8) |
MakeNull | Change an in-band null to out of band
null | any (column)
string (string)
| - |
NullToEmpty | Returns an empty string if input
column is null, otherwise returns
the input column value | input column | input
column value or empty string |
NullToZero | Returns zero if input column is null,
otherwise returns the input column
value | input column | input column value or
zero |
NullToValue | Returns specified value if
input column is null, otherwise returns
the input column value | input column, value | input column value or value |
SetNull | Assign a null value to the target
column | - | - |
Hint: true = 1
false = 0
Top
| Next
Number functions
Previous | Bottom
The following table lists the functions available in the Number category (square brackets indicate an argument is optional):
Name | Description | Arguments | Output |
MantissaFromDecimal | Returns the mantissa from
the given decimal | number (decimal) | result
(dfloat) |
MantissaFromDFloat | Returns the mantissa from the
given dfloat | number (dfloat) | result
(dfloat) |
Top
| Next
Raw functions
Previous | Bottom
The following table lists the functions available in the Raw category (square brackets indicate an argument is optional):
Name | Description | Arguments | Output |
RawLength | Returns the length of a raw
string | input string (raw) | Result (int32) |
Top
| Next
String functions
Previous | Bottom
The following table lists the functions available in the String category (square brackets indicate an argument is optional):
Name | Description | Arguments | Output |
AlNum | Return whether the given string consists
of alphanumeric characters | string (string) | true/false (int8) |
Alpha | Returns 1 if string is purely
alphabetic | string (string) | result (int8) |
CompactWhiteSpace | Return the string after
reducing all consective whitespace to a single
space | string (string) | result (string) |
Compare | Compares two strings for sorting | string1 (string)
string2 (string)
[justification
(L or R)]
| result (int8) |
ComparNoCase | Case insensitive comparison of two
strings | string1 (string) string2 (string) | result (int8) |
ComparNum | Compare the first n
characters of the two strings | string1 (string) string2 (string)
length (int16)
| result (int8) |
CompareNumNoCase | Caseless comparison of the
first n characters of the two strings | string1 (string) string2 (string)
length (int16)
| result (int8) |
Convert | Converts specified characters in a
string to designated replacement
characters | fromlist (string)
tolist (string)
expression
(string)
| result (string) |
Count | Count number of times a substring occurs
in a string | string (string)
substring (string)
| result (int32) |
Dcount | Count number of delimited fields in a
string | string (string)
delimiter (string)
| result (int32) |
DownCase | Change all uppercase letters in a
string to lowercase | string (string) | result
(string) |
DQuote | Enclose a string in double quotation
marks | string (string) | result (string) |
Field | Return 1 or more delimited substrings | string (string) delimiter (string)
occurrence (int32)
[number (int32)]
| result (string) |
Index | Find starting character position of
substring | string (string) substring (string) occurrence (int32) | result (int32) |
Left | Leftmost n characters of
string | string (string)
number (int32)
| result (string) |
Len | Length of string in characters | string (string) | result (int32) |
Num | Return 1 if string can be converted to a
number | string (string) | result (int8) |
PadString | Return the string padded with the
optional pad character and optional
length | string (string)
padlength (int32)
| result (string) |
Right | Rightmost n characters of
string | string (string)
number (int32)
| result (string) |
Soundex | Returns a string which identifies a set
of words that are (roughly)
phonetically alike based on the standard, open algorithm for SOUNDEX evaluation | string
(string) | result (string) |
Space | Return a string of N space
characters | length (int32) | result (string) |
Squote | Enclose a string in single quotation
marks | string (string) | result (string) |
Str | Repeat a string | string (string)
repeats (int32)
| result (string) |
StripWhiteSpace | Return the string after
stripping all whitespace from it | string (string) | result (string) |
Trim | Remove all leading and trailing spaces and
tabs plus reduce internal
occurrences to one | string (string) [stripchar (string)] [options (string)] | result (string) |
TrimB | Remove all trailing spaces and tabs | string (string) | result (string) |
TrimF | Remove all leading spaces and tabs | string (string) | result (string) |
Trim Leading Trailing | Returns a string with
leading and trailing whitespace removed | string (string) | result (string) |
Upcase | Change all lowercase letters in a string
to uppercase | string (string) | result
(string) |
Hint: true = 1
false = 0
Possible options for the Trim function are:
* L Removes leading occurrences of character.
* T Removes trailing occurrences of character.
* B Removes leading and trailing occurrences of character.
* R Removes leading and trailing occurrences of character, and reduces multiple occurrences to a single occurrence.
* A Removes all occurrences of character.
* F Removes leading spaces and tabs.
* E Removes trailing spaces and tabs.
* D Removes leading and trailing spaces and tabs, and reduces multiple spaces and tabs to single ones.
Top
| Next
Vector function
Previous | Bottom
The following function can be used within expressions to access an element in a vector column. The vector index starts at 0.
Name | Description | Arguments | Output |
ElementAt | Accesses an element of a vector | input column index (int) | element of vector |
This can be used as part of, or the whole of an expression. For example, an expression to add 1 to the third element of an
vector input column 'InLink.col1' would be:
ElementAt(InLink.col1, 2) + 1
Top
| Next
Type Conversion Functions
Previous | Bottom
The following table lists the functions available in the Type Conversion category (square brackets indicate an argument is
optional):
Name | Description | Arguments | Output |
DateToString | Return the string representation of
the given date | date
[format (string)]
| result (string) |
DecimalToDecimal | Returns the given decimal in
decimal representation with specified
precision and scale | decimal (decimal) [rtype (string)] [packedflag (int8)] | result (decimal) |
DecimalToDFloat | Returns the given decimal in
dfloat representation | number (decimal) ["fix_zero"] | result (dfloat) |
DecimalToString | Return the string representation
of the given decimal | number (decimal) ["fix_zero"] | result (string) |
DfloatToDecimal | Returns the given dfloat in
decimal representation | number (dfloat) [rtype (string)] | result (decimal) |
DfloatToStringNoExp | Returns the given dfloat in
its string representation with no exponent,
using the specified scale | number (dfloat) scale (string) | result (string) |
IsValid | Return whether the given string is valid
for the given type. Valid
types are "date", "decimal", "dfloat", "sfloat", "int8", "uint8", "int16",
"uint16", "int32", "uint32", "int64", "uint64", "raw", "string", "time", "timestamp".
"ustring" | type (string) format (string) | result
(int8) |
StringToDate | Returns a date from the given
string in the given format | date (string)
format (string)
| date |
StringToDecimal | Returns the given string in
decimal representation | string (string) [rtype (string)] | result (decimal) |
StringToRaw | Returns a string in raw
representation | string (string) | result
(raw) |
StringToTime | Returns a time representation of
the given string | string (string) [format (string)] | time |
StringToTimestamp | Returns a timestamp
representation of the given string | string (string) [format (string)] | timestamp |
TimestampToDate | Returns a date from the given
timestamp | timestamp | date |
TimestampToString | Return the string
representation of the given timestamp | timestamp [format (string)] | result (string) |
TimestampToTime | Returns the time from a given
timestamp | timestamp | time |
TimeToString | Return the string representation of
the given time | time [format (string)] | result
(string) |
StringToUstring | Returns a ustring from the given
string, optionally using the specified
map (otherwise uses project default) | string (string) [,mapname (string)] | result (ustring) |
UstringToString | Returns a string from the given
ustring, optionally using the specified
map (otherwise uses project default) | string(ustring)
[,mapname (string)]
| result (string) |
Rtype: The rtype argument is a string, and should contain one of the following:
*
ceil: Round the source field toward positive infinity. E.g, 1.4 -> 2, -1.6 -> -1.
*
floor: Round the source field toward negative infinity. E.g, 1.6 -> 1, -1.4 -> -2.
*
round_inf: Round or truncate the source field toward the nearest representable value, breaking ties by rounding
positive values toward positive infinity and negative values toward negative infinity. E.g, 1.4 -> 1, 1.5 -> 2, -1.4 -> -1,
-1.5 -> -2.
*
trunc_zero. Discard any fractional digits to the right of the rightmost fractional digit supported in the
destination, regardless of sign. For example, if the destination is an integer, all fractional digits are truncated. If the
destination is another decimal with a smaller scale, round or truncate to the scale size of the destination decimal. E.g, 1.6
-> 1, -1.6 -> -1.
The default is trunc_zero.
Format string: Date, Time, and Timestamp functions that take a format string (e.g., timetostring(time, stringformat))
need to have the date format specified. The format strings are described in Date and time formats.
Where your dates, times, or timestamps convert to or from ustrings, InfoSphere DataStage will pick this up automatically. In
these cases the separators in your format string (for example, `:' or `-') can themselves be Unicode characters.
fix_zero: By default decimal numbers comprising all zeros are treated as invalid. If the string fix_zero is specified
as a second argument, then all zero decimal values are regarded as valid.
Top
| Next
Type "casting" functions
Previous | Bottom
There is a special class of type conversion function to help you when performing mathematical calculations using numeric
fields. For example, if you have a calculation using an output column of type float derived from an input column of type
integer in a Parallel Transformer stage the result will be derived as an integer regardless of its float type. If you want a
non-integral result for a calculation using integral operands, you can use the following functions (which act in a similar
way as casting in C) to cast the integer operands into non-integral operands:
Name | Description | Arguments | Output |
AsDouble | Treat the given number as a double | number (number) | number (double) |
AsFloat | Treat the given number as a float | number (number) | number (float) |
AsInteger | Treat the given number as an
integer | number (number) | number (int) |
Top
Utility functions
Previous | Bottom
The following table lists the functions available in the Utility category (square brackets indicate an argument is optional):
Name | Description | Arguments | Output |
GetEnvironment | Returns the value of the given
environment variable | environment variable (string) | result (string) |
NextSKChain | Returns the value of the surrogate
key column for the
next record in the chain, or value for the newest
record | value (number) | result (int64) |
NextSurrogateKey | Returns the value of the next
surrogate key | None | result (int64) |
PrevSKChain | Returns the value of the surrogate
key column for the
previous record in the chain, or value for the first
record | value (number) | result (int64) |
Top