Wednesday, April 24, 2019

TIBCO BusinessWorks (BW) 5.x Best Practices - Part 1 - Processing Big Data sets

This is Part 1 of a series of Best practices for TIBCO BusinessWorks (BW) 5.x platform. This will be followed with a series of posts that enumerate the best practices to follow to develop and deploy optimized BusinessWorks engines. In order to check if your BW 5.x projects or EAR files adhere to the Best practices and ensure that they are optimized, you can do a code review of your BW 5.x projects in our free online cloud version at BW5 Code Scanner

As an integration platform / ESB, TIBCO BW 5.x is typically used for retrieving large sets of data and processing the data according to business logic. These large objects can consume significant amount of memory and performance degrades in a quadratic fashion as the number of records grow. The lookup time increases as the data tree is traversed from the start element to the current node. The solution to processing large data sets without significant memory downgrade is to process the data in chunks or subsets.
It is recommended to fetch data in subsets for reducing memory usage.
  • For processing a database table with large number of records or a query that returns a large result set, use process in subsets in JDBC Query. In JDBC Query, click on Advanced Tab. Select the checkbox "Process in Subsets". When this is checked, BW processes the result set in small batches instead of processing entire result set at once. When Process in Subsets is enabled, subsetSize input element appears where you can specify size of each batch. Also, the lastSubset output element appears which is automatically set by BW to true when the last batch is being processed. To retrieve subsets, use a Repeat Until True loop group to iterate until the entire result set is processed. To exit the loop when entire result set is consumed, set the condition for the loop as: $JDBCQuery/resultSet/lastSubset = "true" (assuming the name of the activity is JDBCQuery).
  • For processing a flat file with large number of records, like a CSV file, create the required Data Format for the file format and in Parse Data activity, select input type as File. In the configuration tab, check the checkbox for Manually specify Start Record field. Set the noOfRecords input field to the batch size you want for each execution of the loop. Create "Repeat Until True" group around the Parse Data activity, set the condition for the loop to exit when the EOF output field for Parse Data activity is set to true by the BW engine as: string($Parse-Data/Output/EOF) = string(true()) (assuming default name of activity as Parse Data). This should provide a chunk of records in each iteration of the group for processing

When processing large data sets, the following mapping guidelines are recommended to developers.
  • Push complex mapping and list processing functions to the mapper (XSLT engine) for best performance:
  • Use For-each group instead of Iterate group
  • Use Render-xml function instead of Render-XML activity
  • Use mapper activity sparingly only when the mapping result is needed downstream
  • Use local variables while processing large XML trees. At runtime, the mapper activity retrieves data by evaluating XPath expressions and traverses the input tree to access matching data. If the tree is large and the mappings define many lookups of related or identical data, using variables to store that data can improve performance significantly.
  • Use canonical data model and give similar elements same name for easier for-each mapping
  • Use job shared variables for mapping the data used across multiple sub-process instances. Mapping large data objects to the start and end activity of sub-processes creates multiple copies of large objects in memory and degrades performance

For processes that are memory-hungry, enable the memory saving mode of BW. This property can be enabled for specific process instances by setting “EnableMemorySavingMode.<processname>” property to true. The property can be enabled for all process instances by setting the “EnableMemorySavingMode” property to true.

In Summary
When processing large data sets, retrieve data in chunks, use XSLT to process data and use variables to increase performance. Set EnableMemorySavingMode property to true.

No comments:

Post a Comment