Quick Tip: Row Multiplication

Sometimes we need to multiply rows based on a column in that row, multiple columns in that row, or by some other factor dependent on the row. In Talend there is a really nice mechanism that allows this to be done quite simply....but it is not very well known about. This QuickTip shows how to use this functionality with a very basic example.

Lets say we have some data which has a numeric column indicating how many times that row should be multiplied.  For example....

1|1y0jujayeB|3
2|cPXkYSxhEG|5
3|Dzb0e8Xw8B|4
4|PAMKnXNjYY|1

.....and the end column indicates how many times that row should be multiplied. So for the above, we should get the below.....

1|1y0jujayeB|3
1|1y0jujayeB|3
1|1y0jujayeB|3
2|cPXkYSxhEG|5
2|cPXkYSxhEG|5
2|cPXkYSxhEG|5
2|cPXkYSxhEG|5
2|cPXkYSxhEG|5
3|Dzb0e8Xw8B|4
3|Dzb0e8Xw8B|4
3|Dzb0e8Xw8B|4
3|Dzb0e8Xw8B|4
4|PAMKnXNjYY|1

How can this be done?

The below screenshot shows you a basic job which enables this functionality.

The Random Data component simply produces some random data in the format "id" (an integer), "someText" (a String), and "repeatRow" (an integer). The first thing we need to do is talk about the tMap component.

1) "tMap_1" (tMap)

Below you can see the config of this component. I will talk about each of the important points which I identified by the coloured circles in the screenshot.

This component has a single lookup which returns an "id". We will talk about this component next, but it must be noted that it will supply the "id" field. It can supply whatever you wish, but I have just chosen to get it to return the "id" supplied by the main row.

The lookup configuration is the important thing here. Notice the red oval around "Reload at each row". This is important and must be set. This causes the lookup to be re-run on every row.  

The blue oval around "Inner join" isn't actually terribly important for variations on this. But I'll be honest, I circled it so felt I had to say something about it :-) 

The orange oval around the two "globalMap Key" fields is very important. When you select "Reload at each row", the "globalMap" section reveals itself. Normally, this will be used to supply a value to a globalMap variable, then for that variable to be used in a SQL query which will be fired. Then the result of that will be passed into the tMap. In this case, we are going to use a bit of Java to generate some data. What we are doing here is supplying the "repeatRow" and "id" values for the row to the globalMap HashMap. This will be used in the "Row Multiplier" tJavaFlex component.

 

2) "Row Multiplier" (tJavaFlex)

This component has 3 sections; Start Code, Main Code and End Code. We will be using this component to perform a loop to be iterated over the number of times supplied by the "repeatRow" value. The 3 code sections are described below....


Start Code

// start part of your Java code

for(int i = 0; i<((Integer)globalMap.get("repeatRow")).intValue(); i++){

This section defines the For loop. This is based on the value from the "repeatRow" column supplied to the globalMap.

Main Code
// here is the main part of the component,
// a piece of code executed in the row
// loop
row2.id = ((Integer)globalMap.get("id")).intValue();

This section simply supplies the "id" to the "id" column (this component has been configured to output a single "id" column). Since this is inside the For loop, it will be repeated for as many times as the loop is fired. Here is where you might want to carry out any logic required in your row multiplication.

End Code
// end of the component, outside/closing the loop
}

The For loop is closed here.

 

Once this has been configured, you can run your job and see that the rows have been multiplied according to your required logic.

 

Type of content: