Using Neural Networks with Talend DI and ESB

Many times during Data Integration projects we have situations where we have to analyse the data in order to come up with acceptance criteria for it. In a lot of cases, this is pretty straight forward and can be easily written into simple rule based logic. But in some situations, it is not so cut and dry. In these situations a lot of people will generate rule of thumb logic which will isolate certain rows to be double checked by a human. This works. It is time consuming and requires human intervention, but it works. However, in a lot of those situations we can use Neural Networks to do that job for us. 

In this tutorial I will be demonstrating how to use a Multilayer Perceptron Neural Network (  to learn Tic Tac Toe end game states. I have chosen this as it is an easy game to understand and the data set to learn is relatively small. I used a data set found here in this example. In order to implement the Neural Network, I am using a Java API from Neuroph ( Neuroph is a lightweight Neural Network framework which allows you to make use of this powerful machine learning technique quickly and easily. I won't be going into too much detail explaining Multilayer Perceptrons in this tutorial, only explaining where it is necessary to understand this tutorial. For information on Neural Networks in general, I recommend exploring the Neuroph site where there are tutorials using their Neuroph Studio. 

So, lets start. First of all I will talk about the training data.

Training Data

For the training data for this tutorial I have made use of data provided by the University of California, Irvine's Center for Machine Learning and Intelligent Systems. You can find it here.  This data set holds all of the end game scenarios for the situation where X starts and X is the focus. For example, if X wins the result is POSITIVE, if X loses the result is NEGATIVE. Blank fields are represented by a "b". For example....

x,x,o,x,x,o,o,b,o,negativ e

The first thing we have to do before even thinking about our Neural Network is to make our data suitable for a Neural Network. For Neural Networks we need to standardise our data. In this situation it is reasonably simple since we only have a choice of up to 3 alterntives for each value. However it can be a lot more complicated. Take a look here for a good explanation of this with examples. For this tutorial I chose to convert this data as follows....

b = 0
o = -1
x = 1
positive = 1 
negative = 0


TicTacToeUtils Routine

Since the Neural Network will be used with data in the unconverted format, I have built the logic for converting the data into a Talend Routine. This is used by both the job training the Neural Network and the service using the trained Neural Network. This routine is shown below, it is also included with the job and service at the bottom of this tutorial.....

package routines;

public class TicTacToeUtils {

     * Translates String values to int values to suit Neural Network requirements
     * @param data - A String value to be changed to an int
     * @return -  The corresponding int value
    public static int translateStringValueToNumber(String data) {

        int returnVal = -9999;

        data = data.trim();

        if (data.compareToIgnoreCase("X") == 0) {
            returnVal = 1;
        } else if (data.compareToIgnoreCase("O") == 0) {
            returnVal = -1;
        } else if (data.compareToIgnoreCase("B") == 0) {
            returnVal = 0;
        } else if (data.compareToIgnoreCase("POSITIVE") == 0) {
            returnVal = 1;
        } else if (data.compareToIgnoreCase("NEGATIVE") == 0) {
            returnVal = 0;

        return returnVal;

     * Translate from an int value to a String TicTacToe value "X", "O", "B" (blank)
     * @param data - an int value
     * @return - The corresponding String value
    public static String translateNumberValueToString(int data) {

        String returnVal = "";

        if (data == 1) {
            returnVal = "X";
        } else if (data == -1) {
            returnVal = "O";
        } else if (data == 0) {
            returnVal = "B";
        return returnVal;

     * Translate the result value into a String value representing the result of the TicTacToe
     * game from player X's perspective.
     * @param data - A the double response from the Neural Network
     * @return - The String result
    public static String translateResultValueToString(double data) {

        String returnVal = "";

        long tmpData = Math.round(data);

        if (tmpData == 1) {
            returnVal = "POSITIVE";
        } else if (tmpData == 0) {
            returnVal = "NEGATIVE";
        return returnVal;

     * A method for retrieving a section of a String according to its position. Used
     * to extract TicTacToe board data from value supplied to REST service
     * @param data - The complete TicTacToe board in a String
     * @param position - an int representing the section of the String data to be returned
     * @return - A String section of the String data supplie
    public static String getStringAtPosition(String data, int position){
        String[] dataArray = data.split(",");
        String returnVal = "";
            returnVal = dataArray[position].trim();
        return returnVal;


NeuralNetworkUtils Routine

In order to use the Neuroph API in a Talend job, I have built some methods to simplify the process. This is by no means the "perfect solution" for all Talend jobs, but it suits the requirements for this one.  The routine I put together is shown below, it is also included with the job and service at the bottom of this tutorial.....

package routines;

import java.util.ArrayList;
import java.util.Arrays;
import org.neuroph.core.NeuralNetwork;
import org.neuroph.nnet.MultiLayerPerceptron;
import org.neuroph.nnet.learning.BackPropagation;
import org.neuroph.nnet.learning.MomentumBackpropagation;
import org.neuroph.util.NeuronProperties;
import org.neuroph.util.TransferFunctionType;

 * A class making use of the Neuroph API ( The methods here have been
 * written to demonstrate how this API can be used with Talend to enable Neural Network functionality in a Talend job or
 * Service. 
public class NeuralNetworkUtils {

    //Constants for use with TransferFunctionType - currently only SIGMOID, but can be extended
    public static final Enum SIGMOID = TransferFunctionType.SIGMOID;
    //Private Static variables shared by the Static methods
    private static DataSet trainingSet;
    private static NeuralNetwork loadedMlPerceptron;
    private static MultiLayerPerceptron myMlPerceptron;
    private static int numOfIterations;
     * Returns the number of iterations that took place training the
     * Neural Network
     * @return - an int representing the number of iterations
    public static int getNumOfIterations() {
        return numOfIterations;
     * Creates a new training data set
     * @param dataColumns - an int representing the number of input columns
     * @param resultColumns - an int representing the number of expected result columns
    public static void createTrainingSet(int dataColumns, int resultColumns) {
        trainingSet = new DataSet(dataColumns, resultColumns);

     * Adds data to the training data set created using "createTrainingSet" method
     * @param dataColumns - a double array containing one row of input data
     * @param resultColumns - a double array containing one row of expected result data
    public static void addTrainingData(double[] dataColumns,
            double[] resultColumns) {
        trainingSet.addRow(dataColumns, resultColumns);

     * A method which creates a Multi-layer Perceptron Neural Network using backpropogation with momentum
     * For a brief explanation of this see and
     * @param learnRate - a double which sets the learning rate for the network (0<?<1)
     * @param momentum - a double which sets the momentum for the network (0<m<1) - Explained here (
     * @param maxError - a double representing the maximum error permitted for the network to be considered trained
     * @param maxIterations - an int representing the max number of iterations while training 
     * @param transferFunctionType - Set to one of the constants (see constants)
     * @param neuronsInLayers - The number of neurons in layers as int values separated by "," (9,20,1)
    public static void trainMultiLayerPerceptronWithMomentumBackProp(
            double learnRate, double momentum, double maxError,
            int maxIterations, Enum transferFunctionType,
            int... neuronsInLayers) {

        //Reset numOfIterations variable
        numOfIterations = 0;

        //Set the NeuronProperties
        NeuronProperties neuronProperties = new NeuronProperties();
        neuronProperties.setProperty("useBias", true);
        neuronProperties.setProperty("transferFunction", transferFunctionType);

        //Create the neuron layers
        ArrayList<Integer> neuronsInLayersVector = new ArrayList<>();
        for (int i = 0; i < neuronsInLayers.length; i++) {

        // create multi layer perceptron
        myMlPerceptron = new MultiLayerPerceptron(neuronsInLayersVector,

        // Set learning rules
        MomentumBackpropagation mbp = new MomentumBackpropagation();

        //Learning event listener to keep track of iterations
        mbp.addListener(new LearningEventListener() {

            public void handleLearningEvent(LearningEvent arg0) {
                // TODO Auto-generated method stub

                BackPropagation bp = ((org.neuroph.nnet.learning.BackPropagation) arg0
                numOfIterations = bp.getCurrentIteration();



        // learn using the training set
        myMlPerceptron.learn(trainingSet, mbp);

        // test neural network
        testNeuralNetwork(myMlPerceptron, trainingSet);

        //Used for outputting neuron configuration
        String neurons = "";
        for(int x=0; x<neuronsInLayers.length;x++){
            neurons = neurons+neuronsInLayers[x]+",";
        neurons = neurons.substring(0, neurons.length()-1);
        //Print the training parameters and result
        System.out.println("LearnRate = " + learnRate + "| Momentum = "
                + momentum + "|Neurons = " + neurons+ "| Iterations = " + numOfIterations);

     * Save the neural network
     * @param savePath - A String path
    public static void saveMultiLayerPerceptron( String savePath){
        // save trained neural network;
     * Load apreviously trained neural network
     * @param path - A String path
    public static void loadNeuralNetwork(String path) {
        // load saved neural network
        loadedMlPerceptron = NeuralNetwork.createFromFile(path);

     * A method to use the trained neural network to calculate a result with data supplied in the format of a double array.
     * @param data - A double array containing the data to calculate
     * @return
    public static double[] calcData(double[] data) {
        return loadedMlPerceptron.getOutput();

     * A method to test the result of training the neural network using a dataset
     * @param nnet - the NeuralNetwork object
     * @param testSet - the DataSet object
    private static void testNeuralNetwork(NeuralNetwork nnet, DataSet testSet) {

        for (DataSetRow dataRow : testSet.getRows()) {
            double[] networkOutput = nnet.getOutput();

            System.out.print("Input: " + Arrays.toString(dataRow.getInput()));
            System.out.print(" Expected Output: " + Arrays.toString(dataRow.getDesiredOutput()));
            System.out.println(" Output: " + Arrays.toString(networkOutput));



Since this routine makes use of third party APIs, we need to link the related Jars to the Talend routine. The API can be downloaded from

To link the Jars to the Talend Routine do the following....

1) Right click on the routine and select "Edit Routine Libraries"
2) Click "New"
3) Select "Browse a library file"
4) Click "Browse" and search for the required Jars

For this routine, the required Jars are...


The TrainNeuralNetwork ForTicTacToe Job

This job is used to train the Neural Network. It is a pretty straight forward Talend job and can be seen below....

There are two tLogRow components which are deactivated in the screenshot above. It is sometimes quite useful to add these and deactivate them so that you don't have to make major changes to your job in order to simply debug what goes in and comes out of a component. I use them a lot with tMap and tXMLMap components.

Context Variables

For this job I only used two context variables; 1 for the training set file and one for the serialized neural network object. These can be seen below. If you download this job you will need to change these to suit your system.

Now I will explain each component in this job.

1) "Data" (tFileInputDelimited)

This component is used to read the data file (downloaded from here). You can see the configuration of the component below....

2) "Convert to suitable format" (tMap)

This component is used to simply convert the String input type of the column data to an Integer type. This can be seen below...

In order to carry out the conversion of the data, we are using the "translateStringValueToNumber" method from the TicTacToeUtils routine that is show above. The code used is shown below. It is exactly the same for each column, with just a change in the column name supplied.



3) "Train Network" (tJavaFlex)

This component is where the magic happens. Since it is a tJavaFlex and only has 3 Java sections (Start Code, Main Code and End Code) I will not post a screenshot here. Instead I will go through each of the Java sections and explain what is happening.

Start Code

Below is the code in the Start Code section.

//Create a training set object
routines.NeuralNetworkUtils.createTrainingSet(9, 1);

Here we are creating a training set. This an object for storing the training data which is made up of 9 input columns and 1 result column. The configuration of the training set depends on the data you will be working with. In this TicTacToe tutorial we have 9 squares that make up the 3x3 board state and 1 result column which returns whether a positive or negative result has been obtained by the X player. 
The Start Code section is only fired once at the beginning when the component is initialised.

Main Code

Below is the code in the Main Code section.

//Add data to the training set object
double[] inputData = new double[9];
double[] resultData = new double[1];

inputData[0] = row10.a1;
inputData[1] = row10.a2;
inputData[2] = row10.a3;
inputData[3] = row10.b1;
inputData[4] = row10.b2;
inputData[5] = row10.b3;
inputData[6] = row10.c1;
inputData[7] = row10.c2;
inputData[8] = row10.c3;

resultData[0] = row10.result;


Here we are creating two double arrays. The inputData array is made up of 9 elements (1 for each of the squares in a TicTacToe board) and the resultData is made up of 1 element. This is then added to the training set using the "addTrainingData" method. The Main Code section is fired for every row passed to the component. 

End Code

Below is the code in the End Code section.

//Create the Neural Network
routines.NeuralNetworkUtils.trainMultiLayerPerceptronWithMomentumBackProp(0.5, 0.7, 0.000001, 1000, routines.NeuralNetworkUtils.SIGMOID, 9,26,1);

//Save trained Neural Network - The filename and path may need changing in your environment

Here we use the "trainMultiLayerPerceptronWithMomentumBackProp" method to create a Neural Network and initiate the training. The important thing here are the parameters that have been used. I will explain those below....

learnRate0.5The learning rate applies a greater or lesser adjustment to the old weight based on the new result. The lower the value, the slower the learning that takes place. However, the greater the number the more likely that if there is a great variance in the input data, that the wrong thing will be learnt. This value needs to be tweaked until you hit the sweet spot. For this data I have found that 0.5 is a good value.

The momentum simply adds a fraction of the previous weight update to the current one. The reason for this is that sometimes the functions being calculated are not smoothly moving in a constant direction or gradient. Imagine a ball rolling down a hill. During its descent, it might hit the occassional bump that might hinder its progress. In our ball rolling down a hill example, momentum would allow it to continue rolling down the hill by using its momentum to ride over the bump. Both learning rate and moment are explained quite well here.

maxError0.000001The max error is the maximum total net error between the actual and desired outputs we will allow over a training iteration, before the network is considered trained. Since this data should be easily trained, I have set this to quite a low level of tolerance for errors. Usually this value will be much higher.
maxIterations1000The total number of iterations before we give up training. This is low compared to other environments you might wish to train.
transferFunctionTyperoutines.NeuralNetwrkUtils.SIGMOIDTransfer function choice is a big question in Neural Networks. Without going into any detail, the choice here was somewhat arbitrary for this problem. For your Neural Networks you will want to experiment and research the function you use. However, for simple problems SIGMOID is a reasonable one to start with.
neuronsInLayers9,26,1The number of neurons at each level in the network. In this Neural Network I tried a few combinations and found that 26 hidden neurons worked best. The input neurons (9) are dictated by the number of input columns and the output neurons (1) is dictated by the expected result.

The last things that is done in the End Code section is to save the Neural Network to a file. Once trained (so long as your data doesn't change all that much) the Neural Network is able to be saved and used in jobs/services making use of the same sort of data.


The TicTacToeStateScore Service

To show how to use the trained Neural Network I decided to use a REST Service example. I could have used a DI job, but felt that a service might open up some ideas as to how Neural Networks can be used in realtime environments as well as batch. Also, a REST Service is pretty simple and quick to show this working. The Service can be seen below....

Context Variables

Below are the context variables created for this Service. In this Service we are using just 1 for the path to the Neural Network file.

1) "tRESTRequest_1" (tRESTRequest)

This component is where we configure the REST Service. The screenshot below shows how this has been configured....

We are using the "GET" verb and using a relative path for the endpoint. When you run this through the Studio it will use the port that is specified for your REST Service testing. When using it in Apache Karaf it will use the default of the Karaf. 

In this example we make use of a REST Service Query Parameter. This is configured in Talend in a way which is not immediately obvious. The configuration for this is shown in the screenshot below....

First we open the Output Flow schema tool by clicking on the button circled in red.
Once the window appears we configure a column called "state" as a String and add "query" to the comment box. This is important. If this is not done, you will not be able to use it as a query parameter. Now that this is set, we can call this Service with a variation on the following URL.....

http://{ip address}:{port}/statescore?state=X,O,O,O,X,X,X,O,X


2) "Get data from state" (tMap)

This component is used to simply to extract each of the 9 state positions from the "state" query parameter that is supplied in the URL, and output them to the next component as individual Strings. This can be seen below....

To extract the values we are using a method in the TicTacToeUtils routine called "getStringAtPosition". This extracts the section of the String indicated by the second parameter which is used to identify position. The use of this method can be seen below....



3) "tJavaFlex_1" (tJavaFlex)

Like the last tJavaFlex that was used, this is where the magic happens. Also like the last one, I will not post screenshots of this, I will simply go through each of the code sections. In this tJavaFlex we only use the Start Code and Main Code sections.

Start Code

This is what is used in the Start Code section.

//Load Neural Network

In this section we simply load the Neural Network that we want to use. This was the Neural Network trained in the last job.

Main Code

This is what is used in the Main Code section.

//Convert state String values to numbers suitable for a Neural Network
double[] inputData = new double[9];

String a1 = out1.a1;
String a2 = out1.a2;
String a3 = out1.a3;
String b1 = out1.b1;
String b2 = out1.b2;
String b3 = out1.b3;
String c1 = out1.c1;
String c2 = out1.c2;
String c3 = out1.c3;

inputData[0] = routines.TicTacToeUtils.translateStringValueToNumber(a1);
inputData[1] = routines.TicTacToeUtils.translateStringValueToNumber(a2);
inputData[2] = routines.TicTacToeUtils.translateStringValueToNumber(a3);
inputData[3] = routines.TicTacToeUtils.translateStringValueToNumber(b1);
inputData[4] = routines.TicTacToeUtils.translateStringValueToNumber(b2);
inputData[5] = routines.TicTacToeUtils.translateStringValueToNumber(b3);
inputData[6] = routines.TicTacToeUtils.translateStringValueToNumber(c1);
inputData[7] = routines.TicTacToeUtils.translateStringValueToNumber(c2);
inputData[8] = routines.TicTacToeUtils.translateStringValueToNumber(c3);

//Calculate result using the previously trained Neural Network
double[] output = routines.NeuralNetworkUtils.calcData(inputData);

System.out.println("Actual Result:"+ routines.TicTacToeUtils.translateResultValueToString(output[0]));

//Format state for Sys out
String tictactoe = a1+","+a2+","+a3+"\n"+b1+","+b2+","+b3+"\n"+c1+","+c2+","+c3+"\n";


//Pass result and board state to be formatted for the XML output
row3.result = routines.TicTacToeUtils.translateResultValueToString(output[0]);
row3.tictactoe = tictactoe;

In this section we create a double array called "inputData" to hold our state values.
We then use the "translateStringValueToNumber" method from the TicTacToeUtils routine to convert the String values to their corresponding numeric values.
We then use the "calcData" method to run that data through the trained Neural Network. This returns a double array with the result.
After some "System.out" calls to show what is happening in the output window, we pass the result (converted to a String using the "translateResultValueToString" method from TicTacToeUtils) and the tictactoe board state on to the next component.

4) "Format the XML output" (tXMLMap)

This component is used to format the response into an XML output. It is really very basic and the configuration can be seen in the screenshot below.....

The reason we wrap the "tictactoes_state" element value with "<![CDATA[" and "]]>" is to allow formatting carried out in the last component will be shown in the web browser (it works in some browsers not in others). This isn't terribly important, but allows you to easily see the board state as it would be written on a piece of paper.

The output to the browser looks like below....

<?xml version="1.0" encoding="UTF-8"?>


5) "tRESTResponse_1" (tRESTResponse)

This component simply allows us to return the XML to the browser. REST Services can be (and usually are) a lot more complicated then this one. I have chosen to put together a bare bones REST Service in this case and it will not handle incorrect formats being supplied as the "state". As such this component is simply configured to return a 200 status and the XML. The config can be seen below...


Running the TrainNeuralNetworkForTicTacToe Job

To run this job simply make sure the source file is downloaded and in the correct location (configured in the context variables), then click Run. If this runs successfully, you should see something like the following in the output window....

Input: [1.0, -1.0, 1.0, -1.0, -1.0, 1.0, 1.0, 1.0, -1.0] Expected Output: [0.0] Output: [0.0073486272363958655]
Input: [1.0, -1.0, -1.0, -1.0, 1.0, 1.0, 1.0, 1.0, -1.0] Expected Output: [0.0] Output: [0.0010815233039514665]
Input: [-1.0, 1.0, 1.0, 1.0, 1.0, -1.0, -1.0, -1.0, 1.0] Expected Output: [0.0] Output: [0.002757025113506054]
Input: [-1.0, 1.0, 1.0, 1.0, -1.0, -1.0, 1.0, -1.0, 1.0] Expected Output: [0.0] Output: [0.0022573148740257934]
Input: [-1.0, 1.0, 1.0, 1.0, -1.0, -1.0, -1.0, 1.0, 1.0] Expected Output: [0.0] Output: [0.001865857943502293]
Input: [-1.0, 1.0, -1.0, 1.0, 1.0, -1.0, 1.0, -1.0, 1.0] Expected Output: [0.0] Output: [0.003494778673600675]
Input: [-1.0, 1.0, -1.0, 1.0, -1.0, 1.0, 1.0, -1.0, 1.0] Expected Output: [0.0] Output: [0.020374120565192142]
Input: [-1.0, 1.0, -1.0, -1.0, 1.0, 1.0, 1.0, -1.0, 1.0] Expected Output: [0.0] Output: [0.008408965104047168]
Input: [-1.0, -1.0, 1.0, 1.0, 1.0, -1.0, -1.0, 1.0, 1.0] Expected Output: [0.0] Output: [0.022969116290342522]
LearnRate = 0.5| Momentum = 0.7|Neurons = 9,26,1| Iterations = 98
[statistics] disconnected
Job TrainNeuralNetworkForTicTacToe ended at 17:22 01/06/2016. [exit code=0]


Running the TicTacToeStateScore Service

To run this Service simply make sure the path to the Neural Network file is set, that it has been trained, then click Run. 
Once the Service is started, you will see a message like below in the output window....

Starting job TicTacToeStateScore at 17:26 01/06/2016.

[statistics] connecting to socket on port 3652
[statistics] connected
Jun 01, 2016 5:26:26 PM org.apache.cxf.endpoint.ServerImpl initDestination
INFO: Setting the server's publish address to be

In order to work out how to call this Service from your web browser, look at the last line I have copied above. That tells you the endpoint you need to use minu the state query parameter. Be aware that the IP address above is just for localhost. If you want to use the service from another computer on your network, you will need to identify the machine that the service is running on's IP. To call the above Service, the following end point should be used ....,X,O,X,O,X,X,O,O

Remember that the "state" should be changed according to whatever state you want to assess. Since only legal states were trained, you can only get reliable results from legal states. The above call should result in the following XML response....

<?xml version="1.0" encoding="UTF-8"?>


A copy of the completed tutorial can be found here. You will also need the Neuroph Jars which can be downloaded here (we are using Neuroph 2.92 in this tutorial). The TicTacToe data can be downloaded here. This tutorial was built using Talend ESB 6.1.1 but can be imported into subsequent versions. It cannot be imported into earlier versions, so you will either need to upgrade or recreate it following the tutorial. You will need to set the Context variables according to your system before running it.


Talend Version: 
Type of content: