This short tutorial was inspired by a question I had from a customer. They were trying to insert rows into a "log" table at the beginning of a job and update that same row at the end of the job. The problem was that they were using an auto generated primary key in a MySQL database. This in itself isn't a bad thing to do, in fact it could be argued that it is the right thing to do (control sequences/ids at the database end), but they could not work out how to get hold of that generated "id" without running the risk of another job interacting with the table causing errors. They had considered the following methods....
1) Querying the table for the "latest id" when they wanted to update. But this could be broken if another process has created another "id" in that table.
2) Using another key field or a composite key for which they had total control. While this was an option, it rendered the primary key pointless and was still open to potential errors in the future if the way the table was being used changed.
Both of those methods are reasonable, but are not terribly safe if other processes are using the table OR if the use of the table evolves.
Talend Data Integration actually provides a component for retrieving this "id" value. It is called the "tMySqlLastInsertId" component. This component simply returns an "id" field. Now this is great and would have solved their problem, but it is not necessary and this task can be achieved by using just the "tMySqlOutput" component in a far more elegant way. This tutuorial shows how this can be done and demonstrates the use of the "Additional Columns" section of the "Advanced Settings" of database components.
The job shown below is a simple job that simply inserts a value in a MySQL database and then updates it.
The GetLastInsertIDValue Job
Below we can see a screenshot of this Job. I will not go into too much detail as to how to build it, but will focus on the important components. A copy of this job can be downloaded via a link at the bottom of the page....
1) "lastidvaluetest" (tMySqlConnection)
This component is imply used to share a connection to the database rather than permitting each MySQL component to make its own connection.
2+7) "tFixedFlowInput_1 and tFixedFlowInput_1" (tFixedFlowInput)
These components are used to provide fixed values. A screenshot of tFixedFlowInput_1 can be seen below. The schemas are configured by simply connecting the components to the tMySqlOutput components.
The "Number of rows" is set to 1 as we only want 1 row.
The "Values" section only has the "test_text" value set. The "id" value is empty as this is auto generated by the database.
3) "lastidvaluetest" (tMySqlOutput)
This component is used to carry out the first insert. The "Basic settings" configuration can be seen below....
We are using a shared connection, so the "Use an existing connection" tick box is ticked. The "Component List" selection box holds the name of the "tMySqlConnection" component used.
The "Action on data" has been set to "insert" as we only want to insert using this component.
The "Advanced settings" can be seen below....
First of all we tick the "Use field options" tick box. This is done so that we can tailor how this component will work with individual columns.
As we only want to provide the "test_text" value and want the database to deal with the "id" itself, we untick the "id" field in the "Insertable" column. All other tick boxes are unticked.
4+9) "tLogRow_1 and tLogRow_2" (tLogRow)
These components are simply connected to show the state of the data as it passes through the Job. There is no configuration required.
5+10) "tMySqlCommit_1 and tMySqlCommit_2" (tMySqlCommit)
These components are used to commit the data that has been either inserted or updated. The configuration of these components is very similar with 1 difference. The configuration of "tMySqlCommit_1" is shopwn below.....
The "Component List" drop down box holds the name of the "tMySqlConnection" component. This is the same for both "tMySqlCommit" components.
The "Close Connection" tick box is not ticked for the "tMySqlCommit_1" component as we want to keep the connection alive for the update. The "tMySqlCommit_2" component has this ticked as we can close the connection at the end.
6) "tMsgBox_1" (tMsgBox)
This component is simply used to put a pause between the "insert" and "update" to allow us to check the value in the database after the insert and before the update. The configuration of this component is not important.
8) "lastidvaluetest" (tMySqlOutput)
This component is used to carry out the update using the auto generated id from the first insert. The "Basic settings" configuration can be seen below....
As with the other "tMySqlOutput" component we have the "Use an existing connection" tick box ticked and the "tMySqlConnection_1" component selected in the "Component List" drop down. However, this component has "Update" selected in the "Action on data" drop down.
The "Advanced settings" of this component can be seen below.....
The above screenshot is what this tutorial is all about. The "Additional columns" section. The "Additional columns" section allows you to add or replace column values used in updates and inserts. The "Name" field is the name of the column. In this example we set this to be the same as the "id" column in the database.
The "SQL expression" column is the important piece of information here. It holds "LAST_INSERT_ID()". This is a MySQL function that allows you to retrieve the last inserted id when inserting a single a record. If multiple records are inserted, it will return the first auto generated id of the batch. This function is explained here. The "SQL expression" column is very useful for making use of database specific functions on inserts or updates.
The "Position" column allows you to specify whether the column you are setting up should be used to "replace" (as is the case here) or be supplied to the insert/update "before" or "after" the "Reference column". In this case we are replacing the supplied "id" which will be used as the "update key".
If we look at the "Field options" section we can see that the "Update Key" is ticked for "id" and the "Updatable" column has "test_text" ticked. This basically tells the database to use th "id" (which has been replaced by the configuration above) as the update key and to update the "test_text" column only.
The "lastidvaluetest" table
For this tutorial I made use of a simple MySQL table to demonstrate the functionality. The create statement of this table can be seen below....
|CREATE TABLE `lastidvaluetest` (|
`id` int(11) NOT NULL AUTO_INCREMENT,
`test_text` varchar(45) DEFAULT NULL,
PRIMARY KEY (`id`)
Running the Job
To run this Job simply click on the "Run" button on the "Run" tab.
The first thing that will happen is that the insert will take place. Then a popup window (caused by the "tMsgBox" component) will appear instructing the user to take a look at the database table. If you look at the database table at this point you will see a row added to the "lastidvaluetest" table with a vaue of "Test1".
Once this has been witnessed, click "OK" on the popup window and the update will take place. If you check the database table now you will see that the row that was created in the last step has now been updated to hold the value "TestUpdated".
A copy of the completed tutorial can be found here. It was built using Talend 5.5.1 but can be imported into subsequent versions. It cannot be imported into earlier versions, so you will either need to upgrade or recreate it following the tutorial.