Using a Talend Job to create the AMC database schema

To log the AMC data in a database, a predefined schema is required. Tables for "Logs", "Stats" and "Flowmeter" data are required. These can be built by using your database administration tools in the traditional way, or you can create a simple Talend Data Integration job to do it for you. This tutorial shows how to create such a job. This sort of job may only be required once if you are working on a single site with one installation, but if you are regularly changing sites it can be a very useful tool to have in your Talend toolkit.

This tutorial was carried out using Talend Version: 5.4.1 Build id: r111943-20131212-1154. This tutorial and the export of this job should be able to be used with and imported into future versions.

Job Layout

By now, you should have got to grips with how to build a basic Talend Data Integration job. If not, take a look at this example first. If you are happy to proceed and feel you are familiar enough with the Talend Data Integration environment, open a new job and create the layout of components depicted in the diagram below. We will go through the configuration of each component by the numbering in the diagram.

Component Configuration

This section details how the components should be configured.

1) tRowGenerator_1

This component is used to generate 10 rows of random data. It produces 1 string column which uses the "TalendString.getAsciiRandomString" function to generate the content. To set this up configure the component as shown in th diagram below which depicts the component configuration.

 

2) Row1

This row is not a component as such, but a row joining the tRowGenerator_1 to the tLogRow_1. The purpose of this row is to count the number of times individual rows of data are passed along it. This functionality can be switched on for any row and allows you to keep track of the data flowing through the job. The row count is logged in the FlowMeter data table. The purpose of this row in this job is to produce some flow data so that the data can initiate the generation of the FlowMeter data table. The diagram below shows the row highlighted and the "component tab" where the row needs to be configured to be monitored. All that is required for this example is that the "Monitor this connection" tick box is ticked.

3) tLogRow_1

This component is simply used to display the row content in the output. There is no configuration required for this component. The way the data is displayed is entirely up to the developer.

4) tWarn_1

This component is used to create a warning to be caught by the "Logs" data table. This component will fire an exception but will not stop the job. A tDie component would act similarly, but would stop the job as soon as it is fired. This was chosen so that the whole job would finish. Like the tLogRow_1, this component requires no extra configuration. A specific error can be added if required.

5) tStatCatcher_1

The tStatCatcher_1 component is used to catch statistics on when the job starts and finishes as well as individual component statistics. In this example we only care about catching job start and stop stats. All that is required is that this component is placed on the job. Normally when catching AMC stats, you will not need to use a tStatCatcher component as you can set the job to listen for stats by setting the "Stats&Logs" settings for the job. However the purpose of this job is to create the AMC schema, so we need to explicitly catch these stats here.

6) tMysqlOutput_1

This component is used to create and insert rows into the AMCSTATS table. In this example we are using a MySQL database. The configuration of this component is shown below. An important area to pay attention to is the "Action on table" drop down. This needs to have "Create table if does not exist" selected. This will create the table if it doesn't already exsit when a row needs to be inserted.

7) tFlowMeterCatcher_1

The tFlowMeterCatcher_1 component is used to catch data flow stats. This component will catch the row count created by row1. All that is required is that this component is placed on the job. Normally when catching AMC flow meter data, you will not need to use a tFlowMeterCatcher component as you can set the job to listen for flow meter data by setting the "Stats&Logs" settings for the job. However the purpose of this job is to create the AMC schema, so we need to explicitly catch the flow data. here.

8) tMysqlOutput_2

This component is used to create and insert rows into the AMCMETER table. In this example we are using a MySQL database. The configuration of this component is shown below. An important area to pay attention to is the "Action on table" drop down. This needs to have "Create table if does not exist" selected. This will create the table if it doesn't already exsit when a row needs to be inserted.

 

9) tLogCatcher_1

The tStatCatcher_1 component is used to catch statistics on when the job starts and finishes as well as individual component statistics. In this example we only care about catching job start and stop stats. All that is required is that this component is placed on the job. Normally when catching AMC stats, you will not need to use a tStatCatcher component as you can set the job to listen for stats by setting the "Stats&Logs" settings for the job. However the purpose of this job is to create the AMC schema, so we need to explicitly catch these stats here.

10) tMysqlOutput_3

This component is used to create and insert rows into the AMCLOGS table. In this example we are using a MySQL database. The configuration of this component is shown below. An important area to pay attention to is the "Action on table" drop down. This needs to have "Create table if does not exist" selected. This will create the table if it doesn't already exsit when a row needs to be inserted.

A copy of the completed tutorial can be found here. It was built using Talend 5.4.1 but can be imported into subsequent versions. Remember to configure/replace the database components to suit the environment you are working with. 

Tutorial: 
Talend Version: 
Type of content: