Data Integration

Load Data into a Dynamic Number of Files

A question that I have seen multiple times on forums and have been asked several times while I have been on site somewhere, is "How can I load data to a dynamic number of files where I don't know the total number?". When I first heard this I have to admit I thought that the person was just trying to pick holes in what Talend can do. Why would you want to load to files that you have no idea about? But then it occurred to me, with Big Data we are seeing a resurgence of flat file data storage/usage. Maybe there is a requirement to split files into smaller chunks by a key within those files?

Load XML files in batches of records

There are many things that Talend is great at doing with XML....there are also many things that it is not so great at. One such example of something that Talend is not so great at is loading multiple XML files with batches of data. Say for example you have 100 records and you want to load multiple records into single XML files, but have a limit on the number of records per file? How would you do that with the tXMLMap component.

Using Neural Networks with Talend DI and ESB

Many times during Data Integration projects we have situations where we have to analyse the data in order to come up with acceptance criteria for it. In a lot of cases, this is pretty straight forward and can be easily written into simple rule based logic. But in some situations, it is not so cut and dry. In these situations a lot of people will generate rule of thumb logic which will isolate certain rows to be double checked by a human. This works. It is time consuming and requires human intervention, but it works.

Using a third party Java library to scrape the content of a table on a web page

Recently I was contacted by a visitor to this site who asked me to put together a tutorial on using Talend for web data crawling purposes. This interested me as I have myself come across situations where I have used other software to scrape websites for data (links, pictures, emails addresses). While it is not difficult to find software to do this, it usually comes with a cost or is very limited in what you can do. After a few minutes of Googling, I came across several Java libraries which offered this functionality.

Dynamic column order

This tutorial was inspired by questions I get asked a lot when out on sites, I have also seen it asked on forums. The question is "Is there a way to deal with files that have the same columns, but in different orders?" or "Can I idnetify the column order from the header row?". There are several ways in which this can be achieved and this is just one of them. Some ways may required much more complicated logic and maybe a bit of Java. This way makes use of the tMap component and the ordered processing of variables in that component.

Using UPnP enabled devices with Talend - Control Sonos Speakers

In the last tutorial published on here, we discussed using Talend with a UPnP device by Belkin. The tutorial looked at how to discover UPnP devices and how to use the device's UPnP description xml to work out how to use the actions available. The tutorial can be found here and will be useful to keep in mind before looking at this approach.

Using UPnP enabled devices with Talend - A Belkin WEMO Switch

Universal plug and play (UPnP) devices are ubiquitous these days. More and more homes are filling up with devices that make use of UPnP functionality and this opens lots of doors for Talend users to derive more functionality from connecting these devices. In my home I have the following devices which make use of UPnP protocols....

Samsung Smart TV
Belkin WEMO Switch
Belkin WEMO Sensors
Philips HUE lighting
SONOS Speakers
BT Home Hub Router

Using an auto generated primary key to update a row just inserted in a MySQL database

This short tutorial was inspired by a question I had from a customer. They were trying to insert rows into a "log" table at the beginning of a job and update that same row at the end of the job. The problem was that they were using an auto generated primary key in a MySQL database. This in itself isn't a bad thing to do, in fact it could be argued that it is the right thing to do (control sequences/ids at the database end), but they could not work out how to get hold of that generated "id" without running the risk of another job interacting with the table causing errors.

Using OAuth 2.0 with Talend to Access Google APIs

This tutorial deals with a reasonably complicated process and the Talend DI stuff is arguably the simple bit. As this is the case, I will not be going through each of the steps in as much detail as some of the other tutorials. It is assumed that if this functionality is required, that most of the Talend DI basics will have been mastered, or at least understood.

A Talend "Connect By" Example

Out of the box, Talend's Data Integration components cover many scenarios. However, there are some where there simply isn't a component that will do the job. In situations like that you can search for bespoke components that people have built, or maybe even build one yourself. But the problem is that these components are not supported and will not be guaranteed to be upgraded when the version you are using is upgraded. A good way to get around this is to create a child job that provides the functionality you require.