SSIS – Is there a Data Flow Source component that will handle CSV files where the column order may change

csvssis

We have written a number of SSIS packages that import data from CSV files using the Flat File Source.

It now seems that after these packages are deployed into production, the providers of these files may deliver files where the column order of the files changes (Don't ask!). Currently if this happens, our packages will fail.

For example, an additional column is inserted at the beginning of each row. In this case, the flat file source continues to use the existing column order, which obviously has a detrimental effect on the transformation!

Eg. Using a trivial example, the original file has the following content :

OurReference,Client,Amount
235,MFI,20000.00
236,MS,30000.00

The output from the flat file source is :

OurReference   Client   Amount
235            ClientA  20000.00
236            ClientB  30000.00

Subsequently, the file delivered changes to :

OurReference,ClientReference,Client,Amount
235,A244,ClientA,20000.00
236,B222,ClientB,30000.00

When the existing unchanged package is run against this file, the output from the flat file source is :

OurReference   Client   Amount
235            A244     ClientA,20000.00
236            B222     ClientB,30000.00

Ideally, we would like to use a data source that will cope with this problem – ie which produces output based on the column names, instead of the column order.

Any suggestions would be welcomed!

Best Answer

Not that I know of.

A possibility to check for the problem in advance is to set up two different connection managers, one with a single flat row. This one can read the first row and tell if it's OK or not and abort.

If you want to do the work, you can take it a step further and make that flat one-field row the only connection manager, and use a script component in your flow to parse the row and assign to the columns you need later in the flow.

As far as I know, there is no way to dynamically add columns to the flow at runtime - so all the columns you need will need to be added to the script task output. Whether they can be found and get parsed from the each line is up to you. Any "new" (i.e. unanticipated) columns cannot be used. Columns which are missing you could default or throw an exception.

A final possibility is to use the SSIS object model to modify the package before running to alter the connection manager - or even to write the entire package dynamically using the object model based on an inspection of the input file. I have done quite a bit of package generation in C# using templates and then adding information based on metadata I obtained from master files describing the mainframe files.

Related Topic