There was a saying about a student who constantly complained to his teacher of always having so much work to do, that time was never enough for him. “If you never stop working, when do you actually think of the things you should do and how you should do them?” was the question the teacher asked him. Following this thought when my daily tasks flow is at a high level of load, I usually try to stop once in a while and think of the bigger picture of what I am actually trying to achieve and whether I am doing it effectively.
For a couple of days now I have been trying to sit and play a bit with Microsoft SQL Server Integration Services (SSIS) error handling. The reason was that colleagues and I were working on a SSIS project and at a certain point the error analysis of migrated data became somehow a pain in the triple-letter word :).
We identified the following issues:
- We spent more time analyzing errors, than actually transferring the data
- Doing the above we tended to make mistakes in the statistics we provided for the client
Then we tried to nail the actual reasons for these things:
- We had too many error destination components and we had to go over each one, every time the migration was finished. This was slowing us.
- We did not have a common way of outputting the errors content in the destination components. This way we had a different approach of analysis for each one, which is an error-prone way of doing things.
Having this figured out, the next step was to set the goals:
- We need as less error destinations to log data in, as possible. One error output per data flow task was an excellent optimization for our case.
- We wanted a relatively easy data format to analyze over. So we targeted at least an excel sheet to output into, not just a flat file for instance.
So I’ve put some effort to make an example of a problematic (in the context of our case) package. In it we had two types of error outputs (all pointing to flat file destinations) – a) Business logic errors and b) Data manipulation errors:
Starting almost from the scratch:
The data flow task was transformed into this:
The key points here are:
- We have multiple error outputs (both - business logic and data manipulation) united into a single path flow.
- For every business logic type of error we add a custom reason (i.e. “Add Reason” component on the image).
- We have a Script component extracting the actual description of errors with the following code:
public override void Input0_ProcessInputRow(Input0Buffer Row)
if (Row.ErrorCode_IsNull == false)
Row.ErrorDescription = ComponentMetaData.GetErrorDescription(Row.ErrorCode);
Row.ErrorColumnName = ComponentMetaData.InputCollection.InputColumnCollection[Row.ErrorColumn].Name;
catch (Exception ex)
Row.ErrorColumnName = "Column Name retrieval failure";
Row.ErrorDescription = Row.Reason;
- All information is gathered into an excel sheet for later analysis.
If someone wants to dig deeper, here is the source code of the package:
ErrorTest.zip (28.83 kb)