31 Days of SSIS – Create XML Document (13/31)

Posted by & filed under , , , , .

31 FingersDay thirteen and we are trucking along with the 31 Days of SSIS blog series.  The last couple posts discussed methods that could be used to unpivot (1, 2) data to populate data into an entity-attribute-value (EAV) architecture.

Package Requirements

Sometimes, though, it isn’t necessary to populate customer data that doesn’t fit into the portion of the data model for custom values.  Consider for a moment data that needs to be stored for a customer, but will only ever be used for troubleshooting or to be returned in some later process.

Would this data be better stored in the EAV architecture, or some other storage mechanism?  As the name of the post infers, it might be better to store that data in XML.  You can’t throw it away, but you need to store it where it won’t get in the way.

This is exactly the issue that was brought up to me a number of months ago.  A new import would have millions of records, each of which might have dozens of custom values. Storing these values would drastically change the statistics of the existing EAV architecture that only contained a few million rows by adding millions of more rows.

Script Task Solution

For this SSIS package, we need a task that will be able to consume one or more columns.  These columns should be added to an XML document that is created and returned by the transformation.  The package will then be able to take the XML document created and move it to wherever it needs to be delivered.

To keep the solution flexible it should be portable to other SSIS packages.  The process for creating the XML document should be easily configurable and require limited coding for each new implementation.

To accomplish these needs, we’ll implement some code within a Script Component transformation.  The code will be dynamic with configuration relying on the columns passed into the transformation.

Before getting too far into explaining this solution, the idea for this solution arose after reading a blog post by John Welch (Blog | @John_Welch).  In his post, John discussed how to create an XML Destination Script Component.  Some of the code in this post is derived from his post.  I highly recommend reading this and other posts on his blog – there is some great content on his site.

imagePackage Control Flow

To demonstrate this solution, we’ll build a fairly simple SSIS package.  The package only contains a couple Data Flow tasks.  The first data flow task creates a file that will be used to demonstrate the functionality in the second data flow task.  The second data flow task will consolidate a number of columns into an XML document that can be inserted or update into an XML column for reference.

Package Variables

Only two tasks, so why not only two variables as well for the package?  Well this wasn’t intentional, but it was all that I needed.

image

The variables from above were added for the following purposes:

  • FileProduce: Name and location of the text file the source for the XML creation transformation.
  • WorkFolder: This is my one variable to rule them all that sets the FileDestination and FileSource paths.

Package Data Flows

As noted above there are two data flows.  In this section, we’ll dissect them to explain what you will see when you download the SSIS package.

imageDFT_Demo_File

Within this data flow task a text file will be created.  The purpose of the text file is to provide data for the second data flow.  This isn’t entirely necessary to demonstrate this functionality but is included to implement the demonstration with the data source that the original design utilized.

There are two transformations in this data flow:

  • DS_MSS_Product: OLE DB Source that retrieves data from the AdventureWorks database.  The data source exports sales information.
  • DD_FF_Product: Flat file destination for the sales information.

imageDFT_XMLDocument

This data flow is where the real action is.  The data flow uses the file created in the previous data flow as it’s source data.  The data is transformed in a Script Component transformation.  The last step is a Row Sampling transformation that provides a data flow path that we can hook a data viewer onto.

The transformations in this data flow are:

  • DS_FF_Product: Flat file source containing the product information.
  • SCR_Fill_XMLData: Script Component transformation that accepts columns from the data flow path and returns those columns as an XML document in a new column.  The configuration of this component will be describe in detail below
  • RSAMP_Data: As mentioned above, this is here to provide a place for a data viewer.

SCR_Fill_XMLData Configuration

To use a Script Component transformation for building XML there are a few configuration points that will be important to understand.  We’ll go over setting this up, and how to change the contents of the XML document without changing any code.

First browse to the Input Columns tab.  On this tab, select all of the columns that you want to include in the XML document that is created.  This is the primary configuration point for this task.

The columns selected can be changed at anytime.  The only restriction will be the need to recompile the code in order for the underlying collections to be updated.  No edits will need to be made.

SNAGHTMLfcc67e

Next on the Inputs and Outputs tab add a new column to the Output 0 output.  This column should be configured as a string data type with a maximum length of 4,000.  This is shown below.

SNAGHTMLfeccd5

Now that these two pieces are configured, go back to the Script tab and select the Edit Script button.  The script window will open and you see a basic transformation template.

Input0_ProcessInputRow Function

The first step in setting up the script will be to replace the Input0_ProcessInputRow function with the one below.  Before we do that lets review what this script is doing to make certain that you understand the code before you use it.

The code will do the following:

  1. For each column that has been selected as an input for the Script Component transformation the function will loop across each column.
  2. If the column name is XMLData, don’t do anything.  Since we will assign the XML document to the column later we don’t want to include it in the data being fed into the XML document.
  3. Get the name of the column.  Spaces and periods are removed from the column name since the Script Component transformation will do this itself.
  4. Get the value of the column.
  5. Get the name of the NULL column created for the column.  When columns are added to a Script Component transformation a NULL is created to allow tests for NULL.
  6. If the column is configured as ReadOnly then consider it for the XML document.  This was added as a fail safe to the XMLData.  It is redundant and could be removed.
  7. Check if the column value is NULL.  If so, add an empty element to the XML document.  Otherwise, add an element to the XML document with the column value.

Now that we’ve gone through the psuedo-code, here’s that function:


public override void Input0_ProcessInputRow(Input0Buffer Row)
{
Type rowType = Row.GetType();
PropertyInfo columnValue;
PropertyInfo nullState;
string XMLData = "<Data>";
string columnName = "";

try
{
foreach (IDTSInputColumn100 column in ComponentMetaData.InputCollection[0].InputColumnCollection)
{
if (column.Name != "XMLData")
{
columnName = column.Name.Trim().Replace(" ", "").Replace(".", "");
columnValue = rowType.GetProperty(columnName);
nullState = rowType.GetProperty(columnName + "_IsNull");

if (columnValue.CanWrite == false)
{
if (Convert.ToBoolean(nullState.GetValue(Row, null)) == true)
{
XMLData = XMLData + FormatElement(columnName);
}
else
{
XMLData = XMLData + FormatElement(columnName, columnValue.GetValue(Row, null).ToString().Trim());
}

}
}
}

Row.XMLData = XMLData + "</Data>";
}
catch (Exception ex)
{
throw new Exception("Error processing column:" + columnName + " " + ex.StackTrace);
}
}

The code does what the pseudo-code outlined.

Formatting Functions

There are two formatting functions that came from the post mentioned above.  These allow for converting values to XML element easier with less code.  The one distinction with these functions to mention is the inclusion of CDATA.  CDATA is used to force the encoding of values that can be in the source data to prevent them from causing the XML document to be malformed.


public string FormatElement(string elementName)
{
return FormatElement(elementName, "");
}

public string FormatElement(string elementName, string data)
{
string elementValue;

if (data.Trim() == string.Empty)
{
elementValue = String.Format("<{0}/>", elementName);
}
else
{
elementValue = String.Format("<{0}><!{2}CDATA{2}{1}{3}{3}></{0}>", elementName, data, "[", "]");
}
return elementValue;
}

XML Script Task Wrap-Up

With that you can execute the SSIS package.  If you have a data viewer on the package you’ll see something like this:

SNAGHTML1197f61

Pull out one of the values, and you get the following:


<Data><Name><![CDATA[Adjustable Race]]></Name><ProductNumber><![CDATA[AR-5381]]></ProductNumber><MakeFlag><![CDATA[False]]></MakeFlag><FinishedGoodsFlag><![CDATA[False]]></FinishedGoodsFlag><Color/><SafetyStockLevel><![CDATA[1000]]></SafetyStockLevel><ReorderPoint><![CDATA[750]]></ReorderPoint><StandardCost><![CDATA[0.0000]]></StandardCost><ListPrice><![CDATA[0.0000]]></ListPrice><Size/><DaysToManufacture><![CDATA[0]]></DaysToManufacture></Data>

Wait a minute, this doesn’t look like XML!  But remember what I mentioned about he CDATA.  Drop these values into a variable of the XML data type and select it back out.  Try that and you will get the following:


<Data>
<Name>Adjustable Race</Name>
<ProductNumber>AR-5381</ProductNumber>
<MakeFlag>False</MakeFlag>
<FinishedGoodsFlag>False</FinishedGoodsFlag>
<Color />
<SafetyStockLevel>1000</SafetyStockLevel>
<ReorderPoint>750</ReorderPoint>
<StandardCost>0.0000</StandardCost>
<ListPrice>0.0000</ListPrice>
<Size />
<DaysToManufacture>0</DaysToManufacture>
</Data>

This, of course, looks like what you were expecting.  Now go forth and cut and paste this into all of the new SSIS packages where you need this functionality.

The one word of caution what I will add to using this is the limit to 4,000 characters.  Right now, strings can’t exceed that within SSIS and that can be an issue.  If you run into this, this solution will not completely fit for you.

Hopefully, with this and the previous SSIS package you are starting to see that there are a lot of creative things that can be done with Script Component transformations.  For the most part, the sky is the limit with these.

Package Download