It’s been estimated that up to 60% of the effort in building a data warehouse can be spent in developing the ETL processes that are needed to copy data from source systems and transform it before loading it in to the data warehouse.
Many of the ETL processes, especially those loading a staging environment, have a common structure and are repetitive in nature. They lend themselves to a development process that can be automated and repeated programmatically.
Most ETL tools present the user with a graphical user interface that makes developing a single ETL process easier, but falls short in providing several key software development factors such as; reuse, naming standards enforcement, automation, source control, and refactoring.
BIML, which stands for Business Intelligence Markup Language, gives us the capability of incorporating these key software development factors into our development of ETL solutions in a SQL Server Integration Services environment.
BIML is a dialect of XML, developed by Varigence. There is an open-source implementation of BIML in BIDS Helper which is a free ad-on for Visual Studio. This means that if you use the open-source implementation, you can use BIML to automate your ETL solution development at no additional cost.
What is Biml?
Biml is a declarative language that specifies BI assets and metadata. It generates compiled code that is used in the Microsoft BI tool set. So, BIML allows you to describe SSIS, SSAS and Database Objects in human readable XML. The compilation of BIML code results in standard SSIS packages that can be targeted to SQL Server 2005 thru 2014.
Biml allows us to take advantage of code reuse, refactoring, naming standards enforcement, and source control.
What is BimlScript?
BimlScript is an extension to the Biml language where C# and VB.NET code nuggets are embedded directly within Biml. This allows you to turn tedious, repetitive work into reusable scripts. It supports the automation of package generation and creation of reusable patterns for common BI tasks.
BimlScript allows us to take advantage of refactoring and automation. Together Biml and BimlScript resolve the missing software development factors when developing SSIS ETL solutions.
Scenario
Let’s look at a typical data warehouse development scenario and see how Biml and BimlScript can help.
You are given the task of building a new data warehouse, the source data resides in 100 tables on the system of record, and each table design is unique. Without Biml and BimlScript you would be faced with manually creating 100 packages, a very tedious, error prone and time consuming prospect. If you estimate that on average creating and testing an SSIS package that uses the incremental load design pattern takes 2 hours, then you are looking at 200 hours of work.
Using Biml you can create a template that will generate an Incremental Load SSIS package. This will speed things up considerably by encapsulating in the template all the naming standards, data base object creation (i.e. tables, indexes, stored procedures, etc.), ETL work flow and data flow.
For example, let’s say that your data warehouse architectural standards state that each source record must contain a source name, and extract date when it is copied to the data warehouse. Your Biml template for the Incremental Load pattern, can enforce this rule by creating the target table with all of the source table columns plus source_name and extract_date columns with default values.
So far, we have saved a lot of work and ensured a high degree of quality when creating an ETL package for a single table. However, the real savings comes when we use BimlScript to point to a database, interrogate each table in the database and then execute the Biml Incremental Load pattern for each of those tables. This would reduce the 200 hours of work down to just minutes. And because the packages are generated from the same template, the quality of the code increases while the development time decreases.
Conclusion
The idea here is to develop a library of templates for all types of load patterns and have the templates implement a framework for capturing meta data about the load. Using Biml and BimlScript in this manner could significantly reduce complexity, errors and development time, while at the same time improving the quality of ETL solutions.