Jigging with Dataflux

I have read a few “how to” and “case Study” books on Data Warehousing over the last few years, and they all pretty much state if the quality of your data  is rubbish, then the success of your Data Warehouse will be limited.

However it is often difficult to get an organisation to rectify all the Data Quality issues, before they embark on delivering reports and information to the business users who need it.

One of the interesting sessions at next years SAS Users Conference in New Zealand is by Zeeman van der Merwe who is talking about the work he is doing at ACC.  I had the pleasure of meeting Zeeman a while ago and to talk to him about his project and he is definitely taking the recommended approach of sponsorship from the top and covering areas such as Data Governance, Data Stewardship and Data Quality reporting.

One of the Data Warehouse projects we are working on has a sister project dealing with Data Quality.  It is fair to say we that we have yet to get the organisation to fully understand the impact data quality has on the business and the necessity to rectify the issues.  Everybody does of course agree there are a lot of issues with the quality of the data which is a good start.

I always remember in my presales days at SAS the words customers always uttered “yes we have major data quality issues” shortly followed by “but we don’t have any money to pay to fix them”.

Anyway on this project we are lucky enough to have SAS Enterprise Data Integration Server at our beck and call and so have the ability to use Dataflux on the Warehouse data.  So we have done a number of tactical Data Discovery and Data Validation pieces of works.

So far we have completed:

  • Validation of Phone Numbers
  • Validation of Addresses
  • Customer/Person matching

The Phone Number validation was the first one we did and we picked it as it was a discrete piece of work we could time bound, while we worked through the process to use Dataflux.  We are now looking to close the loop by updating the augmented phone number data Dataflux produced inot the source system, and changing the business rules in the source system to rectify some data entry issues we identified.

I really recommend the idea of picking something small to start out with.

We are now looking at how we productionise the Data Quality routines into out standard Data Warehouse load and reporting processes.  So far the options (in 9.1.3) look like:

  • Purchase the full use Dataflux Integration Server
  • Schedule Dataflux routines to run on a PC
  • Manually run the jobs
  • Rewrite the Dataflux jobs in SAS DI Studio

Interesting thing to note is that in SAS 9.2 the Dataflux Integration Server component is bundled in eDI so you can just deploy the Dataflux Architect jobs and run them in your Warehouses standard process flows.

We still havent decided whihc option will work the best, but are thinking it is going to be the DI Studio option in the interim as consistency and stability of loads is one of our major focuses.

I have to say I love Dataflux and all that it does (I even believe the Dataflux team now have a stringer presence in the development of SAS Data Integration Server under the “Project Unity” banner).

I note that Dataflux jumped to the top of the Gartner Magic Quadrant in 2009.  I always struggle to find this when I need them, so here are the Data Quality ones for 2008 and 2009.

Gartner 2008 Data Quality Magic Quadrant

Gartner 2009 Data Quality Magic Quadrant

  • Share/Bookmark

Q: When is the latest SAS version not the latest SAS version?

On a project I am working on we upgraded our SAS licenses from Data Integration Server to Enterprise Data Integration Server.

We were really impressed with the capabilities of Dataflux, and as we were using SAS DI Studio in anger to populate our warehouse, the idea of integrating the Dataflux rules into our ETL processes also appealed.

We had a demo of Dataflux version 8, which looks sexy and has a lot of great features (which you would expect from one of the top 3 data quality tools in the world).

Imagine our disappointment when we got the CD’s to find we had been shipped Dataflux 7.0, not version 8 that was demo’d.

Simple mistake we thought, so onto to our capable SAS account manager.

Well we were wrong, it seems that Dataflux 7.0 is certified (and works) with DI Studio, Dataflux 7.1 and 8.x will be supported in SAS 9.2.

Tricks for young players I suppose., but then again I am not so young anymore…..

I am sure Dataflux 7.0 and DI Studio will do what we need, but still slightly disappointing.

  • Share/Bookmark