Monday, February 24

Data Quality Services (SQL Server 2012)

Introduction: SQL Server Data Quality Services enables you to cleanse, match, standardize with user generated or third party knowledge. Here I’ll show you how to standardize data. The file contains customer name and country name.The country field has values such as "us","usa","united states of america" and we'll standardize it to "u.s.a".

Steps:
Open DQS client and connect to Server.
Click “New Knowledge Base”.

Enter name, description and select “knowledge discovery”.



 Click next.
Select “Excel” as data source and browse your file. The file I’m using has customer full name, email and country.
Select country, email and custname as source column.
Create  domain “country”,”custname” and “email” and map the source columns to domains just created. 





Click next. Click Start.
The profiler statistics show the “country” column has five unique values. Click “next.


Select the domain “country” and see the values. Change the values to invalid if value is not “U.S.A” and correct the values to “U.S.A”.

Click “finish” and publish the KB.

We’ll use this KB to standardize country name in our next project.
Open DQS client and click on “New Data Quality Project” under create or maintain data quality project.

Enter name and select "data standardization" we just created as “knowledge base”. Under “activity”, select “cleansing” and click “next”.

Select Excel as data source and browse your data. Map Source column to domain name.Click “next”.Click “start”.

The profiler statistics shows it corrected two records of country domain.

 Click “next”. Select the “corrected” tab and note that “US” and “USA” have been corrected to “U.S.A” as per our KB. Select the “new” tab and note the values are not in our previously created KB.

 We need to add these new values to our KB and then we can run our data to standardize country name.

Click “next”. It shows the cleansing result. Select export destination and export data.

We can use the KB to SSIS to integrate DQS with SSIS.


No comments:

Post a Comment