Introduction: SQL Server
Data Quality Services enables you to cleanse, match, standardize with user
generated or third party knowledge. Here I’ll show you how to standardize data. The file contains customer name and country name.The country field has values such as "us","usa","united states of america" and we'll standardize it to "u.s.a".
Steps:
Open DQS client and connect to Server.
Click “New Knowledge Base”.
Enter name, description
and select “knowledge discovery”.
Click next.
Select “Excel” as data source and browse your file. The
file I’m using has customer full name, email and country.
Select country, email and custname as source column.
Create domain
“country”,”custname” and “email” and map the source columns to domains just
created.
Click next. Click Start.
The profiler statistics show the “country” column has
five unique values. Click “next.
Select the domain “country” and see the values. Change
the values to invalid if value is not “U.S.A” and correct the values to “U.S.A”.
Click “finish” and publish the KB.
We’ll use this KB to standardize country name in our next
project.
Open DQS client and click on “New Data Quality Project”
under create or maintain data quality project.
Enter name and select "data standardization" we just
created as “knowledge base”. Under “activity”, select “cleansing” and click
“next”.
Select Excel as data source and browse your data. Map
Source column to domain name.Click “next”.Click “start”.
The profiler statistics shows it corrected two records of
country domain.
Click “next”. Select the “corrected” tab and note that
“US” and “USA” have been corrected to “U.S.A” as per our KB. Select the “new” tab and note the values are not in our previously created KB.
We
need to add these new values to our KB and then we can run our data to
standardize country name.
Click “next”. It shows the cleansing result. Select
export destination and export data.
We can use the KB to SSIS to integrate DQS with SSIS.
No comments:
Post a Comment