Monday, February 24

Data Quality Services (SQL Server 2012)

Introduction: SQL Server Data Quality Services enables you to cleanse, match, standardize with user generated or third party knowledge. Here I’ll show you how to standardize data. The file contains customer name and country name.The country field has values such as "us","usa","united states of america" and we'll standardize it to "u.s.a".

Steps:
Open DQS client and connect to Server.
Click “New Knowledge Base”.

Enter name, description and select “knowledge discovery”.



 Click next.
Select “Excel” as data source and browse your file. The file I’m using has customer full name, email and country.
Select country, email and custname as source column.
Create  domain “country”,”custname” and “email” and map the source columns to domains just created. 





Click next. Click Start.
The profiler statistics show the “country” column has five unique values. Click “next.


Select the domain “country” and see the values. Change the values to invalid if value is not “U.S.A” and correct the values to “U.S.A”.

Click “finish” and publish the KB.

We’ll use this KB to standardize country name in our next project.
Open DQS client and click on “New Data Quality Project” under create or maintain data quality project.

Enter name and select "data standardization" we just created as “knowledge base”. Under “activity”, select “cleansing” and click “next”.

Select Excel as data source and browse your data. Map Source column to domain name.Click “next”.Click “start”.

The profiler statistics shows it corrected two records of country domain.

 Click “next”. Select the “corrected” tab and note that “US” and “USA” have been corrected to “U.S.A” as per our KB. Select the “new” tab and note the values are not in our previously created KB.

 We need to add these new values to our KB and then we can run our data to standardize country name.

Click “next”. It shows the cleansing result. Select export destination and export data.

We can use the KB to SSIS to integrate DQS with SSIS.


Monday, February 3

Test Driven Development (TDD) in Visual Studio

Introduction: Visual studio 2012/2013 supports Test Driven Development (TDD) through its unit testing framework.This demonstrates how to build test method in c# using unit testing framework and viewing results in test explorer.

Steps:
 Create a new project as shown below:


The solution explorer should look like this:

Create a new Class: Right click on TDD| Add New Item| Add Class and name it Calculator.

Write a basic test method: Double click UnitTest1.cs under TDD.Tests folder.

Generate the method from test:

Place the cursor on sum and then on the short cut menu choose Generate, Method Stub.
This will create a sum method in Calculator.cs.


Run the unit Test: On the Test menu,choose Unit Test,All Tests. The Test Explorer opens and displays the result.

The test fails because sum() is not implemented.
Let's implement the sum() in Calculator.cs file.

Now run the unit test again.

It fails because the expected result and actual result is not same.
Lets modify the sum() as follows:


Run the unit test again. The method passes this time.































Saturday, February 1

Storage and Management of unstructured data using File Table and File Stream

Introduction: SQL Server provides a special table of files, also referred to as a File Table, which lets an application to integrate its storage and data management components to allow non-transactional access. A File Table is a specialized user table with a pre-defined schema that stores FILESTREAM data, as well as file and directory hierarchy information and file attributes. Each row in file table represents a file or directory.  It supports full-text search and semantic search over files and documents. FILESTREAM enables SQL Server-based applications to store unstructured data, such as documents and images, on the file system. Transact-SQL statements can insert, update, query, search, and back up FILESTREAM data. FILESTREAM integrates the SQL Server Database Engine with an NTFS file system.

Steps:

Enable Filestream:
1. On the Start menu, point to All Programs, point to Microsoft SQL Server 2012, point to Configuration Tools, and then click SQL Server Configuration Manager.
  1. In the list of services, right-click SQL Server Services, and then click Open.
  2. In the SQL Server Configuration Manager snap-in, locate the instance of SQL Server on which you want to enable FILESTREAM.
  3. Right-click the instance, and then click Properties.
  4. In the SQL Server Properties dialog box, click the FILESTREAM tab.
  5. Select the Enable FILESTREAM for Transact-SQL access check box.
  6. If you want to read and write FILESTREAM data from Windows, click Enable FILESTREAM for file I/O streaming access. Enter the name of the Windows share in the Windows Share Name box.
  7. If remote clients must access the FILESTREAM data that is stored on this share, select  Allow remote clients to have streaming access to FILESTREAM data.
  8. Click Apply.


In Sql server Management studio,click new query to display the query editor and enter the following sql statement:








Create File Stream Enabled Database:
CREATE DATABASE Demo
 ON
 PRIMARY (NAME = Book, FILENAME = 'C:\demo\Book.mdf'),
  FILEGROUP BookGroup CONTAINS FILESTREAM (NAME = BookFS, FILENAME = 'C:\demo\BookFS')
  LOG ON (NAME = 'BookLog', FILENAME = 'C:\demo\BookLog.ldf');       
GO

After execution of the statements, you’ll see the folder as seen in below pic.


Enable Non-Transactional Access at database level
ALTER DATABASE Demo
SET FILESTREAM (NON_TRANSACTED_ACCESS = FULL, DIRECTORY_NAME = N'BookFS')

Create a File Table:
Use Demo
CREATE TABLE BookDocument AS FileTable
GO



I created two text files – collegeText.txt and KidsBook.txt. The text files contain Title,Author,Publish Date and Price. Now, right click on “BookDocument” in the object explorer and select “Explore File Table Directory”, a new window will open. Drop the collegeText.txt and KidsBook.txt files here.




Run the select statement:
SELECT *
FROM dbo.bookdocument


Enable full text search
--Create Full-Text Catalog
Create FullText Catalog demoFulltextSearch as Default
GO

--Create FullText Index
Create FullText Index on dbo.bookDocument
(name Language 1033, File_stream type column file_type Language 1033)
key Index PK__BookDocu__5A5B77D5A5D5A685
on demoFullTextSearch
with Change_Tracking Auto, StopList=system
Go

Execute the following sql:

SELECT *
FROM dbo.bookdocument
WHERE CONTAINS (file_stream, 'c#');

This returns the record of collegeText.txt.