Wednesday, September 11

Import and Process XML Data into SQL Server Database



Introduction:  How to import an xml file to SQL Database and how to parse the data ? In my last post I
demonstrated how to convert XML file to CSV file. Once we have a CSV file we can import data either by Sql Server management studio import task or by SSIS.  Here I will demonstrate a different approach in following two steps.

(1)Import XML data to SQL Database using OPENROWSET function
(2)Parse data using OPENXML function.

 Import XML Data to SQL Server using OPENROWSET:

Steps:

(A) Get a XML File and save it.  Here is a dummy xml file (blSurvey.xml) that I am going to import to SQL Database.

<?xml version="1.0" encoding="iso-8859-1" ?>
<HEADER>
<SURVEYDATA>
       <SURVEY_ID>1234567</SURVEY_ID>
       <CLIENT_ID>6666</CLIENT_ID>
       <SERVICE>MD</SERVICE>
       <RECDATE>2013-02-18</RECDATE>
       <DISDATE>2012-02-13</DISDATE>
       <ANALYSIS>
              <RESPONSE>
                     <VARNAME>A1</VARNAME>
                     <VALUE>4</VALUE>
              </RESPONSE>
              <RESPONSE>
                     <VARNAME>A14</VARNAME>
                     <VALUE>5</VALUE>
              </RESPONSE>
              <RESPONSE>
                     <VARNAME>CP1</VARNAME>
                     <VALUE>5</VALUE>
              </RESPONSE>
       </ANALYSIS>
       <DEMOGRAPHICS>
              <RESPONSE>
                     <VARNAME>AGE</VARNAME>
                     <VALUE>56</VALUE>
              </RESPONSE>
              <RESPONSE>
                     <VARNAME>BARCODE</VARNAME>
                     <VALUE>1</VALUE>
              </RESPONSE>
              <RESPONSE>
                     <VARNAME>COMPLETE</VARNAME>
                     <VALUE>YES</VALUE>
              </RESPONSE>
       </DEMOGRAPHICS>
       <HCAHPS>
              <RESPONSE>
                     <VARNAME>CG_1</VARNAME>
                     <VALUE>Yes</VALUE>
              </RESPONSE>
              <RESPONSE>
                     <VARNAME>CG_11</VARNAME>
                     <VALUE>No</VALUE>
              </RESPONSE>
       </HCAHPS>
</SURVEYDATA>
<SURVEYDATA>
       <SURVEY_ID>678910</SURVEY_ID>
       <CLIENT_ID>5555</CLIENT_ID>
       <SERVICE>MD</SERVICE>
       <RECDATE>2013-01-18</RECDATE>
       <DISDATE>2012-01-12</DISDATE>
       <ANALYSIS>
              <RESPONSE>
                     <VARNAME>A1</VARNAME>
                     <VALUE>5</VALUE>
              </RESPONSE>
              <RESPONSE>
                     <VARNAME>A14</VARNAME>
                     <VALUE>5</VALUE>
              </RESPONSE>
              <RESPONSE>
                     <VARNAME>CP1</VARNAME>
                     <VALUE>5</VALUE>
              </RESPONSE>
       </ANALYSIS>
       <DEMOGRAPHICS>
              <RESPONSE>
                     <VARNAME>AGE</VARNAME>
                     <VALUE>44</VALUE>
              </RESPONSE>
              <RESPONSE>
                     <VARNAME>BARCODE</VARNAME>
                     <VALUE>1</VALUE>
              </RESPONSE>
              <RESPONSE>
                     <VARNAME>DISTRIB</VARNAME>
                     <VALUE>Paper</VALUE>
              </RESPONSE>
       </DEMOGRAPHICS>
       <HCAHPS>
              <RESPONSE>
                     <VARNAME>CG_1</VARNAME>
                     <VALUE>Yes</VALUE>
              </RESPONSE>
              <RESPONSE>
                     <VARNAME>CG_11</VARNAME>
                     <VALUE>No</VALUE>
              </RESPONSE>
              <RESPONSE>
                     <VARNAME>CG_13</VARNAME>
                     <VALUE>Always</VALUE>
              </RESPONSE>
       </HCAHPS>
</SURVEYDATA>
</HEADER>

 (B) Create a table and use OPENROWSET method to load the data. Here is the script.
I saved the above file as blSurvey.xml in my c drive. Change your file name and location.
I created a table called tmp_surveydata and imported blSurvey.xml as XML data type.

--create table--
CREATE TABLE tmp_surveydata
(
Id INT IDENTITY PRIMARY KEY,
XMLData XML, --load data as XML data type
LoadedDateTime DATETIME
)
--Load data
INSERT INTO tmp_surveyData(XMLData, LoadedDateTime)
SELECT CONVERT(XML, BulkColumn) AS BulkColumn, GETDATE()
FROM OPENROWSET(BULK 'C:\blSurvey.xml', SINGLE_BLOB) AS x;

--check data
SELECT * FROM tmp_surveydata

Querying the table will show the output as  hyperlink. Data is loaded as XMLdata type.





When I click the hyperlink, the data is displayed as following:

 HEADER>
  <SURVEYDATA>
    <SURVEY_ID>1234567</SURVEY_ID>
    <CLIENT_ID>6666</CLIENT_ID>
    <SERVICE>MD</SERVICE>
    <RECDATE>2013-02-18</RECDATE>
    <DISDATE>2012-02-13</DISDATE>
    <ANALYSIS>
      <RESPONSE>
        <VARNAME>A1</VARNAME>
        <VALUE>4</VALUE>
      </RESPONSE>
      <RESPONSE>
        <VARNAME>A14</VARNAME>
        <VALUE>5</VALUE>
      </RESPONSE>
      <RESPONSE>
        <VARNAME>CP1</VARNAME>
        <VALUE>5</VALUE>
      </RESPONSE>
    </ANALYSIS>
    <DEMOGRAPHICS>
      <RESPONSE>
        <VARNAME>AGE</VARNAME>
        <VALUE>56</VALUE>
      </RESPONSE>
      <RESPONSE>
        <VARNAME>BARCODE</VARNAME>
        <VALUE>1</VALUE>
      </RESPONSE>
      <RESPONSE>
        <VARNAME>COMPLETE</VARNAME>
        <VALUE>YES</VALUE>
      </RESPONSE>
    </DEMOGRAPHICS>
    <HCAHPS>
      <RESPONSE>
        <VARNAME>CG_1</VARNAME>
        <VALUE>Yes</VALUE>
      </RESPONSE>
      <RESPONSE>
        <VARNAME>CG_11</VARNAME>
        <VALUE>No</VALUE>
      </RESPONSE>
    </HCAHPS>
  </SURVEYDATA>
  <SURVEYDATA>
    <SURVEY_ID>678910</SURVEY_ID>
    <CLIENT_ID>5555</CLIENT_ID>
    <SERVICE>MD</SERVICE>
    <RECDATE>2013-01-18</RECDATE>
    <DISDATE>2012-01-12</DISDATE>
    <ANALYSIS>
      <RESPONSE>
        <VARNAME>A1</VARNAME>
        <VALUE>5</VALUE>
      </RESPONSE>
      <RESPONSE>
        <VARNAME>A14</VARNAME>
        <VALUE>5</VALUE>
      </RESPONSE>
      <RESPONSE>
        <VARNAME>CP1</VARNAME>
        <VALUE>5</VALUE>
      </RESPONSE>
    </ANALYSIS>
    <DEMOGRAPHICS>
      <RESPONSE>
        <VARNAME>AGE</VARNAME>
        <VALUE>44</VALUE>
      </RESPONSE>
      <RESPONSE>
        <VARNAME>BARCODE</VARNAME>
        <VALUE>1</VALUE>
      </RESPONSE>
      <RESPONSE>
        <VARNAME>DISTRIB</VARNAME>
        <VALUE>Paper</VALUE>
      </RESPONSE>
    </DEMOGRAPHICS>
    <HCAHPS>
      <RESPONSE>
        <VARNAME>CG_1</VARNAME>
        <VALUE>Yes</VALUE>
      </RESPONSE>
      <RESPONSE>
        <VARNAME>CG_11</VARNAME>
        <VALUE>No</VALUE>
      </RESPONSE>
      <RESPONSE>
        <VARNAME>CG_13</VARNAME>
        <VALUE>Always</VALUE>
      </RESPONSE>
    </HCAHPS>
  </SURVEYDATA>
</HEADER>

Now, data has been loaded into SQL table as XML Data type. It’s time to process the data.
(2) Parse XML data using OPENXML and sp_xml_preparedocument: The following script is used to parse the XML document.

DECLARE @XML AS XML, @Doc AS INT, @SQL NVARCHAR (MAX)

SELECT @XML = XMLData FROM tmp_Surveydata

EXEC sp_xml_preparedocument @Doc OUTPUT, @XML

select SURVEY_ID,CLIENT_ID,SERVICE,RECDATE,DISDATE,ANALYSIS_VARNAME,ANALYSIS_VALUE,DEMOGRAPHICS_VARNAME,DEMOGRAPHICS_VALUE,HCAHPS_VARNAME,HCAHPS_VALUE
from OPENXML(@Doc,'HEADER/SURVEYDATA/ANALYSIS/RESPONSE')
WITH (
SURVEY_ID [varchar] (50) '../../SURVEY_ID',
CLIENT_ID [varchar] (50) '../../CLIENT_ID',
SERVICE [varchar](50) '../../SERVICE',
RECDATE [varchar](50) '../../RECDATE',
DISDATE [varchar](50) '../../DISDATE',
ANALYSIS_VARNAME [varchar](50) 'VARNAME',
ANALYSIS_VALUE [varchar](50) 'VALUE',
DEMOGRAPHICS_VARNAME [varchar](50) '../../DEMOGRAPHICS/RESPONSE/VARNAME',
DEMOGRAPHICS_VALUE [varchar](50) '../../DEMOGRAPHICS/RESPONSE/VALUE',
HCAHPS_VARNAME [varchar](50) '../../HCAHPS/RESPONSE/VARNAME',
HCAHPS_VALUE [varchar](50) '../../HCAHPS/RESPONSE/VALUE')

exec sp_xml_removedocument @Doc



The stored procedure sp_xml_predpareddocument prepare the xml document and retuns an integer value. It is stored in internal cache.The OPENXML function takes parameters – (a) Internal representation of an XML document created by calling sp_xml_preparedocument (b) Xpath pattern to identify the nodes in XML document.
To retrieve all survey data,I started from RESPONSE node under Analysis( HEADER/SURVEYDATA/ANALYSIS/RESPONSE) and getting VARNAME and VALUE directly from this node. To get  SURVEY_ID or CLIENT_ID, I need to go two levels up to reach SURVEYDATA node and I am using ../../ to reach SURVEYDATA node.
../ is used to reach parent node and ../../ is used to reach grand parent node.
To get Demographics information,I am going to SurveyData node from Analysis/Response node by using ../../and then navigating to DEMOGRAPHICS/RESPONSE node.


The output:




Finally, I called sp_xml_removedocument stored procedure to free the internal cache.

If we like to insert this data into a SQL table in this format we can simply change the query as follows:

SELECT @XML = XMLData FROM tmp_Surveydata

EXEC sp_xml_preparedocument @Doc OUTPUT, @XML

select SURVEY_ID,CLIENT_ID,SERVICE,RECDATE,DISDATE,ANALYSIS_VARNAME,ANALYSIS_VALUE,DEMOGRAPHICS_VARNAME,DEMOGRAPHICS_VALUE,HCAHPS_VARNAME,HCAHPS_VALUE into tableName
from OPENXML(@Doc,'HEADER/SURVEYDATA/ANALYSIS/RESPONSE')
WITH (
SURVEY_ID [varchar] (50) '../../SURVEY_ID',
CLIENT_ID [varchar] (50) '../../CLIENT_ID',
SERVICE [varchar](50) '../../SERVICE',
RECDATE [varchar](50) '../../RECDATE',
DISDATE [varchar](50) '../../DISDATE',
ANALYSIS_VARNAME [varchar](50) 'VARNAME',
ANALYSIS_VALUE [varchar](50) 'VALUE',
DEMOGRAPHICS_VARNAME [varchar](50) '../../DEMOGRAPHICS/RESPONSE/VARNAME',
DEMOGRAPHICS_VALUE [varchar](50) '../../DEMOGRAPHICS/RESPONSE/VALUE',
HCAHPS_VARNAME [varchar](50) '../../HCAHPS/RESPONSE/VARNAME',
HCAHPS_VALUE [varchar](50) '../../HCAHPS/RESPONSE/VALUE')



No comments:

Post a Comment