Introduction: How to import an xml file to SQL Database and
how to parse the data ? In my last post I
demonstrated how to convert XML file to CSV file. Once we
have a CSV file we can import data either by Sql Server management studio import
task or by SSIS. Here I will demonstrate
a different approach in following two steps.
(1)Import XML data to SQL Database using OPENROWSET function
(2)Parse data using OPENXML function.
Steps:
(A) Get a XML File and save it. Here is a dummy xml file (blSurvey.xml) that I
am going to import to SQL Database.
<?xml version="1.0" encoding="iso-8859-1" ?>
<HEADER>
<SURVEYDATA>
<SURVEY_ID>1234567</SURVEY_ID>
<CLIENT_ID>6666</CLIENT_ID>
<SERVICE>MD</SERVICE>
<RECDATE>2013-02-18</RECDATE>
<DISDATE>2012-02-13</DISDATE>
<ANALYSIS>
<RESPONSE>
<VARNAME>A1</VARNAME>
<VALUE>4</VALUE>
</RESPONSE>
<RESPONSE>
<VARNAME>A14</VARNAME>
<VALUE>5</VALUE>
</RESPONSE>
<RESPONSE>
<VARNAME>CP1</VARNAME>
<VALUE>5</VALUE>
</RESPONSE>
</ANALYSIS>
<DEMOGRAPHICS>
<RESPONSE>
<VARNAME>AGE</VARNAME>
<VALUE>56</VALUE>
</RESPONSE>
<RESPONSE>
<VARNAME>BARCODE</VARNAME>
<VALUE>1</VALUE>
</RESPONSE>
<RESPONSE>
<VARNAME>COMPLETE</VARNAME>
<VALUE>YES</VALUE>
</RESPONSE>
</DEMOGRAPHICS>
<HCAHPS>
<RESPONSE>
<VARNAME>CG_1</VARNAME>
<VALUE>Yes</VALUE>
</RESPONSE>
<RESPONSE>
<VARNAME>CG_11</VARNAME>
<VALUE>No</VALUE>
</RESPONSE>
</HCAHPS>
</SURVEYDATA>
<SURVEYDATA>
<SURVEY_ID>678910</SURVEY_ID>
<CLIENT_ID>5555</CLIENT_ID>
<SERVICE>MD</SERVICE>
<RECDATE>2013-01-18</RECDATE>
<DISDATE>2012-01-12</DISDATE>
<ANALYSIS>
<RESPONSE>
<VARNAME>A1</VARNAME>
<VALUE>5</VALUE>
</RESPONSE>
<RESPONSE>
<VARNAME>A14</VARNAME>
<VALUE>5</VALUE>
</RESPONSE>
<RESPONSE>
<VARNAME>CP1</VARNAME>
<VALUE>5</VALUE>
</RESPONSE>
</ANALYSIS>
<DEMOGRAPHICS>
<RESPONSE>
<VARNAME>AGE</VARNAME>
<VALUE>44</VALUE>
</RESPONSE>
<RESPONSE>
<VARNAME>BARCODE</VARNAME>
<VALUE>1</VALUE>
</RESPONSE>
<RESPONSE>
<VARNAME>DISTRIB</VARNAME>
<VALUE>Paper</VALUE>
</RESPONSE>
</DEMOGRAPHICS>
<HCAHPS>
<RESPONSE>
<VARNAME>CG_1</VARNAME>
<VALUE>Yes</VALUE>
</RESPONSE>
<RESPONSE>
<VARNAME>CG_11</VARNAME>
<VALUE>No</VALUE>
</RESPONSE>
<RESPONSE>
<VARNAME>CG_13</VARNAME>
<VALUE>Always</VALUE>
</RESPONSE>
</HCAHPS>
</SURVEYDATA>
</HEADER>
(B) Create a table and use OPENROWSET
method to load the data. Here is the script.
I saved the above file as
blSurvey.xml in my c drive. Change your file name and location.
I created a table called
tmp_surveydata and imported blSurvey.xml as XML data type.
--create table--
CREATE TABLE tmp_surveydata
(
Id INT
IDENTITY PRIMARY
KEY,
XMLData XML, --load data as XML data type
LoadedDateTime DATETIME
)
--Load data
INSERT INTO tmp_surveyData(XMLData, LoadedDateTime)
SELECT CONVERT(XML, BulkColumn) AS BulkColumn, GETDATE()
FROM OPENROWSET(BULK 'C:\blSurvey.xml', SINGLE_BLOB) AS x;
--check data
SELECT * FROM tmp_surveydata
Querying the table will show the output as hyperlink. Data is loaded as XMLdata type.
When I click the hyperlink, the data is displayed as following:
<SURVEYDATA>
<SURVEY_ID>1234567</SURVEY_ID>
<CLIENT_ID>6666</CLIENT_ID>
<SERVICE>MD</SERVICE>
<RECDATE>2013-02-18</RECDATE>
<DISDATE>2012-02-13</DISDATE>
<ANALYSIS>
<RESPONSE>
<VARNAME>A1</VARNAME>
<VALUE>4</VALUE>
</RESPONSE>
<RESPONSE>
<VARNAME>A14</VARNAME>
<VALUE>5</VALUE>
</RESPONSE>
<RESPONSE>
<VARNAME>CP1</VARNAME>
<VALUE>5</VALUE>
</RESPONSE>
</ANALYSIS>
<DEMOGRAPHICS>
<RESPONSE>
<VARNAME>AGE</VARNAME>
<VALUE>56</VALUE>
</RESPONSE>
<RESPONSE>
<VARNAME>BARCODE</VARNAME>
<VALUE>1</VALUE>
</RESPONSE>
<RESPONSE>
<VARNAME>COMPLETE</VARNAME>
<VALUE>YES</VALUE>
</RESPONSE>
</DEMOGRAPHICS>
<HCAHPS>
<RESPONSE>
<VARNAME>CG_1</VARNAME>
<VALUE>Yes</VALUE>
</RESPONSE>
<RESPONSE>
<VARNAME>CG_11</VARNAME>
<VALUE>No</VALUE>
</RESPONSE>
</HCAHPS>
</SURVEYDATA>
<SURVEYDATA>
<SURVEY_ID>678910</SURVEY_ID>
<CLIENT_ID>5555</CLIENT_ID>
<SERVICE>MD</SERVICE>
<RECDATE>2013-01-18</RECDATE>
<DISDATE>2012-01-12</DISDATE>
<ANALYSIS>
<RESPONSE>
<VARNAME>A1</VARNAME>
<VALUE>5</VALUE>
</RESPONSE>
<RESPONSE>
<VARNAME>A14</VARNAME>
<VALUE>5</VALUE>
</RESPONSE>
<RESPONSE>
<VARNAME>CP1</VARNAME>
<VALUE>5</VALUE>
</RESPONSE>
</ANALYSIS>
<DEMOGRAPHICS>
<RESPONSE>
<VARNAME>AGE</VARNAME>
<VALUE>44</VALUE>
</RESPONSE>
<RESPONSE>
<VARNAME>BARCODE</VARNAME>
<VALUE>1</VALUE>
</RESPONSE>
<RESPONSE>
<VARNAME>DISTRIB</VARNAME>
<VALUE>Paper</VALUE>
</RESPONSE>
</DEMOGRAPHICS>
<HCAHPS>
<RESPONSE>
<VARNAME>CG_1</VARNAME>
<VALUE>Yes</VALUE>
</RESPONSE>
<RESPONSE>
<VARNAME>CG_11</VARNAME>
<VALUE>No</VALUE>
</RESPONSE>
<RESPONSE>
<VARNAME>CG_13</VARNAME>
<VALUE>Always</VALUE>
</RESPONSE>
</HCAHPS>
</SURVEYDATA>
</HEADER>
Now, data has been loaded into SQL table as XML Data
type. It’s time to process the data.
(2) Parse XML data using OPENXML and sp_xml_preparedocument:
The following script is used to parse the XML document.
DECLARE @XML AS XML, @Doc AS INT, @SQL NVARCHAR (MAX)
SELECT @XML = XMLData FROM
tmp_Surveydata
EXEC sp_xml_preparedocument @Doc
OUTPUT, @XML
select SURVEY_ID,CLIENT_ID,SERVICE,RECDATE,DISDATE,ANALYSIS_VARNAME,ANALYSIS_VALUE,DEMOGRAPHICS_VARNAME,DEMOGRAPHICS_VALUE,HCAHPS_VARNAME,HCAHPS_VALUE
from OPENXML(@Doc,'HEADER/SURVEYDATA/ANALYSIS/RESPONSE')
WITH (
SURVEY_ID [varchar] (50) '../../SURVEY_ID',
CLIENT_ID [varchar] (50) '../../CLIENT_ID',
SERVICE [varchar](50) '../../SERVICE',
RECDATE [varchar](50) '../../RECDATE',
DISDATE [varchar](50) '../../DISDATE',
ANALYSIS_VARNAME [varchar](50) 'VARNAME',
ANALYSIS_VALUE [varchar](50) 'VALUE',
DEMOGRAPHICS_VARNAME
[varchar](50) '../../DEMOGRAPHICS/RESPONSE/VARNAME',
DEMOGRAPHICS_VALUE [varchar](50) '../../DEMOGRAPHICS/RESPONSE/VALUE',
HCAHPS_VARNAME [varchar](50) '../../HCAHPS/RESPONSE/VARNAME',
HCAHPS_VALUE [varchar](50) '../../HCAHPS/RESPONSE/VALUE')
exec sp_xml_removedocument @Doc
The stored procedure sp_xml_predpareddocument prepare the xml
document and retuns an integer value. It is stored in internal cache.The
OPENXML function takes parameters – (a) Internal representation of an XML
document created by calling sp_xml_preparedocument (b) Xpath pattern to
identify the nodes in XML document.
To retrieve all survey data,I started from RESPONSE node
under Analysis( HEADER/SURVEYDATA/ANALYSIS/RESPONSE) and getting VARNAME and
VALUE directly from this node. To get
SURVEY_ID or CLIENT_ID, I need to go two levels up to reach SURVEYDATA
node and I am using ../../ to reach SURVEYDATA node.
To get Demographics information,I am going to SurveyData node
from Analysis/Response node by using ../../and then navigating to
DEMOGRAPHICS/RESPONSE node.
The output:
Finally, I called sp_xml_removedocument stored procedure
to free the internal cache.
If we like to insert this data into a SQL table in this
format we can simply change the query as follows:
SELECT @XML = XMLData FROM
tmp_Surveydata
EXEC sp_xml_preparedocument @Doc
OUTPUT, @XML
select SURVEY_ID,CLIENT_ID,SERVICE,RECDATE,DISDATE,ANALYSIS_VARNAME,ANALYSIS_VALUE,DEMOGRAPHICS_VARNAME,DEMOGRAPHICS_VALUE,HCAHPS_VARNAME,HCAHPS_VALUE into tableName
from OPENXML(@Doc,'HEADER/SURVEYDATA/ANALYSIS/RESPONSE')
WITH (
SURVEY_ID [varchar] (50) '../../SURVEY_ID',
CLIENT_ID [varchar] (50) '../../CLIENT_ID',
SERVICE [varchar](50) '../../SERVICE',
RECDATE [varchar](50) '../../RECDATE',
DISDATE [varchar](50) '../../DISDATE',
ANALYSIS_VARNAME [varchar](50) 'VARNAME',
ANALYSIS_VALUE [varchar](50) 'VALUE',
DEMOGRAPHICS_VARNAME
[varchar](50) '../../DEMOGRAPHICS/RESPONSE/VARNAME',
DEMOGRAPHICS_VALUE [varchar](50) '../../DEMOGRAPHICS/RESPONSE/VALUE',
HCAHPS_VARNAME [varchar](50) '../../HCAHPS/RESPONSE/VARNAME',
HCAHPS_VALUE [varchar](50) '../../HCAHPS/RESPONSE/VALUE')
No comments:
Post a Comment