Do you have a comment or suggestion to improve SQL Server? We’d love to hear it!

Fully support position() in xquery

Right now position() can only be used in predict in xquery. However, in lots of scenarios, the position of the xml data is significant.

DECLARE @x xml;
SET @x = N'<SelectAnswers xmlns="http://MyNamespace">
<AnswerType Type="selectOne" IsRequired="true">
<SelectAnswer>Once a month</SelectAnswer>
<SelectAnswer>Two or three times a month</SelectAnswer>
<SelectAnswer>Usually every week</SelectAnswer>
<SelectAnswer>More than once a week</SelectAnswer>
</AnswerType>
<AnswerType Type="MultipleChoice" IsRequired="true">
<SelectAnswer>This is my first time</SelectAnswer>
<SelectAnswer>Hardly ever/special occasions only</SelectAnswer>
<SelectAnswer>Less than once a month</SelectAnswer>
</AnswerType>
</SelectAnswers>'

-- Suppose you need to get the postion of AnswerType in the xml as AnswerTypeID and the position of SelectAnswer in each AnswerType as AnswerID, you can not use position() to achieve it:

;WITH XMLNAMESPACES (DEFAULT 'http://MyNamespace')
SELECT
T.answer.value('position()','int') AS AnswerTypeID,
T.answer.value('@Type', 'nvarchar(50)') as AnswerTypeName,
T.answer.value('@IsRequired', 'nvarchar(50)') as IsRequired,
TS.A.value('position()','int') AS AnswerID,
TS.A.value('.', 'nvarchar(50)') as SelectAnswer
FROM @x.nodes('SelectAnswers/AnswerType') AS T(answer)
CROSS APPLY T.answer.nodes('SelectAnswer') TS(A)
ORDER BY AnswerTypeID,AnswerID;

--It gives the following error
/*
Msg 2371, Level 16, State 1, Line 58
XQuery [value()]: 'position()' can only be used within a predicate or XPath selector
*/

-- There are two workarounds now. One is to use row_number() function:

;WITH XMLNAMESPACES (DEFAULT 'http://MyNamespace')
SELECT
TAnswerTypes.AnswerTypeID,
TAnswerTypes.AnswerTypeName,
TAnswerTypes.IsRequired,
TS.A.value('.', 'nvarchar(50)') as SelectAnswer,
ROW_NUMBER() OVER(PARTITION BY TAnswerTypes.AnswerTypeID ORDER BY TS.A.value('.', 'nvarchar(50)')) AS AnswerID
FROM (
SELECT ROW_NUMBER() OVER(ORDER BY T.answer.value('@Type', 'nvarchar(50)')) AS AnswerTypeID,
T.answer.value('@Type', 'nvarchar(50)') as AnswerTypeName,
T.answer.value('@IsRequired', 'nvarchar(50)') as IsRequired,
T.answer.query('.') AS xmlAnswer
FROM @x.nodes('SelectAnswers/AnswerType') AS T(answer) ) TAnswerTypes
CROSS APPLY TAnswerTypes.xmlAnswer.nodes('AnswerType/SelectAnswer') TS(A)
ORDER BY AnswerTypeID,AnswerID;

-- However, the AnswerTypeID and AnswerID may be different than the postion of the data in the xml because we need to specify the order by column for row_number function

-- One more work around is to use a number table like master..spt_values (http://www.sqlserverandxml.com/2008/08/tsql-lab-23-retrieving-values-and.html):

;WITH XMLNAMESPACES (DEFAULT 'http://MyNamespace')
SELECT
TAnswerTypes.AnswerTypeID,
TAnswerTypes.AnswerTypeName,
TAnswerTypes.IsRequired,
TS.A.value('.', 'nvarchar(50)') as SelectAnswer,
--ROW_NUMBER() OVER(PARTITION BY TAnswerTypes.AnswerTypeID ORDER BY TS.A.value('.', 'nvarchar(50)'))
p1.number AS AnswerID
FROM master..spt_values p1 CROSS APPLY (
SELECT
--ROW_NUMBER() OVER(ORDER BY T.answer.value('@Type', 'nvarchar(50)'))
p.number AS AnswerTypeID,
T.answer.value('@Type', 'nvarchar(50)') as AnswerTypeName,
T.answer.value('@IsRequired', 'nvarchar(50)') as IsRequired,
T.answer.query('.') AS xmlAnswer
FROM master..spt_values p CROSS APPLY @x.nodes('SelectAnswers/AnswerType[position()=sql:column("p.number")]') AS T(answer)
WHERE p.type=N'P') TAnswerTypes
CROSS APPLY TAnswerTypes.xmlAnswer.nodes('AnswerType/SelectAnswer[position()=sql:column("p1.number")]') TS(A)
WHERE p1.type=N'P'
ORDER BY AnswerTypeID,AnswerID

-- The problem is the workaround will be slow for large xml and not convenient for queries of complex xml documents.

If position() can be used in .value method in the first query, it will be a neat,simple, and quick solution.

Note: Sample data/queires are based on a post in MS forum: http://forums.microsoft.com/msdn/ShowPost.aspx?PostID=4152191&SiteID=1

1 vote
Sign in
Check!
(thinking…)
Reset
or sign in with
  • facebook
  • google
    Password icon
    Signed in as (Sign out)

    We’ll send you updates on this idea

    AdminMicrosoft SQL Server (Admin, Microsoft Azure) shared this idea  ·   ·  Flag idea as inappropriate…  ·  Admin →
    unplanned  ·  AdminMicrosoft SQL Server (Admin, Microsoft Azure) responded  · 

    Upvotes: 39

    <=-=Dec 3 2008 4:28PM=-=>

    Thanks for the feedback. The following shows how we would have to approach this.

    The nodes() method returns N rows one for each node selected by the expression inside the nodes() method. Each of that node is being set as the context item for the expressions inside the value/query/exist/nodes methods. The definition of the position() function is

    http://www.w3.org/TR/2007/REC-xpath-functions-20070123/#func-position

    16.1 fn:position
    fn:position() as xs:integer
    Summary: Returns the context position from the dynamic context.

    where context position is defined as:

    [Definition: The context position is the position of the context item within the sequence of items currently being processed.] It changes whenever the context item changes. When the focus is defined, the value of the context position is an integer greater than zero. The context position is returned by the expression fn:position(). When an expression E1/E2 or E1[E2] is evaluated, the context position in the inner focus for an evaluation of E2 is the position of the context item in the sequence obtained by evaluating E1. The position of the first item in a sequence is always 1 (one). The context position is always less than or equal to the context size.

    So we would need to define the context position for each of this node and consider each nodes() method to create an inner focus for the generated rows. If we would not do that, we would end up always returning 1 for every node.value(‘position()’, ‘int’).

    We will take a closer look at this in conjunction with some other improvements (like integrating hierarchyID with the nodes() method), based on the urgency of your feedback and resource availability.

    Best regards
    Michael

    <=-=Dec 3 2008 4:42PM=-=>

    Just as a quick add-on…

    currently are explaining

    select x.value(‘c’, ‘int’)
    from @x.nodes(‘/a/b’) N(x)

    as being equivalent in a sense to /a/b/c

    however, /a/b/position() based on the definition above would always return 1 for every row. So we would need to change that definition and before we do that, we need to understand the general impact.

    Thanks
    Michael

    <=-=Feb 5 2009 10:11PM=-=>

    Just wanted to correct the url mentioned above. The resource http://www.sqlserverandxml.com/2008/08/tsql-lab-23-retrieving-values-and.htm is moved to http://blog.sqlserver.me/2008/08/tsql-lab-23-retrieving-values-and.html

    <=-=Feb 14 2014 2:30AM=-=>

    Workaround: Use row_number() over (partition by 0 order by nullif(0*b.value(‘count(.)’,‘int’),0))

    <=-=Feb 14 2014 2:33AM=-=>

    Also works (but not deterministic): row_number() over (partition by 0 order by nullif(0*rand(),0))

    <=-=Dec 28 2015 9:55PM=-=>

    If position() is already allowed to be used in the context of a context-dependent predicate, i.e. [position()<=2], that means the position() is already known during execution and it just need to be output, isn’t it?

    0 comments

    Sign in
    Check!
    (thinking…)
    Reset
    or sign in with
    • facebook
    • google
      Password icon
      Signed in as (Sign out)
      Submitting...

      Feedback and Knowledge Base