Parent navigations in .value results in absymal performance.
Say that you have an XML document which contains Orders and Order Details and you want to shred it into relational format. A natural way to get the details would be something like:
SELECT OrderID = T.Item.value('../@OrderID', 'int'),
ProductID = T.Item.value('@ProductID', 'smallint'),
Price = T.Item.value('@Price', 'decimal(10,2)'),
Qty = T.Item.value('@Qty', 'int')
FROM @x.nodes('/Orders/Order/OrderDetails') AS T(Item)
But performance of this can be very poor. In the repro there are 830 orders and 2155 details, which does not seem very extreme. But on my machine it takes 45 seconds to shred it. If I remove the reference to ../@OrderID, the execution time is subsecond. There is no logical reason why it would be so. There is an unambious reference to the parent, and there is no reason to search all over the XML document (or whatever you are doing.)
Upvotes: 9<=-=Aug 11 2010 6:17PM=-=>
Thanks for reporting this issue. We are aware of this performance issue and currently recommend that such queries are being rewritten using multiple nodes() methods and cross apply:SELECT OrderID = O.Item.value(‘@OrderID’, ‘int’), ProductID = T.Item.value(‘@ProductID’, ‘smallint’), Price = T.Item.value(‘@Price’, ‘decimal(10,2)’), Qty = T.Item.value(‘@Qty’, ‘int’) FROM @x.nodes(‘/Orders/Order’) AS O(Item) cross apply T.Item.nodes(‘OrderDetails’) AS T(Item)
Given that the workaround has been widely communicated, we are having this item on a low priority in our things to fix.
Please let me know if that causes any problems.
Permit me to say that I find this answer as quite arrogant. The workaround may be “widely communicated”, but I had not seen it before. Nor I can find any text in Books Online that recommends against the use of the parent axis. Thus, as widely commented as the workaround maybe, there will still be people who will use parent-axis, and therefore answering that is is low priority is not a very satisfactory from a customer perspective.
As it happens, there are two small changes needed to make the repro to perform well. The first concerns how the XML is constructed. Take out TYPE to produce an nvarchar value (this because of an issue I have reported in
The second change concerns the query itself. Change the retrieval of the attribute to:
This gives a different query plan, that does not include any table-spool operator.
OrderID and ../OrderID1 means the same thing, there is no reason that there should be different query plans, and I have reported this spearately,
As I have suggested in yet a Connect item, there is all reason to review how query plans for XML are generated. Given that it is very difficult to get accurate estimates, there is all reason to consider more rule-based optimization and avoid plans that will perform badly when the actual number of rows grossly exceeds the estimates. For instance, plans that depends on spool operators should probably be avoided.