Performance of shredding the XML data type using nodes() / value().
Performance of shredding XML to rows using nodes() and value() get linearly worse with the number of columns returned.
The reason is that each column is handled by a separate branch of its own in the query plan.
A nested loop join against a table valued function that returns the value.
Comparing performance of shredding using nodes() with openjson I have seen that when there are three or more columns shredded,
openjson is the clear winner.
My suggestion is that you build a way of shredding XML similar to how it is done for shredding JSON.
An openxml that can be used in queries like openjson.
That way the plan can, like a openjson plan, have one table valued function that returns all the rows and columns from the shredding instead of as it is now, one function call for the nodes() method and one for each column specified per row.
More complicated shredding would still have to use the XML data type functions nodes(), value() etc but the simple cases could be handled by the new openxml with much better performance.
One way to improve performance of shredding large XML documents using nodes() is to pre-shredd the XML into smaller chunks, to a temp table and then do the final shredding from the table.
The new openxml could be used to do the same within one query. Pre-shred using openxml to return multiple rows of smaller XML fragments and then you could do further processing using the existing XML functions on the fragments.
Upvotes: 8<=-=Mar 4 2017 9:33AM=-=>
We understand the problem. XML is designed differently and the goal was to provide rich query language with XPath support, which might require a lot of memory and processing logic. OPENJSON is designed to be more lightweight and just scan JSON text and return values where it finds them. There are pros and cons for both approaches (similar to pros and cons for DOM and SAX parsers).
The key differentiators between JSON and XML is the fact that JSON is better for scan based processing of JSON columns where you just pick few values from JSON text, and XML is better for rich querying and indexing.
I will keep this item open and let people vote for this; however, we cannot guarantee that this kind of re-design of XML will be done in near future.
If you need to use shredded nodes in some filters, my advice would be to use XML indexes that might speedup your query.
SAinCA Mi9 commented
Our need is for .nodes to be far faster when processing large XML payloads, i.e., thousands of 40-element rows, where the nested loop joins fill the screen! Importing millions of rows, albeit in batches, could be faster, please. Sadly, this isn't a one-time import deal, it's regular. Thanks!