This behavior happens by design and predates UTF-8 implementation. Any comparison between non-UTF8 and UTF8 data always get converted to UTF16. A design change would have to be evaluated for a future version.
Thank you for taking the time to work with this preview feature. As it stands, comparison between a UTF8 and non-Unicode data is done by converting both to UTF-16. We will investigate further as we continue to develop this preview feature..
Thank you for providing the examples of handling invalid characters. While we are trying to detect invalid sequences as we encounter them, at the same time we need to be cautious about performance. As such the current behavior occurs by design, under the general principle that if a partially invalid character sequence is input but certain characters were valid, the valid part of the character sequence may be output without error. If a fully invalid sequence is input, an error is then generated. This doesn’t mean that the logic of handing damaged sequences cannot be improved, so we encourage you to share your suggestions on how to make it better.
Thank you for taking the time to work with this preview feature. We will investigate.
@sqlkiter also available in the Tiger Toolbox is https://github.com/Microsoft/tigertoolbox/tree/master/MaintenanceSolution