Fix ReFS or allow pre-MBS storage option in DPM 2016
Fix ReFS or allow pre-MBS storage option in DPM 2016
MBS is clearly broken, for a long time and since a fix is not eminent, please allow us to return to DPM 2012 R2 style of storage.
Even on full 100% SSD (we have a 30TB enterprise SSD backend) performance degrades up to 50% after about 6 months. It's still relatively fast, but there is clearly a lot more going on than simple fragmentation. Fragmentation shouldn't matter on SSDs because of 0 seek time, yet it does. There is something broken in ReFS in how it chains files together for the block cloning feature, which is what DPM is relying on.
Alexander Klimenok commented
How to disable ReFS on SCDPM 2016/2019.
You must using clean installed SCPM 2016/2019 without PG on ReFS or need remove all PG and MBS storage.
Run regedit and create 2 registry value:
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft Data Protection Manager\Configuration\DiskStorage
and these value
Also need replace DLL on DPM bin directory RHL.dll for 2019CU2/2016CU10. Get dll from MS Support.
Flash storage doesn't prevent degradation; the fragmentation still occurs due to IMO poor software design. The 4% SSD storage is a band-aid; it just provides enough IOPS for the fragmented data to mask the problem.
Reading between the lines here and on the Veeam forums, there are two issues - firstly ReFS is poorly implemented in the OS. It's especially bad in 2016, and somewhat better in 2019, but still requires a large number of registry tweaks and a couple hotfixes to remotely work, if you can call its current state working.
The second issue is the lack of management of the fragmentation. I don't know if this is DPM's fault or the OS's fault, but we have known how to deal with fragmentation for a long time. Increasing the ReFS chunk size from 4KB to 64KB as Veeam does would cut the fragmentation problem down by a factor of 16X at the cost of some lost disk space efficiency.
Other than DPM2016/2019, I haven't had to care about managing disk fragmentation in 15 years. I don't understand why MS created a file system that is either intrinsically broken or at best requires herculean out of band management effort to keep running. I also don't understand why it has been broken for 5 years.
I was interested in Azure Backup, but I wanted to be able to do host level Hyper-V VM backups, and I wanted to manage local disk backups in the same place - so I started doing a lot of research into DPM. I found this feedback page, and the link AlbertoL posted at the bottom of this thread. I ended up very disappointed.
I did note Microsoft's recommendation in their DPM documentation to use tiered storage with a small SSD element. I do understand the advantages, and I also read the Azure Team comment on 1st September chiming in with that just below.
However, a lot of people are talking about ReFS performance which can be good after storage is freshly configured, but then deteriorates over ~3 months, and has to be reformatted.
For me to trust them, and to invest further time in testing/research + potentially storage hardware, I need Microsoft to explain clearly how the use of e.g. 4% SSD storage would ACTUALLY PREVENT DEGREDATION OVER TIME specifically, as opposed to provide a general performance uplift. I'm not yet sure about that.
More detail on precisely how to avoid ReFS problems, including various storage configuration examples would be welcome. This can be complicated - when you include virtualisation (say Hyper-V), and you take the potential use of storage pools, JBOD, RAID, potential pass through etc etc - there are quite a few ways to skin the cat as you take the physical disks up through the stack from host disk controller, Host OS, to Guest DPM server OS, and I'm not quite clear precisely what ReFS needs the admin to AVOID configuring - if indeed avoiding anything particular will definitely help.
Dear Azure Team on UserVoice, throwing hardware at this design problem is a band aid. Increase the ReFS chunk size from 4K to 64K like your competitors do, build in some periodic defrag, or something. I'm attempting to do a storage migration of a 380GB highly fragmented protection item and at the rate it's going now, it will take 53 hours to complete, and this is on a mid-size SAN.
This is utterly unusable.
P.S - for the next 53 hours my DPM server is dead in the water as storage migration preempts all other jobs on the server. I hope nobody needs a restore.
We strongly recommend you to use tiered storage with DPM. With small amount of SSD (4% of your total storage for DPM) the performance of the DPM can be much better. The blog gives a bit more details about the tiered storage and how it helps with improving the backup performance.
Agree with this - REFS is nonfunctional with DPM backups. RCT is nice, but the fragmentation it creates is untenable. I attemped a FS-level defrag of our volume; it ran for 4 weeks just trying to build a catalog of fragments. It never even began the defrag.
DPM has always been a poor product - early on it was the ridiculous VSS partition engine that broke twice a week. Now they fixed that so DPM doesn't crash, but you have to rebuild the server every 3 months or nightly backups start taking more than 24 hours.
The DPM2016 UR9 contains a note of a new Powershell commandlet parameter -CheckReplicaFragmentation.
I ran that plus the copy step on one of my VMs. It still took 90 minutes to backup 40GB; 5 minutes longer than it took yesterday.
MS supposedly "dogfoods" their own products. They must really like the taste of dog food then, because that's what this is. We all knew Ballmer was a sales moron. Satya is supposedly an visionary engineer. As much as I like the guy's personality, the majority of products MS has been churning out in his tenure have been unusably broken or unfinished, with newer versions arguably worse than the previous.
Same Situation on our site. Since moving from DPM on Server 2012 to DPM 2019 on Server 2016/ 2019 we are forced to use ReFs and modern Backup Storage. Generally a good idea if this would work. But after some weeks backups are getting extreemely slow and unreliable. A ticket with our premier support returned to buy a new storage with SSD discs. A very ridiculous "solution" to my opinion.
I now installed DPM 2019 on Windows Server 2012R2 using the same SAN storage "that ist too slow for ReFS". Now i am able to use Dynamic NTFS volumes again, and the backups are stable. MICROSOFT FIX THIS ERROR.
Since upgrading 2 x 2012 R2 DPM servers to 2019 MABS with ReFS we have experienced nothing but issues and now being advised to purchase SSD tiered storage which will be triple the cost!!
Richard Swainsaton commented
Please, Please Fix this!
Aaron Arnold commented
I agree this has gone on for WAY WAY too long and now a we have to wait for other fixes as well this is becoming a bit of a joke if you ask me.
This is ridiculous. They must be ashamed of themselves.
still no solution.
Alexander Kanakaris commented
Issues for us on a DPM2019 on Server 2016 where System State backups suddenly stop working!
Can't believe it's been 2 years and still no fix. This is making DPM an unusable product both in version 2016 AND 2019.
Andrej Trusevic commented
Fix it finally
Anatolie Criucov commented
This is out of control and not fixed for many years. MBS and ReFS are not working together. We spending hours every single day. When we can expect the fix?
John Westerwell commented
After reading the thread posted below, we tried running DPM2019 on Windows 2012 R2 in a test environment. While suspicious at first, so far there are no ill effects and backups complete properly.
We will "upgrade" the current Windows 2019 to the older system, because LBS is the only solution we have found that works.
Andrew McCarthyy commented
REFS/MBS is useless. Interim solution for all who come here is Server 2012R2/DPM2019.
The combo works perfectly even though not officially supported, but allows 2019 Workloads AND legacy storage!
Jon Lee commented
MBS and ReFS are garbage.
DPM with MBS on ReFS UNUSABLE!!!