Windows Server 2012 Data Deduplication

Data deduplication is one of Windows Server 2012’s many new features. It’s a role that you can enable on volumes to save space. it looks for duplicate data in files on a volume and rather than have the same data in multiple it consolidates it. It does not use compression. It sees duplicate data in files and will only keep one version of the data while the other files have pointers to the actual data. This saves space. From an end user standpoint it’s seamless and they don’t even know what is going on in the background. For detailed information please refer to the TechNet article found at http://bit.ly/MBTvdo . Data deduplication is not recommended on volumes that contain files that are locked open. This mean running virtual machines, SQL 2012 data files, Exchange data files, and files like that. Data deduplication is designed for your file shares, software deployment shares, VHD libraries, offline VMs and files that are not locked open all the time. One thing to note is data deduplication will not work on ReFS formatted volumes. ReFS is designed for locked open files like running virtual machines, SQL 2012 data files, Exchange data files, etc. Data deduplication will only work on NTFS volumes. The first thing to do is to install the data deduplication role. It’s under file and storage services, file and iSCSI services roles. When you select it you may get a prompt to install some other features. After you install the role a reboot is not required. The next step is to configure it. It’s as simple as right clicking on the volume in server manager, clicking configure data deduplication, and supplying a few bits of information. To demonstrate data deduplication will not work on ReFS volumes you’ll see In the screenshot below it’s not available. Just because ReFS is newer than NTFS doesn’t always mean it’s better and should be chosen for everything. NTFS should still be used for files that get opened and closed and storing files. ReFS is great for files that are open and locked most of the time. After I formatted the drive as NTFS it’s now available. Before we actually enable data deduplication let’s use the ddpeval.exe tool. It’s installed at c:windowssystem32 when you install the data deduplication role. This tool will give you an estimate on how much space you’ll save on a volume if you decide to use data deduplication. The tool is command line and has some switches. I’m just going to run it against my E drive. I have 3 VHDs and some XLM files there totaling 24.9 GB. In the screenshots above you’ll see running ddpeval.exe uses the processor a lot. It uses between 50-75% of your CPU cycles. I wouldn’t recommend running it during the day. Also depending on how many files you have and how large the files are this could take some time to run. Having 3 large offline VHDs and few xml files the ddpeval.exe process took 19 minutes. According to ddpeval.exe if I decide to data deduplicate this volume it should go from 24.9GB to 5.13GB! WOW, let’s do it! The next few screenshots will show setting up data reduplication. I’m going to basically enable it on volume E, deduplicate files older than 0 days(the default is 5 but since this server was built today having anything other than 0 will not work), and start this at 10:45AM. I’m using the scheduled method. This will use more CPU time but you can schedule it. The other option is enable background optimization. This will run using low CPU priority when the server is idle but for demonstration purposes I wanted to start this manually. I’m also not going to exclude anything as this is for demonstration. Depending on your servers you might exclude certain things. That’s about it. When you set this up it’s basically a scheduled task. In the next few screenshots you’ll see it running using high CPU utilization. Remember we kicked this off manually and when you do that it will run using normal priority. It took 36 minutes but in the end enabling data deduplication saved a lot of room. You’ll see the used space is 6.46GB in the properties window ! Data deduplication can’t be used for everything but if you have a file server it’s worth looking into.