Buyer beware – that 2TB-6TB “NAS” drive you’ve been eyeing might be SMR

Submitted by Alfman 2020-06-03 Hardware 16 Comments

Storage vendors, including but reportedly not limited to Western Digital, have quietly begun shipping SMR (Shingled Magnetic Recording) disks in place of earlier CMR (Conventional Magnetic Recording) disks.
SMR is a technology that allows vendors to eke out higher storage densities, netting more TB capacity on the same number of platters—or fewer platters, for the same amount of TB.
Until recently, the technology has only been seen in very large disks, which were typically clearly marked as “archival”. In addition to higher capacities, SMR is associated with much lower random I/O performance than CMR disks offer.

This is going to be another one of those stupid things us technology buyers have to look out for when buying storage, isn’t it? Like

About The Author

Thom Holwerda

Follow me on Mastodon @[email protected]

16 Comments

2020-06-03 9:29 pm

Alfman verbose=1
I nearly bought one before seeing the performance issues come up in a review.

The main problem is that manufactures and stores are not clearly labeling these and it’s not until you experience bad performance that you realize something is “wrong”. Even armed with the exact model, the manufacturer spec sheets can’t be relied on to tell us the facts.

Here are statements WD made in the article:

All our WD Red drives are designed to meet or exceed the performance requirements and specifications for common and intended small business/home NAS workloads. WD Red capacities 2TB-6TB currently employ device-managed shingled magnetic recording (DMSMR) to maximize areal density and capacity. WD Red 8-14TB drives use conventional magnetic recording (CMR).

You are correct that we do not specify recording technology in our WD Red HDD documentation.

This is deceptive as hell!

Now WD’s got a lawsuit.
https://arstechnica.com/gadgets/2020/05/western-digital-gets-sued-for-sneaking-smr-disks-into-its-nas-channel/

I hope all manufacturers take note. Selling SMR drives isn’t the problem, but not clearly and openly disclosing the fact is.
2020-06-03 11:13 pm

sukru
SMR has a place, but it is not NAS disks which are supposed to provide high performance and low latency. WD knew this, so did the other manufacturers (I don’t have a list, but I think virtually all HDD manufacturers did a similar scheme).

And, if you think about it 10+TB disks are excluded from this. They would actually benefit from SMR, but their customers would be more savvy. (If you will your NAS with 4TB disks, you don’t actually need a NAS at this point in time). So I think, they assumed those customers would not notice the difference.

Those buying the higher capacity ones would obviously notice the difference. One of the first things I do when I get a new disk it actually testing it for performance and defects (and let’s not go into having bad blocks before filling a single pass).

It is nice to see that they could not get away with it. Hope this becomes a lesson.

2020-06-04 1:12 am

Alfman verbose=1
sukru,

SMR has a place, but it is not NAS disks which are supposed to provide high performance and low latency. WD knew this, so did the other manufacturers (I don’t have a list, but I think virtually all HDD manufacturers did a similar scheme).

And, if you think about it 10+TB disks are excluded from this. They would actually benefit from SMR, but their customers would be more savvy. (If you will your NAS with 4TB disks, you don’t actually need a NAS at this point in time). So I think, they assumed those customers would not notice the difference.

I don’t have an issue if they make a good faith attempt to provide the information. However it’s downright unethical to hide it and omit it from the specs. It’s got nothing to do with whether it’s a “NAS drive” or “x TB capacity” a lot of applications are affected by the bad random access performance regardless of capacity and regardless of what kind of system it’s plugged into.
Ultimately the consumer needs to have the facts in order to make an informed choice.

One of the first things I do when I get a new disk it actually testing it for performance and defects (and let’s not go into having bad blocks before filling a single pass).

That’s a good practice, although alot of people test by copying large files and not testing random access, which can be a more significant bottleneck (YMMV).

It is nice to see that they could not get away with it. Hope this becomes a lesson.

I hope so. We deserve to know what we’re buying! If consumers are willing to buy the drives say for a discount, then that is ok. But It’s totally unfair to pawn these off on unsuspecting consumers. I seriously hope the lawsuit forces manufacturers to truthfully label the disks.
2020-06-04 4:14 am

avgalen
SMR has a place, but it is not NAS disks which are supposed to provide high performance and low latency

NO, for high performance and low latency you use (caching) ssd’s. Most NAS disks are only used by one user at a time for storing big files (audio/video/photo/compressed-backups). This would actually be a good match for SMR: budget disks in a network-interfaced device with large file transfers

If you are doing webdevelopment (many small files): use (caching) ssd’s
If you are running a (small) business: A proper fileserver with SSD’s might be better than a NAS
If you are running a large business: SAN, not NAS

2020-06-04 10:03 am

Alfman verbose=1
NO, for high performance and low latency you use (caching) ssd’s. Most NAS disks are only used by one user at a time for storing big files (audio/video/photo/compressed-backups). This would actually be a good match for SMR: budget disks in a network-interfaced device with large file transfers

However a NAS is going to depend on what people put on it, generalizing isn’t helpful. You can’t just say “these are good for NAS”, without knowing more about the file access patterns on the NAS server. Realistically some users (myself included) use NAS as regular storage and not just archives/backups so one cannot automatically say SMR is always suited for “NAS”. Sure we understand that, but that’s not the way these drives are being marketed unfortunately.

A separate issue that’s come up when you read more complaints about SMR drives is that they’re being auto-marked as defective in some raid arrays as defective because they’re timing out.

I’d argue if there’s any application where these are most suited, it’s probably media PCs / security DVRs where it processes long continuous streams all day.

If you are doing webdevelopment (many small files): use (caching) ssd’s
If you are running a (small) business: A proper fileserver with SSD’s might be better than a NAS
If you are running a large business: SAN, not NAS

It depends. Hard drives are significantly cheaper for large capacities and they don’t suffer from bad write endurance, which has been getting worse with SSD as they trade off lifetime for more capacity. I’d advice everyone to look at the write endurance on SSDs when making purchasing decisions. You can buy enterprise SSDs that have more endurance, but these are extremely pricey especially if you need to buy many for raid. SSDs give a nice performance boost, but until SSDs evolve to the point where there’s no tradeoffs I’d say hard disks will continue to have merit for some users and businesses. YMMV.

2020-06-04 12:40 pm

Flatland_Spider
I’d argue if there’s any application where these are most suited, it’s probably media PCs / security DVRs where it processes long continuous streams all day.

Funnily enough, the Purple line for video surveilance systems is the first consumer application of SMR drives from WD.

Hard drives are significantly cheaper for large capacities and they don’t suffer from bad write endurance, which has been getting worse with SSD as they trade off lifetime for more capacity.

Yes, capacities above 1TB. Below that, there isn’t a reason to buy an HD over an SSD.

This isn’t as true as it used to be. Write endurance has stayed stable and gotten better in some cases.

Looking at a 1TB WD Blue with 3D NAND, which is a mainstream drive and not an ultra expensive prosumer drive, it has a MTTF of 1.75M hours and a TBW of 400TB. People don’t write 1TB of data a day. That is a crazy high number for most people, and it is more the adequate for the vast majority of use cases.

https://www.newegg.com/western-digital-blue-1tb/p/N82E16820250088

until SSDs evolve to the point where there’s no tradeoffs I’d say hard disks will continue to have merit for some users and businesses.

That will never happen because we have to deal with physics. HDs have the problem of being mechanical devices, and SSDs have the problem of write endurance. Both are wear items which need to be check periodically. They’re like car tires.

Overall, SSDs have similar durability to HDs, better in some metrics, and much better performance. HDs are niche tech for home users and becoming niche tech in the datacenter relegated to storage arrays.

2020-06-04 3:30 pm

Alfman verbose=1
Flatland_Spider,

This isn’t as true as it used to be. Write endurance has stayed stable and gotten better in some cases.

P/E cycles have only gotten worse as they pack cells more and more densely. Flash chips designed for TLC/QLC bit densities are speced down to as low as 1k cycles for raw storage. Of course there are mitigations that drive manufactures can employ in the controllers to mask the low endurance of modern NAND chips:
– ware leveling algorithms to balance out the remaining life across all cells
– over-provisioning (a big difference between enterprise and consumer SSDs is substantial over-provisioning to increase the overall lifespan),
– more ECC bits
– using a more durable SLC/MLC cache in front of the TLC/QLC storage.
– compressing data to occupy fewer cells.
– battery backed cache

But it’s gotten harder to find the detailed specs in the last decade as manufacturers have become less forthcoming.

Looking at a 1TB WD Blue with 3D NAND, which is a mainstream drive and not an ultra expensive prosumer drive, it has a MTTF of 1.75M hours and a TBW of 400TB.

It does not say “it has a MTTF of 1.75M hours and a TBW of 400TB” it says “MTTF Up to 1.75M hours” and “up to 400 terabytes written (TBW)”. I know we’re programmed to ignore that these days, but it does change the meaning. Also the estimate is probably under ideal conditions (ie long sequential writes). For random access workloads, fragmentation/write amplification takes a larger toll on flash. As always, engineers can employ mitigations, but given that no detailed specs are given, its impossible for us to know whether they actually did. Unless you buy an enterprise drive, you genuinely don’t know how much the manufacturer compromised on endurance to save on their manufacturing costs.

That will never happen because we have to deal with physics. HDs have the problem of being mechanical devices, and SSDs have the problem of write endurance. Both are wear items which need to be check periodically. They’re like car tires.

I’ve had hard disks running decades in systems with no signs of failing. I’ve also had HD fail too, but by far most still work. I’m curious whether the hard drives that went on to fail could have been distinguished from those that did not fail at the time they were brand new? It would probably be feasible to make magnetic drives that last a lifetime with better manufacturing tolerances, but then again they’d probably become obsolete before they fail.

2020-06-04 7:57 pm

Flatland_Spider
P/E cycles have only gotten worse as they pack cells more and more densely. Flash chips designed for TLC/QLC bit densities are speced down to as low as 1k cycles for raw storage. Of course there are mitigations that drive manufactures can employ in the controllers to mask the low endurance of modern NAND chips:</blockquote.

I believe you. OEMs are still holding the durability line, even if it is through crazy contortions and magic.

Then there is how I'm dealing with the drive as whole rather then buying raw flash. If I was buying raw flash, it would concern me more, but I'm not.

But it’s gotten harder to find the detailed specs in the last decade as manufacturers have become less forthcoming.

Yes. >:( It’s annoying. Particularly hiding DRAM cache and SLC cache specs.

I’m not sure anyone uses compressed data anymore. It doesn’t really work that well.

The same with a battery backed cache, especially with a NVMe drives. I don’t think anyone does this anymore.

It’s also one of those things where I’ve accepted they’re good enough.

TechReport did a stress test where they abused some SSDs to failure, and the SSDs took a lot of abuse before they failed. It was something like a year before the first one failed, and something like 2 until the last one failed. SSD endurance isn’t that big of a problem. They do have limits, but everything does.

It does not say “it has a MTTF of 1.75M hours and a TBW of 400TB” it says “MTTF Up to 1.75M hours” and “up to 400 terabytes written (TBW)”. I know we’re programmed to ignore that these days, but it does change the meaning. Also the estimate is probably under ideal conditions (ie long sequential writes). For random access workloads, fragmentation/write amplification takes a larger toll on flash.

Yes, those are weasel words, and OEMs say the same thing about HDs.

The numbers are still pretty good indicators since I don’t have endless time or money to abuse disk drives. Pick one which has decent specs and good reviews.

For people who would like some background: https://www.architecting.it/blog/tbw-dwpd/

Anyway, disks are wear items.

I’m curious whether the hard drives that went on to fail could have been distinguished from those that did not fail at the time they were brand new? It would probably be feasible to make magnetic drives that last a lifetime with better manufacturing tolerances, but then again they’d probably become obsolete before they fail.

I don’t think anyone can, unless there is a pattern. Backblaze releases their annual drive failure ~~marketing piece~~ report, and they don’t even know until they have a large enough sample size.

The conventional wisdom is if a new drive doesn’t fail within 30 days it will be fine.

That’s kind of the thing isn’t it. Designing a wear item to be bulletproof is kind of counter productive. Like incandescent light bulbs. What’s better that they cost a fortune and last forever, or they have a adequate endurance and are cheap to replace?

My assessment is that SSDs are good enough for general applications. The performance boost SSDs provide more then offsets any endurance trade offs which are made.

2020-06-04 11:43 pm

Alfman verbose=1
Flatland_Spider,

Anyway, disks are wear items.

Not in the same sense though. Given enough time, NAND hardware will eventually break, and hard disks will eventually break in the sense that everything eventually “breaks” due to component failures, the average time described by MTBF. But on top of that NAND chips wear out (and become slower too) predictably over time as they’re written to. This type of wear is specific to SSDs.

In other words, for hard disks failure over time is a probability, whereas SSD wearing out over time is a function of how many times its cells were erased & rewritten. An HD that’s written 400PB might manage to write another 400PB whereas an SSD will have used up all of it’s write endurance. With this in mind, I have to object to characterizing hard drives as “wear items” in the same sense that SSDs are.

That’s kind of the thing isn’t it. Designing a wear item to be bulletproof is kind of counter productive. Like incandescent light bulbs. What’s better that they cost a fortune and last forever, or they have a adequate endurance and are cheap to replace?

I still object to your classifying HDs as a “wear item”. Presumably the CD/radio in your car has a MTBF, but it is not a “wear item” in the same sense as the brakes on your car. One literally wears out whereas the other only breaks if you are abusive and/or unfortunate. I know it’s nitpicking, but when you call every electronic or mechanical item a “wear item”, it truly looses all meaning. There is a distinction to be made between products that could fail and those that wear out expectedly during their lifetime.

I found your lightbulb example kind of ironic given that more expensive LED lighting technology that lasts longer is replacing cheaper incandescent light bulbs that need to be replaced more frequently, so I don’t think it was the best example to make that point

Anyways, we’re kind of off track here, haha. I know what you’re saying and I don’t fundamentally disagree, longevity isn’t the end all be all, especially at lower price points. The main thing is that consumers know what they’re buying. Manufacturers need to improve in this area.

2020-06-04 1:12 pm

Flatland_Spider
NO, for high performance and low latency you use (caching) ssd’s.

For high perf and low latency you buy lots of servers with adequate RAM and SSDs from the getgo.

SSD caches aren’t as useful as people think. Performance drops off considerably when the cache gets full or data outside of the cache is requested, and they are still slower than RAM. Moral of the story, buy more RAM first.

Most NAS disks are only used by one user at a time for storing big files (audio/video/photo/compressed-backups).

LOL No.

USB drives are only used by one user at a time. NAS are to be used by multiple people concurrently. Otherwise, what is the point; buy a USB drive to save some money.

If you are doing webdevelopment (many small files): use (caching) ssd’s
If you are running a (small) business: A proper fileserver with SSD’s might be better than a NAS

For lots of small files, get an SSD. 1TB SSDs are cheap.

What do you think a NAS is? NAS and fileservers are basically the same thing. The same thing with a SAN. The line is very blurry between all of them, and it’s really just protocols they use.
2020-06-05 8:35 am

ahferroin7
> Most NAS disks are only used by one user at a time for storing big files (audio/video/photo/compressed-backups).

Even if you assume this, NAS systems pretty universally use storage layouts that have issues with the performance implications of SMR drives. Back when RAID1 was the norm, yes, SMR would be fine. These days the norm is either RAID5/6, ZFS, or BTRFS, all of which have serious issues with the performance implications of SMR drives even if you are storing mostly very big files.

However, you’re also basing your assumption on classical terminology where a NAS is little more than a storage device with a dinky little CPU and next to no RAM used for just providing storage on a conventional network. These days the term NAS just as easily refers to purpose built file-servers as the poorly designed classical storage appliances, and the marketing gets very fuzzy between NAS and SAN hardware when you go beyond the network interface.

But none of that matters, because HDD’s marketed for NAS usage have always been understood to be targeted at running in nearline and/or always-on RAID arrays. IOW, they’re not supposed to be super high performance like HDD’s marketed for Enterprise usage (and thus cost correspondingly less than the price gouging going on there), but still provide the behavioral characteristics and features required to make always-on and near-line RAID arrays work correctly. The big distinction here has historically been that they just return write and read errors when they happen instead of retrying for multiple minutes in firmware before giving up and returning errors. More recently though there’s also been a general assumption though that they are _not_ SMR disks, because SMR makes it impossible to run RAID5/6 arrays efficiently or run ZFS/BTRFS safely.

2020-06-05 10:53 am

Alfman verbose=1
ahferroin7,

More recently though there’s also been a general assumption though that they are _not_ SMR disks, because SMR makes it impossible to run RAID5/6 arrays efficiently or run ZFS/BTRFS safely.

It bugs me that the manufacturers are selling these for markets where they don’t belong. We can’t pay any attention to what the manufacturer thinks for many of the reasons that you brought up and because it just doesn’t make any sense to categorize all “NAS” systems as the same. The fact that these drives happen to be in a NAS is not an indication of how they’ll be configured or what kind of performance it will need.

I see the WD40EFAX is “Specifically designed for use in NAS systems with up to 8 bays” well what the hell does that mean? This is one of the drives with terrible raid performance BTW. Not only is labeling it NAS drive ambiguous, it can be downright counterproductive and misleading.

2020-06-04 8:17 pm

Flatland_Spider
And, if you think about it 10+TB disks are excluded from this.

As other sites have pointed out, it’s about money. High capacity disks are high margin, and lower capacity disks are not.

Switching the lower end Reds to SMR was a cost-cutting move by WD to squeeze out more margin. Nothing more, nothing less.

(If you will your NAS with 4TB disks, you don’t actually need a NAS at this point in time). So I think, they assumed those customers would not notice the difference.

It’s about the access patterns and usage; not data size. Network and Attached are the keywords here.

They assumed it would work, but it didn’t. If it had worked, no one would have cared. The lower end market is cost sensitive, and as long as the performance had been comparable, people would have been fine with it.

2020-06-04 12:19 pm

Flatland_Spider
This is going to be another one of those stupid things us technology buyers have to look out for when buying storage, isn’t it? Like

No because 1TB SSDs have dropped below $149 with many dropping down to the $109-$129 range at times.

1TB is still a huge amount of space, and more people are using SaaS software these days. SMR drives really only affect data hoarders with RAID arrays.

This affects me because running storage servers with RAID arrays is part of my day job, and at home, I need space for VMs to prototype storage heavy things. (I fully understand I am on the high side of the Bell Curve.) However, this doesn’t even register for normal people. I’ll guess they don’t have more than 0.5 TB worth of local data which matters to them.

The SMR drives are good enough for backup drives and warm storage. It’s when they get used in RAID arrays they start falling down, which is highly likely considering the market Reds are targeted at. It says so right on the box: “designed for RAID arrays and high disk systems.”

Performance numbers: https://www.servethehome.com/wd-red-smr-vs-cmr-tested-avoid-red-smr/

2020-06-04 4:05 pm

Alfman verbose=1
Flatland_Spider,

Performance numbers: https://www.servethehome.com/wd-red-smr-vs-cmr-tested-avoid-red-smr/

Thanks for posting that!

Some of those benchmarks are reasonable, I’m sure the 256MB cache helped alot to smooth out bursts of IO.
I was surprised how badly the SMR disk did copying a large 125GB file, I expected better. Granted average consumers may have lighter workloads, but it does highlight the performance deficits of the underlying media once the cache fills up.

And the RAID rebuilding performance was just atrocious:

Unfortunately, while the SMR WD Red performed respectably in the previous benchmarks, the RAIDZ resilver test proved to be another matter entirely. While all three CMR drives comfortably completed the resilver in under 17 hours, the SMR drive took nearly 230 hours to perform an identical task.

The WD40EFAX performed so poorly that we repeated the test on a second disk to rule out user error; the second disk exhibited the same extremely slow resilver speeds. We also tested the SMR drives before and after the CMR drives to ensure that it was not a case of something happening due to the order of testing.

The only positive here is that the resilver did finish, and encountered no errors along the way, but the performance operating in the RAIDZ array was completely unacceptable. That 9 day and almost 14-hour rebuild means that using the WD Red 4TB SMR drive inadvertently in an array would lead to your data being vulnerable for around 9 days longer than the WD Red 4TB CMR drive or Seagate IronWolf. If you use WD Red CMR drives, you had class-leading performance in this test but if you bought a WD Red SMR drive, perhaps not understanding the difference, you would have another 9 days of potentially catastrophic data vulnerability.

2020-06-04 7:06 pm

Flatland_Spider
Yeah, the SMR drive performance isn’t atrocious until you do the one thing people buy Reds for. LOL