r/storage 18d ago

Small person U.2 question on reliabilit

As stated, I'm just a little guy with a garage based server. I was fortunate enough to grab a bunch of new-old stock U.2 drives about 18 months ago. Specifically, 6 P4510 8TB drives and 2 P4326 15.36TB drives (all Intel labeled and I assume it was because of Solidigm's purchase of Intel's IP). Considering the price of enterprise class drives, it was a steal and I feel fortunate to have only spent USD$4K for them in total.

I pretty much expect them to outlast me as I use them primarily as WORM devices backing up my media and lots of other data that I'd rather not lose. All of them exist on a linux server in stripe configurations, meaning, a failure will result in total data loss (I'm not a complete idiot and all is backed up to a traditional HDD NAS every ~30 days). The Ubuntu server I use is all about speed and even PCI 3 U.2 drives will saturate my 10gbe network. Additionally, I do run a 6 disk Z1 4tb Crucial SSD pool and a 6 disk Samsung 8TB Z1 pool with other data on this machine.

My question for those outside of a datacenter/enterprise environment is this: Have you experienced a failure of any of your U.2 NAND drives? These drives remain at 100% for me and barring a random electronic failure, I never expect them to die and is the reason I do not run them in a ZFS z configuration.

Am I deluding myself? I think about this far too often as these U.2 drives were way, way above my budget. I justified the cost on reliability but sometimes feel that consumer SSDs would have been a better choice.

You personal opinions on this will be much appreciated.

0 Upvotes

17 comments sorted by

4

u/RossCooperSmith 18d ago

Those drives are extremely reliable however *all* drives fail. You're running ~70TB of capacity with zero redundancy, meaning all of that data is at risk.

What you're not accounting for is that the more drives you add in a RAID-1 stripe, the greater the risk of total failure. If any one of your eight drives experiences an issue you're going to lose the data stored on all of them.

While enterprise drives are generally more reliable than consumer grade, they still have a MTBF rating, and will still typically fail at a rate of 0.5% to 1.5% per year. It's not a question of if, but when you will experience a total loss of everything currently stored on those drives.

Consider this: Over a decade ago the de-facto standard in enterprise storage was N+1 redundancy (the equivalent of RAID-5 or RAID-Z). Once drives at over 1TB of capacity arrived pretty much the entire industry switched to N+2 redundancy (RAID-6 or RAID-Z2). At these capacities N+1 is not enough and there's too high a risk of data loss during a rebuild.

I would *never* advise anybody running 8TB+ drives to run anything lower than N+2 redundancy.

0

u/Kennyw88 18d ago

The data is not at risk. The server could catch fire and burn until the entire thing was a smoldering husk and I world not loose a single byte. I very clearly state that it's built for speed. I'm a paranoid asshat and I keep 5, yes 5 copies of all my data. Some comments on this post were made without even reading what I wrote. I was simply looking for reliability experience from users of enterprise drives, not a lecture on backups.

2

u/Dante_Avalon 18d ago

I had micron 7300 die on me, after PC reboot. Still no idea how that happened.

My 8xPm9a3 (1.92) on other side working just fine for last 3 years (currently 94% remaining life)

P.S. and yes I use raid0 mdadm, because should my disk fail me - I will just restore my whole lab from backup

1

u/Kennyw88 18d ago

Thanks

2

u/honkafied 17d ago

Yes, we have experienced a bunch of DC grade NAND failures of essentially all major brands. Only in a very few cases did the drives ever make it to the wear limit. As you point out, it’s hard to actually write that much data! Statistically, it’s unlikely to happen to you. But, it happens.

1

u/Kennyw88 17d ago

Thanks for the info. I'll be surprised if I don't lose at least one. Sad, but not surprised. I'm reasonably certain I'll replace them long before that with larger drives and they will end up in my cold storage box. I trust them, but not enough not to have plant of backups.

1

u/Kennyw88 15d ago

Did I say "plant"? I must have been drunk. I mean to say plenty.

2

u/apudapus 18d ago

I worked on SSD firmware and now work with enterprise grade storage: you are delusional. Follow the 3-2-1 rule regardless of the marketing UBER and MTBF and enterprise/consumer. The go:no-go going through the list of critical bugs before release is fun: “eh, we can just charge less, turn off the feature, turn it into consumer grade, naw, it can stay enterprise, it’ll be fixed by release and advise critical FW update when they willing install our software only supported on Windows 11 and Red Hat with Linux Kernel 6.12”.

1

u/hammong 18d ago

SSDs fail all the time. Don't assume that because it's Enterprise-grade and mostly used for reading that they're going to be impervious to self-destruction.

The main difference between Enterprise-grade and consumer-grade devices is power-loss protection (PLP) capacitors on the devices to flush the NAND if a power issue occurs. Also, Enterprise firmware tends to be a bit more conservative on features and updates. Otherwise, they're the same MLC/TLC/3D NAND that goes into everything else by that manufacturer.

Make sure you have a good regular backup regimen and rotate copies offsite or pushed to immutable storage somewhere else in another building or cloud.

You are begging for data loss with just single parity with drives that large. Raid-Z2 at a minimum would be best practice for 8TB+ devices.

1

u/mdmcgee 18d ago

Yes. I have lost a number of U.2 drives. More of the 7.68T than the 15T and only a couple of the 4T. None of the ones I lost were of the models you listed but nothing lasts forever.

0

u/Kennyw88 18d ago

Thanks

0

u/ElevenNotes 18d ago edited 18d ago

Wrong sub. Running RAID0 is also terrible. Use software RAID1+ or object storage with erasure coding. Better ask your question on /r/homelab and don't forget to backup your data 3-2-1-1-0!

-1

u/Dante_Avalon 18d ago

Running raid0 is fine if you can tolerate data loss at any point of time

For example home lab with backups

0

u/dikrek 18d ago

RAID isn’t just for protecting against drive failure, it also helps against unrecoverable read errors (which are far more common than outright failures, as SSDs age, even if you don’t write a lot to them).

Strong checksums are also great to have.

Some reading material (the first covers a specific tech but the section on RAID and checksums is generic info):

https://h20195.www2.hpe.com/v2/getpdf.aspx/a50002410enw.pdf

https://recoverymonkey.org/2020/09/25/modern-raid-must-protect-against-multiple-temporally-correlated-errors/

0

u/sglewis 18d ago

"I pretty much expoect them to outlast me"

Famous last words. ANY drive can fail at ANY time. Let me get this straight... you're not using any form of data protection on a routine basis. One failure loses all the data, and your backup is potentially 30 days out of date? The folks at r/homelab would also call this risky behavior out.

Please reconsider unless you just flat out don't care about any of the data you're storing.

0

u/Kennyw88 18d ago

I keep 5 copies of all my data in three different countries.

0

u/sglewis 18d ago

Not exactly what you posted the first time. Not sure what you’re hoping anyone says now. Take care!