Jump to content
Sign in to follow this  
ameel

HDDs Clicking

Recommended Posts

Thanks for the input guys, but I'm rather confused now....

 

Basically I have 15 drives in the setup.

 

I have no idea which one/s is/are wrong.

 

At least one drive clicks when I access data from the raid.

 

I do note from SMART three have bad sectors (one has 1 bad sector, one has 5, forgot the last one)

 

How can I figure out which drive actually needs replacing?

 

With an Ubuntu system, you should be able to determine which drives have bad sectors and which /dev/sdx it refers to. If you are using a GUI tool, that should also give you the make, model and firmware ID (serial number? - not sure) which should be written on the drive somewhere.

 

To determine which is clicking, I would suggest opening up the side panel and do whatever it is you are doing that makes it click. Perhaps put a finger on (non sensitive area) a drive and listen. You should be able to determine which one it is. If you notice a click when you first boot up - I guess you could unplug one at a time to determine which is making noise - just don't let it get to the point where it boots as it will obviously spaz out your Raid.

Share this post


Link to post
Share on other sites

With an Ubuntu system, you should be able to determine which drives have bad sectors and which /dev/sdx it refers to. If you are using a GUI tool, that should also give you the make, model and firmware ID (serial number? - not sure) which should be written on the drive somewhere.

 

To determine which is clicking, I would suggest opening up the side panel and do whatever it is you are doing that makes it click. Perhaps put a finger on (non sensitive area) a drive and listen. You should be able to determine which one it is. If you notice a click when you first boot up - I guess you could unplug one at a time to determine which is making noise - just don't let it get to the point where it boots as it will obviously spaz out your Raid.

Yes I'm aware of checking for bad sectors (which is how I knew which hdd had x number of bad sectors).

 

I just want to confirm the ones with the bad sectors are the ones clicking. The setup is really cramped as 15 Hdds are fitted in an ATX case, but i guess there's no other way.

 

I received an email today from Hitachi confirming the 2 drives are under warranty. Any idea on RMA time?

Share this post


Link to post
Share on other sites

No idea on RMA time. If you are aware of checking for bad sectors then you should also know where to look for the drive identification information. Once you have determined which are clicking, see if it matches I guess.

Share this post


Link to post
Share on other sites

After opening my server yesterday and feeling the hard drives while accessing data to see which HDDs click, I noticed that the temperatures were CRAZY HOT. I then realised I forgot to plug the fan cables god knows when >_<

 

in any case, left the server open, plugged the HDD fan cables, temperature dropped. Haven't heard any clicking yet, and data seems to transfer well, no lag/freeze.

 

Is it possible high temperature can cause occasional clicking? Would that have permanently damaged the HDDs in any case? any way to test for that? or was the clicking just temporary?

 

Basically, how can i test/determine whether damage is permanent or was temporary, due to heat?

 

edit:

interesting to note my data was freezing because the HDDs were hot

Edited by ameel

Share this post


Link to post
Share on other sites

I notice some of the data that were created/copied when the HDDs started clicking appear corrupt (which is OK), but I'm worried whether this will persist as a general trend for all data from now on, or whether just those particular data are damaged?

Share this post


Link to post
Share on other sites

Basically, how can i test/determine whether damage is permanent or was temporary, due to heat?

Not sure to be honest.

 

I would suggest you just go through the drives exhibiting bad sectors, replace them with new drives and get those RMAd.

 

If you are worried about having extra drives as a result, you could put them towards a backup solution which is about the only advice I think I could offer that would potentially prove beneficial to you.

Share this post


Link to post
Share on other sites

+1 The Tick.

 

basically ameel, a HDD should take a LOT of abuse (within reason). Heat shouldnt ever take a drive out of operating spec. Unless it was 100*+ (and you'll know because you'll have burnt your hand, if it just felt 'hot' it was probably more in the 50* range), while yeah, this is about 15deg. higher than most drives will run, servers live through cooling failures, clients personal PCs have had no fans and dust stopping any airflow. Ive burnt myself minorly on the under side of a drive before (though I have a thing, i burn easily so it coulda been cooler than you'd need).

 

All in all, yeah cooling them will help. and I'm glad its stopped. but the Tick is right. Bad sectors need replacing ASAP, and maybe run a HDD benchmark overnight? see if any errors get thrown and on what disk.

Share this post


Link to post
Share on other sites

Basically, how can i test/determine whether damage is permanent or was temporary, due to heat?

Not sure to be honest.

 

I would suggest you just go through the drives exhibiting bad sectors, replace them with new drives and get those RMAd.

 

If you are worried about having extra drives as a result, you could put them towards a backup solution which is about the only advice I think I could offer that would potentially prove beneficial to you.

 

Sounds good, thanks!

 

All in all, yeah cooling them will help. and I'm glad its stopped. but the Tick is right. Bad sectors need replacing ASAP, and maybe run a HDD benchmark overnight? see if any errors get thrown and on what disk.

I'm running Ubuntu. What HDD benchmark do you suggest?

Just to clarify, I'm running a 10x2tb raid6 and a 5x2tb raid5. Which HDD benchmark tool would help test and determine which HDDs are throwing errors in the raid?

 

Thanks

Edited by ameel

Share this post


Link to post
Share on other sites

/dev/sda

http://pastebin.com/y2qW5nMG

18 errors, but last one 262 days + 8 hours

 

/dev/sdb

http://pastebin.com/FXqXay45

46 errors, last one being 4 hours

 

/dev/sdc

http://pastebin.com/9Y4BuH5m

[This was just kicked out of the raid and I had to re-start pc, add it back wait for it to resync. I suspect this is one of the most "damaged" drives atm?]

 

/dev/sdd

http://pastebin.com/4rUtvrbw

851 errors, last one being 923 days + 6 hours (is that right?)

 

/dev/sde

http://pastebin.com/bNzmCjXT

 

/dev/sdf

http://pastebin.com/cPpXgxNb

 

/dev/sdg

http://pastebin.com/FsdsA2Cn

 

/dev/sdh

http://pastebin.com/MnSaQzZ9

137 errors, last one being 261 days + 17 hours

 

/dev/sdi

http://pastebin.com/9PUwgU7G

 

/dev/sdj

http://pastebin.com/W8K79F9v

 

/dev/sdk

http://pastebin.com/kCYLnwv9

 

/dev/sdl

http://pastebin.com/dxDxTBnB

 

/dev/sdm

http://pastebin.com/ZwBV7Vi1

 

/dev/sdn

http://pastebin.com/9n1Q8Z1A

 

/dev/sdo

http://pastebin.com/KaQbtbQc

43 errors, last one being 156 days + 10 hours

 

/dev/sdp

http://pastebin.com/vsmfFvYa

 

So Disk Utility reports /dev/sdd and /dev/sdo as having bad sectors (not surprisingly). For some reason /dev/sdc isn't even showing up in the assembled raid array on Disk Utility although mdstat says it's in there!!

So I guess I have to replace in priority /dev/sdc and /dev/sdd, is that right?

 

btw don't worry about /dev/sdb, that's the OS drive, i could care less.. (replacing it soon enough with an SSD, dw)

Edited by ameel

Share this post


Link to post
Share on other sites

Looks like 2 drives died. Well not really, but one lost superblock altogether, one's superblock ID is gone wrong and can't be recognised (it appears to be /dev/sdd from raid6 and /dev/sda from raid5).

 

I've added 1 spare drive for each raid and they're resyncing now. One will take 500 minutes, the other 1000 minutes, ah well..

 

I will most likely replace /dev/sdo as well. I got another 2 spare drive on my main rig but need to test whether they are healthy/etc

Edited by ameel

Share this post


Link to post
Share on other sites

AH crap, looks like one of the fans wasn't working. Looking at the temperature the HDDs were running at 70degrees @_@

Share this post


Link to post
Share on other sites

This is why I bought an icute case:

http://www.allneeds.com.au/images/case_icute_super18.jpg

 

I pulled all the drive bay fans out, and just created a huge positive pressure environment in my case.

 

Side fan > in

Rear fan x 2 > in

PSU fan > In (yep i reversed it)

 

I then sealed all vents with aluminium tape and silicone.

 

So all the positive pressure only has one place to go, out over the drive bays. Never seen higher than 45* and that was a 32* ambient day.

Share this post


Link to post
Share on other sites

Doesn't look too good with multiple drives failing.

Edited by Jeruselem

Share this post


Link to post
Share on other sites

i think 2 hdds are as good as dead, another 2 are concerning, and the others are OK.

i've prioritised and replaced the drives in the raid6, Copying my data from the raid5 atm and will get rid of the raid5 altogether. will keep 1 spare drive for the raid 6 and create a new raid5 with less disks (4 instead of 5)

Temperatures all look good now, the drives are sitting on 32-33 degree under full load when my fans are working.

 

Oh and I use an iCute case as well:

http://forums.atomicmpc.com.au/index.php?s...t&p=1022262

Share this post


Link to post
Share on other sites

I learnt the lesson a long time ago. HDDs suck.

Get the biggest ones you can, so you have the fewest number of them.

 

Non server drives also HATE raid. it doesnt seem to matter what type, my failure rate of raided drives vs unraided\synced is about 10:1.

Though this is possibly my bad luck speaking. Just like i've had an 80% DOA rate on every gigabyte product ive ever owned, and never any other brand.

 

Keep us updated on how your resilver etc. goes.

Share this post


Link to post
Share on other sites

Is there really any advantage running raid these days ? Drives are no longer slow, and when a raid fails its bye to data.

Share this post


Link to post
Share on other sites

You do realise there is more to raid than simply raid0 right?

 

Raid1 and above offer redundancy.

Yep and that's where the trouble starts because users seem to think this is equivalent to a backup, so when the raid falls over for some reason they scream and yell that it shouldn't have happened and the data shouldn't be lost.

In my opinion ordinary users should avoid raid like the plague and use a proper backup procedure to have copies of their data off the in uses drives.

Share this post


Link to post
Share on other sites

You do realise there is more to raid than simply raid0 right?

 

Raid1 and above offer redundancy.

Yep and that's where the trouble starts because users seem to think this is equivalent to a backup, so when the raid falls over for some reason they scream and yell that it shouldn't have happened and the data shouldn't be lost.

In my opinion ordinary users should avoid raid like the plague and use a proper backup procedure to have copies of their data off the in uses drives.

 

 

That would only really be a problem to the uninformed though.

 

RAID with redundancy + backup of GTFO :D

Share this post


Link to post
Share on other sites

You do realise there is more to raid than simply raid0 right?

 

Raid1 and above offer redundancy.

Yep and that's where the trouble starts because users seem to think this is equivalent to a backup, so when the raid falls over for some reason they scream and yell that it shouldn't have happened and the data shouldn't be lost.

In my opinion ordinary users should avoid raid like the plague and use a proper backup procedure to have copies of their data off the in uses drives.

 

 

That would only really be a problem to the uninformed though.

 

RAID with redundancy + backup of GTFO :D

 

 

Yeah, that's pretty much what I go for. raid with redundancy + backup hdds

 

looks like i'll be sending a number of drives to rma? :/ what does warranty cover btw?

Share this post


Link to post
Share on other sites

You do realise there is more to raid than simply raid0 right?

 

Raid1 and above offer redundancy.

Yep and that's where the trouble starts because users seem to think this is equivalent to a backup, so when the raid falls over for some reason they scream and yell that it shouldn't have happened and the data shouldn't be lost.

In my opinion ordinary users should avoid raid like the plague and use a proper backup procedure to have copies of their data off the in uses drives.

 

 

That would only really be a problem to the uninformed though.

 

RAID with redundancy + backup of GTFO :D

 

 

Yeah, that's pretty much what I go for. raid with redundancy + backup hdds

 

looks like i'll be sending a number of drives to rma? :/ what does warranty cover btw?

 

Yep I know there is more than raid 0. But these days raid IMHO is of limited benefit, and the problems ali points out makes it even less relevant. I for one do not see any use for it these days. I have seen it go wrong to many times for people, so I say fuck raid and the horse it rode in on lol. Far better to do a proper back up.

Share this post


Link to post
Share on other sites

You do realise there is more to raid than simply raid0 right?

 

Raid1 and above offer redundancy.

Yep and that's where the trouble starts because users seem to think this is equivalent to a backup, so when the raid falls over for some reason they scream and yell that it shouldn't have happened and the data shouldn't be lost.

In my opinion ordinary users should avoid raid like the plague and use a proper backup procedure to have copies of their data off the in uses drives.

 

 

That would only really be a problem to the uninformed though.

 

RAID with redundancy + backup of GTFO :D

 

 

Yeah, that's pretty much what I go for. raid with redundancy + backup hdds

 

looks like i'll be sending a number of drives to rma? :/ what does warranty cover btw?

 

Yep I know there is more than raid 0. But these days raid IMHO is of limited benefit, and the problems ali points out makes it even less relevant. I for one do not see any use for it these days. I have seen it go wrong to many times for people, so I say fuck raid and the horse it rode in on lol. Far better to do a proper back up.

 

I think I might just do that actually.... I'll switch back to non-raid, just regular HDDs, once I get my HDDs back from RMA or something, so I can transfer data on single non-raid drives and break my current raid. I'll most likely mirror a couple of the "more important" drives as well as proper backups. Would win8 be and OK OS? I originally wanted to run BWmeter or equivalent to monitor home bandwidth, but given my ubuntu raid and inefficient virtualisation options, I decided to forgo bandwidth monitoring.

Share this post


Link to post
Share on other sites

That would only really be a problem to the uninformed though.

 

RAID with redundancy + backup of GTFO :D

Trouble is that is the majority, and a surprising number of them use raid thinking it is a backup. Heck there is quite a few posts on Whirlpool about people losing raid 1 or 5 and complaining about lost data and how raid is supposed to be a backup.

Even had it directly with a couple of customers.

 

I have seen it go wrong to many times for people, so I say fuck raid and the horse it rode in on lol. Far better to do a proper back up.

+ eleventy billion. Leave raid for specialised solutions where the system owner/manager knows what he or she is doing.

Share this post


Link to post
Share on other sites

I think I might just do that actually.... I'll switch back to non-raid, just regular HDDs, once I get my HDDs back from RMA or something, so I can transfer data on single non-raid drives and break my current raid. I'll most likely mirror a couple of the "more important" drives as well as proper backups. Would win8 be and OK OS? I originally wanted to run BWmeter or equivalent to monitor home bandwidth, but given my ubuntu raid and inefficient virtualisation options, I decided to forgo bandwidth monitoring.

You wont notice any difference in speed, but your data will be much safer, and backups will save you if for some reason a hdd fails.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
Sign in to follow this  

×