Jump to content
Forum upgrade is live! Read more... ×
Sign in to follow this  
g__day

Windows10 BSOD WHEA

Recommended Posts

Hi folks

 

On my new big rig the PC now just randonly crashes - usually within 5-10 mins of booting with a WHEA unrecoverable error.

 

I haven't been overclocking or using it hard - nor any changes put thru. ANy ideas how to fix this very persistent and annoying issue - system restore hasn't helped!

 

Thanks,

 

Matthew

Share this post


Link to post
Share on other sites

Can you check the Event Log (via "Manage" on "This PC") to see if there's more specific info?

 

Can't say I've dealt with this exact problem but it might come down to some dodgy Ram or other components.

Share this post


Link to post
Share on other sites

Will do mate.

 

BTW - just played Serious Sam 3 for an hour or two - rock solid - which makes me wonder if its a registry error if its crashing in a browser or excel?


The application-specific permission settings do not grant Local Activation permission for the COM Server application with CLSID
{6B3B8D23-FA8D-40B9-8DBD-B950333E2C52}
and APPID
{4839DDB7-58C2-48F5-8283-E1D1807D0D7D}
to the user NT AUTHORITY\LOCAL SERVICE SID (S-1-5-19) from address LocalHost (Using LRPC) running in the application container Unavailable SID (Unavailable). This security permission can be modified using the Component Services administrative tool.


The above error happen many times in the last 24 hours.

 

The other error was:

 

The computer has rebooted from a bugcheck. The bugcheck was: 0x00000124 (0x0000000000000000, 0xffff830d0cd75028, 0x00000000b2000000, 0x0000000000070005). A dump was saved in: C:\Windows\MEMORY.DMP. Report Id: 86634aa3-7259-4e7a-87ce-5e06da5d2039..

Share this post


Link to post
Share on other sites
Posted (edited)

Does anything go wrong if you just start it up and let it sit there for half an hour?

 

I do suspect though you might have been screwed over by a dodgy update of some sort.

Would you be able to use a spare HDD and create a dummy system with similar OS version then see how it's stability goes?

Edited by Rybags

Share this post


Link to post
Share on other sites

Hmm,

 

Does sound like RAM to me. I've had a few issues with WIN10, but not BSODs, RAM is cheap these days, but that does not imply reliability.

 

Swop it around stick by stick, just the insert/re-insert often clears things up, for what we pay for RAM/MOBOs these days it's not surprising if an OS like WIN10, which uses a lot of RAM cache, chucks a fit over one bad byte :)

 

The other alternative is a badly seated CPU. but knowing you G_day I'd doubt that.

 

I've seen quite a lot of bad RAM this past year, stick a few sticks under the microscope and it's not hard to see the shoddy not QA tested wave soldering.

 

Cheers

Share this post


Link to post
Share on other sites

Wha? RAM has used BGA since DDR2 and even before. It's been part of the official specs since DDR2. No wave soldering there.

 

It's a pain but my advice would be first try a minimal hardware config with the existing Windows install. If that's of no help do the minimal config on the "dummy" Windows install.

Share this post


Link to post
Share on other sites
Posted (edited)

99.9% of the time (and even microsoft support forums say this from their 'official' techs) that error is Motherboard (rarely) or RAID card\HDD controller (common).

 

BCCode: 124 0x00000124
The WHEA_UNCORRECTABLE_ERROR bug check has a value of 0x00000124. This bug check indicates
that a fatal hardware error has occurred. This bug check uses the error data that is provided by the
Windows Hardware Error Architecture (WHEA).

 

I don't recall how crazy or not your storage solution is; but if this is happening while you're crunching your astronomy data, it could be as simple as an SSD reaching its write threshold on life, or a RAID card on its way out.

Of course, it could be anything controller related on near anything.

 

Grab OCCT.

Grab AIDA64

and the odd one out:

Grab Crystal Disk Mark.

 

Run each of their stress tests area-by-area, don't run any 'blend' tests (like "CPU and ram" just CPU, then just ram.

My gut says that benchmarking the HDDs in crystal disk mark will throw the error. (also grab crystal disk info, and check SMART health).

 

Just a random side note; Don't use PRIME95 anymore for testing without EXTREME cooling; the artificial load it creates is higher than any possible real load, and in some series of CPU's it can literally damage them.

It's because of the new AVX capabilities of skylake onwards CPU's, it was never meant to be stressed to max long term, and the voltage can spike if you stress it.

A lot of BIOS know this now and limit CPU voltage increases, and rumor is that the new CPU microcode handles it too; but It's just not worth the risk.

Not sure if the same risk applies to AMD.... (likely not)

Edited by Master_Scythe
  • Like 1

Share this post


Link to post
Share on other sites
Posted (edited)

it's not surprising if an OS like WIN10, which uses a lot of RAM cache, chucks a fit over one bad byte :)

 

Actually, it is.

Windows 10's built in error checking and correction is so strong, a completely faulty stick can take literally months to diagnose, based on symptoms alone.

I've had things as simple as Word exiting once ever 4 months or so being an entirely dead stick of RAM.

 

XP? sure. 7? Sometimes. 8.1? rarely, 10? Impossible to diagnose from 'symptoms', it's just too stable.

Edited by Master_Scythe

Share this post


Link to post
Share on other sites

The caching algorithm is much the same, likely Win 10 is probably better tuned. I've got real doubts about it tolerating bad RAM though that said the modern CPU architecture with onboard memory controllers can mean that bad RAM installs (by failing hw or wrongly configured) will report their actual size but the system can use less, and it'll show the fact in System Properties even in Win 7.

Share this post


Link to post
Share on other sites

One surprising thing - never seems to crash in playing a game for hours - but surf the net or watch youtube for 5 mins...

 

So maybe some h/w isn't playing nice at low utilisation? A voltage is too low somewhere.

 

Scorptec have been good - saying just drop it in to use (1km away) and its under warranty - we will replace whatever maybe broken - first guy wondered memory or CPU - not cheap fixes.

 

I did run System restore - memtest - Chkdsk - all fine.

 

Will try options suggested above - given PC is stable enough. Thanks all!

Share this post


Link to post
Share on other sites

I had a similar issue with my old HD4870.

 

Could play 3D games for hours, do standard 2D stuff for days but it had an intermittent lockup issue with video and some 2D accelerated gaming.

So maybe it's the graphics card. There's distinct parts of the core that are only used by the video codecs so it might be your problem.

 

But if they're willing to take it back and replace whatever they find wrong, then take advantage.

  • Like 1

Share this post


Link to post
Share on other sites

Wha? RAM has used BGA since DDR2 and even before. It's been part of the official specs since DDR2. No wave soldering there.

 

It's a pain but my advice would be first try a minimal hardware config with the existing Windows install. If that's of no help do the minimal config on the "dummy" Windows install.

:)

 

It depernds which manufacturers perhaps Ry, I've seen the Corsair RAM production line, they, or at least used to, wave soldered after chip insertion from the underside of their boards, perhaps that is why they are so reliable :)

 

Cheers

  • Like 1

Share this post


Link to post
Share on other sites

The caching algorithm is much the same, likely Win 10 is probably better tuned. I've got real doubts about it tolerating bad RAM though that said the modern CPU architecture with onboard memory controllers can mean that bad RAM installs (by failing hw or wrongly configured) will report their actual size but the system can use less, and it'll show the fact in System Properties even in Win 7.

 

I don't know what they've changed, What I think it is, is that they've managed to better control crash handling.

Where previously a memory dump from a program or driver could result in a BSOD (usually IRQ not less or equal), now I think it just 'reloads' the driver.

 

I haven't looked into it, but as someone who has to diagnose that sort of issue remotely, 10's of times a day, I can assure you, on XP or 7, "Explorer crashes 4 or 5 times a day" or "Blue screen saying XYZ", ok ram.

These days? You'll often just get things as simple as "oh, my tab reloaded.... ok" or "oh my resolution re-configured for 1 second and is now normal". As atomicans we could probably spot it just as quick, but in circumstances where the end user doesn't know things as simple as a tab reloading can be a bad sign? It s nearly IMPOSSIBLE to diagnose.

 

tldr, actual BSOD from RAM absolutely got rarer.

Share this post


Link to post
Share on other sites

Yeah... but I do suspect that onboard memory controllers help a bit. I can't recall traditional N'Bridge hardware ever detecting and reporting the full amount of Ram but disabling specific sticks because of errors (disregarding 32-bit OSes or inbuilt graphics adaptors).

Share this post


Link to post
Share on other sites
Posted (edited)

Scorptec said send it back and we will fix / replace whatever is wrong :) So will likely do that next week!

 

From WhoCrashed:

 

Just ran Whocrashed thanks for the suggestion. It could provide very little information - it just says it thinks it is an unspecified hardware fail:

On Sat 17/03/2018 2:04:55 AM your computer crashed or a problem was reported crash dump file: C:\Windows\MEMORY.DMP

This was probably caused by the following module: hal.dll (hal!HalBugCheckSystem+0xCF)
Bugcheck code: 0x124 (0x0, 0xFFFFAD0960174028, 0xB2000000, 0x70005)
Error: WHEA_UNCORRECTABLE_ERROR
file path: C:\Windows\system32\hal.dll
product: Microsoft® Windows® Operating System
company: Microsoft Corporation
description: Hardware Abstraction Layer DLL

Bug check description: This bug check indicates that a fatal hardware error has occurred. This bug check uses the error data that is provided by the Windows Hardware Error Architecture (WHEA).

This is likely to be caused by a hardware problem.

The crash took place in the Windows kernel. Possibly this problem is caused by another driver that cannot be identified at this time.

--------------------------------------------------------------------------------
Conclusion
--------------------------------------------------------------------------------

2 crash dumps have been found and analyzed. No offending third party drivers have been found. Connsider using WhoCrashed Professional which offers more detailed analysis using symbol resolution. Also configuring your system to produce a full memory dump may help you.

Edited by g__day

Share this post


Link to post
Share on other sites

OK, then the thing to do now would be to backup your personal data in case they decide to do a Windoze reinstall without bothering to preserve anything.

  • Like 1

Share this post


Link to post
Share on other sites

Do the stress testing I listed anyway.

 

It'll be nicer and more likely to be resolved if you can find a trigger for the crashes so you can tell them "It always crashes in XYZ at about 30 minutes" so thay can use it as a testing ground.

Share this post


Link to post
Share on other sites

Asked on Whirlppol forums - they are quite good. They ran thru my dump file - concurred it was hardware related bug, said the idle CPU voltages seem low - suggested I update my motherboard BIOS and see if a new BIOS better manages these new CPUs. Did so and no crashes for two weeks :)

  • Like 1

Share this post


Link to post
Share on other sites

Sounds likely!

It's why anyone who builds PC's, who are worth their salt, don't leave anything set to 'auto'.

And if you do, always add the minimum + voltage to the auto.

 

(Auto +0.05v)

Share this post


Link to post
Share on other sites

Whirlpool got it... WTF? Why didn't we think of it?

  • Like 1

Share this post


Link to post
Share on other sites

I highly doubt they are and even then the layout and location would likely vary enormously from one Bios to another, even for the same board.

 

More likely "dump analysis" and "hey, update the Bios!" are 2 unrelated events.

  • Like 1

Share this post


Link to post
Share on other sites

It's possible the BIOS update fixed an issue with the recent Spectre Intel CPU glitch with Windows.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
Sign in to follow this  

×