Jump to content


Photo

Windows10 BSOD WHEA

Oh no

  • Please log in to reply
25 replies to this topic

#1 g__day

g__day

    Champion

  • Hero
  • 7,902 posts

Posted 15 March 2018 - 07:11 PM

Hi folks

 

On my new big rig the PC now just randonly crashes - usually within 5-10 mins of booting with a WHEA unrecoverable error.

 

I haven't been overclocking or using it hard - nor any changes put thru.  ANy ideas how to fix this very persistent and annoying issue - system restore hasn't helped!

 

Thanks,

 

   Matthew


Talent + Integrity = Atomic!

#2 Rybags

Rybags

    Immortal

  • Super Hero
  • 35,847 posts

Posted 15 March 2018 - 07:22 PM

Can you check the Event Log (via "Manage" on "This PC") to see if there's more specific info?

 

Can't say I've dealt with this exact problem but it might come down to some dodgy Ram or other components.



#3 g__day

g__day

    Champion

  • Hero
  • 7,902 posts

Posted 15 March 2018 - 10:07 PM

Will do mate.

 

BTW - just played Serious Sam 3 for an hour or two - rock solid - which makes me wonder if its a registry error if its crashing in a browser or excel?


The application-specific permission settings do not grant Local Activation permission for the COM Server application with CLSID
{6B3B8D23-FA8D-40B9-8DBD-B950333E2C52}
 and APPID
{4839DDB7-58C2-48F5-8283-E1D1807D0D7D}
 to the user NT AUTHORITY\LOCAL SERVICE SID (S-1-5-19) from address LocalHost (Using LRPC) running in the application container Unavailable SID (Unavailable). This security permission can be modified using the Component Services administrative tool.


The above error happen many times in the last 24 hours.

 

The other error was:

 

The computer has rebooted from a bugcheck.  The bugcheck was: 0x00000124 (0x0000000000000000, 0xffff830d0cd75028, 0x00000000b2000000, 0x0000000000070005). A dump was saved in: C:\Windows\MEMORY.DMP. Report Id: 86634aa3-7259-4e7a-87ce-5e06da5d2039..


Talent + Integrity = Atomic!

#4 Rybags

Rybags

    Immortal

  • Super Hero
  • 35,847 posts

Posted 15 March 2018 - 10:19 PM

Does anything go wrong if you just start it up and let it sit there for half an hour?

 

I do suspect though you might have been screwed over by a dodgy update of some sort.

Would you be able to use a spare HDD and create a dummy system with similar OS version then see how it's stability goes?


Edited by Rybags, 15 March 2018 - 10:39 PM.


#5 chrisg

chrisg

    Immortal

  • Super Hero
  • 35,029 posts
  • Location:Perth

Posted 16 March 2018 - 04:56 AM

Hmm,

 

Does sound like RAM to me. I've had a few issues with WIN10, but not BSODs, RAM is cheap these days, but that does not imply reliability.

 

Swop it around stick by stick, just the insert/re-insert often clears things up, for what we pay for RAM/MOBOs these days it's not surprising if an OS like WIN10, which uses a lot of RAM cache, chucks a fit over one bad byte :)

 

The other alternative is a badly seated CPU. but knowing you G_day  I'd doubt that.

 

I've seen quite a lot of bad RAM this past year, stick a few sticks under the microscope  and it's not hard to see the shoddy not QA tested wave soldering.

 

Cheers


"Specialisation is for Insects" RAH

#6 Rybags

Rybags

    Immortal

  • Super Hero
  • 35,847 posts

Posted 16 March 2018 - 08:24 AM

Wha?  RAM has used BGA since DDR2 and even before.  It's been part of the official specs since DDR2.  No wave soldering there.

 

It's a pain but my advice would be first try a minimal hardware config with the existing Windows install.  If that's of no help do the minimal config on the "dummy" Windows install.



#7 Master_Scythe

Master_Scythe

    Titan

  • Hero
  • 20,571 posts
  • Location:QLD

Posted 16 March 2018 - 09:11 AM

99.9% of the time (and even microsoft support forums say this from their 'official' techs) that error is Motherboard (rarely) or RAID card\HDD controller (common).

 

BCCode: 124   0x00000124
The WHEA_UNCORRECTABLE_ERROR bug check has a value of 0x00000124. This bug check indicates
that a fatal hardware error has occurred. This bug check uses the error data that is provided by the
Windows Hardware Error Architecture (WHEA).

 

I don't recall how crazy or not your storage solution is; but if this is happening while you're crunching your astronomy data, it could be as simple as an SSD reaching its write threshold on life, or a RAID card on its way out.

Of course, it could be anything controller related on near anything.

 

Grab OCCT.

Grab AIDA64

and the odd one out:

Grab Crystal Disk Mark.

 

Run each of their stress tests area-by-area, don't run any 'blend' tests (like "CPU and ram" just CPU, then just ram.

My gut says that benchmarking the HDDs in crystal disk mark will throw the error. (also grab crystal disk info, and check SMART health).

 

Just a random side note; Don't use PRIME95 anymore for testing without EXTREME cooling; the artificial load it creates is higher than any possible real load, and in some series of CPU's it can literally damage them.

It's because of the new AVX capabilities of skylake onwards CPU's, it was never meant to be stressed to max long term, and the voltage can spike if you stress it.

A lot of BIOS know this now and limit CPU voltage increases, and rumor is that the new CPU microcode handles it too; but It's just not worth the risk.

Not sure if the same risk applies to AMD.... (likely not)


Edited by Master_Scythe, 16 March 2018 - 09:11 AM.

Wherever you go in life, watch out for Scythe, the tackling IT support guy.

"I don't care what race you are, not one f*cking bit, if you want to be seen as a good people, you go in there and you f*ck up the people who (unofficially) represent you in a negative light!"


#8 Jeruselem

Jeruselem

    Guru

  • Atomican
  • 15,000 posts
  • Location:Not Trump-Land

Posted 16 March 2018 - 10:36 AM

Run this Windows diagnostic RAM check

https://technet.micr...y/ff700221.aspx (built into Windows, no need for boot USB or CD)


“We’re not going to stop the wheel. I’m going to break the wheel.” - Daenerys Targaryen

 

"We have some of the most beautiful hookers in the world" - Putin to Trump


#9 Master_Scythe

Master_Scythe

    Titan

  • Hero
  • 20,571 posts
  • Location:QLD

Posted 16 March 2018 - 11:35 AM

 it's not surprising if an OS like WIN10, which uses a lot of RAM cache, chucks a fit over one bad byte :)

 

Actually, it is.

Windows 10's built in error checking and correction is so strong, a completely faulty stick can take literally months to diagnose, based on symptoms alone.

I've had things as simple as Word exiting once ever 4 months or so being an entirely dead stick of RAM.

 

XP? sure. 7? Sometimes. 8.1? rarely, 10? Impossible to diagnose from 'symptoms', it's just too stable.


Edited by Master_Scythe, 16 March 2018 - 11:35 AM.

Wherever you go in life, watch out for Scythe, the tackling IT support guy.

"I don't care what race you are, not one f*cking bit, if you want to be seen as a good people, you go in there and you f*ck up the people who (unofficially) represent you in a negative light!"


#10 Rybags

Rybags

    Immortal

  • Super Hero
  • 35,847 posts

Posted 16 March 2018 - 02:33 PM

The caching algorithm is much the same, likely Win 10 is probably better tuned.  I've got real doubts about it tolerating bad RAM though that said the modern CPU architecture with onboard memory controllers can mean that bad RAM installs (by failing hw or wrongly configured) will report their actual size but the system can use less, and it'll show the fact in System Properties even in Win 7.



#11 g__day

g__day

    Champion

  • Hero
  • 7,902 posts

Posted 16 March 2018 - 11:16 PM

One surprising thing - never seems to crash in playing a game for hours - but surf the net or watch youtube for 5 mins...

 

So maybe some h/w isn't playing nice at low utilisation? A voltage is too low somewhere.

 

Scorptec have been good - saying just drop it in to use (1km away) and its under warranty - we will replace whatever maybe broken - first guy wondered memory or CPU - not cheap fixes.

 

I did run System restore - memtest - Chkdsk - all fine.

 

Will try options suggested above - given PC is stable enough.  Thanks all!


Talent + Integrity = Atomic!

#12 Rybags

Rybags

    Immortal

  • Super Hero
  • 35,847 posts

Posted 16 March 2018 - 11:20 PM

I had a similar issue with my old HD4870.

 

Could play 3D games for hours, do standard 2D stuff for days but it had an intermittent lockup issue with video and some 2D accelerated gaming.

So maybe it's the graphics card.  There's distinct parts of the core that are only used by the video codecs so it might be your problem.

 

But if they're willing to take it back and replace whatever they find wrong, then take advantage.



#13 chrisg

chrisg

    Immortal

  • Super Hero
  • 35,029 posts
  • Location:Perth

Posted 17 March 2018 - 07:04 AM

Wha?  RAM has used BGA since DDR2 and even before.  It's been part of the official specs since DDR2.  No wave soldering there.

 

It's a pain but my advice would be first try a minimal hardware config with the existing Windows install.  If that's of no help do the minimal config on the "dummy" Windows install.

:)

 

 It depernds which manufacturers perhaps Ry, I've seen the Corsair RAM production line, they, or at least used to, wave soldered after chip insertion from the underside of their boards, perhaps that is why they are so reliable :)

 

Cheers


"Specialisation is for Insects" RAH

#14 Master_Scythe

Master_Scythe

    Titan

  • Hero
  • 20,571 posts
  • Location:QLD

Posted 17 March 2018 - 09:02 AM

The caching algorithm is much the same, likely Win 10 is probably better tuned.  I've got real doubts about it tolerating bad RAM though that said the modern CPU architecture with onboard memory controllers can mean that bad RAM installs (by failing hw or wrongly configured) will report their actual size but the system can use less, and it'll show the fact in System Properties even in Win 7.

 

I don't know what they've changed, What I think it is, is that they've managed to better control crash handling.

Where previously a memory dump from a program or driver could result in a BSOD (usually IRQ not less or equal), now I think it just 'reloads' the driver.

 

I haven't looked into it, but as someone who has to diagnose that sort of issue remotely, 10's of times a day, I can assure you, on XP or 7, "Explorer crashes 4 or 5 times a day" or "Blue screen saying XYZ", ok ram.

These days? You'll often just get things as simple as "oh, my tab reloaded.... ok" or "oh my resolution re-configured for 1 second and is now normal". As atomicans we could probably spot it just as quick, but in circumstances where the end user doesn't know things as simple as a tab reloading can be a bad sign? It s nearly IMPOSSIBLE to diagnose.

 

tldr, actual BSOD from RAM absolutely got rarer.


Wherever you go in life, watch out for Scythe, the tackling IT support guy.

"I don't care what race you are, not one f*cking bit, if you want to be seen as a good people, you go in there and you f*ck up the people who (unofficially) represent you in a negative light!"


#15 Rybags

Rybags

    Immortal

  • Super Hero
  • 35,847 posts

Posted 17 March 2018 - 09:36 AM

Yeah... but I do suspect that onboard memory controllers help a bit.  I can't recall traditional N'Bridge hardware ever detecting and reporting the full amount of Ram but disabling specific sticks because of errors (disregarding 32-bit OSes or inbuilt graphics adaptors).



#16 g__day

g__day

    Champion

  • Hero
  • 7,902 posts

Posted 17 March 2018 - 01:04 PM

Scorptec said send it back and we will fix / replace whatever is wrong :)  So will likely do that next week!

 

From WhoCrashed:

 

Just ran Whocrashed thanks for the suggestion.  It could provide very little information - it just says it thinks it is an unspecified hardware fail:

On Sat 17/03/2018 2:04:55 AM your computer crashed or a problem was reported crash dump file: C:\Windows\MEMORY.DMP

This was probably caused by the following module: hal.dll (hal!HalBugCheckSystem+0xCF)
Bugcheck code: 0x124 (0x0, 0xFFFFAD0960174028, 0xB2000000, 0x70005)
Error: WHEA_UNCORRECTABLE_ERROR
file path: C:\Windows\system32\hal.dll
product: Microsoft® Windows® Operating System
company: Microsoft Corporation
description: Hardware Abstraction Layer DLL

Bug check description: This bug check indicates that a fatal hardware error has occurred. This bug check uses the error data that is provided by the Windows Hardware Error Architecture (WHEA).

This is likely to be caused by a hardware problem.

The crash took place in the Windows kernel. Possibly this problem is caused by another driver that cannot be identified at this time.

--------------------------------------------------------------------------------
Conclusion
--------------------------------------------------------------------------------

2 crash dumps have been found and analyzed. No offending third party drivers have been found. Connsider using WhoCrashed Professional which offers more detailed analysis using symbol resolution. Also configuring your system to produce a full memory dump may help you.


Edited by g__day, 17 March 2018 - 01:05 PM.

Talent + Integrity = Atomic!

#17 Rybags

Rybags

    Immortal

  • Super Hero
  • 35,847 posts

Posted 17 March 2018 - 01:50 PM

OK, then the thing to do now would be to backup your personal data in case they decide to do a Windoze reinstall without bothering to preserve anything.



#18 Master_Scythe

Master_Scythe

    Titan

  • Hero
  • 20,571 posts
  • Location:QLD

Posted 19 March 2018 - 09:41 AM

Do the stress testing I listed anyway.

 

It'll be nicer and more likely to be resolved if you can find a trigger for the crashes so you can tell them "It always crashes in XYZ at about 30 minutes" so thay can use it as a testing ground.


Wherever you go in life, watch out for Scythe, the tackling IT support guy.

"I don't care what race you are, not one f*cking bit, if you want to be seen as a good people, you go in there and you f*ck up the people who (unofficially) represent you in a negative light!"


#19 Jeruselem

Jeruselem

    Guru

  • Atomican
  • 15,000 posts
  • Location:Not Trump-Land

Posted 19 March 2018 - 10:27 AM

I think it's RAM related. RAM crashes are just random!


“We’re not going to stop the wheel. I’m going to break the wheel.” - Daenerys Targaryen

 

"We have some of the most beautiful hookers in the world" - Putin to Trump


#20 g__day

g__day

    Champion

  • Hero
  • 7,902 posts

Posted 31 March 2018 - 08:23 PM

Asked on Whirlppol forums - they are quite good.  They ran thru my dump file - concurred it was hardware related bug, said the idle CPU voltages seem low - suggested I update my motherboard BIOS and see if a new BIOS better manages these new CPUs.  Did so and no crashes for two weeks :)


Talent + Integrity = Atomic!




0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users