Tuesday, January 29, 2013

Troubleshooting Configuration Manager 2012/2007 and SMS 2003 unhandled exceptions (Crash Dump)


I have experienced most of the time when SMS Executive stop and when I see in logs folder, crash folder, and I found SMS Executive service again crashed. This is happening due to unhandled exception.
What is an unhandled exception?
In almost every Configuration Manger crash there is an exception involved. An exception occurs when an instruction is attempted but fails for some reason (e.g. an Access Violation), so when an exception occurs we need information about that exception or what was in memory when the exception occurred.
Most applications have their own exception handling code and Configuration Manager is no different. Configuration Manager has its own exception handler that is designed to collect certain predefined data such as thread stack information and other data when the exception has occurred. Note that it is also sometimes necessary to do live debugging or post mortem debugging when an application/OS crashes using the Windows debugging tools.
Components that could cause unhandled exception
SMS Executive: SMSEXEC.EXE is the main service that calls many threads. Any running thread will terminate SMS_EXECUTIVE service if an exception occurs in the thread, and the Configuration Manager site server exception handler will collect the required data.
Data collected when Configuration Manger site server encounters an exception
- A log file (CRASH.LOG) that details the thread stacks and very basic information.
- All current .LOG files from the \LOGS folder. These are saved in the \LOGS\CRASHDUMPS\YYYYMMDD_000XX folder where YYYYMMDD is the date when the crash occurred and XX represents the number of crashes in that day.
- An individual thread log for every component at the time of the failure. These files have no extension but can be viewed in any text editor or SMS Trace or CM Trace.
Depending on the nature of crash and current memory conditions, not all of the above information will be captured. Here’s an example:
With this in mind, here are some steps you can do if you experience one of these crashes:
1. Check the LOGS\CRASHDUMPS\CRASH.LOG file and make a note of the failing component and thread ID.

2. Locate the <component>_thread_<thread number> in \Logs and open in a text editor such as Notepad.
3. Look at the bottom of the log to identify the last thing the component was doing when the crash occurred.
4. Take corrective action based on what was occurring. Often there will be a reference in the log to a specific file or object that is causing the crash.
NOTE If nothing useful is found in the log file, a memory dump could be used to analyze the issue deeper.
In our example, examining the CRASH.LOG shows the following:
EXCEPTION INFORMATION
Time = 08/29/2012 17:28:47.406
Service name = SMS_EXECUTIVE
Thread name = SMS_AD_SYSTEM_DISCOVERY_AGENT
Executable = D: \Microsoft Configuration Manager\bin\i386\smsexec.exe
Process ID = 11789 (0x2E0D)
Thread ID = 13565 (0x33FD)
Instruction address = 77bd8efa
Exception = c0000005 (EXCEPTION_ACCESS_VIOLATION)
Description = "The thread tried to read from the virtual address 00000000 for which it does not have the appropriate access."
Raised inside CService mutex = No
Examining the corresponding <component>_thread_<thread number> we can see the following:
Starting the data discovery. SMS_AD_SYSTEM_DISCOVERY_AGENT
INFO: Processing search path: 'LDAP://OU=xxx ,OU=xx,DC=GLOBAL,DC=xx,DC=xx'. SMS_AD_SYSTEM_DISCOVERY_AGENT
INFO: Full synchronization requested SMS_AD_SYSTEM_DISCOVERY_AGENT
INFO: DC DNS name = 'FQDN' SMS_AD_SYSTEM_DISCOVERY_AGENT
So by looking at this it becomes apparent that the Active Directory System Discovery method is causing the exception to occur. From this point you could continue troubleshooting the cause of the issue with Active Directory System Discovery, or perhaps if this is a secondary site you could disable the Active Directory System Discovery if you do not need it.

No comments:

Post a Comment

PXE Issue after SCCM CB 1806 upgrade

Recently i had upgraded my SCCM environment 1806 and after upgrade suddenly all PXE servers stopped working. While initiating the PXE ...