Windows Kernel Mode Internals, Debugging and Dump Analysis

5 days lecture + hands-on lab

Target Audience

Driver Developers, Support Engineers and Software QA Engineers

Description

This course teaches architecture, internals and of the Windows operating system with emphasis on production debugging of kernel mode drivers. It helps attendees understand the behind the scenes working of the Windows operating system and debug common crashes and hangs that occur during kernel mode code execution.

The hands-on lab familiarizes attendees with the debugging and instrumentation tools, relevant debugger extension commands, interpretation of the command's output to investigate the state of device drivers and the system, debugging techniques to isolate faulting modules and root cause crashes and hangs caused by drivers.

Pre-requisites

Attendees must be familiar with the basic usage of the debugger (Debugging Tools for Windows). Basic usage includes symbol server, debugger commands for displaying call stacks, data structures, memory contents and system information. To get the most value from the course attendees must be familiar with Windows kernel API and C programming language.

Goals

Upon completion of this course attendees would be able to:

Configure the host and target systems for live kernel debugging. Apply live debugging techniques to debug kernel mode issues.

Understand the effect enabling driver verifier has on debugging and learn techniques to debug such failures. Identify the Driver Verifier settings required to debug different categories of problems.

Understand the memory dump generation mechanism and configure the system to generate memory dumps for hangs and crashes.

Interpret the information displayed by the debuggers’ automated analyzer and to identify subsequent analysis steps on memory dumps.

Understand how hardware failures cause system bug-checks and debug WHEA_UNCORRECTABLE_ERROR bugchecks.

Understand IRQLs, restrictions imposed by IRQLs and debug IRQL related problems like IRQL_NOT_LESS_OR_EQUAL bug-checks.

Understand the different execution contexts and conditions under which kernel mode drivers execute. Deploy tools and apply techniques for debugging performance issues, intermittent CPU spikes and consistent high CPU usage in the system.

Understand APCs, critical and guarded regions, process attachment and use this knowledge to debug bug-checks like APC_INDEX_MISMATCH, KERNEL_APC_PENDING_DURING_EXIT etc.

Understand system worker threads, work items and identify worker thread depletion issues causing system hangs.

Understand the principles of synchronization in kernel mode code and the different synchronization options that are provided by the kernel. Identify normal vs. stuck threads, modules and resources involved in kernel mode deadlocks and root cause the deadlocks.

Understand the distribution and utilization of system virtual address space. Identify causes of system PTE depletion and debug NO_MORE_SYSTEM_PTES bug-checks. Debug invalid memory accesses leading to bug-checks like KMODE_EXCEPTION_NOT_HANDLED and PAGE_FAULT_IN_NONPAGED_AREA.

Understand the stack usage in the kernel, stack jumping, causes of stack overflows and debug issues like double faults.

Understand the layout, types and utilization of pool memory. Debug pool depletion indicated by Event IDs 2010 and 2020. Detect corrupted data structures, identify the scope of corruption, isolate modules that are responsible for the corruption and debug BAD_POOL_CALLER bugchecks.

Understand how virtual memory is mapped to physical memory on X86 and X64 systems, memory locking, mapping, memory descriptor lists and debug issues related to DMA.

Understand key I/O manager data structures and navigate between them. Understand the interactions between device drivers the I/O manager. Find and identify drivers blocking I/O requests leading to system hangs. Debug bug-checks like PROCESS_HAS_LOCKED PAGES, NO_MORE_IRP_STACK_LOCATIONS, MULTIPLE_IRP_COMPLERE_REQUESTS etc. and identity drivers that cause them.

Understand behind the scenes working of PnP and Power transitions and how device drivers respond to PnP and Power state changes. Identify driver and stacks responsible for blocking power IRPs resulting in DRIVER_POWER_STATE_FAULIRE bug-checks.

Topics

Kernel Mode Debugging Tools
Debugging Tools for Windows
Collecting System Information
Gflags for Kernel Debugging
Performance Analysis Tools
Driver Verifier
Driver Verifier Logging
Kernel Architecture
Kernel Mode Components
System Service Dispatching
Process Context
Thread Context
Exception Handling
Trap Frames
Task State Segment (TSS)
Context Structures
System Bug-Checks
Dump Generation
Live Debugging
Memory Dump Generation
Memory Dump Types & Contents
Hang vs. Crash Dumps
Types of Crashes
Memory Dump Navigation
Dump Analysis
Common Analysis Steps
Register Contexts
Analyzing System State
Identifying Faulting Modules
Hardware Failures
Access violations
Assembly language
Call Stacks
Kernel Mechanisms
Processor Control Region (PCR)
IRQLs
Interrupt Service Routines (ISRs)
Deferred Procedure Calls (DPCs)
Asynchronous Procedure Calls (APCs)
Intermittent CPU Spikes
High CPU Usage
System Threads
Work Items
Memory Manager
Kernel Virtual Address Space
Dynamic Kernel Space Management
SysPTE Depletion
Kernel Stacks
Stack Overflows and Double Faults
Page Table Entries (PTEs)
Page Frame Number (PFN) Database
Memory Descriptor Lists (MDLs)
Direct Memory Access (DMA) Issues
Pools and Look-aside Lists
Pool Corruption
Pool Depletion
Kernel Synchronization
Dispatcher Objects
Fast Mutexes and Guarded Mutexes
ERESOURCEs
Deadlocks
Spin Locks
Queued Spin Locks
Livelocks
I/O Manager
Driver Architecture & Entry Points
I/O Manager Data Structures
I/O Request Packet (IRP) Flow
IRP Processing
Synchronous and Asynchronous I/O Processing
Completion Routines
Stuck I/O Requests
Cancel Routines
I/O Cancelation Hangs
PnP & Power
Driver, Device Types and Device Nodes
Device Object Layering
PnP IRPs and PnP State Transitions
Device Enumeration
Device Startup Failures
Device Manager Error Codes
System and Device and CPU Power States
Power IRPs and Power State Transitions
Idle Power Management
Remote Wakeup
Power Watchdog Timeouts