Incident Case Studies — Michael Krawczyk

Work

Case Studies Playbooks Patch Reporting AI Prompt Playbook Proxmox Lab Build

Security

Tools vs Outcomes Risk Management Security Philosophy Security Cheatsheet

How I Work

How I Work Ticket Operating Style

Lab

HomeLab Network Proxmox Lab Build Let's Connect

Portfolio · Incident Investigation & Resolution

Incident
Case Studies

Real-world investigations, root cause analyses, and remediations from managed IT environments. All client details redacted. Each case documents the diagnostic approach, technical findings, and resolution actions taken.

6

Case studies

100%

Resolved

Multi

Platform coverage

Redacted

Client details

Featured Investigation

Featured · Deepest Technical Investigation REDACTED

Security Software · Windows Crash Forensics

ESET Agent BEX64 Crash Loop Destabilizing RMM Communications

Multi-server WER analysis · Kernel driver forensics · DEP violation root cause

Fatal buffer overflow crashes of ERAAgent.exe (build 12.4.1124.0) identified across three servers simultaneously — DATA, HOST, and DOMAIN CONTROLLER roles. Each crash left ESET's kernel filter drivers loaded but unmanaged, causing ScreenConnect and ConnectWise Automate agents to flap and flood monitoring with false-positive offline alerts. Windows Error Reporting analysis confirmed identical crash signatures across machines, pinpointing a defective July 2025 agent build as the root cause.

ESET ERAAgent BEX64 Crash Analysis WER Forensics Kernel Filter Drivers ConnectWise Automate ScreenConnect DEP Violation RMM Reliability

3 SERVERS AFFECTED · FATAL CRASH · IsFatal=1 · RESOLVED View Case Study →

All Case Studies

Compliance & Governance REDACTED

SOC 2 CC7 · NIST 800-171 · Patch Governance

Patch Management Control Effectiveness Assessment

0% compliance discovered · 72-day exposure window · Governance failure

A failed patch cycle on 11/24/2025 went undetected for 72 days, producing 0% patch compliance across 2026. Identified a closed-loop process failure — patching occurred but validation, exception handling, and remediation stages all collapsed. Root cause was a governance gap, not a tooling failure. Full remediation included segregation of duties framework and continuous compliance monitoring.

SOC 2 CC7 NIST 800-171 NinjaOne Patch Compliance Root Cause Analysis Segregation of Duties

0% COMPLIANCE · 72-DAY WINDOW · RESOLVED View Case Study →

Connectivity Investigation · Disputed Root Cause REDACTED

Onboarding Incident · Multi-Incident Report

Agent Connectivity Disruption — Post-Onboarding Investigation

ScreenConnect correlation · Disputed Windows Defender attribution · Session interruption during evidence collection

Within 5 days of onboarding, repeated agent connectivity disruptions clustered in a 4–6PM window. Engineering attributed the issue to Windows Defender with no supporting evidence aligned to the disruption timestamps. Documented a stronger, timestamped correlation with ScreenConnect activity instead. A second incident occurred during evidence collection — an observed RDP session displacement and subsequent reboot while collecting supporting data.

Agent Connectivity ScreenConnect RDP Forensics RMM Onboarding Disputed Root Cause Session Security

2 INCIDENTS · 5-DAY WINDOW · DOCUMENTED View Case Study →

Hardware Investigation · Storage Validation REDACTED

VMware ESXi · HPE Smart Array · Storage Forensics

Drive / RAID Controller Alerts Validated as Monitoring Noise

iLO + ESXi SSH + Smart Array CLI · 7-layer investigation · No hardware fault found

Drive-related "Disk Error: red" entries in NinjaOne triggered alongside memory alerts on an HPE ProLiant Gen10 running ESXi 7.0.3. A 7-step multi-layer investigation traversed iLO firmware, ESXi SSH, Smart Array CLI, logical drive, and physical disk layers. All storage components confirmed healthy. Red entries traced to non-present empty bays and unsupported SMART health lookups on the array LUN — not a hardware fault.

VMware ESXi HPE Smart Array iLO Management RAID Validation NinjaOne SMART Health Monitoring Noise

P408i-a SR GEN10 · ESXi 7.0.3 · STORAGE HEALTHY View Case Study →

EDR Security · Deployment Governance REDACTED

SentinelOne · EDR Deployment · Policy Enforcement

SentinelOne Agent Misconfiguration Analysis & Remediation

Services hung in stopping state · Tamper protection absent · Wrong console instance

SentinelOne services were found stuck in a "stopping" state across multiple servers — reboot-persistent, not transient. Local service stop and process termination were permitted, indicating tamper protection was not enforced. The organization had been onboarded into a new console instance without access being provisioned. Investigation identified incomplete deployment as the root cause, not a product defect.

SentinelOne EDR Deployment Tamper Protection Policy Enforcement Console Access Deployment Governance

SERVICES HUNG · TAMPER OPEN · RESOLVED View Case Study →

Monitoring Operations · Platform Engineering REDACTED

NinjaOne · VMware ESXi · Monitoring Policy Engineering

NinjaOne Monitoring Tuning — VMware Host False Positive Reduction

6 targeted policy changes · Phantom drive alerts eliminated · Critical escalation preserved

Following confirmed storage validation on an HPE ProLiant Gen10 / ESXi 7.0.3 host, NinjaOne continued generating drive alerts and holding the device in a chronic "Needs attention" state. Produced a structured 6-action tuning specification for an L2–L3 administrator — eliminating SMART health false positives, de-emphasizing a known memory warning, converting the uptime alert to maintenance hygiene, and preserving escalation for genuinely critical sensors.

NinjaOne VMware ESXi Monitoring Tuning Alert Noise Reduction Policy Engineering SNMP Sensors

6 ADMIN ACTIONS · L2–L3 SPEC · SIGNAL QUALITY RESTORED View Case Study →