A security incident is the wrong time to figure out your security incident process. Every minute of confusion — who has authority to take the system offline, who calls legal, where the runbooks live — is a minute an attacker spends in your systems. Organizations that respond well to security incidents are the ones that prepared before the event.
This article walks through building a practical incident response playbook: what to include, how to structure it, and specific technical procedures your team should have ready.
What Makes a Good Playbook
A playbook is not a policy document. It is an operational guide written for the person on call at 2am who just got paged about something alarming. That means:
- Concrete steps: Not "investigate the incident" but "run this query to find affected accounts"
- Clear decision trees: If X, do Y. If Y reveals Z, escalate to person A
- Pre-approved authorities: Who can take the application offline without a manager's approval
- Contact lists with backup contacts: Primary responders get sick; know who the second call is
- Defined roles: One person coordinates; others investigate; someone handles communication
Incident Severity Classification
Not all incidents are equal. Define severity levels upfront so responders know which procedures to invoke:
SEV-1 (Critical)
- Active data breach with confirmed exfiltration
- Ransomware or destructive malware active in systems
- Complete service outage caused by an attack
- Customer PII or payment data confirmed exposed
Actions: Immediate executive and legal notification, consider taking system offline
Response time: Immediate (24/7)
SEV-2 (High)
- Suspected breach under investigation
- Unauthorized access confirmed, scope unknown
- Credentials for production systems confirmed compromised
- Vulnerability actively exploited in production
Actions: Security lead + on-call engineer, legal notification if data involved
Response time: Within 1 hour
SEV-3 (Medium)
- Unusual access patterns detected, no confirmed breach
- Vulnerability discovered in production, no exploitation detected
- External security researcher report
Actions: Security team during business hours, document and investigate
Response time: Within 4 hours
SEV-4 (Low)
- Failed attack attempts (port scans, brute force attempts)
- Phishing emails targeting employees
- Minor policy violations
Actions: Document, analyze, improve controls
Response time: Within 24 hours
The Incident Response Lifecycle
Formalize the six phases of response:
1. Preparation
This is everything you do before an incident occurs. The playbook itself is preparation. So is:
- Deploying centralized logging before you need to investigate logs
- Establishing an out-of-band communication channel (if your email is compromised, you need a backup)
- Running tabletop exercises to find gaps in your playbook
- Maintaining an up-to-date asset inventory
2. Detection and Analysis
Write specific detection queries for common attack patterns. Store these in your runbook so responders can run them immediately.
-- Detect unusual authentication patterns (many failed logins followed by success)
SELECT
user_id,
ip_address,
COUNT(CASE WHEN event_type = 'login_failed' THEN 1 END) AS failed_attempts,
MAX(CASE WHEN event_type = 'login_success' THEN created_at END) AS success_at
FROM audit_logs
WHERE created_at > NOW() - INTERVAL 1 HOUR
AND event_type IN ('login_failed', 'login_success')
GROUP BY user_id, ip_address
HAVING failed_attempts > 10
AND success_at IS NOT NULL
ORDER BY failed_attempts DESC;
-- Detect bulk data access (possible exfiltration)
SELECT
user_id,
COUNT(*) AS records_accessed,
MIN(created_at) AS first_access,
MAX(created_at) AS last_access
FROM audit_logs
WHERE event_type = 'data_accessed'
AND created_at > NOW() - INTERVAL 1 HOUR
GROUP BY user_id
HAVING records_accessed > 1000
ORDER BY records_accessed DESC;
# Check for recently created admin accounts
php artisan tinker --execute="
User::where('role', 'admin')
->where('created_at', '>', now()->subDays(7))
->get(['id', 'email', 'created_at', 'created_by']);
"
3. Containment
Containment stops the bleeding. Short-term containment prioritizes stopping ongoing damage, sometimes at the cost of evidence:
// Emergency: Revoke all sessions for a compromised user
public function revokeAllSessions(User $user, string $incidentId): void
{
DB::table('sessions')
->where('user_id', $user->id)
->delete();
// Revoke all API tokens
$user->tokens()->delete();
// Force password reset on next login
$user->forceFill(['force_password_reset' => true])->save();
// Log the action for the incident record
Log::channel('security')->warning('Emergency session revocation', [
'user_id' => $user->id,
'incident_id' => $incidentId,
'revoked_by' => auth()->id(),
'timestamp' => now()->toIso8601String(),
]);
}
// Emergency: Take a user's account offline entirely
public function suspendAccount(User $user, string $incidentId, string $reason): void
{
$this->revokeAllSessions($user, $incidentId);
$user->forceFill([
'account_suspended' => true,
'suspended_at' => now(),
'suspension_reason' => $reason,
'suspended_by' => auth()->id(),
])->save();
}
For infrastructure-level containment, have these commands documented and tested:
# Isolate a compromised EC2 instance (replace security group with isolation group)
aws ec2 modify-instance-attribute \
--instance-id i-0123456789abcdef0 \
--groups sg-isolation-00000000
# The isolation security group should allow:
# - No inbound traffic
# - No outbound traffic
# This preserves the instance for forensics while cutting it off
# Enable enhanced logging on the instance while investigating
aws ec2 monitor-instances --instance-ids i-0123456789abcdef0
4. Eradication
Eradication removes the attacker's foothold. This must be thorough — partial eradication and the attacker is back within hours.
Checklist for account compromise:
[ ] Identify how the account was compromised (phishing, credential stuffing, stolen token)
[ ] Rotate ALL credentials for the affected user — not just the known compromised one
[ ] Check for persistence mechanisms (new API keys, new OAuth apps, forwarding rules on email)
[ ] Review all actions taken by the account during the compromise window
[ ] Identify any other accounts the compromised account had access to
[ ] Rotate credentials for those downstream accounts
[ ] Review audit logs for evidence of lateral movement
Checklist for server compromise:
[ ] Capture forensic snapshot before any changes
[ ] Check for new user accounts (cat /etc/passwd)
[ ] Check for modified system binaries (debsums -c or rpm -Va)
[ ] Check crontabs for all users (crontab -l -u <user> for each user)
[ ] Check for new SSH authorized_keys
[ ] Check running processes for unexpected entries
[ ] Check outbound network connections for C2 beacons
[ ] Rebuild from known-good image rather than attempting to clean
5. Recovery
Recovery restores normal operations. Document the verification criteria before you declare recovery complete:
Recovery verification checklist:
[ ] Replaced compromised credentials are deployed and working
[ ] Monitoring shows no further anomalous access patterns
[ ] Security team has reviewed the eradication steps and signed off
[ ] Legal/compliance has been notified per regulatory requirements
[ ] Affected users/customers notified (if required)
[ ] New controls preventing recurrence are in place or scheduled
[ ] System behavior has been normal for minimum of [X] hours under monitoring
6. Post-Incident Review
Conduct a blameless review within 5 business days. The goal is to learn, not to assign fault.
Post-incident report template:
# Incident Report: [Incident ID] - [Title]
**Date of incident:** [date]
**Duration:** [detection time] to [resolution time]
**Severity:** SEV-[1-4]
**Authored by:** [name]
**Review date:** [date]
## Summary
[2-3 sentence non-technical summary suitable for executives]
## Timeline
| Time | Event |
|------|-------|
| HH:MM | First anomalous signal detected |
| HH:MM | Incident declared |
| HH:MM | Containment complete |
| HH:MM | Eradication complete |
| HH:MM | Recovery complete |
## Root Cause
[What was the underlying cause?]
## Contributing Factors
[What conditions allowed this to happen?]
## What Went Well
[Be specific — this is how you know what processes to keep]
## What Could Be Improved
[Be specific — these become action items]
## Action Items
| Item | Owner | Due Date | Priority |
|------|-------|----------|----------|
| Implement MFA enforcement | @devops | 2026-04-30 | P1 |
| Add alerting for bulk data access | @security | 2026-05-15 | P2 |
Communication Templates
Prepare communication templates in advance. Under pressure, writing a customer-facing notification from scratch leads to mistakes.
Internal incident declaration:
Subject: [SECURITY INCIDENT - SEV-{level}] {brief description}
Incident declared at {time} UTC.
Severity: SEV-{level}
Description: {one sentence}
Incident commander: {name}
Communication channel: {link to incident slack channel or bridge}
All responders please join {channel}. Do not discuss via email.
External customer notification (data breach):
Subject: Important Security Notice
We are writing to inform you of a security incident that may have affected
your account.
What happened: [Clear, non-technical description]
When it happened: [Date range]
What information was involved: [Specific data types]
What we have done: [Remediation steps taken]
What you should do: [Specific actions for customers]
We apologize for this incident and take the security of your data seriously.
If you have questions, contact {security email}.
Keep Your Playbook Current
A playbook written once and never revisited is nearly as dangerous as no playbook. Build maintenance into your process:
- Review and update after every incident
- Run a tabletop exercise at least twice a year
- Update contact lists quarterly
- Review detection queries against new attack patterns
- Test recovery procedures annually (can you actually restore from backup in the documented time?)
The best time to build your incident response playbook was six months ago. The second best time is today.
Building secure, reliable systems? We help teams deliver software they can trust. scopeforged.com