Why BCP and DRP matter for SaaS founders
Business Continuity Planning (BCP) and Disaster Recovery Planning (DRP) are two of the most commonly deferred compliance tasks in early-stage SaaS companies. The reasoning is understandable: you're focused on shipping features and closing customers, not writing plans for scenarios that may never happen.
But there are two concrete reasons to take this seriously earlier than feels comfortable. First, SOC 2's Availability Criterion (A1) and the Common Criteria for logical access and change management both reference continuity capabilities — you can't pass a SOC 2 Type II audit without evidence that your BCP/DRP exists, is tested, and works. Second, GDPR Article 32 requires "a process for regularly testing, assessing and evaluating the effectiveness of technical and organisational measures" — which includes your ability to restore availability of personal data after an incident.
Enterprise sales teams increasingly use BCP/DRP documentation as a filter during vendor due diligence. Not having it written down signals operational immaturity, even if your actual infrastructure is resilient.
BCP vs DRP: what's the difference?
These terms are often used interchangeably but they describe distinct scopes:
- Business Continuity Plan (BCP): the broader plan for how your organisation continues operating during and after a disruption. It covers people, processes, communications, alternative work arrangements, and stakeholder management — not just the technology layer.
- Disaster Recovery Plan (DRP): the technical subset of the BCP. It focuses specifically on restoring IT systems, data, and infrastructure after a disaster. RTOs and RPOs live here.
A complete programme has both. In practice, early-stage SaaS companies often start with a combined BCP/DRP document that covers both. That's fine — what matters is that the content is substantive, not that it's split into separate binders.
Key concepts: RTO and RPO
Two metrics anchor every DRP:
- Recovery Time Objective (RTO): the maximum acceptable time for restoring a system or service after a disruption. If your RTO is 4 hours, your DR plan must demonstrate you can restore service within 4 hours. RTOs drive your infrastructure architecture — active-active, hot standby, or cold standby.
- Recovery Point Objective (RPO): the maximum acceptable amount of data loss measured in time. If your RPO is 1 hour, you need database backups or replication with at most 1-hour lag. RPOs drive your backup frequency and replication strategy.
These are not aspirational numbers — they need to be validated by actual tests. An auditor will ask: "What is your RTO? When did you last test that you can actually meet it?" If you've never run a DR drill, your RTO is theoretical.
What SOC 2 Availability Criteria actually require
SOC 2's Trust Service Criteria includes an Availability Category (A1) that is optional but increasingly included by enterprise customers in their SOC 2 scope requests. The relevant criteria are:
- A1.1: The entity maintains, monitors, and evaluates current processing capacity and use of system components to manage capacity demand.
- A1.2: The entity authorises, designs, develops or acquires, implements, operates, approves, maintains, and monitors environmental protections, software, data back-up processes, and recovery infrastructure to meet its objectives.
- A1.3: The entity tests recovery plan procedures supporting system recovery to meet its objectives. Test results are evaluated and recovery plan procedures are updated based on test results.
The key requirement is A1.3: you must actually test the recovery plan and have evidence of those tests. A plan that's never been tested doesn't satisfy the criterion. Annual table-top exercises (walk-throughs where you simulate a failure scenario) are the minimum; full failover tests are better evidence.
The Common Criteria also touch BCP/DRP in CC7 (System Operations) and CC9 (Risk Mitigation). CC9.1 requires identifying business disruption risks and developing controls to manage them. CC7.5 covers incident response and restoration.
What a minimal viable BCP/DRP document includes
Here's the content an auditor expects to see:
1. Scope and objectives
Define which systems and services are in scope (typically: your production application, customer database, payment processing, and support systems). State your target RTOs and RPOs for each tier. Tier 1 systems (customer-facing production) typically have lower RTOs than Tier 3 systems (internal analytics).
2. Risk assessment and impact analysis
List the disruption scenarios you're planning for:
- Cloud provider regional outage (e.g. AWS us-east-1 down);
- Database corruption or accidental deletion;
- Ransomware / destructive cyber attack;
- Key personnel unavailability (single points of failure);
- Third-party service provider failure (Stripe down, Auth0 down);
- Data centre physical disaster (fire, flood, power loss).
For each scenario, document the estimated impact (downtime, data loss, revenue impact, customer impact) and the likelihood. This becomes your risk matrix.
3. Recovery procedures
For each in-scope system, document step-by-step recovery procedures:
- Who is responsible for declaring a disaster and initiating the DRP;
- How to restore from backups (include actual commands or runbook links);
- How to fail over to a standby environment (if applicable);
- How to verify data integrity post-restoration;
- How to cut over DNS/traffic once recovery is confirmed.
These procedures need to be specific enough for someone who wasn't the original engineer to execute. "Restore from S3" is not a procedure. "Log into AWS console → navigate to S3 bucket [name] → select backup [naming convention] → run [specific restore command] → verify with [test]" is a procedure.
4. Communication plan
Who gets notified when? Document:
- Internal escalation chain (who declares the DR event, who needs to know);
- Customer communication timeline (when and how you notify affected customers);
- Status page update procedures;
- Regulatory notification requirements (GDPR breach notification within 72 hours if personal data is affected — GDPR Art. 33).
5. Testing schedule and results
Document when and how you test. Minimum viable:
- Annual table-top exercise: walk through a disaster scenario with key personnel, identify gaps, update the plan.
- Quarterly backup restoration test: actually restore from a recent backup to a staging environment and verify data integrity.
- Annual failover test (if you have a standby environment): actually fail over and verify RTO is achievable.
Record test dates, participants, outcomes, and any remediation actions. This is what auditors want to see.
6. Roles and responsibilities
Name a Business Continuity Plan Owner (typically the CTO or Head of Engineering) and a Communications Lead (typically the CEO or Head of Operations). Define their responsibilities during an active DR event.
7. Plan maintenance
The plan must be reviewed and updated at least annually and after any major infrastructure change. Date-stamp every review and keep a version history.
BCP/DRP and GDPR
GDPR Article 32 specifically requires "the ability to restore the availability and access to personal data in a timely manner in the event of a physical or technical incident." Your DRP directly addresses this. If your DRP demonstrates you can restore customer data within your stated RPO, you're meeting this GDPR obligation.
Importantly, if a data loss event occurs during a disaster, you may also trigger GDPR's breach notification requirements (Art. 33: notify your DPA within 72 hours; Art. 34: notify affected data subjects if the breach creates a high risk to their rights and freedoms). Your BCP communication plan should include GDPR breach notification procedures.
Practical infrastructure choices that affect your RTO/RPO
- Multi-region deployment: if you're on a single cloud region, your RTO for a regional outage is dependent on that region recovering. Multi-region active-active or active-passive cuts your RTO to minutes for infrastructure failures.
- Database replication: managed databases like AWS RDS, Supabase, and PlanetScale offer automated backups and point-in-time recovery. Enable them. Know what your actual RPO is (the backup lag) — don't assume.
- Infrastructure as Code: if your entire infrastructure is defined in Terraform or Pulumi, you can recreate it in a new region in hours, not days. This is one of the highest-leverage investments in DRP capability.
- Runbooks in version control: DR procedures stored in a repo that your team can access even when your primary systems are down. Don't store runbooks only in Notion or Confluence if those are hosted on infrastructure that could also be affected.
Generate your BCP/DRP documentation
ComplyKit's Incident Response Plan Generator covers the incident phase — detection, containment, GDPR notification, and post-mortem. Combine this with your BCP/DRP for a complete operational resilience programme.
For your pre-audit compliance stack, you'll also need an Information Security Policy (SOC 2 CC6) and a Data Retention Policy (SOC 2 CC6.5 / GDPR Art. 5).