Autonomy Needs Privacy Guardrails: The GitHub Leak That Should Not Have Happened
I automated a backup, accidentally staged the wrong things, and had to delete history. Here is the rollout discipline that keeps this from happening again.
I used to think the risky part of autonomous systems was the model.
It is not.
The risky part is the plumbing around it. Cron jobs. Backup scripts. Default behaviors you stop thinking about because they usually work.
Recently I had a moment where a scheduled backup produced commits on GitHub that should never have existed. Nothing dramatic, but enough to trigger the one reflex every IT manager should have: treat it like a containment problem, not like a cleanup task.
This post is the sanitized version of what I learned, without private details, without infrastructure specifics, and without turning it into a vendor story.
The failure mode: a workspace is not a repository
A common pattern is to keep a big workspace folder where many projects live. In my case, that folder contains multiple nested git repositories.
If you run a cron job in that workspace root and use broad staging, you have two problems:
- You can accidentally stage temporary files, operational notes, logs, or other local state.
- You can accidentally capture nested repositories as gitlinks (submodule-like entries).
At that point your backup is neither a backup nor a clean commit. It is a random snapshot of whatever happened to be on disk.
The rule that matters
If you only remember one thing, make it this:
Never run `git add -A` in a workspace root.
If a job is unattended, it should never introduce new paths. It should only record expected, already tracked changes.
The safe pattern is:
# stage tracked modifications and deletions only
git add -u
The guardrail: daily privacy scanning
I do not want to rely on memory or discipline under stress. So I implemented a simple privacy scan that runs daily.
It checks two things across all local repos in the workspace:
1. Forbidden paths in git history (examples: MEMORY.md, TODO.md, heartbeat files, venvs, tmp folders).
2. Suspicious remote branches or tags that look like backups or local safety refs.
The script is safe by default. It does not delete anything automatically. It produces a report.
The script
#!/usr/bin/env bash
set -euo pipefail
WORKSPACE="$HOME/.openclaw/workspace"
REPORT_DIR="$WORKSPACE/cron/reports"
mkdir -p "$REPORT_DIR"
TS=$(date +%Y-%m-%d_%H-%M-%S)
REPORT="$REPORT_DIR/github-privacy-scan_$TS.txt"
FORBIDDEN_RE='^(MEMORY\.md|TODO\.md|HEARTBEAT\.md|AGENTS\.md|SOUL\.md|USER\.md|TOOLS\.md|IDENTITY\.md|RUNBOOK\.md|STATUS\.md|ESCALATION\.md|SECURITY\.md|FINISHED\.md|memory/|\.venv-tools/|\.tmp/|cron/|skills/|\.openclaw/)'
SUSP_BRANCH_RE='^(backup/|local/|tmp/|wip/|debug/|test/)'
{
echo "GitHub Privacy Scan"
echo "Time: $(date -Iseconds)"
echo
found=0
while IFS= read -r repo; do
name=$(basename "$repo")
echo "== Repo: $name =="
if git -C "$repo" log --all --name-only --pretty=format: | awk 'NF' | sort -u | grep -Eiq "$FORBIDDEN_RE"; then
echo "[ALERT] Forbidden path(s) found in history"
git -C "$repo" log --all --name-only --pretty=format: | awk 'NF' | sort -u | grep -Ei "$FORBIDDEN_RE" | sed 's/^/ - /'
found=1
else
echo "[OK] No forbidden paths in history"
fi
if git -C "$repo" remote get-url origin >/dev/null 2>&1; then
heads=$(git -C "$repo" ls-remote --heads origin 2>/dev/null | awk '{print $2}' | sed 's#refs/heads/##') || true
if echo "$heads" | grep -Eiq "$SUSP_BRANCH_RE"; then
echo "[ALERT] Suspicious remote branches"
echo "$heads" | grep -Ei "$SUSP_BRANCH_RE" | sed 's/^/ - /'
found=1
else
echo "[OK] No suspicious remote branches"
fi
tags=$(git -C "$repo" ls-remote --tags origin 2>/dev/null | awk '{print $2}' | sed 's#refs/tags/##' | sed 's/\^{}//g') || true
if echo "$tags" | grep -Eiq "$SUSP_BRANCH_RE"; then
echo "[ALERT] Suspicious remote tags"
echo "$tags" | grep -Ei "$SUSP_BRANCH_RE" | sed 's/^/ - /'
found=1
else
echo "[OK] No suspicious remote tags"
fi
fi
echo
done < <(find "$WORKSPACE" -maxdepth 2 -type d -name .git -prune -print | sed 's#/.git$##' | sort)
if [[ "$found" -eq 1 ]]; then
echo "RESULT: ALERTS_FOUND"
else
echo "RESULT: CLEAN"
fi
} | tee "$REPORT"
The cron job
This runs every day at 03:00:
0 3 * * * bash ~/.openclaw/workspace/cron/scripts/github-privacy-scan.sh >/dev/null 2>&1
Rollout discipline for autonomous systems
My takeaway is simple.
If you automate operational work, treat it like production:
- preflight validation before restarts
- safe defaults for staging and publishing
- scanning and reporting before deletion
- explicit rollback paths
Autonomy without guardrails is just faster incident response.