By E. Kohler in automation — 24 Feb 2026

Autonomy Needs Privacy Guardrails: The GitHub Leak That Should Not Have Happened

I automated a backup, accidentally staged the wrong things, and had to delete history. Here is the rollout discipline that keeps this from happening again.

I used to think the risky part of autonomous systems was the model.

It is not.

The risky part is the plumbing around it. Cron jobs. Backup scripts. Default behaviors you stop thinking about because they usually work.

Recently I had a moment where a scheduled backup produced commits on GitHub that should never have existed. Nothing dramatic, but enough to trigger the one reflex every IT manager should have: treat it like a containment problem, not like a cleanup task.

This post is the sanitized version of what I learned, without private details, without infrastructure specifics, and without turning it into a vendor story.

The failure mode: a workspace is not a repository

A common pattern is to keep a big workspace folder where many projects live. In my case, that folder contains multiple nested git repositories.

If you run a cron job in that workspace root and use broad staging, you have two problems:

- You can accidentally stage temporary files, operational notes, logs, or other local state.

- You can accidentally capture nested repositories as gitlinks (submodule-like entries).

At that point your backup is neither a backup nor a clean commit. It is a random snapshot of whatever happened to be on disk.

The rule that matters

If you only remember one thing, make it this:

Never run `git add -A` in a workspace root.

If a job is unattended, it should never introduce new paths. It should only record expected, already tracked changes.

The safe pattern is:


# stage tracked modifications and deletions only
git add -u

The guardrail: daily privacy scanning

I do not want to rely on memory or discipline under stress. So I implemented a simple privacy scan that runs daily.

It checks two things across all local repos in the workspace:

1. Forbidden paths in git history (examples: MEMORY.md, TODO.md, heartbeat files, venvs, tmp folders).

2. Suspicious remote branches or tags that look like backups or local safety refs.

The script is safe by default. It does not delete anything automatically. It produces a report.

The script


#!/usr/bin/env bash
set -euo pipefail

WORKSPACE="$HOME/.openclaw/workspace"
REPORT_DIR="$WORKSPACE/cron/reports"
mkdir -p "$REPORT_DIR"

TS=$(date +%Y-%m-%d_%H-%M-%S)
REPORT="$REPORT_DIR/github-privacy-scan_$TS.txt"

FORBIDDEN_RE='^(MEMORY\.md|TODO\.md|HEARTBEAT\.md|AGENTS\.md|SOUL\.md|USER\.md|TOOLS\.md|IDENTITY\.md|RUNBOOK\.md|STATUS\.md|ESCALATION\.md|SECURITY\.md|FINISHED\.md|memory/|\.venv-tools/|\.tmp/|cron/|skills/|\.openclaw/)' 
SUSP_BRANCH_RE='^(backup/|local/|tmp/|wip/|debug/|test/)' 

{
  echo "GitHub Privacy Scan"
  echo "Time: $(date -Iseconds)"
  echo

  found=0

  while IFS= read -r repo; do
    name=$(basename "$repo")
    echo "== Repo: $name =="

    if git -C "$repo" log --all --name-only --pretty=format: | awk 'NF' | sort -u | grep -Eiq "$FORBIDDEN_RE"; then
      echo "[ALERT] Forbidden path(s) found in history"
      git -C "$repo" log --all --name-only --pretty=format: | awk 'NF' | sort -u | grep -Ei "$FORBIDDEN_RE" | sed 's/^/  - /'
      found=1
    else
      echo "[OK] No forbidden paths in history"
    fi

    if git -C "$repo" remote get-url origin >/dev/null 2>&1; then
      heads=$(git -C "$repo" ls-remote --heads origin 2>/dev/null | awk '{print $2}' | sed 's#refs/heads/##') || true
      if echo "$heads" | grep -Eiq "$SUSP_BRANCH_RE"; then
        echo "[ALERT] Suspicious remote branches"
        echo "$heads" | grep -Ei "$SUSP_BRANCH_RE" | sed 's/^/  - /'
        found=1
      else
        echo "[OK] No suspicious remote branches"
      fi

      tags=$(git -C "$repo" ls-remote --tags origin 2>/dev/null | awk '{print $2}' | sed 's#refs/tags/##' | sed 's/\^{}//g') || true
      if echo "$tags" | grep -Eiq "$SUSP_BRANCH_RE"; then
        echo "[ALERT] Suspicious remote tags"
        echo "$tags" | grep -Ei "$SUSP_BRANCH_RE" | sed 's/^/  - /'
        found=1
      else
        echo "[OK] No suspicious remote tags"
      fi
    fi

    echo
  done < <(find "$WORKSPACE" -maxdepth 2 -type d -name .git -prune -print | sed 's#/.git$##' | sort)

  if [[ "$found" -eq 1 ]]; then
    echo "RESULT: ALERTS_FOUND"
  else
    echo "RESULT: CLEAN"
  fi
} | tee "$REPORT"

The cron job

This runs every day at 03:00:


0 3 * * * bash ~/.openclaw/workspace/cron/scripts/github-privacy-scan.sh >/dev/null 2>&1

Rollout discipline for autonomous systems

My takeaway is simple.

If you automate operational work, treat it like production:

- preflight validation before restarts

- safe defaults for staging and publishing

- scanning and reporting before deletion

- explicit rollback paths

Autonomy without guardrails is just faster incident response.

Autonomy Needs Privacy Guardrails: The GitHub Leak That Should Not Have Happened

The failure mode: a workspace is not a repository

The rule that matters

The guardrail: daily privacy scanning

The script

The cron job

Rollout discipline for autonomous systems

How I Run an Autonomous AI Assistant Without Losing Control

AEGIS: The Macro Perspective - Sovereignty, Green IT, and Trust

The failure mode: a workspace is not a repository

The rule that matters

The guardrail: daily privacy scanning

The script

The cron job

Rollout discipline for autonomous systems

How I Run an Autonomous AI Assistant Without Losing Control

AEGIS: The Macro Perspective - Sovereignty, Green IT, and Trust

You might also like...