Research: Self-Mutating Files

Research: Self-Mutating Files


This document contains some research into “self-modifying” programs. Essentially the goal is to create a single file in C which is able to run and update some arbitrary state which is encoded into it’s binary.

Source Code

To compile the source code below and run:

# compile the binary program
gcc self_memory.c -o self_memory

./self_memory   # run program ("Run count 1")
./self_memory   # run program ("Run count 2")
./self_memory   # run program ("Run count 3")

Here is the original C source code (system is targetd for MacOS / M1 chip)

// self_memory.c
// A program that remembers how many times it has been run by appending
// state past the end of the signed Mach-O binary content.
//
// macOS only signature-verifies the Mach-O content, not bytes appended
// after it — so we can safely read/write state there without breaking
// the signature or needing to re-sign.
//
// State block layout:
//   [StateHeader (12 bytes)][body (header.length bytes)]
//
// Compile:  gcc self_memory.c -o self_memory
// Run:      ./self_memory

#include <mach-o/dyld.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <time.h>
#include <unistd.h>

/** --- Definitions --- **/

#define MAGIC_LEN 4
#define MAGIC_BYTES {0xDE, 0xAD, 0xBE, 0xEF}

// Fixed header — always written at the start of our appended block
typedef struct {
  uint8_t magic[MAGIC_LEN]; // sentinel to confirm block exists
  uint32_t version;         // schema version, for future changes
  uint32_t length;          // byte length of the body that follows
} StateHeader;

// Dynamic body — add/remove fields here as needed
typedef struct {
  int32_t run_count;
  int64_t last_run; // unix timestamp of last run
} StateBody;

/** --- Binary helpers --- **/

// Get path to our own executable
static int get_exe_path(char *buf, uint32_t size) {
  if (_NSGetExecutablePath(buf, &size) != 0) {
    fprintf(stderr, "Error: executable path buffer too small\n");
    return -1;
  }
  return 0;
}

// Open our own binary for reading and writing
static FILE *open_self(char *path_buf, uint32_t buf_size) {
  if (get_exe_path(path_buf, buf_size) != 0)
    return NULL;
  FILE *f = fopen(path_buf, "r+b");
  if (!f)
    perror("fopen");
  return f;
}

// Flush and close a file
static void flush_and_close(FILE *f) {
  fflush(f);
  fsync(fileno(f));
  fclose(f);
}

/** --- State block helpers --- **/

// Check if a header has a valid magic sentinel
static int header_is_valid(const StateHeader *h) {
  static const uint8_t magic[] = MAGIC_BYTES;
  return memcmp(h->magic, magic, MAGIC_LEN) == 0;
}

// Seek to end of file and return file size
static long seek_to_end(FILE *f) {
  fseek(f, 0, SEEK_END);
  return ftell(f);
}

// Find offset of our state body, or -1 if not present
// Header lives at EOF - sizeof(StateHeader), body lives before it
static long find_block_offset(FILE *f, long file_size) {
  if (file_size < (long)(sizeof(StateHeader) + sizeof(StateBody)))
    return -1;

  StateHeader h;
  fseek(f, file_size - sizeof(StateHeader), SEEK_SET);
  fread(&h, sizeof(StateHeader), 1, f);

  if (!header_is_valid(&h))
    return -1;

  // Body starts before the header
  return file_size - sizeof(StateHeader) - h.length;
}

// Read state body from file at given block offset
// Layout on disk: [body][header] — body comes first, header at end
static int read_body(FILE *f, long block_offset, StateHeader *h,
                     StateBody *body) {
  fseek(f, block_offset, SEEK_SET);
  uint32_t read_len =
      h->length < sizeof(StateBody) ? h->length : sizeof(StateBody);
  memset(body, 0, sizeof(StateBody)); // zero out in case schema grew
  fread(body, read_len, 1, f);
  return 0;
}

// Write body first, then header — header lives at EOF so we can always find it
static void write_block(FILE *f, const StateBody *body) {
  static const uint8_t magic[] = MAGIC_BYTES;

  StateHeader h;
  memcpy(h.magic, magic, MAGIC_LEN);
  h.version = 2;
  h.length = sizeof(StateBody);

  fwrite(body, sizeof(StateBody), 1, f); // body first
  fwrite(&h, sizeof(StateHeader), 1, f); // header last (findable at EOF)
}

/** --- Main --- **/

int main() {
  char path[4096]; // NOTE: large enough for any macOS path
  FILE *f = open_self(path, sizeof(path));
  if (!f)
    return 1;

  long file_size = seek_to_end(f);
  long block_offset = find_block_offset(f, file_size);

  StateHeader h;
  StateBody body = {0};

  if (block_offset != -1) {
    // Re-read header so read_body has the length
    fseek(f, file_size - sizeof(StateHeader), SEEK_SET);
    fread(&h, sizeof(StateHeader), 1, f);
    read_body(f, block_offset, &h, &body);

    char time_buf[64];
    time_t t = (time_t)body.last_run;
    strftime(time_buf, sizeof(time_buf), "%Y-%m-%d %H:%M:%S", localtime(&t));
    printf("Run count: %d (last run: %s)\n", body.run_count, time_buf);
  } else {
    printf("Run count: 0 (first run — appending state block)\n");
  }

  // Update state
  body.run_count++;
  body.last_run = (int64_t)time(NULL);

  // Write back — overwrite in place or append fresh
  if (block_offset != -1) {
    fseek(f, block_offset, SEEK_SET);
  } else {
    fseek(f, 0, SEEK_END);
  }

  write_block(f, &body);
  flush_and_close(f);

  printf("Updated run count to %d\n", body.run_count);
  return 0;
}

The problem

When macOS runs a binary, it verifies the code signature against the Mach-O content — the structured binary format that contains your compiled code, data segments, and metadata. Any modification to this region invalidates the signature and the OS kills the process.

This is why our earlier attempts failed:

  • Patching bytes inside __TEXT or __DATA segments → signature mismatch → killed
  • Re-signing after patching → OS restores from signed snapshot on next load → state reset

Why appending works

The macOS code signing system only verifies up to the end of the Mach-O content, which is recorded in the binary’s own header. Bytes written past that boundary are completely ignored by the signature verification process.

┌─────────────────────────────┬──────────────────┬─────────────┐
│        Mach-O content       │  Code Signature  │  Our state  │
│   (__TEXT, __DATA, etc.)    │  (verified ✓)    │  (ignored)  │
└─────────────────────────────┴──────────────────┴─────────────┘

                                             We write here

The OS loads the binary, verifies the Mach-O region, and simply doesn’t care about anything after it. Our 8-byte state block sits in that unchecked region, free to be read and written between runs.


The state block layout

We append exactly 8 bytes at the end of the binary:

Offset +0   DE AD BE EF   ← magic sentinel (4 bytes)
Offset +4   XX XX XX XX   ← run_count as int32 (4 bytes)

The magic sentinel 0xDEADBEEF serves two purposes:

  1. Identification — lets us confirm the state block actually exists before reading it (first run it won’t be there yet)
  2. Collision avoidance — an unlikely sequence to appear naturally at the exact last 8 bytes of a real Mach-O binary

What the program does each run

  1. Opens its own binary via _NSGetExecutablePath() (macOS equivalent of /proc/self/exe on Linux)
  2. Seeks to EOF - 8 and checks for the magic sentinel
  3. If found — reads run_count from the next 4 bytes, increments it, seeks back and overwrites in place
  4. If not found — first run, seeks to EOF and appends DEADBEEF + 0x00000001
  5. Calls fsync() to guarantee the write hits disk before exit

Why this doesn’t work for arbitrary state changes

This approach is append-only by design. If you wanted to store more complex state (e.g. a struct with multiple fields), you’d extend the block length and update the magic + size accordingly. But you can’t grow the block after the first run without re-appending, since we overwrite in place.

A cleaner extension would be to store a fixed-size state block (e.g. 64 bytes) on first run, then always overwrite at the same offset on subsequent runs.


Binary Patching as an Attack Vector — Historical Context

Viruses (the OG case)

Classic 80s/90s viruses worked exactly this way — a program would find other executables on disk and inject code into them. This is literally how the word “virus” applies: self-replicating by infecting hosts. MS-DOS had zero protection — no signing, no permissions, binaries were just files anyone could write to.

Notable Real-World Examples

Stuxnet (2010)

Didn’t patch binaries per se, but injected into running processes (DLLs). Targeted Iranian nuclear centrifuge PLCs. Considered the first cyberweapon.

XcodeGhost (2015)

Poisoned the Xcode installer itself, so any app compiled with it got malicious code injected. Supply chain attack at the compiler level.

SolarWinds (2020)

Attackers got into the build pipeline and inserted code before signing. The signed binary was the malware.

Self-Modifying Code as a Feature

  • Old copy protection schemes (90s games) would intentionally scramble their own code at runtime to defeat debuggers/disassemblers
  • Packers like UPX compress a binary and prepend a stub that decompresses and executes it in memory — not malicious, but the same mechanic

The Turning Point

Code signing + W^X + mandatory sandboxing (iOS first, then macOS) basically killed the classic virus model. Modern attacks shifted to supply chain (compromise before signing) or privilege escalation rather than patching binaries directly.


Modern Protections Against Binary Tampering

Code Signing

Binaries are signed with a cryptographic hash of their contents. Any modification invalidates the signature — macOS will refuse to run a tampered binary outright.

SIP (System Integrity Protection) — macOS

Prevents even root from modifying protected paths like /usr/bin and system apps. Introduced in OS X El Capitan (2015). Requires booting into recovery mode to disable.

Hardened Runtime — macOS

An opt-in entitlement that disables self-modification, JIT injection, and loading of unsigned libraries. Required for notarization.

Notarization — macOS

Apple scans and signs your binary server-side. Tampering breaks the notarization seal, and Gatekeeper blocks execution on any Mac.

W^X (Write XOR Execute)

Memory pages cannot be both writable and executable simultaneously. Prevents injected code from being run directly from a writable region.

ASLR (Address Space Layout Randomization)

Randomizes where code and libraries are loaded in memory at runtime, making it hard to predict addresses for injection or exploitation.

Secure Boot

Modern Macs (Apple Silicon) verify the entire boot chain cryptographically — from firmware to kernel to OS. A tampered kernel simply won’t boot.

Supply Chain Remains the Weak Point

All of the above protections assume the binary was clean before signing. SolarWinds and XcodeGhost both bypassed every modern protection by compromising code before it was signed — the signature itself became the attacker’s shield.