Backup EnginebackupEngine
Docs/Security/Zero-Knowledge Encryption

Zero-Knowledge Encryption

How BackupEngine encrypts your data with AES-256-GCM so that no one — not even us — can read it.

What Is Zero-Knowledge Encryption?

Zero-knowledge encryption means that BackupEngine encrypts your data on your device before upload and never has access to your encryption key. Only you (or someone with your passphrase) can decrypt your backed-up data. BackupEngine employees, server administrators, and anyone who gains access to the storage infrastructure cannot read your files.

This is not a marketing claim — it is a cryptographic guarantee enforced by the architecture. The encryption key is derived from your passphrase on your device and is never transmitted to any server.

Encryption Algorithm: AES-256-GCM

BackupEngine uses AES-256-GCM (Galois/Counter Mode) for all data encryption. AES-256-GCM is an authenticated encryption algorithm that provides both confidentiality and integrity in a single operation.

  • AES-256: The Advanced Encryption Standard with a 256-bit key. Considered unbreakable by current and foreseeable computing technology.
  • GCM mode: Galois/Counter Mode provides authenticated encryption. Each encrypted block includes an authentication tag that detects any tampering.
  • Per-chunk IV: Every chunk is encrypted with a unique 96-bit Initialization Vector (IV) generated from a cryptographically secure random number generator.
  • Authentication tags: The GCM auth tag (128-bit) is stored alongside each encrypted chunk. On restore, the tag is verified before decryption — any modification is rejected.
Encryption process (simplified)
For each chunk:
  1. Generate a unique 96-bit IV (cryptographically random)
  2. Encrypt: ciphertext = AES-256-GCM(key, IV, plaintext)
  3. Store: [IV (12 bytes)] + [ciphertext] + [GCM auth tag (16 bytes)]
  4. Upload the encrypted blob to storage

On restore:
  1. Download the encrypted blob
  2. Extract IV, ciphertext, and auth tag
  3. Verify auth tag — reject if tampered
  4. Decrypt: plaintext = AES-256-GCM-Decrypt(key, IV, ciphertext, tag)

Key Derivation: Argon2id

Your encryption key is derived from your passphrase using Argon2id, the winner of the Password Hashing Competition and the current state-of-the-art for key derivation. Argon2id is designed to be resistant to both GPU-based brute-force attacks and side-channel attacks.

  • Argon2id combines Argon2i (side-channel resistant) and Argon2d (GPU resistant) for the best of both approaches.
  • Parameters: 64 MB memory, 3 iterations, parallelism of 4. These parameters make brute-force attacks computationally impractical.
  • A unique 128-bit salt is generated per account and stored server-side. The salt is not secret but ensures that identical passphrases produce different keys.
  • The derived key is 256 bits, used directly as the AES-256-GCM encryption key.
  • Key derivation happens entirely on your device. The passphrase and derived key never leave your machine.

⚠ Warning

Choose a strong passphrase. While Argon2id makes brute-force attacks difficult, a weak passphrase (e.g., a single dictionary word) can still be vulnerable. Use a passphrase with at least 12 characters, mixing words, numbers, and symbols.

Per-Chunk Security

BackupEngine does not encrypt files as a whole — it encrypts each chunk independently after content-defined chunking with FastCDC. This design enables deduplication to work alongside encryption.

  • Each chunk gets a unique IV, ensuring that identical plaintext chunks produce different ciphertext.
  • Chunks are identified by the SHA-256 hash of the plaintext content, enabling content-addressed deduplication.
  • The chunk hash is computed before encryption. Only the hash (not the content) is shared with the server for dedup lookups.
  • This approach means that deduplication happens at the chunk level while maintaining full per-chunk encryption with unique IVs.

ℹ Note

Content-addressed deduplication with per-chunk encryption is a carefully designed balance. The server knows which chunks are duplicates (by hash) but cannot read any chunk's content. This enables significant storage savings without compromising the zero-knowledge guarantee.