July 29, 2018

Over-engineer your own file-backup solution

Until yesterday all my digital life was backed up on a physical drive using Time Machine. Occasionally I would feel guilty about that TimeMachine reminder that I haven’t backed-up in a while and would rummage through my desk for the external hd that I would occasionally use to backup my main laptop.

Lame.

I could have just decided to pay a service like backblaze but nah, I want to be in control of this data so I decided I was going to roll my own solution.

There are 3 main folders I care about, everything else is in a remote git repo. I want an encrypted/private s3 bucket, one IAM user should be able to put and get files from the bucket, and every 20 days I would like all the files moved to long-term storage in Glacier.

I had just started using Terraform in a limited fashion so I wanted to use this as an opportunity to tinker with it.

First up I created an empty git repo in Keybase bjoernw where I keep the terraform files along with a backup script.

Now some terraform. Create a file called providers.tf with the following:

provider "aws" {
  profile = "personal"
  region = "${var.region}"
  version = "~> 1.29"
}

terraform {
  backend "local" {
    path = "terraform-backend/terraform.tfstate"
  }
}

Here we setup our provider, tell terraform which aws profile (i.e. personal) to use (as defined in ~/.aws/credentials), and tell terraform where to save its state. I could have used a variety of backends but I wanted to keep it simple and just commit the state into git whenever I make changes.

Next lets create a file called s3.tf where we will setup everything related to the s3 bucket.

resource "aws_s3_bucket" "backup-bucket" {
  bucket = "${var.backup-bucket-name}"
  acl = "private"

  server_side_encryption_configuration {
    rule {
      apply_server_side_encryption_by_default {
        kms_master_key_id = "${aws_kms_key.backupKey.arn}"
        sse_algorithm = "aws:kms"
      }
    }
  }

  lifecycle_rule {
    id = "main"
    enabled = true

    transition {
      days = 30
      storage_class = "GLACIER"
    }

  }
  tags {
    Name = "Backups"
  }
}

resource "aws_kms_key" "backupKey" {
  description = "This key is used to encrypt backup bucket objects"
  deletion_window_in_days = 30
  tags {
    Name="BackupBucketKey"
  }
}

resource "aws_kms_alias" "key_alias" {
  name = "alias/${var.key_name}"
  target_key_id = "${aws_kms_key.backupKey.id}"
}

We tell s3 that we want a private bucket, encrypted using the key backupKey, and a lifecycle rule that ships our s3 files to Glacier on a monthly basis. At the bottom we create our backup key and assign an alias to it.

Now lets add some permissions.

resource "aws_s3_bucket_policy" "backup-bucket-policy" {
  bucket = "${aws_s3_bucket.backup-bucket.id}"
  policy = "${data.aws_iam_policy_document.default.json}"
}

data "aws_iam_policy_document" "default" {
  statement = [{
    actions = ["s3:*"]

    resources = ["${aws_s3_bucket.backup-bucket.arn}/*"]

    principals {
      type        = "AWS"
      identifiers = ["${var.main-user}"]
    }
  }]

}

Here we created a S3 bucket policy for this bucket which gives a “main-user”, all permissions on that bucket.

Last we need to create our variables.tf file, which holds variables used throughout this terraform module.

variable "region" {
  default = "us-west-2"
}

variable "key_name" {
  description = "Name of the key"
  default = "backup-s3-bucket-key"
}

variable "backup-bucket-name" {
  default = "your-personal-backup-bucket"
}

variable "main-user" {
  default = "arn:aws:iam::5623203034562:user/someUser"
}

Run terraform to create the infrastructure.
terraform plan
terraform apply

Now we need something that efficiently syncs local files with an s3 bucket. We could go the traditional route and use rsync but I’ve always wanted to try a tool called rclone. It does everything I need.

Features:

MD5/SHA1 hashes checked at all times for file integrity
Timestamps preserved on files
Partial syncs supported on a whole file basis
Copy mode to just copy new/changed files
Sync (one way) mode to make a directory identical
Check mode to check for file hash equality
Can sync to and from network, eg two different cloud accounts
Optional encryption (Crypt)
Optional cache (Cache)
Optional FUSE mount (rclone mount)

brew install rclone && rclone config should get you up an running. Now alls that’s left is to create a shell script that backs up what you want backed up. Here is my version:

backup.sh

#!/usr/bin/env bash
rclone sync ~/Pictures personal:your-personal-backup-bucket/pictures
rclone sync ~/Documents personal:your-personal-backup-bucket/documents
rclone sync ~/books personal:your-personal-backup-bucket/books

Then run ./backup.sh and voila, now it’s painfully obvious how slow your upload speed is, so take your laptop to work and run it from there ;) The neat thing about rclone is that it picks up where it left off so you can interrupt it without losing progress.

Happy over-engineering!

-Bjoern

Kudos

Over-engineer your own file-backup solution

Now read this

Modern Java web app in 5min