Matthew Davidson - Site Reliability Engineer

Summary

Seasoned Site Reliability Engineer with 10+ years of experience designing, automating, and maintaining scalable infrastructure. Proven track record in transforming legacy systems into modern DevOps pipelines, with deep expertise in cloud infrastructure, monitoring, and configuration management. Adept at building internal tooling with Go and managing hybrid cloud environments with a focus on performance, cost, and reliability.

Professional Experience

Site Reliability Engineer / DevOps Engineer

The Gifting Company Nov 2018 – Present

Architected and deployed a containerized monorepo architecture, enabling unified CI/CD pipelines and seamless service integration across business units.

Built and maintained internal full-stack Go applications to automate repetitive tasks such as employee onboarding, build orchestration, and asset inventory management.

Managed and versioned infrastructure using Terraform and Packer, enabling repeatable, modular deployments across multiple environments.

Replaced legacy Jenkins jobs with GitHub Actions workflows, introducing caching, matrix builds, and secure secrets management to improve build reliability and speed by 30%.

Implemented Prometheus and Grafana monitoring stack for critical workloads, including custom metrics exporters and alert rules to proactively mitigate production issues.

Automated system provisioning and application configuration using Chef, reducing configuration drift and manual remediation effort by over 20%.

Led multi-year GCP cost and resource optimization efforts, introducing autoscaling and reserved instance planning to improve ROI.

Mentored junior engineers in DevOps practices, troubleshooting, and automation, and established onboarding documentation for faster ramp-up.

Systems Administrator

Facebook Jun 2015 – Nov 2018

Managed over 5,500 physical and virtual systems across multiple data centers, specializing in large-scale Linux deployments.

Performed system tuning, patching, and kernel upgrades with minimal downtime, ensuring high availability for production systems.

Diagnosed and resolved critical issues involving storage, networking, and compute resource bottlenecks.

Created internal documentation and SOPs covering escalation paths, root cause analysis templates, and system recovery guides.

Trained new team members and served as a knowledge bridge between systems engineering and software engineering teams.

Skills

Cloud & Infrastructure

Google Cloud Platform (GCP)
Linux (Debian, Ubuntu, RHEL)
NGINX, Load Balancers, DNS

Automation & DevOps

Terraform, Packer
Chef, GitHub Actions
CI/CD, Secrets Management

Programming & Scripting

Go (Golang)
PowerShell, Bash
Ruby (Chef, internal tools)

Containerization & Orchestration

Docker
Kubernetes (basic usage)

Monitoring & Logging

Prometheus, Grafana
Google Cloud Logging
Custom Metrics & Alerting