Matthew Davidson
Summary
Seasoned Site Reliability Engineer with 10+ years of experience designing, automating, and maintaining scalable infrastructure. Proven track record in transforming legacy systems into modern DevOps pipelines, with deep expertise in cloud infrastructure, monitoring, and configuration management. Adept at building internal tooling with Go and managing hybrid cloud environments with a focus on performance, cost, and reliability.
Professional Experience
Site Reliability Engineer / DevOps Engineer
The Gifting Company
Nov 2018 – Present
- Architected and deployed a containerized monorepo architecture, enabling unified CI/CD pipelines and seamless service integration across business units.
- Built and maintained internal full-stack Go applications to automate repetitive tasks such as employee onboarding, build orchestration, and asset inventory management.
- Managed and versioned infrastructure using Terraform and Packer, enabling repeatable, modular deployments across multiple environments.
- Replaced legacy Jenkins jobs with GitHub Actions workflows, introducing caching, matrix builds, and secure secrets management to improve build reliability and speed by 30%.
- Implemented Prometheus and Grafana monitoring stack for critical workloads, including custom metrics exporters and alert rules to proactively mitigate production issues.
- Automated system provisioning and application configuration using Chef, reducing configuration drift and manual remediation effort by over 20%.
- Led multi-year GCP cost and resource optimization efforts, introducing autoscaling and reserved instance planning to improve ROI.
- Mentored junior engineers in DevOps practices, troubleshooting, and automation, and established onboarding documentation for faster ramp-up.
Systems Administrator
Facebook
Jun 2015 – Nov 2018
- Managed over 5,500 physical and virtual systems across multiple data centers, specializing in large-scale Linux deployments.
- Performed system tuning, patching, and kernel upgrades with minimal downtime, ensuring high availability for production systems.
- Diagnosed and resolved critical issues involving storage, networking, and compute resource bottlenecks.
- Created internal documentation and SOPs covering escalation paths, root cause analysis templates, and system recovery guides.
- Trained new team members and served as a knowledge bridge between systems engineering and software engineering teams.
Skills
Cloud & Infrastructure
- Google Cloud Platform (GCP)
- Linux (Debian, Ubuntu, RHEL)
- NGINX, Load Balancers, DNS
Automation & DevOps
- Terraform, Packer
- Chef, GitHub Actions
- CI/CD, Secrets Management
Programming & Scripting
- Go (Golang)
- PowerShell, Bash
- Ruby (Chef, internal tools)
Containerization & Orchestration
- Docker
- Kubernetes (basic usage)
Monitoring & Logging
- Prometheus, Grafana
- Google Cloud Logging
- Custom Metrics & Alerting