Senior Site Reliability Engineer

  • FTE
  • Remote (U.S.)


At JDA TSG, we partner with many of the world’s top brands to make their operations more efficient, flexible, scalable and agile. Our business process outsourcing services free them to focus on delivering the very best experience to their customers. Our secret? Exceptional people—and the business acumen, collaborative culture and deep personal respect to make the most of their abilities. We are proud of our diversity and welcome people of all backgrounds, beliefs and sexual orientation.

Job Details

At JDA TSG, we equip many of the world’s major brands with top-tier specialized talent, business process expertise innovations which drive their organizations in exciting new directions. We have established a reputation for bringing exceptional focus, flexibility, and confidence with every client we serve.

We have an immediate opportunity for a Senior Site Reliability Engineer for our client, who has has a 100% “cloud” based infrastructure and is seeking an engineer with strong experience in Infrastructure as Code, automation, CI/CD, Containers, AWS, and DevOps best practices to join their Site Reliability Engineering team! This is a full-time direct hire role which can be performed remotely (in the U.S.). This position will report directly to the Director of Developer Operations, who will assist in grooming and assigning out work but will rely on the Engineer to provide documentation and regular progress updates to the TechOps team and key stakeholders.

Who you are:

The ideal candidate has a very strong sense of ownership and passion for learning. Excellent communication skills are desired, as the TechOps team has developed a strong and close working relationship with both development owners and product owners to define clear expectations of objectives and fast, robust, and future proof results.

What you will do:

  • Improve the observability of legacy and newly designed systems.
  • Leverage infrastructure-as-code and AWS services to evolve and retire our legacy infrastructure.
  • Provide support for emergent problems, identify the root cause, and drive improvements through automation and self-healing services.
  • Support network infrastructure including WAN, LAN, and wireless technologies.
  • Running team meetings, grooming tickets, and managing workloads of the team, with the assistance of the Director.
  • Provide clear and professional communication of ideas or feedback to Leadership and other internal team members.
  • Handle Tier III escalations from the Sys Admins as needed.
  • Remove toil from our development teams by training their teams and creating self-service tools alongside our Architect.
  • Actively participate in team activities and discussions such as suggesting architecture improvements, best practices, new processes, etc.
  • Perform on-call duty as required (About once every 6 weeks for a week at a time)
  • Write Infrastructure-as-Code in such a way that it can be leveraged across multiple application stacks.
  • Perform additional tasks as assigned.

The Experience you need to thrive in the role:

  • Working with CI/CD and Infrastructure-as-Code tools, such as Github Actions, Terraform, Jenkins, etc.
  • AWS managed service offerings and have successfully designed solutions using them. Prioritized according; familiarity with ECS Fargate, EC2, S3, RDS, Lambda, Cloudfront, and Cloudwatch X-Ray/Eventbus.
  • NewRelic or other similar APM tools.
  • Software monitoring and log aggregation tools.
  • Training and mentoring junior members.
  • Strong sense of ownership and troubleshooting skills.
  • Strong working knowledge of Linux operating systems
  • Strong working knowledge of Docker or Kubernetes.
  • Familiarity with microservice and event driven architectures

Preferred Skills:

  • 5+ years in a software engineering discipline.
  • 2 + years writing Python code
  • 2+ years as a Site Reliability Engineer
  • Running daily briefings or other team meetings.
  • The ideal candidate is an autonomous self-starter that has a passion for learning paired with a strong sense of responsibility and ownership.
  • Developing “cloud-native” applications.
  • “Containerizing” legacy applications.
  • Strong documentation skills.
  • Strong troubleshooting skills and an ability to come up with creative “outside the box” solutions in a cost-effective manner.
  • Demonstrable track record of dealing well with ambiguity, prioritizing needs, and delivering measurable results in an agile environment.
  • Familiarity with the Agile Framework and working in both sprints and kanban methodology.

Education Requirements:

  • Bachelor’s Degree in Computer Science or related field, or 5+ years relevant work experience.


Senior Site Reliability Engineer

Remote (U.S.)
Information Technology
January 20, 2023

Apply for this job


Drag and drop PDF here

By checking this box, I certify that all of the information furnished on this application and during this application process is true, complete and correct to the best of my knowledge. I understand that any misrepresentation or omission of facts called for may result in refusal to hire or, if hired, may result in my dismissal at any time regardless of when the false answer or omissions are discovered.