Current Statistics
1,547,435 Total Jobs 263,493 Jobs Today 17,681 Cities 222,734 Job Seekers 146,855 Resumes |
|
|
|
|
|
|
Sr. Site Reliability Engineer (SRE) - Charlotte North Carolina
Company: Siemens Mobility Location: Charlotte, North Carolina
Posted On: 01/23/2025
In everchanging SaaS landscape there are a few persistent items that contribute to developing quality solutions with speed. Namely, ensuing operational activities are treated as software development enhancements, manual tasks are remediated through automation, risk reduction through compartmentalization of services/code and consumption of readily available provider services. Product/development teams require an accountable partner to advance on these topics, The SRE (Site Reliability Engineering) team will be this partner.The SRE team will support the Siemens Xcelerator platform and will be responsible for identifying, managing, improving, and reporting on availability, resiliency, reliability, and stability efficiencies. This includes providing technical guidance and leadership to drive solutions, create & enhance processes that deliver excellence. A strong relationship with the various product teams of the Xcelerator platform is necessary to support core objectives. This role's success will be defined by product teams meeting their SLO's with healthy product adoption and operational excellence.This position will be responsible to support technology and culture through an enterprise ecosystem to ensure developers and products exceed product SLO's (Service Level Objectives) and clearly, without dispute, benefit from every interaction with the SRE team.Responsibilities - Incident Management, Game Day coordination
- Create and drive Metric/observability solutions and reviews
- Support production readiness reviews
- Cross division role model to advance the SRE practice in Siemens
- Complete technological control over methods of automation, codifying optional activities, microservice architecture, platform engineering to ensure changes, updates or technical advancements are in place for a product
- Ensure the team can provide the design, deployment, automation, and scripting solutions to drive new capabilities, visibility, and efficiency
- Simplify highly complex ideas, architectures and concepts to encourage achievable adoption
- Collaborate with other technical platforms and partners to engineer automated and integrated solutions between tools, services, teams that increase availability, reliability, and performance
- Own and ensure the internal and external SLA's meet and exceed expectations
- Be part of maintaining a 24x7, global, highly available SaaS environment
- Participate in an on-call rotation that supports our production infrastructure
- Troubleshoot production availability incidents that often span across multiple teams and services
- Ensure the SRE team can coordinate production incident post-mortems, and contribute to solutions to prevent problem recurrence; with the goal of automated response to all non-exceptional service conditions
- Communicate to business and technical partners on incidents as they occur when they impact system performance or availability at a critical levelRequired Knowledge/Skills, Education, and Experience
- Bachelor's Degree or equivalent experience
- Proven experience as a Site Reliability Engineer or equivalent role
- Experience working in a large organization through a SRE transformation where existing applications were adapted to contemporary targets
- Proven experience with automation via scripting & API development
- Experience with software development in the cloud
- Experience with monitoring tools (Datadog, CloudWatch, CloudTrail, Cloudability, or equivalent tools)
- Proven experience with containerization, specifically Kubernetes
- Experience with Amazon Web Services (AWS) services and Terraform, CloudFormation, Ansible, or equivalent toolsPreferred Knowledge/Skills, Education, and Experience
|
|
|
|
|
|
|