Current Statistics

1,547,435 Total Jobs
263,493 Jobs Today
17,681 Cities
222,734 Job Seekers
146,855 Resumes

 

Reliability Engineer - Ansible and DataDog - WFH - 1099 / C2C ok - Atlanta Georgia

Company: Datamanagementgroup
Location: Atlanta, Georgia
Posted On: 01/18/2025

Reliability Engineer - Ansible and DataDog - WFH - 1099 / C2C okLooking for an experienced Reliability Engineer to support critical projects for our Technology, Infrastructure & Operations teams. Work from home, work to be done primarily on US Eastern Timezone.

  • Minimum of 7 years performance engineering and performance testing experience
  • MUST HAVE 3+ years of recent work with Ansible
  • MUST HAVE 4+ years of work with DataDog
  • Excellent English Communications skills - Verbal & Written (idiomatic English)
  • Experience managing performance engineering efforts for applications strongly preferred
  • Knowledge of developing scripts for monitoring using PowerShell, Python, and Shell scripting
  • 5 years of Splunk programming proficiency is highly preferred
  • 5-6 years experience using .NET and Java application and Application Monitoring Tools like AppDynamics or DataDog are highly preferred
  • Proficiency in performance tuning is preferred
  • Good understanding of the UI, Middleware, and backend Databases
  • BA/BS degree in Information Technology, Computer Science, or related field of studyDuties include:
    • Develop and maintain comprehensive monitoring solutions for cloud-based services and applications
    • Configure monitoring tools and systems to collect relevant metrics, logs, and traces
    • Create custom monitoring dashboards and reports using Splunk, DataDog, DynaTrace, or other tools, to provide real-time insights into system performance and health
    • Continuously monitor the cloud infrastructure's performance and capacity, anticipating and addressing potential scalability issues
    • Proactively suggest and implement improvements to enhance the system's reliability, resilience, and fault tolerance
    • Work on automating tasks to streamline operational processes and reduce manual intervention
    • Collaborate with cross-functional teams to investigate and resolve critical incidents, ensuring minimal impact on end-users
    • Work with Problem Management team to complete post-mortem analysis of incidents to identify root causes and implement preventive measures
    • Understand the overall architecture of our systems to identify gaps in monitoring and troubleshoot issues
    • Configure and maintain custom dashboards and alerts in various monitoring tools
    • Create custom reports, deliver report presentations to various stakeholders
    • Develop scripts for monitoring PowerShell, Python, Shell scripting
    • Develop metrics for both the business and technical teams to determine the health of systems
    • Provide on-call support as needed
    • Leads and coordinates performance engineering for medium to large initiatives
    • Collect and document expected system performance and operational characteristics
    • Collect and/or prepare test data for test execution
    • Develop and execute performance tests including load, stress, endurance, fail-over, and interoperability
    • Conduct technical analysis of performance test results and production systems, and provide recommendations on performance tuning, systems, and infrastructure. Identify, report, and review defects in assessing system performance and stability
    • Defining the strategy for enabling performance diagnostics and monitoring using an Application Performance Management (APM) tool, other monitoring tools, and diagnostic techniques
    • Collaborating with developers to promote the concept of performance engineering during all phases of the SDLC to detect and correct performance issues earlier in the lifecycle
    • Leads peer reviews to ensure the completeness of all test assets created
    • Resolve performance and stability issues in the performance test environment
    • Develop a performance engineering work plan structure and project schedule
    • Review architectural design for performance risks and potential issues
    • Prepare capacity analysis when applicable
      #J-18808-Ljbffr More...

      Send this job to a Friend     


      Register an account with us and set up job agents! We'll email you immediately when jobs like this are posted on our site.


Your Account
Email:
Password:
Register a New Account

Can't find what you're looking for? Try searching here:
Google
 
Web www.localjobboard.com

Copyright 2025 LocalJobBoard.com. All Rights Reserved.

RSS Job Feeds

Reliability Engineer - Ansible and DataDog - WFH - 1099 / C2C ok: Atlanta, Georgia job search information from LocalJobBoard.com

Recruiter expertise by Recruiter Media Corporation

Job Offers Search Engine

Atlanta Georgia job: Reliability Engineer - Ansible and DataDog - WFH - 1099 / C2C ok, Atlanta Georgia job search