|
Staff Infrastructure Engineer, GroqCloud - Mountain View California
Company: Groq Inc. Location: Mountain View, California
Posted On: 02/03/2025
At Groq, we believe AI will change humanity forever, and that making it affordable and universally accessible is the key to human agency in an AI economy. We're assembling a team of world-class engineers and business minds who believe in this mission. We're hiring problem-solvers with the skills and desire to build a business that leaves a fingerprint on civilization.Mountain View, CA (Remote)At Groq: We believe in an AI economy powered by human agency. We envision a world where AI is accessible to all, a world that demands processing power that is better, faster, and more affordable than is available today. AI applications are currently constrained by the limitations of the Graphics Processing Unit (GPU), a technology originally developed for the gaming market and soon to become the weakest link in the AI economy.Enter Groq's LPU AI Inference Technology: Specifically engineered for the demands of large language models (LLMs), the Language Processing Unit outpaces the GPU in speed, power, efficiency, and cost-effectiveness. The quickest way to understand the opportunity is to watch the following talk - groq.link/scspdemo.Why join Groq? AI will change humanity forever, and we believe preservation of human agency and self-determination is only possible if AI is made affordably and universally accessible. Groq's LPUs will power AI from an early stage, and you will get to leave your fingerprint on civilization.Mission: Design, build, and maintain the infrastructure of GroqCloud as we 100x traffic in the coming year.Responsibilities & opportunities in this role: - Infrastructure Development: Design, build, and automate cloud infrastructure using Terraform to support a wide variety of needs.
- Service Deployment & Orchestration: Build and manage robust deployment pipelines and GitOps workflows into Kubernetes-based environments. Continuously improve CI/CD processes to facilitate rapid, reliable rollouts of new features and services, ensuring minimal downtime and maximum velocity.
- System troubleshooting: Lead investigations to determine root causes of system failures and develop scripts to repair and automate the upkeep of infrastructure components.
- Observability enhancement: Implement comprehensive monitoring (tracing, metrics, logging, alerting) to swiftly pinpoint, diagnose, and resolve system issues.
- Efficient incident response: Manage critical system incidents as a first responder, ensuring swift resolution and comprehensive post-incident analyses with implemented remediations.
- Cross Functional Collaboration: Collaborate with software engineers, platform & networking engineers, product managers, and sales to enable feature delivery.Ideal candidates have/are:
- 6+ years of experience in software engineering or a related field.
- 3+ years experience with GCP (especially VPC, Hybrid Networking, IAM, and GKE).
- Actively working with modern Infrastructure-as-Code technologies (Kubernetes, Terraform, Flux/ArgoCD, Kustomize, Crossplane).
- Experience with open-source monitoring tools (Prometheus, Grafana, VictoriaMetrics, VictoriaLogging, and Alert Manager).
- Deep experience in cloud technologies, global scale applications, and automation.
- Familiarity with multi-region deployments, including the associated networking, latency, and failover challenges.
- History of debugging production issues, mitigating, and driving efficient resolution.
- Comfortable reading, writing, and debugging software in multiple languages, especially Go and Rust.
- Thorough understanding of cloud-security best practices and modern compliance controls.Attributes of a Groqster:
|
|