Service Reliability Sr. Admin - Live Operations, NOC
Job Id: REQ-0002950
The Network Operations Center (NOC) manages the 24x7 monitoring and response components of Riot's player-facing services. We are the first line of defense when things go wrong with any of our live services and many of our internal services as well. We leverage technical familiarity with best-practice processes to rapidly remediate incidents. The team is staffed with Administrators, and Specialists that provide reliable triage services across many levels of technical and process operations. The team helps to create and mentor other Riot teams on best practice in alerting, monitoring, and operational processes.
As a Senior Service Reliability Administrator, you will work closely with the Live Operations team and Riot globally to establish and maintain a high-performing and highly available game service for players. You will monitor and support all aspects of production environments, development environments, and general system needs. Your technical skills and grasp of system integration will help you diagnose and communicate potential issues to Rioters and the community, improving the quality of the player experience. You will be a craft expert in operational and triaging skills. The team can rely on you as a proactive individual, focused on solving day to day problems that affect any aspect of running live games.
- First responder, triage agent, or escalation point from the NOC to external teams
- Self-organize with the team around live incidents
- Work with internal and external teams to create and update documentation
- Execute technical runbooks in a fast-paced environment
- Multitask rapidly to address issues affecting our players and services.
- Work in a fast paced, constantly changing environment
- Initial technical troubleshooting (SSH, IP address, Command Line Interfaces)
- Gather and report data on the health and operation of Riot services
- 2-3 years of NOC Technician or equivalent role (Analyst, System Administrator, Live Operations, Network Administrator, etc)
- Familiarity with the core concepts of operating systems, networking, and software life cycles
- Enthusiasm around operations and technology
- Highly driven and self-motivated
- Excellent logical troubleshooting skills
- Strong organizational skills
- Demonstrates effective communication skills
- Scripting proficiency is highly desired
- Experience working on deployments in a live environment is a plus
- Multiple language proficiency is a plus, especially Mandarin
- Certified in Linux+ and Network+, or equivalents
- Experience with the following:
- Monitoring solutions eg: NewRelic, Nagios, Elastic Search, Grafana
- Event management tools eg: BigPanda, Moogsoft
- ITIL-based Ticketing systems eg: ServiceNow, JIRA
For this role, you'll find success through craft expertise, a collaborative spirit, and decision-making that prioritizes the delight of players. We will certainly be looking at your past studies and experience, but for this role, we also look for dedicated people with a personal relationship with games. If you embody player empathy and care about the experiences of players, this could be the role for you!
- Full health insurance for you, your spouse and children
- Open paid time off
- Savings benefit with company matching
- Life insurance, parental leave, plus short-term and long-term disability
- Play Fund so you can broaden and deepen your knowledge of our players and community through games
- Wellness Fund to encourage a balanced body and mind
- Monthly phone bill allowance
- Monthly food allowance
- We will double down on your donations of time and money to non-profits
Don’t forget to include a resume and cover letter. We receive a lot of applications, but we’ll notice a fun, well-written intro that shows us you take play seriously.