Technical Infrastructure

Software Engineer, Site Reliability Engineering (SRE),(Hybrid Work)

Taipei City, Taipei City
Work Type: Full Time

KKLab is looking for a candidate who can combine software and system engineering skills to build and operate reliable, scalable, and distributed systems for our AI/ML-based SaaS.

As a member of the Technical Infrastructure team, you will help us manage the challenges of our cloud-native web services, like KKRaaS, which provides enhancing personalized experiences with musical DNA designed for media and e-commerce, proven by the adoption of KKBOX.

To improve operational efficiency and reduce the complexity of system design, you will work closely with peers such as data engineers, software architects, and researchers. Best practices will be ensured through this collaboration.

To be successful in this role, you should be resilient and enjoy solving problems with curiosity. An individual who works efficiently in a hybrid environment (both remote and on-site) is highly desired, as we strive to create an open and fault-tolerant environment.


Responsibilities

  • Engage in and improve the whole lifecycle of service, from inception and design, through to deployment, operation, and refinement.
  • Support services before they go live through activities such as system design consulting, developing software platforms and frameworks, capacity planning, and launch reviews.
  • Maintain services once they are live by measuring and monitoring availability, latency, and overall system health. Practice sustainable incident response and blameless postmortems.
  • Scale systems sustainably through mechanisms like automation and evolve systems by pushing for changes that improve reliability and velocity.
  • Develop innovative solutions to ensure efficient SRE processes under regulatory requirements. Bring SRE expertise to the product teams that enforce regulatory requirements.

Requirements

Minimum Qualifications:

  • Experience in one or more of the following: Go, Python, PHP, or shell scripting.
  • Experience with version control systems (Git, Mercurial, etc).
  • Experience with developing containerized and cloud environments (AWS, Azure, or GCP).
  • Experience with SQL, NoSQL, cache, or search engines.


Preferred Qualifications:

  • Experience in IaC (Ansible, Puppet, Terraform, etc).
  • Experience with Unix/Linux or networking administration.
  • Experience in operating containerized environments (Kubernetes, Nomad, etc).
  • Experience in managing systems in cloud environments (AWS, Azure, or GCP).

Interview process

  1. AMA (Ask me anything about the job): Hiring Manager / 15-30 Minutes / Google Meet
  2. Online assessments (Coding Tests)
  3. In-Person: Line Manager, Team (Whiteboard Challenge) + HR / 60-90 Minutes / Google Meet with Google Docs
  4. Group: General Manager, Bar Raiser, Hiring Manager, HR / 60 Minutes / Google Meet

Submit Your Application

You have successfully applied
  • You have errors in applying