Site Reliability Engineer (SRE) in Globaldev Group

Posted more than 30 days ago

2 views

Globaldev Group

0 reviews

Without experience

Kyiv

RequirementsExperience with public cloud infrastructure (e.g., AWS, Azure) and related technologies (e.g., Docker, Kubernetes, Cloud Formation);Good understanding ofstorage and database systems, caching and queueing, networking;Experience ofleading technical recoveriesWorking knowledge ofService Management practices (ITIL).Experience designing, analyzing, and troubleshooting distributed systems;Ability todebug, optimize code, and automate routine operational tasks;Solid foundation inLinux orWindows administration and troubleshooting;Monitoring/ observability technologies like Prometheus, Grafana, Kibana, Elasticsearch are aplus;Understanding ofService level agreements and objectives;Excellent command ofthe English language, both written and spoken;Solid understanding ofprogramming principles and good command ofatleast one programming language relevant for infrastructure work;What weofferDirect cooperation with the already successful, long-term, and growing project;Truly competitive salary;Working with top-notch equipment;Help and support from our caringHR team;ResponsibilitiesDesign, develop and implement systems software that improves the stability, scalability, availability and robustness ofOdido’s products and services— now and for years tocome;Develop patterns for automation, instrumentation etc., that can bereused across teams and products;Take ownership ofseveral services and products;Automate instead offixing operational issues manually;Develop and implement strategies for effective and proactive monitoring and observability ofour systems;Provide senior technical leadership onMajor Incident calls. Take technical ownership ofservice outage recoveries. Drive internal and partner resources torapidly restore service implementing best practice technical fixes and workarounds. Utilize technical expertise toshape and implement recovery plans;Manage cross functional technical resources following Major Incidents toensure root cause isfully understood and documented, and that robust service protection measures are inplace. Provide technical expertise atIncident Wash-ups ensuring that all appropriate actions are inplace toprevent repeat Incidents, and toimprove recovery times.Triage and fix system issues inacomplicated distributed landscape;Participate inanon-call rotation, including weekend orafter hours coverage;Oversee and continuously improve incident-response processes atOdido;Advocate engineering best practices across the company, mentor more junior engineers onautomation and operational best practices;Contribute toOdido’s growth through interviewing and onboarding;

Without experience

Kyiv

Want to get related jobs?

New job openings in your Telegram

We use cookies

Platform is now completely free!

Site Reliability Engineer (SRE) in Globaldev Group