Site Reliability Engineer (SRE) | Job in Merseyside
Title: Site Reliability Engineer (SRE) - Consultant - Digital Factory
Location: North West, United Kingdom
*Company Overview: *
We are a leading digital transformation consultancy, specializing in helping businesses embrace the digital age and thrive in today's rapidly changing technological landscape. Our mission is to empower organizations with cutting-edge digital solutions that drive innovation, efficiency, and growth.
*Job Description: *
In this dynamic role as a Site Reliability Engineer (SRE) Consultant within our Digital Factory, you will be at the forefront of our technology efforts, ensuring the reliability, scalability, and performance of our client's digital infrastructures. You will leverage your expertise in cloud technologies, automation, and DevOps practices to design, build, and maintain robust systems that deliver high-quality user experiences.
*Key Responsibilities: *
- Collaborate with cross-functional teams to design, deploy, and manage large-scale, distributed computing systems.
- Develop and implement monitoring tools and processes to ensure the reliability and scalability of our client's digital infrastructure.
- Implement best practices for automation, continuous integration/continuous deployment (CI/CD), and incident response to optimize system performance and minimize downtime.
- Work closely with development teams to identify and resolve issues at the root cause, focusing on preventing recurrences through proactive measures.
- Stay current with the latest technologies and industry trends in SRE, DevOps, and cloud computing.
- Contribute to knowledge sharing and collaboration within our Digital Factory and across our organization.
- Mentor junior engineers, fostering a culture of learning and innovation.
- Participate in incident response and on-call rotations as needed.
*Qualifications: *
- Bachelor's or Master's degree in Computer Science, Information Technology, or a related field.
- Proven experience as a Site Reliability Engineer or DevOps Engineer in a consulting role.
- Strong knowledge of cloud platforms (AWS, Google Cloud, Azure) and containerization technologies (Docker, Kubernetes).
- Experience with automation tools like Terraform, Ansible, Chef, Puppet, etc.
- Proficiency in scripting languages such as Python, Go, or Ruby.
- Excellent problem-solving skills and ability to work under pressure in fast-paced, high-stress situations.
- Strong communication skills, both verbal and written, with the ability to explain complex technical concepts to non-technical stakeholders.
- A passion for learning and a commitment to staying current with emerging technologies and best practices in SRE and DevOps.
By joining our team as a Site Reliability Engineer (SRE) Consultant, you will have the opportunity to work on exciting projects, collaborate with top talent, and make a tangible impact on businesses across various industries. If you're ready to take your career to the next level and help shape the digital future, we encourage you to apply today!
Hiring Purpose
At our Digital Factory, we are committed to leveraging technology to drive innovation and growth in the dynamic digital landscape. We are currently seeking a highly skilled and motivated Site Reliability Engineer (SRE) - Consultant to join our team based in North West.
The ideal candidate will play a crucial role in ensuring the reliability, scalability, and efficiency of our IT infrastructure. This role requires a strong blend of software engineering, systems design, and operations expertise, with an emphasis on automation, monitoring, and incident management.
Key Responsibilities:
- Design, build, and maintain efficient, reliable, and cost-effective systems that meet our business needs.
- Collaborate with cross-functional teams to identify and resolve performance issues before they impact our customers.
- Implement and manage monitoring tools and processes to ensure system reliability and availability.
- Participate in on-call rotation to handle incident management and troubleshooting when needed.
- Contribute to the continuous improvement of our IT operations by automating manual processes, improving system design, and implementing best practices.
- Collaborate with developers to ensure new features are deployed smoothly and do not impact the overall system reliability.
- Stay up-to-date with the latest industry trends and technologies in SRE and apply them to improve our systems.
Qualifications:
- Bachelor's degree in Computer Science, Information Technology, or a related field, or equivalent work experience.
- Proven experience as a Site Reliability Engineer or similar role.
- Strong understanding of cloud technologies (e.g., AWS, Google Cloud, Azure) and containerization technologies (e.g., Docker, Kubernetes).
- Proficiency in at least one programming language (Python, Go, Java, etc.) and scripting languages (Bash, PowerShell, etc.).
- Experience with monitoring tools (Nagios, Prometheus, Grafana, etc.) and infrastructure automation tools (Terraform, Ansible, Puppet, etc.).
- Strong problem-solving skills and the ability to think analytically and systematically.
- Excellent communication skills, both written and verbal, with the ability to explain complex technical concepts to non-technical stakeholders.
- Ability to work in a fast-paced environment and manage multiple tasks simultaneously.
- Strong collaboration skills and the ability to work effectively in cross-functional teams.
Join us at our Digital Factory, where you will have the opportunity to make a significant impact on our IT operations while working with a dynamic and innovative team. We offer competitive compensation packages, a flexible work environment, and opportunities for professional growth and development.
We are an equal opportunity employer and value diversity at our company. We do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status.
To apply, please submit your resume and cover letter detailing your relevant experience and why you are interested in this role. We look forward to hearing from you!
Skill Requirements
Strong understanding of cloud computing platforms: Proficiency in Google Cloud Platform (GCP), Amazon Web Services (AWS), or Microsoft Azure is essential. Familiarity with cloud-native tools and services such as Kubernetes, Docker, and Terraform is highly desirable.
Scripting & Programming: Expertise in at least one programming language, preferably Python, Go, or Java for automation scripts, monitoring tools, and CI/CD pipelines. Familiarity with YAML and JSON is also required.
Containerization & Orchestration: Proficiency in container technologies like Docker and Kubernetes is necessary. Experience with managing large-scale containerized applications and understanding of orchestration tools is essential.
Automation & Monitoring: Familiarity with infrastructure as code (IAC) using tools such as Terraform, Ansible, or CloudFormation. Strong knowledge of monitoring tools like Prometheus, Grafana, Nagios, or Splunk for maintaining system performance and availability.
Security & Compliance: Knowledge of security best practices for cloud environments, familiarity with OWASP principles, understanding of Infrastructure Security Groups (ISGs), Identity Access Management (IAM), and compliance regulations such as HIPAA, GDPR, or PCI DSS.
Service Mesh & API Gateways: Familiarity with service mesh technologies like Istio or Linkerd, and API gateways like Kong, Apigee, or AWS API Gateway is beneficial.
Data Analysis: Ability to analyze large datasets using tools like ELK Stack (Elasticsearch, Logstash, Kibana), Apache NiFi, or Splunk for identifying trends, anomalies, and patterns to optimize system performance and reliability.
Problem-solving & Analytical Skills: Ability to analyze complex problems, design solutions, and execute them effectively in a fast-paced environment.
Communication & Collaboration: Excellent written and verbal communication skills, with the ability to collaborate effectively within a team and with other departments or clients when necessary.
Project Management & Organization: Strong organizational skills, with the ability to manage multiple projects simultaneously, prioritize tasks, and meet deadlines effectively.
Adaptability: Ability to adapt to new technologies quickly, as well as a willingness to learn and grow professionally.
Innovation & Creativity: Ability to think outside the box, devise creative solutions, and continuously improve existing processes and systems for increased efficiency and reliability.
This role requires a proactive, self-motivated individual who is passionate about technology, enjoys problem-solving, and is committed to delivering high-quality work in a rapidly evolving environment.
Candidate Expectations
As a Site Reliability Engineer (SRE) - Consultant within our Digital Factory, you are expected to bring your exceptional technical skills and problem-solving abilities to help us deliver high-quality, scalable, and reliable digital solutions for our clients in the North West region. In this role, you will work collaboratively with a dynamic team of professionals, leveraging cutting-edge technologies to ensure the optimal performance, stability, and resilience of our client's digital ecosystems.
Design, build, deploy, and maintain highly available, performant, and scalable systems and services using Google Cloud Platform (GCP) or other cloud providers.
Implement and continually improve automation solutions to reduce human intervention in operational tasks.
Monitor and analyze system performance and behavior to identify issues before they impact users or the business.
Work collaboratively with cross-functional teams, including software engineers, data scientists, and product owners, to ensure that our digital solutions meet the performance and reliability requirements of our clients.
Develop incident management processes to minimize the impact of service disruptions on our clients' businesses.
Participate in on-call rotation to respond promptly to incidents and collaboratively work towards resolution.
Actively contribute to our team's knowledge sharing, documentation, and training efforts to ensure that everyone is up-to-date on the latest best practices and technologies in SRE.
Collaborate with our clients' internal teams to facilitate the adoption of DevOps principles and practices, helping them to achieve their digital transformation goals.
Continuously evaluate new tools, technologies, and methodologies to ensure that we are using the most effective approaches to deliver high-quality SRE services to our clients.
Contribute to the growth and development of our Digital Factory by sharing your knowledge, mentoring junior team members, and seeking opportunities for personal and professional growth.
Bachelor's degree in Computer Science, Information Technology, or a related field, or equivalent experience.
Proven experience as a Site Reliability Engineer or similar role within the digital industry, preferably with a focus on cloud-native applications and microservices architectures.
Strong knowledge of Google Cloud Platform (GCP) or other cloud providers, including their services, tools, and best practices for building scalable and reliable systems.
Proficiency in one or more programming languages, such as Python, Go, or Java.
Experience with infrastructure automation tools like Terraform, Ansible, Chef, or Puppet.
Knowledge of containerization technologies, such as Docker and Kubernetes.
Familiarity with CI/CD pipelines, including Jenkins, GitLab, or CircleCI.
Strong troubleshooting skills and the ability to work collaboratively with others to solve complex problems.
Excellent communication and interpersonal skills, with the ability to explain technical concepts clearly and concisely to both technical and non-technical audiences.
A proactive approach to identifying and addressing potential issues before they impact the business or users.
We are an equal opportunity employer and value diversity at our company. We do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status.
If you are passionate about delivering high-quality SRE services, thrive in a collaborative and dynamic environment, and are excited about helping our clients achieve their digital transformation goals, we would love to hear from you! Apply today to join our growing team at the Digital Factory in the North West region.
The Role
In our vibrant Digital Factory, we are seeking a dynamic and skilled Site Reliability Engineer (SRE) to join our team in the North West region. As an SRE, you will play a pivotal role in ensuring the reliability, scalability, and performance of our digital platforms and services.
Collaborate with engineering, product management, and other teams to design, develop, deploy, maintain, and improve systems that are highly available, durable, and fault-tolerant.
Participate in on-call rotation to ensure timely response to critical issues and service disruptions.
Implement practices for continuous delivery and integration of code changes to production environments.
Design, build, deploy, monitor, troubleshoot, and maintain the reliability, performance, and efficiency of our digital platforms and services.
Collaborate with data engineers to improve observability and develop effective alerting policies.
Contribute to incident response by performing root cause analysis during major incidents.
Proactively identify systemic risks and help design and implement solutions for preventing failures before they occur.
Provide mentorship, coaching, and guidance to junior engineers within the team.
Participate in postmortem analysis following service incidents and contribute to the development of action plans to prevent recurrence.
Collaborate with other teams on projects that improve our systems' stability and performance.
Bachelor's degree in Computer Science, Engineering, or a related field; or equivalent practical experience.
Proven experience as a Site Reliability Engineer or similar role.
Strong understanding of infrastructure as code (IaC) and cloud services (AWS, GCP, Azure).
Proficiency in programming languages such as Python, Go, or Java.
Experience with containerization technologies like Docker and Kubernetes.
Knowledge of monitoring tools such as Prometheus, Grafana, and ELK Stack.
Familiarity with service mesh solutions like Istio or Linkerd.
Strong problem-solving skills and the ability to work effectively in a fast-paced environment.
Excellent communication skills, both written and verbal, with the ability to explain complex technical concepts clearly and concisely.
Ability to work collaboratively with a diverse team of engineers and other stakeholders.
Join us at our Digital Factory, where you will have the opportunity to apply your expertise, grow your skills, and make a significant impact on our digital platforms and services. We are committed to fostering an inclusive and supportive work environment that encourages innovation, creativity, and collaboration.
To learn more about this exciting opportunity, please submit your application today! We look forward to hearing from you soon.
Perks and Advantages
As a Site Reliability Engineer (SRE) - Consultant within the Digital Factory at our location in North West, you will enjoy numerous benefits that foster growth, creativity, and work-life balance. Here are some of the perks and advantages you can expect:
Cutting-Edge Technology: Work with the latest technologies, tools, and methodologies in a dynamic environment that encourages innovation and continuous learning. You'll have ample opportunities to grow your skills and stay at the forefront of the industry.
Collaborative Culture: Be part of a vibrant team where collaboration, transparency, and open communication are valued. We foster an environment that encourages knowledge sharing, allowing you to learn from some of the best minds in the field.
Project Diversity: Our clients come from various industries, ensuring that you'll never get bored with repetitive work. Each project offers unique challenges and opportunities to make a real impact on businesses across the digital landscape.
Professional Development Opportunities: Continuously expand your skills through our comprehensive training programs, workshops, and conference sponsorships. We encourage and support our team members in their pursuit of knowledge and expertise.
Competitive Compensation & Benefits: Enjoy a competitive salary package that includes comprehensive health, dental, and vision insurance, retirement plans, paid time off, and employee assistance programs for mental and emotional well-being.
Flexible Work Environment: We understand the importance of work-life balance. Our flexible work arrangements allow you to maintain a healthy personal life while achieving professional success.
Inclusive & Diverse Workplace: Join a team that values diversity, equity, and inclusion. We believe that diverse perspectives lead to better problem-solving, innovation, and creativity.
Opportunities for Advancement: Take your career to new heights with opportunities for advancement within our organization. We value growth from within and offer promotions based on merit and potential.
Community Engagement: Make a difference in the community through various volunteer initiatives, team building events, and charity drives, fostering a strong sense of connection to our region.
Join us as an SRE Consultant in our Digital Factory, where you'll have the opportunity to grow, learn, and make a lasting impact on businesses across North West and beyond. Apply today to start your journey with us!
About the Company
*Digital Factory North West
- is a leading digital transformation consultancy based in the vibrant tech hub of North West England. We specialize in helping businesses leverage cutting-edge technologies to drive growth, improve efficiency, and innovate at scale. Our team comprises seasoned professionals with diverse expertise, united by a passion for delivering exceptional solutions that cater to our clients' unique needs.
As a Site Reliability Engineer (SRE) Consultant at Digital Factory North West, you will be an integral part of our dynamic and innovative technology team. Your role will involve ensuring the reliability, scalability, and performance of our clients' digital infrastructure, enabling them to deliver high-quality services to their customers without interruption.
Our commitment to excellence extends beyond technical proficiency. We foster a culture of collaboration, continuous learning, and professional development. As an SRE Consultant, you will have ample opportunities to grow your skills, expand your knowledge, and make a significant impact on our clients' digital journeys.
At Digital Factory North West, we value diversity, inclusivity, and the unique perspectives that each team member brings. We offer a competitive salary, comprehensive benefits, flexible working arrangements, and a supportive work environment where everyone is empowered to excel. If you are ready to take your career to new heights while making a tangible difference in our clients' businesses, we invite you to apply for the Site Reliability Engineer (SRE) Consultant position at Digital Factory North West today.
Additional Information
Job Designation: Site Reliability Engineer (SRE) - Consultant - Digital Factory
Experience Requirements: 2 years experience required
Work Hours: 38