We take a look behind the scenes at how contact center phone numbers are monitored. Service reliability engineer at Spearline, Patrick Lynch tells us that it’s all about balancing existing technology with developing new innovations.
Tell us briefly about your background and your role at Spearline?
I’m part of the development team at Spearline and my role is site reliability engineer. My job is to ensure that all of the systems are up and running and that the internal tools are working within the company. I liaise with other development team members to help make sure that their job is as easy as possible. We work together collaboratively as a team. I’m the point of contact for all of the testing environments.
How did you end up in the role?
I joined Spearline in 2016 as a junior software engineer. I was formerly a primary school teacher, but I went back to college in 2015 to do a conversion course in UCC in Applied Computing Technology.
Why did you change your career path?
When I was studying to be a teacher, I undertook modules on teaching through technology and it was an area I had always been interested in; not only using technology, but developing technology of my own. I was delighted when the opportunity arose to study the area and to create a career from it and I was lucky that there was an opportunity to work in the industry at Spearline.
It’s a relatively new role with a lot of companies. Google was one of the first companies to develop it.
Google developed it as a system back in 2003. They found there was a lot of bottlenecking between administrative work and development work. They found that they were hiring more and more system administrators. They thought there could be a way that software could be developed to bridge the gap. They developed the concept of site reliability, a lot of which involves automating processes that system administration workers would otherwise have to do manually, which were repetitive tasks, such as updating servers. Other companies have since adopted it.
What element of innovation is there in your role?
About 40% of the role is to fix bugs and the other 60% is to develop new features. We balance what exists with new ways to innovate.
What does your metrics and monitoring system look like?
We have a multiple prong approach. We set up a logging system so that our code will send logs to the system. We also have dashboards so that we can monitor load on the system. We have alerting systems set up whereby customers can set up alerts for any issues with their numbers. If there’s an issue with a certain testing environment, alerts will be issued so that it can be resolved and the problems will never reach the customer.
We have an excellent QA team here who oversee all of our testing and feed back to us if there are any issues with the code or platform.
Is your role cross-functional across the company?
There’s a lot of collaboration between every department in the company. For example, our customer engagement team and our technical solutions architects use our platform every day. Then, testing support use our job tester platform every day. In development, we’re developing new products. If an issue arises, we are informed about it so there is a lot of intercommunication between departments that way.
Our CE team are also our first point of contact with our customers. If there’s a new feature that customers want or need they filter that back to us and we determine if it’s possible to develop it. Listening and responding to the customer results from internal collaboration between our teams.
There’s also communication between the development team itself; the backend and frontend. For example, if there’s an issue with one of our servers it’s relayed to the frontend team. Then, QA are constantly testing our systems to ensure that everything is running smoothly. We also have teams in India and Romania that we collaborate with regularly so it’s a very interactive environment.
Can you tell us a few key aspects of your role?
Site reliability engineering is about making processes as easy and quick as possible for everyone in the company. For example, for developers when someone issues a ticket for an issue that they might have, the process is as automated as possible. As a result, customer-impacting issues are minimised. Where there are issues, they can be fixed very easily and then they go through QA. Ultimately, we want to ensure that the customer won’t have the face the same issues again.
Automation is the biggest part of site reliability - we aim to ensure that no developer would have to revisit the same code twice. They can then innovate to help the customer grow.
To listen in on more of this interview, be sure to tune in to the Spearline Podcast.