Network Monitoring, Load and Reliability

We carry out tens of thousands of tests per day through our routes over 45 countries. These tests are carried out for...
On Tuesday 06 October 2015 by Matthew Lawlor

Network monitoring overview

Spearline carry out tens of thousands of tests per day through in our 49 countries. These tests are carried out for many different customers, terminating to different countries across many carriers giving us vast amounts of data to monitor our own infrastructure. Our internal monitoring metric is constantly monitoring the PESQ scores which are returned on all of our routes. If there is any decrease in the quality across our customers a ticket will be automatically generated and sent to both our support team and our in-country provider. When any issue is suspected on our own network we immediately stop testing in the affected country and stop generating alerts which may be misleading.

Prevention

Before any route is certified we carry out a rigorous certification process. This includes sourcing data centres which provide excellent SLA and guaranteed uptime, for the PRI lines the distance between the carrier’s router and our server is kept to a minimum. All new routes are tested for 60 days before we start passing customer calls. Only servers and PRI lines with excellent service levels are certified and put into production.

Server Load & Reliability

All our servers are built to a very high specification that that would ensure the load on the server is never going to introduce any quality issues. Our servers would be capable of handling many hundreds of calls but we limit the number of calls each server can make to 32. This ensures the load on our hardware is always kept low and ensures quality issues are not coming from our hardware being over worked.

How we monitor our voice network

We use a small number of SIP trunks to connect to our carriers. These SIP trunks are monitored constantly looking for network errors such as packet loss, high latency and jitter. If errors are detected during a test call the test is automatically discarded and another test is run. If the issue persists for more than 4 minutes our system will un-certify the affected route so tests to that country will not be run or reported on while the issue is being investigated.

All network information that is sent and received on a SIP call is stored in a PCAP file. After a call this file is parsed to see if there were any errors on the link between our server and in-country provider. If any errors are detected the PESQ score generated will be discarded and another test will be run.