There is no more high-end manufacturer of smartphones than Vertu. The company's products are to mobile what Bentley is to cars.
Indeed, the firm's Signature model (in titanium and red gold livery) will set you back no less than £13,500 in Selfridges.
Unsurprisingly, the buyers of such phones are unforgiving of even the smallest failure.
Vertu cloud architect Rob Charlton put it very simply. "If you spend £10,000 on a phone it has to work," he told the audience at Computing's DevOps Summit 2016 this week.
At this level, any failure is much more likely to be down to software than to hardware. Vertu provides its own apps, including one that allows you to speak directly to a personal concierge to organise your life and another that allows you to bid for prizes including dinner with president Obama, for example.
Failures in the software are likely to originate lower down in the stack. The low-level systems, the Linux kernel and Android, between them comprise 35 million lines of code, and there is yet more piled on top of that.
"I checked the source tree on our current product and there are 60 million lines of code in it. That's a lot to go wrong. The phone might reboot, the camera might take fuzzy photos, the battery might run out quickly or overheat," Charlton said.
Five years ago it was clear that Vertu's IT team needed to be more agile and responsive to improve throughput and become better at nipping problems in the bud during the testing phase.
The company began an IT transformation process, moving on-premise infrastructure to the Amazon cloud, transitioning development to DevOps, automating manual processes with Puppet and then Ansible, and using Splunk to analyse operational data generated by the phones.
A software agent that collects relevant metrics is run on the device during the development of a new model. The data from the agent is uploaded to a server in AWS and then fed into a cluster running Splunk Enterprise. Feedback from this system helps the testers to pinpoint the causes of any failure.
"We look at how long has it been on, did it crash, what's the battery life like, all sorts of things," explained Charlton.
"We're constantly trawling through that data and if we see a signature of a phone crash, rather than waiting for the person to report it, an email is sent directly to the crash analysis team."
Product managers are given more visibility too, using a dashboard to monitor the entire system and removing the guesswork from the timing of the launch of a new product.
"They can see how many people are testing, what they've tested, who is on an old version of the software," Charlton said.
"Suddenly, they can get a really good picture of the software, and the main metric we use is the mean time between failures, or MTBF, which is used to assess the maturity of the software.
"This data has absolutely transformed the way we make our products. We are now data led. We can get an actual figure out saying the MTBF has now reached this key level so we know we can launch, rather than saying: 'Well, there aren't many bugs now so I think it's OK.'"
Ecostress instrument will provide new insights into water usage and plant health on Earth
Chinese cyber espionage group Thrip targeting satellite communications, telecoms and defence companies
Symantec warning over state-sponsored hackers targeting satellite operators' control systems
Letter to House of Commons Treasure Committee explains cause of payments glitch earlier this month
Would you want to live in a world without memes?