Ian Roop Design Case Study - Google Cloud

Google Cloud

Defined new way of working and team direction with product and engineering managers. Took a microservices approach with renewed focus on alerting and monitoring.
8 minute read

Google has a platform as a service (PaaS) that allows customers to run, manage and scale their infrastructure running in the cloud. Google Cloud Platform offers a host of products and services ranging from storage to networking to core products like App Engine and Big Query. Companies like Snapchat and Spotify run on GCP.

After a short design sprint in Kirkland, WA our smaller mobile team realized that we weren’t able to keep up with the larger cloud organization. It didn’t make sense to reproduce mostly desktop based experiences in our mobile app. We would end up working for a few months on a new feature and then after it launched we would move on to something else. Google Cloud teams working on desktop would continue to iterate on their experience while we had already pivoted, effectively making our recently shipped mobile experience less relevant.

We took a step back and ended up rethinking our approach. Rather than redesigning the can opener, we redesigned the top of the can itself. So instead of releasing more features and duplicating the desktop experiences we used a ‘microservices-esque’ technique and applied that to UI component creation. Our proposal would allow cloud teams to create their own widgets from a set of templates using their own data, essentially turning them into authors and publishers.

Our desktop based teams inside Google Cloud could distill the most important parts of their product areas into digestible mobile components rather than having us build out custom experiences every time. In order to meet the needs of feature teams inside of Google, third parties using GCP and as a way to address the real world constraints of our team I acted as a facilitator and guide to uncover these dependencies and offer possible solutions. I worked closely with the team to help us see a different path forward, one in which it wasn’t as much about more features as it was about redefining our way of working.

Use cases

Cloud engineers, SREs and others need insight into their environment whenever they are on-call and away from their keyboard. Our goal was to support them during the triage and coordination phase of incident response. Engineers will create an alert policy that matches their service level agreements. Any time a metric goes above a certain threshold or appears out of wack they would be notified by the system. We had been focused on building software that allowed users to get these types of alerts, monitor key metrics and coordinate during an outage, plus some resource management.

Critical customer impacting issues can cost companies like PayPal millions of dollars. Incident response is usually a frantic and nerve-wracking experience as engineers try to figure out the severity and impact of a specific issue. When a service is down every minute counts for multi-billion-dollar software companies.

Incident response & triage
The incident response user journey. On call engineers typically create policies on desktop and start the triage process on mobile, then debug on desktop.

Problem

Cloud engineers need diagnostic tooling and insight into their environment running in the cloud. Teams have service level agreements and monitor key metrics like throughput, latency or errors. When incidents occur they need to take action immediately and start the triage and debugging process. Outages can be a big black eye for the brand, are expensive and cause unnecessary damage to the user experience.

We had been partnering with feature teams inside of Google to translate their mostly desktop designs into mobile experiences and had shipped several features including mobile versions of Logs, Traffic Splitting, Permissions, Error Reporting, Trace and others. It was common for us to launch a feature and then shortly thereafter it would become out of date as those teams continued to iterate. We couldn't keep up. It became clear over time that duplicating the features from GCP wouldn't add up to a great mobile experience and wouldn't necessarily align with the on-call use cases.

The engineers created a proof of concept to demonstrate the feasibility of our proposal. We got the green light to continue working on this idea and I received peer bonuses from the engineering and product managers...

Internal teams and third parties, companies running on GCP, each had different requirements. On top of that there were different roles within those companies that we needed to consider and design for including administrators and CEOs not just on-call engineers.

Whiteboard session with the engineering team at the Google office in Kirkland, WA

Solution

Our team continued to support the Cloud Console app which had the largest user base. We saw an increase in usage as we rolled out new features. Cloud had a mobile presence even despite the fact that the vast majority of users integrated with GCP services without the use of an interface.

At the same time we also worked directly with the Stackdriver team in Cambridge, MA. Stackdriver was a diagnostic tool acquired by Google in 2016. Their team had made a product and condition builder that let engineers create if else statements and then be notified based on their SLOs/SLAs (service level agreements). Let's say anytime latency, or other metric, on a particular virtual machine or app engine instance went above a certain threshold they would get a push notification, an alert that would kick of a series of responses.

 
 

Instead of duplicating desktop features that were bloated and didn't match interaction attention on mobile we proved that there was a different path forward. We proposed a new platform and rethought not just what we built but how we worked. We had to give up control and empower teams inside of Google and third parties to create their own UI and experience via applets, similar to a monolithic versus microservices approach. Users would select a set of data visualization tools and inject their own data.

Stackdriver app focused on alerting and monitoring

Outcomes

We had success on the Cloud Console mobile app with a +67% increase in MAU in the last 6 months from January to June of 2017 and an increase of +158% in MAU year over year from June 2016 to June 2017. Keep in mind that was with a relatively small audience of around 35,000 total users. Our work was able to support cloud engineers, billing admins and other who needed insight into their cloud environment. When we focused more on alerting and monitoring scenarios we were able to help SREs and those in DevOps with more diagnostic tooling.

The engineers created a proof of concept to demonstrate the feasibility of our proposal. The premise was relatively simple. Partner teams inside of Google Cloud could use JSON to render native UI through an interpretation engine. The team got the green light to continue working on this idea and I received two peer bonuses from the engineering and product managers in Google Cloud. We then socialized our thinking within the larger org.

There were huge challenges to overcome. This is a technical space to operate in with many different requirements. There were over 200 customers using GCP and dozens of roles inside each company to consider, plus feature teams inside of Google. We were a smallish team of 10 – 12 people.

We took an initial pass at creating a charting library for mobile on Google Cloud with Material Design in mind. In the end we showed that there was a different way to tackle this problem given the needs of our users, the restrictions and the reality of our limited resources.

When I first joined the team they needed someone to design a warm welcome onboarding experience. Two years later I ended up helping the PMs and engineers make strategic decisions about how to work and imagine a new possibility, even without a computer science background. This project was proof that design can play a huge part in defining strategy.

  • Role
  • Interaction Design
  • Visual Design
  • Strategic Design
  • Speculative Design
  • Systems Thinking