The IBM Cloud console provides the web front end for IBM Cloud, an open and secure public cloud providing enterprise-class IaaS and PaaS capabilities with over 170 managed services covering data, AI, IoT, security, blockchain, and more. I’ve had the pleasure of working on the console for most of the last decade. As a more junior developer, I wrote the first lines of proof-of-concept front-end code before IBM Cloud even had a name. This cloud was made generally available as IBM Bluemix in 2014, rebranded as IBM Cloud in 2017, and incorporated the fully integrated SoftLayer experience in 2018.

I became the lead full-stack architect in 2014, and had the opportunity to work with a countless number of talented developers, UI designers, and offering managers as the console’s architecture and user experience (UX) were regularly updated and improved to keep pace with the maturing IBM Cloud. Last year, I left this amazing team for another role as an architect on the IBM Cloud for Financial Services. As I reflect on the experience, I’ll share with you how the console architecture has evolved from its monolithic origins into a highly scalable, modern microservice-based system composed of dozens of micro front ends. In particular, I’ll highlight the critical role that Node.js played in this evolution.

  IBM Cloud Console Architecture Diagram


Earlier this month, we launched a revamped IBM Cloud Platform Experience at our new location. Our primary goal was to unify our IaaS and PaaS offerings to better meet your needs. This was a massive undertaking with changes up and down the stack. We’re very excited by the outcome, and we think you will be too.

Prior to this unification, there were two distinct UI’s at different URLs – one for IaaS and one for PaaS. This understandably led to a lot of frustration and confusion. We wanted to fix this by creating a seamless experience with one UI at a single URL to manage everything. With that in mind, the user experience of our console was overhauled to bring new and/or improved capabilities in the following areas:

  • dashboard (see screenshot below)
  • resource management
  • search and tagging
  • catalog/purchasing
  • account/user management
  • billing
  • support
  • overall performance
  • and more!

  IBM Cloud Console Dashboard

Rollout of

With such massive changes, we faced a number of interesting questions about how to best rollout the new experience. I will summarize our thought process here, but for more details about our strategy, see A/B Test Deployment — Rollout of written by myself and colleague Nic Sauriol.

We were very sensitive to the fact that the holiday season is a key time of year for many of our clients, and we did not want to do anything to disrupt their operations. On the other hand, we believe the new experience offers improvements that our clients could benefit from immediately. Ultimately, we decided to go with an A/B rollout strategy with two consoles running side by side. The diagram below shows a very high-level architecture. Both consoles are deployed separately, but use exactly the same set of IBM Cloud Platform APIs on the backend.

Rollout Architecture for

Users in the A group see the existing experience at, while users in the B group get the enhanced console at However, the choice of console is entirely up to each individual user. At some point in the new year, we will retire the experience at and force a transition. But, until then, users are free to switch back and forth between both experiences.

From a technical perspective, we use the IBM Cloud Kubernetes Service to host the microservices that make up the console. By using this service, it’s very simple to spin up Kubernetes clusters to host both experiences. We can easily make use of the same build pipeline, Helm Charts, and so on, but with different code branches for the new experience.

(NOTE: Something not captured in the high-level architecture is that both consoles are actually made up of many geographically load balanced clusters placed all over the world. This is used to provide for high availability and failover. The concepts behind this are described in my previous blog Global IBM Cloud Console Architecture.)

Other Related Content

Aside from the A/B rollout blog summarized above, many teammates of mine produced an extensive set of blogs with detailed information about all of the changes found in the unified experience. I highly recommend you check these out!


I sincerely hope you enjoy the new experience, and would love to get your feedback (both good and bad) in order to learn and better meet your needs. So, please reach out and let me know how it goes!


In a blog post a few months ago, I announced that my team had released the new “global” IBM Cloud Console (formerly Bluemix Console) allowing all public regions of the IBM Cloud platform to be managed from a single location: This took us from four addresses (one for each of the four public IBM Cloud regions at the time) to a single geo load-balanced address. Users would now always get the UI served from the geographically closest (and healthy) deployment, resulting in improved performance and enabling failover to greatly increase availability. In this post, I’ll dig a bit deeper so that you can gain insight into what it would take for you to build similar solutions with your own IBM Cloud apps. In particular, I’ll discuss:

  • High-level features of our architecture and the enabling third-party products we used
  • Things to think about while building a health check to determine when failover should occur
  • Considerations for coding apps so they are enabled to run in multiple data centers
  • Using the architecture to smooth the transition to new technology

Hybrid Architecture With Akamai and Dyn

The IBM Cloud Console has long been fronted by two offerings from Akamai:

  • Akamai Kona for web application firewall (WAF) and distributed denial of service (DDoS) protection.
  • Akamai Ion for serving static resources via a content delivery network (CDN), finding optimal network routes, reusing SSL connections, etc.

With the implementation of the global console, we added the Dyn Traffic Director (TD) product to the mix. All requests to are sent to the Akamai network, and then Akamai acts as a proxy. Akamai does a DNS lookup to determine the IP address to forward the request to by using a host name which has been configured using a Dyn TD. The Dyn configuration is setup to spread traffic based on geo location to one of six IBM Cloud data centers. This is shown in the diagram below.

Global Console High-level Architecture

Since Akamai also offers its Global Traffic Management (GTM) product for load balancing, it may seem a bit strange to be using this hybrid solution. But, we had a an existing contract with Dyn, and we simply decided to leverage that instead of adding GTM to our Akamai contract. This has worked quite well for us.

Importance of Health Check

Every 60 seconds or so, Dyn checks the health of each console deployment by probing a health check API written by my team. If the health check cannot be reached or if it responds with a 40x or 50x error, then Dyn marks the associated data center as unhealthy and takes it out of the rotation. This is what is meant by a “failover.” At this point, requests that would go to the unhealthy deployment, are instead sent to the next closest deployment. In this way, the user never knows there was an issue and continues working as if nothing has happened. Eventually, when the health of the failed deployment recovers, Dyn will put it back in the rotation and route traffic to it.

In the diagram below, the Tokyo deployment of the console is unhealthy, and traffic that would normally go there starts flowing to Sydney.

Global Console High-level Architecture With Failed Data Center

Clearly, the algorithm used by the health check plays a very important role in the overall success of the architecture. So, when building your own health checks, you should think carefully about the key components that influence the health of your deployments. For example, Redis is one of our absolutely critical components because we use it to store session state. Without Redis, we cannot maintain things like a user’s auth token. So, if one of our deployments can no longer connect to its local Redis, then we need to failover.

On the flip side, there may be other dependencies that are not nearly as critical. For example, if our Dallas console deployment cannot connect to the API for Cloud Foundry (CF) in Dallas, the majority of the console functionality will continue to work. Other console deployments probably can’t connect to the API either, so there is probably not much point in failing over.

Finally, the health check can be very helpful for making proactive failovers easy. For example, we made our health check configurable so we can force it to return an error code. We have made use of this on several occasions such as when we knew reboots were required while patches for Meltdown and Spectre were being deployed in SoftLayer. We elected to take console deployments in those data centers out of the rotation until we knew those data centers (and our deployments within them) were back online.

Impact to Microservice Implementation

As described in a previous post, each console deployment contains a set of roughly 40 microservices running behind a reverse proxy. In our original implementation, our microservices tended to be tied to APIs in the region they were deployed to. For example, our Dallas deployment could only manage CF resources in Dallas, our London deployment only CF resources in London, and so on. This is illustrated in the pre-Dyn diagram below where microservices in the three data centers only talk to the “backend” within the same region.

Microservices Tied to One Backend

This worked fine for us when we had a separate URL for each console deployment and users knew they had to go to the London console URL to manage their London resources. However, this architecture was not conducive to the goals of global console where we wanted the UI to be served from the geographically nearest data center and for it to continue to be accessible even if all but one deployment failed. In order to accomplish this, we needed to decouple the microservices from any one specific region and enable them to communicate with equivalent APIs in any of the other regions based on what the user was requesting. This is shown in the diagram below.

Microservices Communicate With Different Backends

Of course, an astute reader might point out we’d be even better off if all of the backend APIs provided their own globally load balanced endpoints. Then a console microservice would be able to always point at the same host names no matter where deployed. And, indeed, we do have many APIs in the IBM Cloud ecosystem that are moving in that direction.

Smoothing Migration from Cloud Foundry to Kubernetes

This architectural update has been great for us in many ways, and has given us much more flexibility in determining where to deploy the console throughout the world. It has also had the added benefit of making it easy for us to roll-out deployments running on different technologies without end users ever knowing.

Historically, the console has run on Cloud Foundry on the IBM Cloud, but we are nearly done with a migration to Kubernetes (managed by the IBM Cloud Container Service). We have been able to add Kubernetes deployments into the rotation simply by updating our Dyn configuration. This has allowed us to vette Kubernetes fully before turning off our CF deployments entirely. This is represented in the diagram below showing Dyn load balancing between two CF deployments and three Kubernetes deployments.

Load Balancing Between CF and Kubernetes Deployments


We’re excited by the improvements in performance and reliability we’ve been able to provide our customers with the global console. I hope some of the lessons and insights that my team has gained in the process will help your efforts as well.

Related Resources

For additional material on this subject, please see Configure and run a multiregion Bluemix application with IBM Cloudant and Dyn by colleague Lee Surprenant.


In June, I had the honor of attending the Cloud Foundry Summit Silicon Valley 2017 conference in Santa Clara, CA. My two submissions related to Bluemix UI architecture were selected, and I got the chance to present them as part of the conference’s Cloud Native Node.js track. In this post, I’ll briefly describe my talks as well as share some general takeaways from the conference.

Topic 1: Microservices Architecture of the Bluemix UI

The full title of my first topic was To Kill a Monolith: Slaying the Demons of a Monolith with Node.js Microservices on Cloud Foundry. The intent of the talk to was to trace my team’s journey migrating the Bluemix UI from a monolithic app to a microservices architecture.

The Bluemix UI (which runs on Cloud Foundry) is the front-end to Bluemix, IBM’s open cloud hosting platform. The original implementation as a single-page, monolithic Java web app brought with it many demons, such as poor performance, lack of scalability, inability to push small updates, and difficulty for other teams to contribute code. Over the last 2 years, the team has been on a mission to slay these demons by embracing cloud native principles and splitting the monolith into smaller Node.js microservices. The effort to migrate to a more modern and scalable architecture has paid large dividends, but has also left behind a few battle scars from wrestling with the added complexity cloud native can bring. The team had to tackle problems in a wide variety of areas, including: large-scale deployments, continuous integration, monitoring, problem determination, high availability, and security.

In the talk, I went on to discuss the advantages of microservice architectures, ways that Node.js has increased developer productivity, approaches to phasing microservices into a live product, and real-life lessons learned in the deployment and management of Node.js microservices across multiple Cloud Foundry environments.

If you’d like to see the full presentation, check out the slide deck below:

Or, if you prefer video, you can watch the talk on YouTube:

Topic 2: Monitoring Node.js Microservices

My second topic was called Monitoring Node.js Microservices on Cloud Foundry with Open Source Tools and a Shoestring Budget. During the migration described in my first talk, we learned that while microservice architectures offer lots of great benefits, there’s also a downside. Perhaps most notably, there is an increased complexity in monitoring the overall reliability and performance of the system. In addition, when problems are identified, finding a root cause can be a challenge. To ease these pains in managing the Bluemix UI, we’ve built a lightweight system using Node.js and other opensource tools to capture key metrics for all microservices (such as memory usage, CPU usage, speed and response codes for all inbound/outbound requests, etc.).

In this approach, each microservice publishes lightweight messages (using MQTT) for all measurable events while a separate monitoring microservice subscribes to these messages. When the monitoring microservice receives a message, it stores the data in a time series DB (InfluxDB) and sends notifications if thresholds are violated. Once the data is stored, it can be visualized in Grafana to identify trends and bottlenecks.

In the presentation, I described the details of the Node.js implementation, real-world examples of how this system has been used to keep the Bluemix UI running smoothly without spending a lot of money, and how the system has acted as a “canary in the mine shaft” to find problems in non-UI subsystems before the relevant teams even knew there was an issue!

The slide deck for the presentation is available below:

And, you can also watch it on YouTube:

Takeaways from the Conference

This was my second trip to CF Summit, and in both cases it was a great experience. In my first trip in 2015, I gave a talk with Brian Martin when my team was basically just getting started on our journey to microservices. Then, I was a little naive about what we were getting into, but this time around I was far more battle-hardended and had more in-depth knowledge and experiences to share.

One thing I noticed in the questions afterward this time is that there were more people who came up to me and asked questions specific to their own journeys re-architecting monoliths. This tells me there are a lot of organizations struggling with what to do with their legacy code bases and that they are hungry for guidance. Of course, post talk questions are far from scientific. But, I found it interesting nonetheless.

One thing I emphasized to these folks was to not underestimate the need for robust monitoring as they build out their own microservices. As I went into far more detail in my second talk, I think this was the biggest mistake we made when we started the Bluemix UI migration.

Oh, yeah… Wally World

I stayed at the Hilton across the street from the Santa Clara Convention Center where the conference was held. From my room, I had a great view of Levi’s Stadium and California’s Great America.

Every morning I’d look out from my window and see the vast parking lots for both facilities sitting empty:

Empty Parking at Great America

And, each day I kept hoping the Griswold’s would come driving up in their family truckster and see the park was closed, just like Wally World was in 1983. Smiley Face

We have been listening to your feedback on the Bluemix UI and have used that to design a brand new user experience (UX) that we believe will streamline your workflows. The new experience is now live for your immediate use. When you visit the Bluemix UI, you can choose to opt-in for the new experience via a “Try the new Bluemix” link in the header bar:

Try New Bluemix Screenshot

In this blog, we’ll walk you through the new taxonomy organizing your resources, the redesigned catalog, the updated flows for creating new compute resources, the reorganized app details page, and more!

All Category Cards Screenshot

Original blog post co-authored with Amod Bhise.