Introduction

The IBM Cloud console provides the web front end for IBM Cloud, an open and secure public cloud providing enterprise-class IaaS and PaaS capabilities with over 170 managed services covering data, AI, IoT, security, blockchain, and more. I’ve had the pleasure of working on the console for most of the last decade. As a more junior developer, I wrote the first lines of proof-of-concept front-end code before IBM Cloud even had a name. This cloud was made generally available as IBM Bluemix in 2014, rebranded as IBM Cloud in 2017, and incorporated the fully integrated SoftLayer experience in 2018.

I became the lead full-stack architect in 2014, and had the opportunity to work with a countless number of talented developers, UI designers, and offering managers as the console’s architecture and user experience (UX) were regularly updated and improved to keep pace with the maturing IBM Cloud. Last year, I left this amazing team for another role as an architect on the IBM Cloud for Financial Services. As I reflect on the experience, I’ll share with you how the console architecture has evolved from its monolithic origins into a highly scalable, modern microservice-based system composed of dozens of micro front ends. In particular, I’ll highlight the critical role that Node.js played in this evolution.

What is the IBM Cloud console?

The IBM Cloud console serves as a UX consolidation point for the entire IBM Cloud ecosystem, with UIs contributed by more than 75 teams across IBM that cover a wide range of IBM Cloud features and functions. With these contributions, the console lets users create, view, and manage all of their IBM Cloud resources. It also provides a UX for a number of other core platform capabilities, including:

  • Registration: Allows users to sign up for IBM Cloud
  • Identity and access management: Lets users manage authentication and control access to IBM Cloud resources
  • Account: Offers the ability to manage IBM Cloud accounts
  • Catalog: Provides users the ability to view and order IBM Cloud offerings
  • Billing/usage: Allows users to monitor their IBM Cloud usage and view their bill
  • Support: Lets users request help and report problems
  • Docs and API docs: Provides information on how to use IBM Cloud and IBM Cloud APIs

The following image shows the console’s dashboard.

IBM Cloud Console: Dashboard UI

Origins as a monolith

The original console was deployed to the IBM public Cloud Foundry instance as a single monolithic Java application on the back end. This application served APIs as well as the static resources for the monolithic front end that was written using the Dojo Toolkit. The front end was a single-page application (SPA) where all HTML, CSS, and JavaScript resources were loaded within a single web page​. The front end made AJAX calls to consume APIs provided by the Java server. At the time, this monolithic Dojo/Java combination was one of the most common architectural patterns in IBM.

The following image shows the original architecture.

IBM Cloud console: Original monolithic architecture

Cracks begin to show

The early console’s function was focused entirely around Cloud Foundry apps, services, and billing. A few development squads worked on the initial release, and generally things were going pretty well. However, cracks began to show in the monolithic architecture. Some of the issues included:

  • Fragile code: Developers started to become afraid of making changes for fear of introducing unintended consequences and breaking the application.
  • Lack of resiliency: The monolithic Java server was a single point of failure. If the Java back end went down, the entire UI was down. We had horizontal scaling in place with multiple instances and load balancing between them, but if null pointer exceptions, memory leaks, or other problems caused the instances to crash repeatedly, then the whole console was compromised.
  • Small changes difficult to deploy: Code updates required building, testing, and redeploying the entire application, even for very small changes.
  • Limited ability for other teams to contribute: The relatively small core console team was becoming a bottleneck for the ever increasing number of features demanded of a rapidly expanding cloud. Teams throughout IBM wanted to extend the console to include UIs for a multitude of additional compute types and managed services, but there was no easy way for them to extend the existing code base with new features.
  • Locked into technology stack: The monolithic architecture locked us into using Dojo and Java. In particular, Dojo was beginning to lose favor within IBM as the world of front-end web development was exploding with interesting toolkits.
  • Poor performance: The static resources of the Dojo-based SPA were large, and the volume of client-initiated AJAX requests to fill in page content led to unacceptable performance.
  • Lack of search engine optimization (SEO): The HTML served by the SPA was really just a shell for loading required JavaScript and CSS. So, there was no content for web crawlers to process.

For these reasons, it became clear that the monolithic architecture was not going to scale to support our goals over the long term, and it was obvious that we needed a new direction.

New architecture to the rescue

Microservices and micro front ends

As we looked to break free of the constraints of the monolith, we settled on a microservices architecture with many smaller, loosely coupled services that we called “plug-ins.” Each plug-in would be a micro front end, or vertical slice representing a distinct area of responsibility within the overall UX. The plug-ins would be independent entities made up of client-side functions as well as a corresponding stateless, server-side backend-for-frontend (BFF) that would host the UI resources and provide any APIs needed by the front end. The composition of these plug-ins would form an experience that looked and behaved like one cohesive web application.

We expected this approach to provide numerous benefits, including:

  • Allowing for paced migration: We had real users who expected existing functions to remain in place while also gaining new features. We couldn’t just start from scratch, nor could we pause development while taking the time to rewrite our entire codebase. The microservices architecture would let us continue running the monolith while we steadily broke apart its features into micro front ends that would run alongside it.
  • Limited blast radius for changes: In general, new bugs would be isolated to one plug-in and not break other plug-ins.
  • Increased resiliency: The single point of failure of the monolith would be eliminated. So, if the instances of a plug-in started crashing, the rest of the console could still run.
  • More granularity when deploying updates: Plug-ins would be deployed independently, so teams would have the freedom to fix issues or make enhancements on their schedule without requiring the entire console to be rebuilt and redeployed.
  • Improved cross-team contributions: Plug-ins could be provided by any squad who wanted to build one. This eliminated the bottleneck of a single team being responsible for developing and maintaining a monolith.
  • Flexibility in technologies: The base architecture didn’t require that a plug-in use a particular technology stack. This would free us from Java and Dojo, and let teams make technology decisions for their back-end and front-end code that made sense for them.
  • Better performance: Instead of one large monolithic SPA, emphasis would be placed on building small services optimized for speed and page size​. Perceived performance would be improved by generating more content on the server and sending in the initial HTML payload to ensure that key parts of the page were in place on initial rendering. In addition, pages would try to use vanilla JavaScript where possible. And, where vanilla JavaScript wasn’t feasible, pages would use only lightweight frameworks.
  • Improved SEO: With more content being generated on the server, web crawlers would be able to index the parts of the console that we wanted to appear in search engine results.

Prominent role of Node.js

Even though one of the selling points of the new architecture was flexibility in technology choice across the plug-ins, we didn’t want to create a wild west with every plug-in making different choices. We recognized that being strongly opinionated in some areas would increase efficiencies in code reuse and enable knowledge transfer as developers moved between plug-in squads.

To that end, we made a key early decision to strongly recommend Node.js for back-end microservice development. Node.js was a great fit for several reasons, including:

  • JavaScript runtime: JavaScript is relatively easy to learn, and it’s used by nearly 68% of all developers (according to the 2020 Stack Overflow survey.

  • Transfer of skills up and down the stack: The use of JavaScript would enable our developers to transfer front-end JavaScript skills to the back end, and vice versa. This meant developers would be able to effectively work on the full stack without having to become proficient in a second language.
  • Robust ecosystem: We believed developers would be able to do more, with less effort by virtue of the rich Node.js ecosystem. For example, the public NPM registry includes thousands (now millions) of open source tools and modules written and supported by developers all over the world.
  • Well-suited for our workloads: Node.js would be able to handle the demands of our highly scalable, non-CPU intensive BFFs by virtue of it being optimized as an asynchronous event-driven runtime.

Micro front end architectural pattern

The diagram shows the general micro front-end architectural pattern that was proposed.

IBM Cloud Console: Typical Micro Front end Pattern

The pattern’s flow (based heavily on work done by colleague Dejan Glozic) involved the following steps:

  1. The page request comes into server-side proxy.
  2. The proxy routes the request to the appropriate Node.js microservice based on the URL (for example, a request for /catalog would get routed to our Catalog microservice acting as the BFF for the Catalog UI).
  3. The microservice handles the request to return HTML to the client:
    • Consults the shared session store (for example, Redis) for the user token (which would have been inserted in cache during the login process).
    • Collects information needed for the initial page rendering by invoking back-end IBM Cloud APIs, other microservices, databases, and so on. Special attention would need to be paid here to not make expensive calls on the server-side that would delay the HTML response and leave the user seeing a blank screen for an extended period.
    • Invokes the Common Header API (provided by a special purpose microservice) to get HTML for a shared header to include in the initial payload​.
    • Uses server-side templating to insert all data into one HTML payload.
  4. The browser renders the pages and loads any required (and hopefully lightweight) JavaScript and CSS.
  5. The page might then make additional API calls back to the BFF to get additional data and execute user-initiated actions.

The insertion of the common header is one of the most important parts of the process. All of the pages in the console, regardless of what service produced them, needed to look like they were part of one seamless application. Including the same header on every page would serve to unify the entire experience. This is shown in more detail in the following image.

IBM Cloud Console: Page Composition

Executing the migration from monolith to microservices

Armed with these ideas and the dreams of numerous benefits, we started down the path of transitioning to the new architecture. One problem every team faces when migrating from monolith to microservices is determining how best to partition the existing monolith into microservices. Unfortunately, there’s no one-size-fits-all paradigm for making this determination, and it’s often more art than science.

In our case, we wanted to start small, and we chose two relatively simple, mostly static parts of our UI – the home page and marketing solutions pages – for the initial proof-of-concept (PoC). The following image shows that architecture.

IBM Cloud Console: Early Interim Architecture

In the image, you see that we added a proxy plus micro front ends for Home, Solutions, and Common Header. Additionally, the Java monolith continued to run alongside these new microservices. As alluded to earlier, it was key that we could stage the migration to the new architecture because it wasn’t feasible to start over. This meant slowly breaking out logical pieces of the monolith and rewriting them for the new architecture.

Over the course of the next two years, we were able to entirely eliminate all vestiges of the original monolith and complete the migration to the new architecture. We were also able to greatly increase the number of developers and teams who could independently contribute plug-ins to the larger console on their own schedules. This was done by enabling the console proxy to route requests outside of the console cluster and out to microservices deployed and managed by other teams. All of this can be seen in the following figure:

IBM Cloud Console: Current Architecture with External Plugins

New challenges

You might think the story is over at this point – that we knew it all, the migration was easy, and everything has been smooth sailing ever since. But, make no mistake, there have been plenty of growing pains both during those first couple of years of migration and the few years since. Developing and running a scalable, resilient, highly available, performant microservice-based architecture on the cloud is hard. I’ve often joked that if we knew all of the problems we’d encounter along the way, we might not have done it at all.

But, of course, no one and no team can know everything up front, nor can a team do everything they’d like to all at once. Being agile enough to make mid-course corrections based on new information and new challenges is critical. To that end, the console architecture and ecosystem have not stayed static and continue to evolve today. The next several sections explore some of the major challenges that popped up, and how the team rose to the occasion to address them. Despite the challenges, the path we chose has paid off immensely.

Enabling console developers

One of our initial goals was to enable a large ecosystem of console developers from both the core console team as well as other teams from across IBM Cloud. In the early days, we had a few samples and reusable Node.js modules. This got us by, but we needed to make developing console plug-ins more efficient and as self-service as possible in order to scale.

So, over the years, we’ve built a pretty robust set of console developer resources. These include assets and processes needed for UI developers to quickly and easily build and maintain plug-ins, such as:

  • Best practices​ and guidelines
  • Node.js starter app
  • Reusable UX patterns and components
  • NPM modules for enabling session management
  • Console APIs (and associated Swagger docs)
  • Plug-in extension points
  • End-to-end test framework
  • CI/CD pipeline

Best practices and guidelines

We wanted to ensure that all plug-ins were held to the same high standards to ensure the best customer experience possible. A major part of achieving this goal was establishing a clear set of expectations and responsibilities required to participate in the console ecosystem.

These expectations were codified in a set of best practices in these areas:

  • Onboarding
  • UI design guidelines and reviews
  • Coding standards
  • Testing and monitoring
  • Performance and reliability
  • Security
  • Accessibility
  • Globalization and translation
  • Documentation
  • Incorporation of analytics tools

These plug-in best practices were made mandatory as part of the internal IBM Cloud Service Framework, which all service teams must be validated against before their offerings appear in the IBM Cloud catalog.

Opinionated Node.js starter app

One important requirement we levied is that new plug-ins must be based on a minimal standard UI starter app written with Node.js. The starter app includes everything needed for a basic micro front end and is intended to help developers adhere to the best practices laid out in the previous section. The starter uses the Express web application framework and pulls in custom modules acting as Express middleware to enable authentication, session management, and so on.

In support of front-end development, the starter includes packages for:

  • Carbon Design System: The IBM external open source design system for products and digital experiences. With the IBM Design Language as its foundation, the system consists of working code, design tools and resources, human interface guidelines, and a vibrant community of contributors.
  • Cloud Pattern and Asset Library (PAL): Internal to IBM, this library provides service team developers with reusable React components and hooks to build user interfaces on IBM Cloud. The library is built using the Carbon Design System and components from Carbon Components React to create other components and patterns that are consistent across IBM Cloud.
  • webpack: This is used to produce nicely packaged modules of UI resources that can be served to the browser.

The use of these UI libraries is a key element in promoting uniformity and consistency in the UX, while still giving teams freedom to be creative.

The starter highlights the power of the Node.js module system and support of build-time tools. Developers can run the starter app locally in a way that makes it seem like it is part of a development deployment of the full console. With that, we’ve found that developers can take the starter app and quickly become productive.

CI/CD pipeline

The importance of a robust CI/CD pipeline was one of the biggest things we underestimated, especially as the number of microservices increased. But, over time, we built a pipeline for the console that we’re proud of – so much so that we now strongly encourage all plug-in teams to use the pipeline rather than build their own. We still support routing proxy requests to externally hosted plug-in deployments, but teams who have integrated with this central pipeline have saved countless hours of development and maintenance.

The pipeline features:

  • Easy onboarding where the plug-in team simply provides a GitHub repo containing their microservice code
  • Automatic builds as code is pushed to GitHub
  • Enablement of rolling deployments through development, test, and production​ environments
  • Automatic deployment in all console clusters around the world​
  • Ability to promote individual microservices on demand
  • Vulnerability scanning of container images
  • Integration of build, unit, and end-to-end testing to help improve quality​
  • Red/black deployments giving both on-deck and live environments that allow for quick reverts to the previous code in case something goes wrong​

The pipeline now deploys nearly 100 independent microservices​ per console deployment.

Monitoring and troubleshooting

We also greatly underestimated the importance of monitoring and troubleshooting when microservices were introduced. If you’ve worked on UIs, you know that defects often get routed to the UI team first regardless of where the problem might be in the system. I’ve often referred to the console as the “canary in the mine shaft” for IBM Cloud as a whole. Because it acts as the front end to IBM Cloud, anything not working correctly in any part of IBM Cloud probably impacts some part of the console.

It became a matter of self-preservation to add monitoring so that we could quickly:

  • Determine whether the problem was actually in the console or elsewhere in the cloud
  • Identify which microservices might be implicated
  • Troubleshoot and fix any console problems that were found

In the early days of our microservice migration, we developed a bespoke monitoring system to track status codes and response times for every inbound/outbound request on every microservice with results available in Grafana dashboards. After this system was in place, we were no longer “flying blind” and were often able to identify problems in various parts of IBM Cloud even before the affected teams knew there was an issue.

The system has evolved, but is still in place today. One of our most recent and important changes was to leverage OpenCensus (now merged with OpenTelemetry). This has enabled us to construct distributed traces in Jaeger to see how requests flow from microservice to microservice in our system.

Improved high availability and resiliency

Moving to a microservice architecture was important for high availability and resiliency. However, that alone did not get us where we wanted to be. There were two other changes made that took our availability to the next level that had nothing to do with microservices or Node.js. I bring this up because even with the most solid of microservice architectures, you still need to think about how you deploy the system to achieve your availability goals.

Adopting Kubernetes

You might recall that our first microservices architecture ran on Cloud Foundry. A critical part of the evolution of the console was replatforming to the IBM Cloud Kubernetes Service. We moved to Kubernetes because it allowed us to achieve greater performance, scalability, reliability, and security than we had on Cloud Foundry.

Each console deployment uses a Kubernetes cluster with 9 worker nodes running on 16 cores with 64 GB of memory.

Geo-load balancing and failover

The decision that made an even greater impact on our availability was implementing geo-load balancing and failover. This basically means routing a user’s request to the geographically closest console deployment which is also healthy. The odds of all deployments being unhealthy at the same time is much less than one deployment being unhealthy.

The production console is deployed to clusters in 9 different regions all around the world. IBM Cloud Internet Services monitors the health of these deployments through a health check endpoint on each cluster provided by the console team. When a request goes to our single global URL (https://cloud.ibm.com), Cloud Internet Services returns the IP address for the nearest console deployment. If a health check in a particular region shows a problem, then Cloud Internet Services returns the IP address for the next closest healthy region​.

IBM Cloud Console: Geo Load Balancing

Conclusion

I’m now in a new role, but it’s amazing to look back and see how far we’ve come since our monolithic beginning. There were challenges moving to a microservices architecture and the evolution since, but the benefits have far outweighed the problems. We’ve seen dramatic benefits in terms of improved resiliency, performance, developer productivity, and ability to incorporate plug-ins from teams across IBM Cloud.

The architecture continues to evolve, and there are things we’d do differently today based on what we’ve learned. But, of all of the decisions we’ve made along the way, we never regretted basing our microservices on Node.js. It was and continues to be the right runtime for the job and has enabled the console to become a true Node.js success story.

Acknowledgements

Special thanks to all of the amazing architects, developers, UI designers, and offering managers who have worked so hard over the years to make the console what it is today. It’s truly been a collaborative team effort, and I’m thankful I got to be a part of it.

Update 4/6/23: This blog was originally published on the IBM Developer site, and I had posted only an excerpt here. But, IBM Developer decided to archive the original, so now I’m including the full text on my site.

Introduction

Earlier this month, we launched a revamped IBM Cloud Platform Experience at our new cloud.ibm.com location. Our primary goal was to unify our IaaS and PaaS offerings to better meet your needs. This was a massive undertaking with changes up and down the stack. We’re very excited by the outcome, and we think you will be too.

Prior to this unification, there were two distinct UI’s at different URLs – one for IaaS and one for PaaS. This understandably led to a lot of frustration and confusion. We wanted to fix this by creating a seamless experience with one UI at a single URL to manage everything. With that in mind, the user experience of our console was overhauled to bring new and/or improved capabilities in the following areas:

  • dashboard (see screenshot below)
  • resource management
  • search and tagging
  • catalog/purchasing
  • account/user management
  • billing
  • support
  • overall performance
  • and more!

  IBM Cloud Console Dashboard

Rollout of cloud.ibm.com

With such massive changes, we faced a number of interesting questions about how to best rollout the new experience. I will summarize our thought process here, but for more details about our strategy, see A/B Test Deployment — Rollout of cloud.ibm.com written by myself and colleague Nic Sauriol.

We were very sensitive to the fact that the holiday season is a key time of year for many of our clients, and we did not want to do anything to disrupt their operations. On the other hand, we believe the new experience offers improvements that our clients could benefit from immediately. Ultimately, we decided to go with an A/B rollout strategy with two consoles running side by side. The diagram below shows a very high-level architecture. Both consoles are deployed separately, but use exactly the same set of IBM Cloud Platform APIs on the backend.

Rollout Architecture for cloud.ibm.com

Users in the A group see the existing experience at console.bluemix.net, while users in the B group get the enhanced console at cloud.ibm.com. However, the choice of console is entirely up to each individual user. At some point in the new year, we will retire the experience at console.bluemix.net and force a transition. But, until then, users are free to switch back and forth between both experiences.

From a technical perspective, we use the IBM Cloud Kubernetes Service to host the microservices that make up the console. By using this service, it’s very simple to spin up Kubernetes clusters to host both experiences. We can easily make use of the same build pipeline, Helm Charts, and so on, but with different code branches for the new experience.

(NOTE: Something not captured in the high-level architecture is that both consoles are actually made up of many geographically load balanced clusters placed all over the world. This is used to provide for high availability and failover. The concepts behind this are described in my previous blog Global IBM Cloud Console Architecture.)

Other Related Content

Aside from the A/B rollout blog summarized above, many teammates of mine produced an extensive set of blogs with detailed information about all of the changes found in the unified experience. I highly recommend you check these out!

Conclusion

I sincerely hope you enjoy the new experience, and would love to get your feedback (both good and bad) in order to learn and better meet your needs. So, please reach out and let me know how it goes!

Introduction

In a blog post a few months ago, I announced that my team had released the new “global” IBM Cloud Console (formerly Bluemix Console) allowing all public regions of the IBM Cloud platform to be managed from a single location: https://console.bluemix.net. This took us from four addresses (one for each of the four public IBM Cloud regions at the time) to a single geo load-balanced address. Users would now always get the UI served from the geographically closest (and healthy) deployment, resulting in improved performance and enabling failover to greatly increase availability. In this post, I’ll dig a bit deeper so that you can gain insight into what it would take for you to build similar solutions with your own IBM Cloud apps. In particular, I’ll discuss:

  • High-level features of our architecture and the enabling third-party products we used
  • Things to think about while building a health check to determine when failover should occur
  • Considerations for coding apps so they are enabled to run in multiple data centers
  • Using the architecture to smooth the transition to new technology

Hybrid Architecture With Akamai and Dyn

The IBM Cloud Console has long been fronted by two offerings from Akamai:

  • Akamai Kona for web application firewall (WAF) and distributed denial of service (DDoS) protection.
  • Akamai Ion for serving static resources via a content delivery network (CDN), finding optimal network routes, reusing SSL connections, etc.

With the implementation of the global console, we added the Dyn Traffic Director (TD) product to the mix. All requests to console.bluemix.net are sent to the Akamai network, and then Akamai acts as a proxy. Akamai does a DNS lookup to determine the IP address to forward the request to by using a host name which has been configured using a Dyn TD. The Dyn configuration is setup to spread traffic based on geo location to one of six IBM Cloud data centers. This is shown in the diagram below.

Global Console High-level Architecture

Since Akamai also offers its Global Traffic Management (GTM) product for load balancing, it may seem a bit strange to be using this hybrid solution. But, we had a an existing contract with Dyn, and we simply decided to leverage that instead of adding GTM to our Akamai contract. This has worked quite well for us.

Importance of Health Check

Every 60 seconds or so, Dyn checks the health of each console deployment by probing a health check API written by my team. If the health check cannot be reached or if it responds with a 40x or 50x error, then Dyn marks the associated data center as unhealthy and takes it out of the rotation. This is what is meant by a “failover.” At this point, requests that would go to the unhealthy deployment, are instead sent to the next closest deployment. In this way, the user never knows there was an issue and continues working as if nothing has happened. Eventually, when the health of the failed deployment recovers, Dyn will put it back in the rotation and route traffic to it.

In the diagram below, the Tokyo deployment of the console is unhealthy, and traffic that would normally go there starts flowing to Sydney.

Global Console High-level Architecture With Failed Data Center

Clearly, the algorithm used by the health check plays a very important role in the overall success of the architecture. So, when building your own health checks, you should think carefully about the key components that influence the health of your deployments. For example, Redis is one of our absolutely critical components because we use it to store session state. Without Redis, we cannot maintain things like a user’s auth token. So, if one of our deployments can no longer connect to its local Redis, then we need to failover.

On the flip side, there may be other dependencies that are not nearly as critical. For example, if our Dallas console deployment cannot connect to the API for Cloud Foundry (CF) in Dallas, the majority of the console functionality will continue to work. Other console deployments probably can’t connect to the API either, so there is probably not much point in failing over.

Finally, the health check can be very helpful for making proactive failovers easy. For example, we made our health check configurable so we can force it to return an error code. We have made use of this on several occasions such as when we knew reboots were required while patches for Meltdown and Spectre were being deployed in SoftLayer. We elected to take console deployments in those data centers out of the rotation until we knew those data centers (and our deployments within them) were back online.

Impact to Microservice Implementation

As described in a previous post, each console deployment contains a set of roughly 40 microservices running behind a reverse proxy. In our original implementation, our microservices tended to be tied to APIs in the region they were deployed to. For example, our Dallas deployment could only manage CF resources in Dallas, our London deployment only CF resources in London, and so on. This is illustrated in the pre-Dyn diagram below where microservices in the three data centers only talk to the “backend” within the same region.

Microservices Tied to One Backend

This worked fine for us when we had a separate URL for each console deployment and users knew they had to go to the London console URL to manage their London resources. However, this architecture was not conducive to the goals of global console where we wanted the UI to be served from the geographically nearest data center and for it to continue to be accessible even if all but one deployment failed. In order to accomplish this, we needed to decouple the microservices from any one specific region and enable them to communicate with equivalent APIs in any of the other regions based on what the user was requesting. This is shown in the diagram below.

Microservices Communicate With Different Backends

Of course, an astute reader might point out we’d be even better off if all of the backend APIs provided their own globally load balanced endpoints. Then a console microservice would be able to always point at the same host names no matter where deployed. And, indeed, we do have many APIs in the IBM Cloud ecosystem that are moving in that direction.

Smoothing Migration from Cloud Foundry to Kubernetes

This architectural update has been great for us in many ways, and has given us much more flexibility in determining where to deploy the console throughout the world. It has also had the added benefit of making it easy for us to roll-out deployments running on different technologies without end users ever knowing.

Historically, the console has run on Cloud Foundry on the IBM Cloud, but we are nearly done with a migration to Kubernetes (managed by the IBM Cloud Container Service). We have been able to add Kubernetes deployments into the rotation simply by updating our Dyn configuration. This has allowed us to vette Kubernetes fully before turning off our CF deployments entirely. This is represented in the diagram below showing Dyn load balancing between two CF deployments and three Kubernetes deployments.

Load Balancing Between CF and Kubernetes Deployments

Conclusion

We’re excited by the improvements in performance and reliability we’ve been able to provide our customers with the global console. I hope some of the lessons and insights that my team has gained in the process will help your efforts as well.

Related Resources

For additional material on this subject, please see Configure and run a multiregion Bluemix application with IBM Cloudant and Dyn by colleague Lee Surprenant.

Introduction

In June, I had the honor of attending the Cloud Foundry Summit Silicon Valley 2017 conference in Santa Clara, CA. My two submissions related to Bluemix UI architecture were selected, and I got the chance to present them as part of the conference’s Cloud Native Node.js track. In this post, I’ll briefly describe my talks as well as share some general takeaways from the conference.

Topic 1: Microservices Architecture of the Bluemix UI

The full title of my first topic was To Kill a Monolith: Slaying the Demons of a Monolith with Node.js Microservices on Cloud Foundry. The intent of the talk to was to trace my team’s journey migrating the Bluemix UI from a monolithic app to a microservices architecture.

The Bluemix UI (which runs on Cloud Foundry) is the front-end to Bluemix, IBM’s open cloud hosting platform. The original implementation as a single-page, monolithic Java web app brought with it many demons, such as poor performance, lack of scalability, inability to push small updates, and difficulty for other teams to contribute code. Over the last 2 years, the team has been on a mission to slay these demons by embracing cloud native principles and splitting the monolith into smaller Node.js microservices. The effort to migrate to a more modern and scalable architecture has paid large dividends, but has also left behind a few battle scars from wrestling with the added complexity cloud native can bring. The team had to tackle problems in a wide variety of areas, including: large-scale deployments, continuous integration, monitoring, problem determination, high availability, and security.

In the talk, I went on to discuss the advantages of microservice architectures, ways that Node.js has increased developer productivity, approaches to phasing microservices into a live product, and real-life lessons learned in the deployment and management of Node.js microservices across multiple Cloud Foundry environments.

If you’d like to see the full presentation, check out the slide deck below:

Or, if you prefer video, you can watch the talk on YouTube:

Topic 2: Monitoring Node.js Microservices

My second topic was called Monitoring Node.js Microservices on Cloud Foundry with Open Source Tools and a Shoestring Budget. During the migration described in my first talk, we learned that while microservice architectures offer lots of great benefits, there’s also a downside. Perhaps most notably, there is an increased complexity in monitoring the overall reliability and performance of the system. In addition, when problems are identified, finding a root cause can be a challenge. To ease these pains in managing the Bluemix UI, we’ve built a lightweight system using Node.js and other opensource tools to capture key metrics for all microservices (such as memory usage, CPU usage, speed and response codes for all inbound/outbound requests, etc.).

In this approach, each microservice publishes lightweight messages (using MQTT) for all measurable events while a separate monitoring microservice subscribes to these messages. When the monitoring microservice receives a message, it stores the data in a time series DB (InfluxDB) and sends notifications if thresholds are violated. Once the data is stored, it can be visualized in Grafana to identify trends and bottlenecks.

In the presentation, I described the details of the Node.js implementation, real-world examples of how this system has been used to keep the Bluemix UI running smoothly without spending a lot of money, and how the system has acted as a “canary in the mine shaft” to find problems in non-UI subsystems before the relevant teams even knew there was an issue!

The slide deck for the presentation is available below:

And, you can also watch it on YouTube:

Takeaways from the Conference

This was my second trip to CF Summit, and in both cases it was a great experience. In my first trip in 2015, I gave a talk with Brian Martin when my team was basically just getting started on our journey to microservices. Then, I was a little naive about what we were getting into, but this time around I was far more battle-hardended and had more in-depth knowledge and experiences to share.

One thing I noticed in the questions afterward this time is that there were more people who came up to me and asked questions specific to their own journeys re-architecting monoliths. This tells me there are a lot of organizations struggling with what to do with their legacy code bases and that they are hungry for guidance. Of course, post talk questions are far from scientific. But, I found it interesting nonetheless.

One thing I emphasized to these folks was to not underestimate the need for robust monitoring as they build out their own microservices. As I went into far more detail in my second talk, I think this was the biggest mistake we made when we started the Bluemix UI migration.

Oh, yeah… Wally World

I stayed at the Hilton across the street from the Santa Clara Convention Center where the conference was held. From my room, I had a great view of Levi’s Stadium and California’s Great America.

Every morning I’d look out from my window and see the vast parking lots for both facilities sitting empty:

Empty Parking at Great America

And, each day I kept hoping the Griswold’s would come driving up in their family truckster and see the park was closed, just like Wally World was in 1983. Smiley Face

We have been listening to your feedback on the Bluemix UI and have used that to design a brand new user experience (UX) that we believe will streamline your workflows. The new experience is now live for your immediate use. When you visit the Bluemix UI, you can choose to opt-in for the new experience via a “Try the new Bluemix” link in the header bar:

Try New Bluemix Screenshot

In this blog, we’ll walk you through the new taxonomy organizing your resources, the redesigned catalog, the updated flows for creating new compute resources, the reorganized app details page, and more!

All Category Cards Screenshot

Original blog post co-authored with Amod Bhise.