One of the core tenets of Bond engineering is focusing on building products that continue to add value to our customers and differentiate Bond in the market.
The Brand Experience product team at Bond was tasked with creating a brand-focused web application that would allow our customers to navigate an easy-to-use dashboard and suite of tools that consolidates everything you need to manage your program with Bond — all in one place. While we’re proud of the new product that we launched, which is called Bond Portal, this blog is focused on the technological improvements that were also made under the hood.
The first major decision we needed to make was whether to build on a web application we already had or to start fresh with a new application. At Bond, we debate quickly, then commit; being a fully remote team, Bond has embraced a culture of communication through writing. We start engineering projects by writing an RFC (Request for Comment) that details why we should do a given task, how we anticipate doing it, and how we’ll measure success. These RFCs are shared broadly for open feedback and comments across our engineering team. Ultimately, we ended up creating a new application in order to give us a stronger architectural foundation and improve developer productivity.
Bond’s number one goal was to improve upon the internal developer experience. The team generally had a very difficult time iterating on this codebase. We categorized improvement in a few categories such as did engineers enjoy working with the tech stack?, do we feel it can attract top talent?, and general engineering velocity.
Iterating on the application and making any new changes in the code previously took too long due to the way the application code was structured, the quality of code written, and general ownership. We've learned from our mistakes and here's how and where we've corrected in order to improve our DevEx.
Standardize code structure
Bond uses React as our frontend framework. The React framework allows engineers to structure their applications in a myriad of ways versus something like Ember or Angular that has very clear front-end architecture it expects you to abide by. We had contractors and Bond engineers simply pushing code without large consideration given to the bigger architecture due to the relatively loose nature of React projects.
Software engineering is already complex enough, but if your dependencies have no defined boundaries and can inherit from anywhere they like, then the intricacies will surely increase exponentially over time. Our original-but-since-defunct project, Bond OS, had React contexts that lived everywhere while some contexts were nested in other contexts — which were also in random places. This made it extremely difficult to track down where the state lived and a great example of how something that seemed unrelated could cause multiple issues.
The easiest way to solve issues caused by poor project structure is to not allow it in the first place but sometimes hindsight is 20/20. This is where opinionated frameworks like Nextjs come in handy; they tell you how to build the applications. Bond selected the NextJs framework to build our new frontend onas it’s the fastest growing React framework with almost three million weekly downloads.
NextJs gives us an opinionated way to structure and build our application with their page model. We don’t have to mess around with React Router much and it has made it extremely easy to triage issues and find the team responsible. We have product verticals at Bond so if a page is having issues, it will not leak to other pages in the application and we can map 1:1 to the product vertical having issues.
Colocate related code
Bond has also restructured our application to improve the colocation of our code. Previously, we had to search up to 22 components deep and sometimes even laterally to debug an issue or write out a feature. By colocating our files to the nearest child of its dependents, it allows us to form a top-down approach where dependencies can only be pulled from its parent or the same level.
Above is a loose example of our directory. If we deleted or modified module_a, we would be assured that shared components would be unaffected and module_b would be unaffected. It would also apply if we deleted module_b; module_a’s dependencies would be unaffected, as well. The only piece of code that could potentially affect both modules would live in shared. Whenever a line of code is altered in a directory which is global-like, we can set our CODEOWNER’s setting to alert multiple teams and require approval in order to decrease the chances that a wide blast radius bug could occur.
Despite our efforts to solve the issues being reported in Bond OS with SonarQube code quality checks, Datadog, Sentry, and Pagerduty alerts, with coverage thresholds, it was clear that the codebase was still struggling.
If Bond wished to thrive as a business and recruit strong engineers, we needed to change our practices.
Tests that make sense
The only thing that matters in testing is if your tests are meaningful. It’s more useful to write tests that resemble how a real user or process would behave using your systems.
For example, much of the 75%+ coverage that we had is from tests that simply just check if a component has been rendered. These are not meaningful tests and will rarely prevent an issue from being shipped.
Bond migrated away from using and paying for SonarQube as it did not, at the time, offer us any value that we couldn’t get from Jest, which is free.
We currently use MSW and Cypress to provide much of the value in our UI testing; Cypress acts as a real user that interacts with the browser. Bond has integration tests that navigate a real deployed website with user interactions such as clicking, mouse movement, and keyboard typing. These have offered a more realistic testing scenario and much less mocking. The less we mock, the more we can assert that our tests mimic real user scenarios.
Don’t page me in the middle of the night
Bond had a lot of Datadog monitors that would trigger multiple times a day. There’s a familiar and almost communal feeling to teams seeing the bombardment of messages from monitors across ourSlack channels. However, what made this difficult was that they were all attached to PagerDuty alerts that would call and text you — especially when companies preach about work-life balance.
Bond’s number one rule for a PagerDuty alert is, “Can I do something about this alert and is it something that requires manual intervention?” Having too many alerts that are not actionable is almost the same as having no alerts at all. People will end up silencing their phones when they sleep, ignoring pages, and avoid monitoring the Slack channels.
In Bond’s new Brand Portal, we made the conscious decision to get rid of all PagerDuty alerts that a user could not immediately act on. All alerts were accompanied by monitors that we created and not something generic like, “Page us when we cross a five error threshold in five minutes.”
This approach provides defined ownership within the application; we know exactly what we need to look out for and the steps to mitigate it. All alerts are accompanied by runbooks and would be classified as something that is a SEV-2 or greater. We’re also constantly refining monitors that might be too sensitive or page when we don’t think they should have.
Bond has never had a SEV-2 incident or greater since we cleaned up our monitoring for this application.
Bond made this application to be Continuously Deployed by default. Once a pull request is approved and merged to main, we no longer need to manually judge N amount of stages to ship to production; it will automatically happen (pending passing builds and tests). This gives Bond a serious competitive advantage because we ship code to production multiple times a day, which, in turn, means we’re continuously improving our application for our consumers. Our competitors might be only shipping every few weeks — or worse, every few quarters — out to production. Based on CircleCi’s 2022 state of software delivery study, Bond is doing pretty good!
Of course, continuous deployment is a privilege and not a right. To decrease the chance that buggy code is shipped out to production, Bond abides by a series of continuous integration steps. We make sure that, at the PR step, the application can build, linters and unit tests pass, and the integration tests pass. Once a code reviewer approves the PR, we merge to our main branch where our application is deployed on non-public resources for smoke tests to run again. Once those have passed, the application is then deployed into the public internet where we have a series of monitors in Datadog that offer observability into our application and regular synthetic tests that fire to make sure that parts of our application are functioning properly.
At Bond, we aim to make development easy as easy development makes for better products and faster iteration cycles. This has paid dividends for us as we now have built API logs, Bond Instant, and soon to be customer search in a matter of week(s) rather than months. These features allow customers to derive more value from our product, gain insights into what the market wants, and make differentiated products in the marketplace.