Harness R&D Update — Q3 ‘22

Sri Ramalingam
5 min readNov 7, 2021

First of all, congratulations to the entire Harness team for a fantastic Q3 — in every dimension. Quick highlights on the product, engineering and operations.

Product

Thanks to the entire product, design and engineering teams, we now have 5 modules completely available on our next generation platform. Our CD 2.0 module with select K8S use cases is now available in production. We have added almost 90+ new features and enhancements across Cloud cost, CI Enterprise, Feature Flags, Drone, Test Intelligence and CD 1.0.

Our current portfolio

Cloud Operations

Our production SLAs have improved significantly while our deployment frequency went up by 20%. We have introduced new platform features in Q3 to contain blast radius of occasional production incidents and that resulted in five 9 availability for most of our customers in our two clusters.

Q3 SLA’s

Harness production consists of about 40 micro-services and run in 3 isolated clusters.W e compute our CFR not just based on the failure in production affecting x% of our customers but every single regression or a P1 issue is considered a failure in production. Our Change Failure Rate were slightly up from Q3 but both the recovery time and lead time have improved by about 15% from Q3.

Deployment Frequency, Change Failure Rate, Recovery Time and Lead time to production

Running a complex SaaS platform in production with this kind of availability is no small feat and a massive congratulations to the entire platform and SRE teams for staying up late and working through the weekends looking into every single production alert.

Quality

Our focus on delivering happiness to customers starts with tracking every single regression issue (and treating them as a production show stopper) to making sure that we keep a healthy balance of reducing bug backlog, improving automation of test cases, making sure our test cases are reflective of how our customers use our products and continue to making investments to address technical debit. The total number of regressions were less than 13 across all the modules including our core platform. CFDs (Customer Found Defects, a key quality metric we track every single day) averaged about 3–4 per week in Q3 which is well below our forecasted numbers. Congratulations to all of our CD engineering teams on the dedication and focus on the some of the areas we had weakness in automation on and improving them.

CFD’s in CD

Developer Effectiveness

Developer effectiveness is key to execute at scale and speed. We have established a council with lead developers with different teams to track and measure the problem areas and inefficiencies with the internal tools, build and test environments. While we need to do a lot more in the next 2 quarters to improve significantly in this area, our teams have made some good progress. We have moved our back end Java builds to Bazel to take advantage of its remote caching. With that, our back end build times have come down to about 25 mins. With 19,000+ unit tests that runs on every pull request, our test intelligence becomes a vital platform to pick a fraction of tests that needs to run and bring down the overall PR times and the PR cost by 30%. We still have a list of larger issues (flaky tests, local dev pain points, IntelliJ w/bazel, functional tests) to address but we are on the right path.

Cloud Cost

Thanks to our own Cloud Cost Platform, we closely track our non production costs across of our environments and also the spend on GCP services. We have enabled auto stopping rules (part of our cloud cost platform) in our internal environments and we are seeing significant cost savings … Huge congratulations to the entire cloud cost team for adding new features in Q3 including auto stopping for GKE.

Cost savings from auto stopping in internal environments

Our overall non prod cost still can be optimized but getting a real time visibility with our CCM platform is huge for our leadership teams.

Cost visibility of our non pro environments

UI Council

We have an UI council run by a select lead engineers and architects with the charter of establishing clear UI standards and frameworks, coding guidelines and to provide mentorship for other UI engineers across the teams. One of the well run and successful engineering council here at Harness given that the council members are also working on other deliverables.

People

Thanks to our recruiting and HR teams (and our employees for referrals), we have added more engineers, designers, engineering leaders, tech writers and product managers to our teams in Q3 and we ended Q3 with 220 people across North America, Europe and India. Delivering happiness to our own employees is the key to make our customers successful. We continue to run internal engagement surveys within R&D once in 100 days … we continue to get feedback from our own teams on what more we can do to make their experience at Harness joyful and productive. I have a solemn obligation to make this happen, continue to listen to their feedback and make Harness engineering a great place to learn and work.

Learning

One of the key objectives for running an engineering QBR is to understand what did not go well and what do we about it. There are few areas we will continue to focus on to improve — startups are about executing well at scale and speed while balancing and managing priorities across new products, new features in existing products, technical debit, customer requests, internal tooling, development process, developer effectiveness and an inspired team who see a larger purpose and meaning to what they do everyday.

And To Q4

We are working on new modules which will be launched for beta by end of Q4. Stay tuned. We continue to expand our engineering, product and UX teams across North America (all of US and Canada), Ireland, Belgrade, India and Moldova. Reach out to us if you want to be part of a talented team, work on latest tech stacks across multiple products and/or on the platform, build a career and be part of an exciting journey at a well respected and valued start up.

image credit: fineartamerica

--

--

Sri Ramalingam

Sri is currently SVP of Engineering at the fast growing software delivery startup Harness.io. He writes about engineering leadership and strategy.