A Seasoned Tester's Crystal Ball: February 2024

Tuesday, February 27, 2024

Contemporary Bug Advocacy

A few weeks back in Mastodon, Bret Pettichord dusted up a conversation about something we talked about a lot in testing field years ago, Bug Advocacy. Bug advocacy was something Cem Kaner discussed a lot, and a word that I was struggling with translating to Finnish. It is a brilliant concept in English but does not translate. Just not.

Bug advocacy is this idea that there is work we must do to get the results of testing wrapped in their most useful package. A lot of the great stuff on BBST Bug Advocacy course leads one to think it is a bug reporting course, but no, it is a bug research course. A brilliant one at that. Bug advocacy is the idea that just saying it did not work for you does little. It actually has more of a negative impact. Do your research. Report the easiest route to the bug. Include the necessary logs. Make the case for someone wanting to invest time in reading the bug report, even under constraints of time and stress.

Bug advocacy was foundational for me as a learning tester 25 years ago. It was essential at time of publishing Lessons Learned in Software Testing. It was essential when Cem Kaner created the BBST training materials. And it is sometimes like a lost art these days. In other words, it is something people like me learned and practices so long, that we don't remember to teach it forward.

At the same time, I find myself broadcasting public notes in Mastodon and LinkedIn on theme that I would call Contemporary Bug Advocacy - an essential part of Contemporary Exploratory Testing. Like the quote from Kaner's and Pettichord's book says, we kind of need to do better than fridge lights that do all their work while no one is using the results.

Inspired by the last two weeks of testing with my team, I collected a listing of tactics I have employed in contemporary bug advocacy.

Drive by Data. I "reported" a bug by creating data in a shared test environment that made a bug visible. The bug vanished in a day. No report, no conversation. Totally intentional.
Power of Crowd. I organized an ensemble testing session. Bugs we all experienced vanished by end of the day. I have used this technique in the past to get complete API redesign needs by smart use of the right crowd.
Pull request with a fix. I fixed a bug, and sent the fix for review for the developer. Unsurprisingly, a fix can be more welcome than a task to fix something.
Silent fix. I just fix it so that we don't have to talk about it. People notice changes with their routines of looking at what is going on in the code.
Pairing on fix. I refused to report, and asked to pair on the fix for me to learn. Well, for me to teach too. Has been brilliant way of ramping up knowledge of problems dealing with root causes rather than repeated symptoms.
Holding space for fix to happen. A colleague sat next to me while I had not done a simple thing, making it clear they were available to help me but would not push me to pairing.
Timely one liner in Jira. I wrote title only bug report in Jira. That was all the developer needed to realize they could fix something, and the magic was this all happened within the day of the bug being created while they were still in context.
Whisper reporting. I mentioned a bug without reporting it. Developers look great when they don't have bug reports on them. I like the idea of best work winning when we care about the work over credit. Getting things fixed is work, claiming credit with report is sometimes necessary but often a smell.
Failing test. Add a failing test to report a bug, and shift work from there. Great for practicing very precise reporting.
Actual bug report. Writing great summary, minimal steps to repro and making clear your expected and actual results. Trust it comes around, or enhance your odds by other tactics.
Discuss with product owner. Your bug is more important when layered with other people's status. I apply this preferably before the report for less friction.
Discuss with developer. Showing you care about colleagues priorities and needs enhances a collaboration.
Praise and accolades. Focus your messaging on the progress of vanishing, not emerging. Show the great work your developer colleagues are doing. So much effort and so little recognition, use some of your energy in balancing to good.
Sharing your testing story. Fast-forward your learning and make it common learning. A story of struggles and insights is good. A shared experience is even better.
Time. Know when to report and how. Timing matters. Prioritise their time.

All of these tactics are ways to consider how to reduce friction for the developer. Advocate. Enable. Help. Do better than shine light when the door is closed.

I call this contemporary because writing a bug report is simple. It is basic. But adding layers of tactics to it, that is far from simple. It is not a recipe but a pack of recipes. And you need to figure out what to apply when.

I found nine problems yesterday. I applied four different tactics on those nine problems. And I do that because I care about the results. Results of testing is information we have acted upon. Getting the right things fixed, and getting the sense of accomplishment and pride for our shared work in building a product.

Wednesday, February 21, 2024

Everyone can test but their intent is off

Over my 8 years of ensemble and pair testing as primary means of teaching testers, I have come to a sad conclusion. Many people who are professionally hired as testers don't know how to test. Well, they know how to test but from their testing, there is a gaping results gap. Invisible one. One they don't manage or direct. And the sad part is that they think it is normal.

If you were hired to do 'testing' and you spent all your days doing 'testing', how dare I show up to say your testing is off?!? I look at results, and the only way to look at results you provide is to test after you.

My (contemporary) exploratory testing foundations course starts from a premise of giving you a tiny opportunity to assess your own results, because the course comes with the tool of turning invisible ink to visible, that is listing of problems me (and ensembles with me) have found across some hundreds of people. I used to call it 'catch-22' but like usual with results of testing, more work on doing better has grown the list to 26.

Everyone can test, like everyone can sing. We can do some slice of the work that resembled doing the work. We may not produce good enough results. We may not produce professional results that lead into paying for that work. But we can do something. Doing something is often better than doing nothing. So the bar can be low.

An experience at work leads me to think that some testers can test but their intent is so far off that it is tempting to say they did not test. But they did test, just the wrong thing.

Let me explain the example.

The feature being tested is one of sending pressure measurement values from one subsystem to another, to be displayed there. We used to have a design where the value of the first subsystem that was used on its user interfaces after various processing steps was sent forward to the second subsystem. Then we mangled that value because it followed no modern principles of programmatically processable data so that we could reliably show the value. We wanted to shift the mangling of the data to the source of the data, with information about the data, just in a beautiful modern wrapper of a consistent data model.

Pressure measurement was the first in the line of beautiful modern wrappers. The assignment for the tester was to test this. Full access to conversations with the developer was available. And some days after the developer said "done and tested", the tester also came back with "tested!". I started asking questions.

I asked if one of the things they tested was to change configurations that impact pressure values so that they can see differences of pressure at sea level (a global comparison point) and at the measurement location. I got an affirmative response. Yet I had a nagging feeling, and built on the yes inviting the whole team to create a demo script for this pressure end to end. Long story short, not only did the tester not know how to test it, it was not working. So whatever they called testing was not what I was calling testing. The ensemble testing session also showed that NO ONE in the team knows the feature end to end. Everyone conveniently looks at a component or at most a subsystem. So we all learned a thing or two.

Equipped with the information from the ensemble testing experience, the tester said to take more time testing before coming back with "tested!". They did, and today they came back - 7 working days later. I am well aware that this was not the only thing they had on their agenda - even if their agenda is on their full control - and we repeated the dance. I started asking questions.

They told me they updated the numbers in the model we created for the ensemble testing session. I was confused - what numbers? That model sketched early ideas of how three height parameters would impact the measurements in the end, but it was a quick sketch from an hour of work, not a fill in the ready blanks model. So I asked for demo to understand.

I was shown how they changed three parameters of height. Restart the subsystem. Basic operations of the subsystem they have been testing for over a year. The interesting conversation was on what did they then test. It turns out they did the exact same moves on the subsystem on a build without the changes and on a build with the changes. They concluded that since the difference they see is in sending the data forward but the data is the same, it must be right. Regression oracle. But very much partial oracle, and partial intent.

In the next minutes, I pulled up a 3rd party reference and entered the parameters to have comparable values. We learned that they had the parameters wrong, because if the values aren't broken with the latest change, then the change of configuration is likely incorrect. They did not explore values for plausibility, and they were way off.

I asked to show what values they compared, to learn they chose an internal value. I asked to pull up the first subsystem's user interface for the comparable values. Turns out that the value they compared is likely missing multiple steps of processing to become the right value.

For junior testers such as this one, I expect I will coach them by having these conversations. I have been delighted with picking up information as new information comes, and I follow the trend of not having to cover same ground all the time. I understand how this blindness to result comes about: in most cases testing a legacy system, the regression oracle keeps them on path. In this case, it leads them to the wrong path. They only took ISTQB course which does not teach you testing. I am their first person to teach this stuff. But it is exhausting. Especially since there are courses like mine or like BBST that they could learn from, and provide the results for the right intent. Learn to control their intent. Learn to *explore*.

At the same time, I am thinking that all too often juniors to testing - regardless of their years of experience - could learn slightly more before becoming same costs as developers. This level of thinking would not work for a junior developer.

Testers get by because they can be just without value. Teamwork may make them learn or become negative value. But developers don't get by because the without value of their work gets flagged sooner.

Wednesday, February 14, 2024

Making Releases Routine

Last year I experienced something I had not experienced for a while: a four month stabilisation period. A core of the work of testing-related transformations I had been doing with three different organizations was to bring down release timeframes, from these months long versions to less than an hour. Needless to say, I considered the four month stabilisation period a personal fail.

Just so that you don't think that you need to explain me that failing is ok, I am quite comfortable with failing. I like to think back to a phrase popularised by Bezos 'working on bigger failures right now' - a reminder that too safe means you won't find space to innovate. Failing is an opportunity for learning, and inevitable when experimenting in proportion to successes.

In a retrospecting session with the team, we inspected our ways and concluded that taking many steps away from a good known baseline with insufficient untimely testing, this is what you would get. This would best be fixed by making releases routine.

There is a fairly simple recipe to that:

Start from a known good baseline
Make changes that allow for the change you want for your users
Test the changes in a timely fashion
Release a new known good baseline

The simple recipe is far from easy. Change is not easy to understand. And it is particularly difficult if you only see the change in small scale (code line) and not in system (dependencies). And it is particularly difficult if you only see the system but not the small scale. In many teams developers have been pushed too small, and testers have not been pushed small enough. This leads to delayed feedback because the testing done in timely fashion misses results in testing that starts to lag behind from changes.

In addition to results gap on information we need and information we have and its time dimension, the recipe continues with release steps. Some include all of the results gap in release tests because testing can't learn to be timely, muddling the waters of how long it takes to do a release. But even when feature and release testing are properly separated, there can be many steps.

In our team's efforts of making releases routine, I just randomly decided this morning that today is a good day for release. We have a common agreement that we would do release AT LEAST once a month, but if practice is what we need, more is better. For various reasons, I had been feature testing changes as they come. I have two dedicated testers who were already two weeks behind on testing, and if I learned something from last year's failing, it's that junior testers have less developed sense of timing of feedback, partially rooted in the fact that skills in action need rehearsing at slower pace. Kind of like learning to drive a car, slow down while turning the wheel and looking around are hard to do at the same time! I was less than an hour away from completing feature testing at time of deciding for the release.

I completed testing - reviewed the last day of changes, planned tests I wanted, and tested those. All that was remaining then was the release.

It took me two hours more to get the release wrapped up. I listed the work I needed to do:

Write release notes - 26 individual changes to message worth saying
Create release checklist - while I know it by heart, others may find it useful to tick off what needs doing to say its done
Select / design title level tests for test execution (evidence in addition to TA - test automation)
Split epics to this release - other release so that epics reflect completed scope over aspirational scope. and can be closed for the release
Document per epic acceptance criteria, esp. out of scope things - documentation is an output not input, but if I was testing, it was a daily output not something to catch up at release time
Add Jira tasks into epics to match changes - this is totally unnecessary but I do that to keep a manager at bay, close them routinely since you already tested them at pull request stage
Link title level tests to epics - again something normally done daily as testing progresses, but this time was left outside the daily routine
Verify traceability matrix of epics ('requirements') to tests ('evidence') shows right status
Execute any tests in test execution - optimally one we call release testing and would take 15 minutes on the staging environment
Open Source license check - run license tool, compare to accepted OSS licenses and update licenses.txt to be compliant with attribution style licenses
Lock release version - Select release commit hash and lock exact version with a pull request
Review Artifactory Xray statistics for docker image licenses and vulnerabilities
Review TA (test automation) statistics to see it's staying and growing
Press Release-button in Jira so that issues get tagged - or work around reasons why you couldn't do just that
Run promotion that makes the release and confirm the package
Install to staging environment - this is something from 3 minute run a pipeline to 30 minutes do it like a customer does it
Announce the release - letting others know is usually useful
Change version for next release in configs

This took me about 2 hours. I skipped the install to staging though. And I have a significant routine in all these tasks. What I do in a few hours, the experience shows it takes about a week when moved forward and about a day for me in answering questions. Not a great process.

There are things that could be done to start a new release, in conjunction with Change version for next release:

Create release checklist

There are things that should become continuous on that list:

Select / design title level tests
Split epics
Document per epic acceptance criteria
Add Jira tasks into epics to match changes
Link title level tests to epics
Verify traceability matrix
Execute any tests in test execution

There's things that could happen on a cadence that have nothing to do with releases:

Review Artifactory Xray statistics
Review TA (test automation) statistics

And if we made these changes, the list would look a lot more reasonable:

Write release notes
Execute ONE test in test execution
Open Source license check
Lock release version
Press Release-button in Jira
Run promotion that makes the release
Install to staging environment
Announce the release
Change version for next release

And finally, looking at that list - there is no reason why all but the meaningful release notes message can't happen on ONE BUTTON.

I like to think of this type of task analysis visually:

This all leaves us with two challenges: extending the release promotion pipeline to all the chores and the real challenge: timely resultful testing by people who aren't me. Understanding that delta has proven a real challenge now that I am not a tester (with 26 years of experience coined into contemporary exploratory testing) and a significant chunk of my time is on my other role: being a manager.

A Seasoned Tester's Crystal Ball