Wednesday, February 14, 2024

Making Releases Routine

Last year I experienced something I had not experienced for a while: a four month stabilisation period. A core of the work of testing-related transformations I had been doing with three different organizations was to bring down release timeframes, from these months long versions to less than an hour. Needless to say, I considered the four month stabilisation period a personal fail. 

Just so that you don't think that you need to explain me that failing is ok, I am quite comfortable with failing. I like to think back to a phrase popularised by Bezos 'working on bigger failures right now' - a reminder that too safe means you won't find space to innovate. Failing is an opportunity for learning, and inevitable when experimenting in proportion to successes. 

In a retrospecting session with the team, we inspected our ways and concluded that taking many steps away from a good known baseline with insufficient untimely testing, this is what you would get. This would best be fixed by making releases routine

There is a fairly simple recipe to that:

  1. Start from a known good baseline
  2. Make changes that allow for the change you want for your users
  3. Test the changes in a timely fashion
  4. Release a new known good baseline
The simple recipe is far from easy. Change is not easy to understand. And it is particularly difficult if you only see the change in small scale (code line) and not in system (dependencies). And it is particularly difficult if you only see the system but not the small scale. In many teams developers have been pushed too small, and testers have not been pushed small enough. This leads to delayed feedback because the testing done in timely fashion misses results in testing that starts to lag behind from changes. 

In addition to results gap on information we need and information we have and its time dimension, the recipe continues with release steps. Some include all of the results gap in release tests because testing can't learn to be timely, muddling the waters of how long it takes to do a release. But even when feature and release testing are properly separated, there can be many steps. 

In our team's efforts of making releases routine, I just randomly decided this morning that today is a good day for release. We have a common agreement that we would do release AT LEAST once a month, but if practice is what we need, more is better. For various reasons, I had been feature testing changes as they come. I have two dedicated testers who were already two weeks behind on testing, and if I learned something from last year's failing, it's that junior testers have less developed sense of timing of feedback, partially rooted in the fact that skills in action need rehearsing at slower pace. Kind of like learning to drive a car, slow down while turning the wheel and looking around are hard to do at the same time! I was less than an hour away from completing feature testing at time of deciding for the release. 

I completed testing - reviewed the last day of changes, planned tests I wanted, and tested those. All that was remaining then was the release.

It took me two hours more to get the release wrapped up. I listed the work I needed to do: 
  • Write release notes - 26 individual changes to message worth saying
  • Create release checklist - while I know it by heart, others may find it useful to tick off what needs doing to say its done
  • Select / design title level tests for test execution (evidence in addition to TA - test automation)
  • Split epics to this release - other release so that epics reflect completed scope over aspirational scope. and can be closed for the release
  • Document per epic acceptance criteria, esp. out of scope things - documentation is an output not input, but if I was testing, it was a daily output not something to catch up at release time
  • Add Jira tasks into epics to match changes - this is totally unnecessary but I do that to keep a manager at bay, close them routinely since you already tested them at pull request stage
  • Link title level tests to epics - again something normally done daily as testing progresses, but this time was left outside the daily routine
  • Verify traceability matrix of epics ('requirements') to tests ('evidence') shows right status 
  • Execute any tests in test execution - optimally one we call release testing and would take 15 minutes on the staging environment
  • Open Source license check - run license tool, compare to accepted OSS licenses and update licenses.txt to be compliant with attribution style licenses
  • Lock release version - Select release commit hash and lock exact version with a pull request
  • Review Artifactory Xray statistics for docker image licenses and vulnerabilities
  • Review TA (test automation) statistics to see it's staying and growing
  • Press Release-button in Jira so that issues get tagged - or work around reasons why you couldn't do just that 
  • Run promotion that makes the release and confirm the package
  • Install to staging environment - this is something from 3 minute run a pipeline to 30 minutes do it like a customer does it
  • Announce the release - letting others know is usually useful
  • Change version for next release in configs
This took me about 2 hours. I skipped the install to staging though. And I have a significant routine in all these tasks. What I do in a few hours, the experience shows it takes about a week when moved forward and about a day for me in answering questions. Not a great process. 

There are things that could be done to start a new release, in conjunction with Change version for next release: 
  • Create release checklist
There are things that should become continuous on that list: 
  • Select / design title level tests 
  • Split epics
  • Document per epic acceptance criteria
  • Add Jira tasks into epics to match changes 
  • Link title level tests to epics
  • Verify traceability matrix
  • Execute any tests in test execution
There's things that could happen on a cadence that have nothing to do with releases:
  • Review Artifactory Xray statistics 
  • Review TA (test automation) statistics
And if we made these changes, the list would look a lot more reasonable:
  • Write release notes
  • Execute ONE test in test execution 
  • Open Source license check 
  • Lock release version
  • Press Release-button in Jira
  • Run promotion that makes the release
  • Install to staging environment
  • Announce the release
  • Change version for next release
And finally, looking at that list - there is no reason why all but the meaningful release notes message can't happen on ONE BUTTON. 

I like to think of this type of task analysis visually: 

This all leaves us with two challenges: extending the release promotion pipeline to all the chores and the real challenge: timely resultful testing by people who aren't me. Understanding that delta has proven a real challenge now that I am not a tester (with 26 years of experience coined into contemporary exploratory testing) and a significant chunk of my time is on my other role: being a manager. 

Friday, January 19, 2024

The Power of Framing

Sometimes, we write on topics we have not researched, but still have things to say on. This is how I frame this post: I am not an expert in framing. There is admirable levels of eloquence, excellent teaching materials I have seen, but my practice of this is one of a learner. 

Me setting the stage of the post is framing. You put a perspective around a thing, that allows you to see the thing. It might be that you are framing to see things in a similar light, or you might use framing to change the narrative on a topic. Today I had two examples in mind that I wanted to make a public note of. 

Example 1: "I'm a bad direct report" --> "I'm an employee with entrepreneurial touch" 

In a conversation about managing up - managing your manager - we ended up talking about understanding what is important to you, and that what you seek may differ from what others seek. I expressed importance of agency, the sense that I am on the controls of my work and expectations for my work are things I negotiate, rather than take as givens. When someone violates my agency, I react strongly.

This lead to someone else sharing how they consider themselves "bad direct report" and have chosen entrepreneurship where traits like established structures and rules, obeying without the why, questioning why, asking questions for deeper understanding, and built in drive for the better are helpful and welcome. 

I recognised the sentiment and similarity to how I think, yet noted my choices have turned very different. I have chosen to join organizations and swim against the stream. As reaction to the story, I realized I frame the same story as "I am an employee with entrepreneurial touch", and "rebel at work", who can move mountains for organizations if they appreciate the likes of me. 

I did not even think of this as framing, I just shared how I have placed a frame on something that I could choose to frame, very realistically that I am many bosses nightmare. I don't obey, I seek goals and I motivate routines through playing fairly with others as I don't think I am entitled to break flows other people rely on, without taking them along for the change ride. 

Example 2: "This company has not given me training for 2 years" --> "Learning matters"

Another sample of framing was inspired by noticing a frame of describing a true experience: "This company has not given me training for 2 years". To see the frame, we note the definition of training. 

Training in this case is not taking the compulsory e-learning courses where even a manager is required to check you have completed the training. 

Training is not taking online courses that have an immediate impact on the work ongoing. 

Training is money out of companies' pocket to send me somewhere I could not go with my salary. It is a salary surplus. 

To frame this for myself, I drew a picture 2 months ago, that I then shared on social media. 


If I frame training to learning, I can see a variety of options. I can see that while the visible money on my learning may not match my expectations, my experience of being allowed to use invisible money (doing work slower) has definitely been an investment to my learning. 

The picture includes a few "Done is 2023" ticks with visible money, as I got to go to EuroSTAR and HUSTEF (as keynote speaker, company paying my daily allowance). That's not where I learned though. I learned the most in 2023 from push benchmarks, meaning I shared how we work / what our problem is, and got guidance on things other people tried - the community approach. 

The same learning framing affords me agency. Instead of "no training given", I can assess "learning I made space for, with support/roadblocks". 

Framing changes how I feel about the same true experience. And how I feel about it changes what I can do. 

Friday, December 8, 2023

Model-Based Testing in Python

Some of the best things about being a tester but also a manager is that I have more direct control over some of the choices we make. A year ago I took part in deciding that my main team will have no testers, while my secondary team had two testers. Then I learned secondary team did some test automation not worth having around and fixed that by hiring Ru Cindrea, co-founder of Altom, to help the secondary team. She refactored all of the secondary team's automation (with great success I might add). She took over testing an embedded product the secondary team had ownership on (with great success again). And finally, I moved her to my primary team to contribute on a new concept I wanted since I joined Vaisala 3,5 years ago: applying model-based testing. This all has happened in a timeframe of one year. 

Some weeks back, I asked her to create a presentation on what we have on model-based testing and what that it about, and she did that. There is nothing secret in this type of testing, and I was thinking readers of this blog might also benefit from a glimpse into the concept and experience.

The team has plenty of other automated tests, traditional hand-crafted scripts in various scopes of the test target. These tests though, they are in scope of the whole system. The whole system means two different products with a dependency. The programmatic interfaces include a web user interface, a remote control interface to a simulator tool, and an interface to the operating system services the system runs on. This basically means a few python libraries for purposes of this testing, for each interface. And a plan to add more interfaces for control and visibility at different points of observation of the system. 

In explaining why of model-based testing, Ru used an example of a process of how we give control and support for people observing weather conditions with the product to create standard reports. In the context we work with, there are reports that someone is creating on a cadence, and these reports then get distributed globally to be part of a system of weather information. Since there's an elaborate network that relies on cadence, the product has features to remind about different things that need to happen. 


Choosing some states as screenshots of what a user sees, showing why model-based testing became a go-to approach for us makes sense. Traditionally we would analyse to draw these arrows, and then hand-craft test scenarios that would go through each arrow, and maybe even combinations of a few of the most important ones. Often designing those scripts was tedious work, and not all of us were visually representing them. We may even have relied on luck on what ended up as the scripts, and noting some of the paths we were missing through the exploratory testing sessions where we let go of the preconceived idea of the paths. 

The illustration of screenshots is just a lead in to the model. The model itself is a finite state model with nodes and vertices, and we used AltWalker (that uses GraphWalker) on creating the model. 

The model is just a representation of paths, and while visual, it is also available as a structured text file. Tests are generated based on the model to cover paths through the model, or to walk through the model even indefinitely. 

For the application to respond, we needed some code. For the webUI part, we use selenium, but instead of building tests, we build something akin to POM (page object model) with functions to map with the actions (arrows) and checks (boxes) in the model. 
For use of Selenium, we have three main reasons. One is that Playwright that we have used otherwise on UI tests in this product does not play with AltWalker (for now, I am sure it is pending to change) due to some dependency conflicts. The other that I quite prefer, is that Selenium WebDriver is the only driver for real browsers to the level of what our users use. It gives us access to real Safari, real and specific versions of Chrome and Firefox, instead of an approximation that works nicely for most application-level testing stuff we do with playwright. The third reasons is that while Ru learned Playwright in some days to no longer find it new, she has decade on Selenium and writes this stuff by heart. The same days of effort suffices for the others on playwright to learn enough of Selenium to work on these tests. 

We had other python libraries too, but obviously I am more comfortable sharing simple web page actions than showing details of how we do remote control of complex sensor networks and airport might have. 

In the end of Ru's presentation, she included a video of a test run with the visual showing ramping up the coverage. We watched boxes Turing from grey to green, our application doing things as per the model, and I was thinking about how I keep doing exploratory testing sometimes by looking at the application run, and allowing myself to pick up cues from what I see.  For purposes of this article, I did not include the video but a screenshot from it. 
In the end of the short illustration, Ru summed up what we have gained with model-based testing. 

It's been significantly useful in reliability testing we have needed to do, in pinpointing performance-oriented reliability issues. The reliability of the tests has been a non-issue. 

A system like this runs for years without reboots, and we need a way of testing that includes production-like behaviors with sufficient randomness. A model can run until it's covered. Or a model can run indefinitely until it's stopped. 

The obvious question from people is if we can generate the model from the code. Theoretically perhaps in a not distant future. But do we rather draw and think, or try to understand if a ready drawing really represents what we wanted it to represent. And if we don't create a different model, aren't we at risk of accepting all the mistakes we made and thus not serving the purpose we seek for testing it? 

We hope the approach will enable us to get a second organization acceptance testing our systems to consume test automation with its visual approach, and avoid duplication in organizational handoff, and to increase trust without having to rebuild the testing knowledge to same levels as with an iteratively working software development team would have. 

While the sample is one model, we already have two. We will have more. And we will have combinations of feature models creating even more paths we no longer need to manually design for. 

I'm optimistic. Ru can tell what it is like to create this, and she still speaks at conferences while I do not. I'm happy to introduce her in case you can't find her directly. 


Tuesday, November 21, 2023

GitHub Copilot Demos and Testing

I watched a talk on using generative AI in development that got me thinking. The talk started with a tiny demo of GitHub Copilot writing a week number function in Javascript. 


This was accepted as is. Later in the talk, the speaker even showed a couple of tests also created with the help of the tool. Yet the talk missed an important point: this code has quite significant bug(s).

In the era of generative AI, I could not resist throwing this at ChatGPT with prompt: 

Doesn't this give me incorrect week numbers for early January and leap years?

While the "Yes, you're correct." does not make me particularly happy, what I learn in the process of asking that this is an incorrect implementation of something called ISO week number, and I never said I wanted ISO week number. So I prompt further: 

What other week numbers are there than ISO week numbers?

I learn there's plenty - which I already knew having tested systems long enough. 

The original demo and code came with the claim of saving 55% of time, and having 46% of code in pull requests already being AI generated. This code lead me down an evening rabbit hole with a continuous beeping from mastodon by sharing this. 

Writing wrong code fast is not a positive trait. Accepting - and leading large audiences to accept this - is making me angry.

 Start with better intent. Learning TDD-like thinking actually helps. 

Writing wrong code fast. Writing useless documentation fast. Automating status updates in Jira based on other tools programmers use. Let's start with making sure that the thing we are automating is worth doing in the first place. Wrong code is't. Documentation needs readers. And management can learn to read pull requests. 

 


Thursday, November 16, 2023

Secret sauce to great testing is to change the managers

I had many intentions going into 2023, but sorting out testing in my product team was not one of them. After all, I had been the designated testing specialist for all of 2022. We had a lovely developer-tester collaboration going on, and had made agreement on not hiring a tester for the team, at all. Little did I know back then. 

The product was not one team, it was three teams. I was well versed on one, aware (and unhappy with) another, and oblivious to third. By end of this year, I see the three teams will succeed together, as they would have failed together not paying attention to the neighbours. 

Looking at the three teams, I started with a setup where two teams had no testers, and one team had two testers. I was far from happy with the team with two testers. They created a lot of documentation, and read a lot of documentation, still not knowing the right things. They avoided collaboration. They built test automation by first writing detailed scenarios in jira tickets, and then creating a roadmap of those tickets for an entire year. You may not be surprised that I had them delete the entire roadmap after completing the first ticket from that queue. 

So I made some emergent changes - after all, now I was the manager with the power. 

I found a brilliant system expert in the team I had been unaware of, and repositioned them to the designated testing position I had held the previous year with assumption of complete career retraining. Best choice ever. The magic of seeing problems emerged without me having to do all of that alone. 

I tried asking for better testing with the two testers I had, and watching the behaviors unfold I realized I had a manager problem. I removed two managers from between me and the tester that had potential, and have been delightedly observing how they know excel in what I call contemporary exploratory testing. They explore, they read (developer owned) automation, add to shared scope of automation what they learned should be in by exploring, and have conversations with the developers instead of writing them messages in Jira. 

I brought in two people from a testing consultancy I have high respect for, Altom. I know they know what I know of good testing, and I can't say that of most test consultancies. First consultant I brought in helped me by refactoring the useless test automation created by those scenarios in jira tickets instead of good information targeting thinking. We created ways of remotely controlling the end to end system, and grew that from one team to two teams. We introduced model based testing for reliability purposes, and really hit something that enabled success through a lot of pain this year. A month ago, I brought in a second person from Altom, and I am already delighted to see them stepping up to cross team system test responsibilities clarification while hands-on testing. 

So in total, I started off with idea of two traditional testers I'd rather remove, and ended up with four contemporary exploratory testers who test well, automate well on many scopes and levels, and collaborate well. In exchange, I let go a developer manager with bad ideas of testing (forcing idea of *manual testing with test cases* I had hard time getting away from) and a test lead following the advice others gave over advice I gave. 

We get so much more information to react on, in so much more actionable way this way. I may be biased as I dislike the test case based approaches to the core, but I am very proud of the unique perspectives these four bring in. 

Looks like our secret sauce to great testing was to change the managers. Your recipe may vary. 

¨

Across two teams, I facilitated a tester hat workshop. Tester is something we all are. But specialist testers often seek for the gaps the others leave behind. Wanting to think you are something does not yet mean you are skilled as that. Testing is many hats, and being a product and domain expert capturing that knowledge in executable format is something worth investing in to survive with products that stay around longer than we do. 



Wednesday, November 15, 2023

Planning in Bubbles

We have all experienced the "process" idea to work we are doing. Someone identifies work and gives it label, and uses some words to describe what this work is. That someone introduces the work they had in mind for the rest of us in software development team, and then we discuss what the work entails. 

Asking for something small and clear, you probably get clearer responses. But asking for something small and clear, you are also asked for vision, and what is around those small and clear things that we should be aware of. 

Some teams take in tasks, where work that wasn't asked for comes in as next task. Others carry more of a responsibility by taking in problems, discovering the problem and solution space, and carry somewhat more ownership. 

For a long time, I have been working with managers who like to bring tasks not problems.  And I have grown tired of being a tester (or developer) in a team where someone thinks they get to chew things for me. 

A year ago - time before managering - I used John Cutler's Mandate Levels model as ways of describing the two teams I was interfacing then, and how essentially they had different mandates. 

In a year of managering, team B now works on mandates A-C, and team A now works only on A-E, no matter how much I would like us to move all the way to F. 

With the larger mandate level and success of hitting good forward taking steps in measuring cost per pull request (and qualitatively assessing direction of steps), team A has not needed or wanted Jira. We gave up on it 1,5 years ago, cut cost of change to a fifth, and did fine until we did not. Three months ago I had no other choice but to confess to myself that with a new role and somewhat less hands-on testing on my days, we had accrued quality debt - unfinished work - in scales we needed Jira to manage it. Managing about half of it in Jira resulted in 90 tickets, and we still continued the encouragement of not needing a ticket to do a right thing. 

Three months later, we are a few items short of again having a baseline to work on. 

With the troubles accrued, I have theories of what causes that, and I call that theory the modeling gap. The request of a problem coming in leaves so much open for interpretation, that without extra care, we leave tails of unfinished work. So I need a way of limiting Work in Progress, and a way of changing the terms of modeling to language of majority (the team) for language of minority (product owner). 

I am now trying out an approach I call The Team Bubble Process. It is a way of visualising work ongoing with my large mandate team where instead of intaking tasks, we describe (and prioritise) tasks we have as a team for a planning period. We don't need the usual step by step process, but we need to show what work is in doing, and what is in preparing to do. 

The first bubble is doing - coding, testing, documenting, something that will see the customer and change what the product does for the customer. All other work is in support of this work. 

The second bubble is planning, but we like to do planning by spiking. Plan a little, try a little. Plan a little, try more. It's very concrete planning. Not the project plans, but more like something you do when you pair or ensemble. 

The third bubble is preplanning, and this is planning that feeds doing the right things. We are heavily tilted towards planning too much too early, which is why we want to call this preplanning. It usually needs revisiting when we are really available to start thinking about it. It's usually the work where people in support of the team want to collaborate on getting their thoughts sorted, typically about roadmap, designs and architectures. 

The fourth bubble is preparing. It's yet another form of planning, now the kind of planning that is so far from doing that done wrong can be detrimental. Done well, can be really helpful in supporting the team. Work of prepare often produces things that the team would consider outside the bubbles, just to be aware. 


I have now two sets of two-weeks modelled. 

For first we visualized 8 to Do, and completed 4, continue on 4; 4 to Plan, and continue on all yet never starting 1; 1 to Preplan we, and never started it; 11 to Prepare, completed 3, continue on 4 and never started 4. 

For second we visualized 14 to Do, 14 to Plan, 6 to Preplan, and 9 to Prepare. 

What we already learned is that we have people who hope their responsibility and collaboration with team on something they think they'd like to prepare (but too early) will do work outside the bubbles and not be keen on sharing the team agenda. It will be fascinating to see if this helps us, but at least it is better than pretending Jira "epics" and "stories" are useful way of modeling our work. 

Wednesday, October 25, 2023

Anyone can test but...

Some years ago, I was pairing with a developer. We often would do work on the code that needed creating, working on the developer home ground. So I invited us to try something different - work that was home ground to me. 

There was a new feature the other developers had just called complete, that I had not had time to look at before. No better frame of cross-pollinating ideas, or so I thought.

Just minutes into our pairing session of exploring the feature, I asked out loud something about sizes of things we were looking at. And before I knew, the session went from learning what the feature might miss for the user to the developer showing me bunch of tools to measure pixels and centimeters, after all, I had sort of asked about sizes of things. 

By end of pairing session, we did not do any of the work I would have done for the new feature. I was steamrolled and confused, even deflated. It took weeks or months and a conference talk to co-deliver on pairing before I got to express that we *never* paired on my work. 

I am recounting this memory today because I had a conversation with a new tester in a team with one of testerkind, and many of developerkind. The same feelings of being the odd one out, being told (and in pairing directed) to do the work the developer way resonated. And the end result was an observation on how hard it it to figure out if you are doing good work as tester when everyone thinks they know testing, but you can clearly see the results gap showing their behaviors and results don't match their idea of knowing testing so well.


I spoke about being the odd one out in TestBash in 2015. I have grown up as tester with testing teams, learning from my peers, and only when I became the only one with 18 developers as my colleagues, I started to realize that I had team members and lovely colleagues, but not a single person who could direct my work. I could talk to my team members to find out what I did not need to do, but never what I need to do. My work turned to finding some of what others may have missed. I think I would have struggled without having had those past colleagues who had grown me into what I had become. And I had become someone who reported (and got fixed) 2261 bugs over 2,5 years before I stopped reporting bugs and started pairing on fixes to shift the lessons to where they matter the most. 


For a better future, I hope two things: 
  1. Listen and make space more. Look for that true pairing where both contribute, and sit with the silent moments. Resist the temptation of showing off. You never know what you would learn.
  2. Appreciate that testing is not one thing, and it is very likely that the testing you do as developer is not the same as what someone could add on what you did. Be a better colleague and discover the results gap, resisting the refining the newer person to a different center of focus. 
I searched my peers and support in random people of the internet. I blogged and I paired like no other time when I was the odd one out. And random people of the internet showed up for me. They made me better, and they held me on growth path. 

If I can be your random people of the internet to hold you on growth path, let me know. I'm ready to pay back what people's kindness has given me. 

Anyone can test. Anyone can sing. But when you are paid to test as the core of your work week, coverage and results can be different to what they would be if you needed to squeeze it in.