Tuesday, November 21, 2023

GitHub Copilot Demos and Testing

I watched a talk on using generative AI in development that got me thinking. The talk started with a tiny demo of GitHub Copilot writing a week number function in Javascript. 


This was accepted as is. Later in the talk, the speaker even showed a couple of tests also created with the help of the tool. Yet the talk missed an important point: this code has quite significant bug(s).

In the era of generative AI, I could not resist throwing this at ChatGPT with prompt: 

Doesn't this give me incorrect week numbers for early January and leap years?

While the "Yes, you're correct." does not make me particularly happy, what I learn in the process of asking that this is an incorrect implementation of something called ISO week number, and I never said I wanted ISO week number. So I prompt further: 

What other week numbers are there than ISO week numbers?

I learn there's plenty - which I already knew having tested systems long enough. 

The original demo and code came with the claim of saving 55% of time, and having 46% of code in pull requests already being AI generated. This code lead me down an evening rabbit hole with a continuous beeping from mastodon by sharing this. 

Writing wrong code fast is not a positive trait. Accepting - and leading large audiences to accept this - is making me angry.

 Start with better intent. Learning TDD-like thinking actually helps. 

Writing wrong code fast. Writing useless documentation fast. Automating status updates in Jira based on other tools programmers use. Let's start with making sure that the thing we are automating is worth doing in the first place. Wrong code is't. Documentation needs readers. And management can learn to read pull requests. 

 


Thursday, November 16, 2023

Secret sauce to great testing is to change the managers

I had many intentions going into 2023, but sorting out testing in my product team was not one of them. After all, I had been the designated testing specialist for all of 2022. We had a lovely developer-tester collaboration going on, and had made agreement on not hiring a tester for the team, at all. Little did I know back then. 

The product was not one team, it was three teams. I was well versed on one, aware (and unhappy with) another, and oblivious to third. By end of this year, I see the three teams will succeed together, as they would have failed together not paying attention to the neighbours. 

Looking at the three teams, I started with a setup where two teams had no testers, and one team had two testers. I was far from happy with the team with two testers. They created a lot of documentation, and read a lot of documentation, still not knowing the right things. They avoided collaboration. They built test automation by first writing detailed scenarios in jira tickets, and then creating a roadmap of those tickets for an entire year. You may not be surprised that I had them delete the entire roadmap after completing the first ticket from that queue. 

So I made some emergent changes - after all, now I was the manager with the power. 

I found a brilliant system expert in the team I had been unaware of, and repositioned them to the designated testing position I had held the previous year with assumption of complete career retraining. Best choice ever. The magic of seeing problems emerged without me having to do all of that alone. 

I tried asking for better testing with the two testers I had, and watching the behaviors unfold I realized I had a manager problem. I removed two managers from between me and the tester that had potential, and have been delightedly observing how they know excel in what I call contemporary exploratory testing. They explore, they read (developer owned) automation, add to shared scope of automation what they learned should be in by exploring, and have conversations with the developers instead of writing them messages in Jira. 

I brought in two people from a testing consultancy I have high respect for, Altom. I know they know what I know of good testing, and I can't say that of most test consultancies. First consultant I brought in helped me by refactoring the useless test automation created by those scenarios in jira tickets instead of good information targeting thinking. We created ways of remotely controlling the end to end system, and grew that from one team to two teams. We introduced model based testing for reliability purposes, and really hit something that enabled success through a lot of pain this year. A month ago, I brought in a second person from Altom, and I am already delighted to see them stepping up to cross team system test responsibilities clarification while hands-on testing. 

So in total, I started off with idea of two traditional testers I'd rather remove, and ended up with four contemporary exploratory testers who test well, automate well on many scopes and levels, and collaborate well. In exchange, I let go a developer manager with bad ideas of testing (forcing idea of *manual testing with test cases* I had hard time getting away from) and a test lead following the advice others gave over advice I gave. 

We get so much more information to react on, in so much more actionable way this way. I may be biased as I dislike the test case based approaches to the core, but I am very proud of the unique perspectives these four bring in. 

Looks like our secret sauce to great testing was to change the managers. Your recipe may vary. 

¨

Across two teams, I facilitated a tester hat workshop. Tester is something we all are. But specialist testers often seek for the gaps the others leave behind. Wanting to think you are something does not yet mean you are skilled as that. Testing is many hats, and being a product and domain expert capturing that knowledge in executable format is something worth investing in to survive with products that stay around longer than we do. 



Wednesday, November 15, 2023

Planning in Bubbles

We have all experienced the "process" idea to work we are doing. Someone identifies work and gives it label, and uses some words to describe what this work is. That someone introduces the work they had in mind for the rest of us in software development team, and then we discuss what the work entails. 

Asking for something small and clear, you probably get clearer responses. But asking for something small and clear, you are also asked for vision, and what is around those small and clear things that we should be aware of. 

Some teams take in tasks, where work that wasn't asked for comes in as next task. Others carry more of a responsibility by taking in problems, discovering the problem and solution space, and carry somewhat more ownership. 

For a long time, I have been working with managers who like to bring tasks not problems.  And I have grown tired of being a tester (or developer) in a team where someone thinks they get to chew things for me. 

A year ago - time before managering - I used John Cutler's Mandate Levels model as ways of describing the two teams I was interfacing then, and how essentially they had different mandates. 

In a year of managering, team B now works on mandates A-C, and team A now works only on A-E, no matter how much I would like us to move all the way to F. 

With the larger mandate level and success of hitting good forward taking steps in measuring cost per pull request (and qualitatively assessing direction of steps), team A has not needed or wanted Jira. We gave up on it 1,5 years ago, cut cost of change to a fifth, and did fine until we did not. Three months ago I had no other choice but to confess to myself that with a new role and somewhat less hands-on testing on my days, we had accrued quality debt - unfinished work - in scales we needed Jira to manage it. Managing about half of it in Jira resulted in 90 tickets, and we still continued the encouragement of not needing a ticket to do a right thing. 

Three months later, we are a few items short of again having a baseline to work on. 

With the troubles accrued, I have theories of what causes that, and I call that theory the modeling gap. The request of a problem coming in leaves so much open for interpretation, that without extra care, we leave tails of unfinished work. So I need a way of limiting Work in Progress, and a way of changing the terms of modeling to language of majority (the team) for language of minority (product owner). 

I am now trying out an approach I call The Team Bubble Process. It is a way of visualising work ongoing with my large mandate team where instead of intaking tasks, we describe (and prioritise) tasks we have as a team for a planning period. We don't need the usual step by step process, but we need to show what work is in doing, and what is in preparing to do. 

The first bubble is doing - coding, testing, documenting, something that will see the customer and change what the product does for the customer. All other work is in support of this work. 

The second bubble is planning, but we like to do planning by spiking. Plan a little, try a little. Plan a little, try more. It's very concrete planning. Not the project plans, but more like something you do when you pair or ensemble. 

The third bubble is preplanning, and this is planning that feeds doing the right things. We are heavily tilted towards planning too much too early, which is why we want to call this preplanning. It usually needs revisiting when we are really available to start thinking about it. It's usually the work where people in support of the team want to collaborate on getting their thoughts sorted, typically about roadmap, designs and architectures. 

The fourth bubble is preparing. It's yet another form of planning, now the kind of planning that is so far from doing that done wrong can be detrimental. Done well, can be really helpful in supporting the team. Work of prepare often produces things that the team would consider outside the bubbles, just to be aware. 


I have now two sets of two-weeks modelled. 

For first we visualized 8 to Do, and completed 4, continue on 4; 4 to Plan, and continue on all yet never starting 1; 1 to Preplan we, and never started it; 11 to Prepare, completed 3, continue on 4 and never started 4. 

For second we visualized 14 to Do, 14 to Plan, 6 to Preplan, and 9 to Prepare. 

What we already learned is that we have people who hope their responsibility and collaboration with team on something they think they'd like to prepare (but too early) will do work outside the bubbles and not be keen on sharing the team agenda. It will be fascinating to see if this helps us, but at least it is better than pretending Jira "epics" and "stories" are useful way of modeling our work.