Tuesday, November 21, 2023

GitHub Copilot Demos and Testing

I watched a talk on using generative AI in development that got me thinking. The talk started with a tiny demo of GitHub Copilot writing a week number function in Javascript. 


This was accepted as is. Later in the talk, the speaker even showed a couple of tests also created with the help of the tool. Yet the talk missed an important point: this code has quite significant bug(s).

In the era of generative AI, I could not resist throwing this at ChatGPT with prompt: 

Doesn't this give me incorrect week numbers for early January and leap years?

While the "Yes, you're correct." does not make me particularly happy, what I learn in the process of asking that this is an incorrect implementation of something called ISO week number, and I never said I wanted ISO week number. So I prompt further: 

What other week numbers are there than ISO week numbers?

I learn there's plenty - which I already knew having tested systems long enough. 

The original demo and code came with the claim of saving 55% of time, and having 46% of code in pull requests already being AI generated. This code lead me down an evening rabbit hole with a continuous beeping from mastodon by sharing this. 

Writing wrong code fast is not a positive trait. Accepting - and leading large audiences to accept this - is making me angry.

 Start with better intent. Learning TDD-like thinking actually helps. 

Writing wrong code fast. Writing useless documentation fast. Automating status updates in Jira based on other tools programmers use. Let's start with making sure that the thing we are automating is worth doing in the first place. Wrong code is't. Documentation needs readers. And management can learn to read pull requests. 

 


Thursday, November 16, 2023

Secret sauce to great testing is to change the managers

I had many intentions going into 2023, but sorting out testing in my product team was not one of them. After all, I had been the designated testing specialist for all of 2022. We had a lovely developer-tester collaboration going on, and had made agreement on not hiring a tester for the team, at all. Little did I know back then. 

The product was not one team, it was three teams. I was well versed on one, aware (and unhappy with) another, and oblivious to third. By end of this year, I see the three teams will succeed together, as they would have failed together not paying attention to the neighbours. 

Looking at the three teams, I started with a setup where two teams had no testers, and one team had two testers. I was far from happy with the team with two testers. They created a lot of documentation, and read a lot of documentation, still not knowing the right things. They avoided collaboration. They built test automation by first writing detailed scenarios in jira tickets, and then creating a roadmap of those tickets for an entire year. You may not be surprised that I had them delete the entire roadmap after completing the first ticket from that queue. 

So I made some emergent changes - after all, now I was the manager with the power. 

I found a brilliant system expert in the team I had been unaware of, and repositioned them to the designated testing position I had held the previous year with assumption of complete career retraining. Best choice ever. The magic of seeing problems emerged without me having to do all of that alone. 

I tried asking for better testing with the two testers I had, and watching the behaviors unfold I realized I had a manager problem. I removed two managers from between me and the tester that had potential, and have been delightedly observing how they know excel in what I call contemporary exploratory testing. They explore, they read (developer owned) automation, add to shared scope of automation what they learned should be in by exploring, and have conversations with the developers instead of writing them messages in Jira. 

I brought in two people from a testing consultancy I have high respect for, Altom. I know they know what I know of good testing, and I can't say that of most test consultancies. First consultant I brought in helped me by refactoring the useless test automation created by those scenarios in jira tickets instead of good information targeting thinking. We created ways of remotely controlling the end to end system, and grew that from one team to two teams. We introduced model based testing for reliability purposes, and really hit something that enabled success through a lot of pain this year. A month ago, I brought in a second person from Altom, and I am already delighted to see them stepping up to cross team system test responsibilities clarification while hands-on testing. 

So in total, I started off with idea of two traditional testers I'd rather remove, and ended up with four contemporary exploratory testers who test well, automate well on many scopes and levels, and collaborate well. In exchange, I let go a developer manager with bad ideas of testing (forcing idea of *manual testing with test cases* I had hard time getting away from) and a test lead following the advice others gave over advice I gave. 

We get so much more information to react on, in so much more actionable way this way. I may be biased as I dislike the test case based approaches to the core, but I am very proud of the unique perspectives these four bring in. 

Looks like our secret sauce to great testing was to change the managers. Your recipe may vary. 

¨

Across two teams, I facilitated a tester hat workshop. Tester is something we all are. But specialist testers often seek for the gaps the others leave behind. Wanting to think you are something does not yet mean you are skilled as that. Testing is many hats, and being a product and domain expert capturing that knowledge in executable format is something worth investing in to survive with products that stay around longer than we do. 



Wednesday, November 15, 2023

Planning in Bubbles

We have all experienced the "process" idea to work we are doing. Someone identifies work and gives it label, and uses some words to describe what this work is. That someone introduces the work they had in mind for the rest of us in software development team, and then we discuss what the work entails. 

Asking for something small and clear, you probably get clearer responses. But asking for something small and clear, you are also asked for vision, and what is around those small and clear things that we should be aware of. 

Some teams take in tasks, where work that wasn't asked for comes in as next task. Others carry more of a responsibility by taking in problems, discovering the problem and solution space, and carry somewhat more ownership. 

For a long time, I have been working with managers who like to bring tasks not problems.  And I have grown tired of being a tester (or developer) in a team where someone thinks they get to chew things for me. 

A year ago - time before managering - I used John Cutler's Mandate Levels model as ways of describing the two teams I was interfacing then, and how essentially they had different mandates. 

In a year of managering, team B now works on mandates A-C, and team A now works only on A-E, no matter how much I would like us to move all the way to F. 

With the larger mandate level and success of hitting good forward taking steps in measuring cost per pull request (and qualitatively assessing direction of steps), team A has not needed or wanted Jira. We gave up on it 1,5 years ago, cut cost of change to a fifth, and did fine until we did not. Three months ago I had no other choice but to confess to myself that with a new role and somewhat less hands-on testing on my days, we had accrued quality debt - unfinished work - in scales we needed Jira to manage it. Managing about half of it in Jira resulted in 90 tickets, and we still continued the encouragement of not needing a ticket to do a right thing. 

Three months later, we are a few items short of again having a baseline to work on. 

With the troubles accrued, I have theories of what causes that, and I call that theory the modeling gap. The request of a problem coming in leaves so much open for interpretation, that without extra care, we leave tails of unfinished work. So I need a way of limiting Work in Progress, and a way of changing the terms of modeling to language of majority (the team) for language of minority (product owner). 

I am now trying out an approach I call The Team Bubble Process. It is a way of visualising work ongoing with my large mandate team where instead of intaking tasks, we describe (and prioritise) tasks we have as a team for a planning period. We don't need the usual step by step process, but we need to show what work is in doing, and what is in preparing to do. 

The first bubble is doing - coding, testing, documenting, something that will see the customer and change what the product does for the customer. All other work is in support of this work. 

The second bubble is planning, but we like to do planning by spiking. Plan a little, try a little. Plan a little, try more. It's very concrete planning. Not the project plans, but more like something you do when you pair or ensemble. 

The third bubble is preplanning, and this is planning that feeds doing the right things. We are heavily tilted towards planning too much too early, which is why we want to call this preplanning. It usually needs revisiting when we are really available to start thinking about it. It's usually the work where people in support of the team want to collaborate on getting their thoughts sorted, typically about roadmap, designs and architectures. 

The fourth bubble is preparing. It's yet another form of planning, now the kind of planning that is so far from doing that done wrong can be detrimental. Done well, can be really helpful in supporting the team. Work of prepare often produces things that the team would consider outside the bubbles, just to be aware. 


I have now two sets of two-weeks modelled. 

For first we visualized 8 to Do, and completed 4, continue on 4; 4 to Plan, and continue on all yet never starting 1; 1 to Preplan we, and never started it; 11 to Prepare, completed 3, continue on 4 and never started 4. 

For second we visualized 14 to Do, 14 to Plan, 6 to Preplan, and 9 to Prepare. 

What we already learned is that we have people who hope their responsibility and collaboration with team on something they think they'd like to prepare (but too early) will do work outside the bubbles and not be keen on sharing the team agenda. It will be fascinating to see if this helps us, but at least it is better than pretending Jira "epics" and "stories" are useful way of modeling our work. 

Wednesday, October 25, 2023

Anyone can test but...

Some years ago, I was pairing with a developer. We often would do work on the code that needed creating, working on the developer home ground. So I invited us to try something different - work that was home ground to me. 

There was a new feature the other developers had just called complete, that I had not had time to look at before. No better frame of cross-pollinating ideas, or so I thought.

Just minutes into our pairing session of exploring the feature, I asked out loud something about sizes of things we were looking at. And before I knew, the session went from learning what the feature might miss for the user to the developer showing me bunch of tools to measure pixels and centimeters, after all, I had sort of asked about sizes of things. 

By end of pairing session, we did not do any of the work I would have done for the new feature. I was steamrolled and confused, even deflated. It took weeks or months and a conference talk to co-deliver on pairing before I got to express that we *never* paired on my work. 

I am recounting this memory today because I had a conversation with a new tester in a team with one of testerkind, and many of developerkind. The same feelings of being the odd one out, being told (and in pairing directed) to do the work the developer way resonated. And the end result was an observation on how hard it it to figure out if you are doing good work as tester when everyone thinks they know testing, but you can clearly see the results gap showing their behaviors and results don't match their idea of knowing testing so well.


I spoke about being the odd one out in TestBash in 2015. I have grown up as tester with testing teams, learning from my peers, and only when I became the only one with 18 developers as my colleagues, I started to realize that I had team members and lovely colleagues, but not a single person who could direct my work. I could talk to my team members to find out what I did not need to do, but never what I need to do. My work turned to finding some of what others may have missed. I think I would have struggled without having had those past colleagues who had grown me into what I had become. And I had become someone who reported (and got fixed) 2261 bugs over 2,5 years before I stopped reporting bugs and started pairing on fixes to shift the lessons to where they matter the most. 


For a better future, I hope two things: 
  1. Listen and make space more. Look for that true pairing where both contribute, and sit with the silent moments. Resist the temptation of showing off. You never know what you would learn.
  2. Appreciate that testing is not one thing, and it is very likely that the testing you do as developer is not the same as what someone could add on what you did. Be a better colleague and discover the results gap, resisting the refining the newer person to a different center of focus. 
I searched my peers and support in random people of the internet. I blogged and I paired like no other time when I was the odd one out. And random people of the internet showed up for me. They made me better, and they held me on growth path. 

If I can be your random people of the internet to hold you on growth path, let me know. I'm ready to pay back what people's kindness has given me. 

Anyone can test. Anyone can sing. But when you are paid to test as the core of your work week, coverage and results can be different to what they would be if you needed to squeeze it in. 




Wednesday, October 11, 2023

How many test cases do we still have to execute?

In a one on one conversation to understand where we are with our release and testing efforts, a manager asked me the infamous question that always makes me raise my eyebrows:

How many test cases do we still have to execute?

This question was irrelevant 20 years ago when it send me fuming and explaining, and it is still irrelevant today. But today it made me think about how I want to change the frame of the conversation, and how I manage to now almost routinely do it.

I have two framings: 

  1. The 'transform for intent while hearing the question' framing
  2. The 'teach better ways of asking so that you get the info when not talking to me' framing
It would be lovely if I was big enough person to always go with the first framing. I know from years of having answered this question that the real question is one about schedule and how testing supports it.  What they really ask me is twofold: a) How much work do we need to do to have testing done? b) Can we scale it to make it faster? This time the answer in the moment I gave was in the first framing. I explained that I would think since September 1st, we've made it to about 40% of what we need in time and effort. I illustrated actions we take as a team to ensure fixing which is the real problem supports testing, and how the critical knowledge of the domain specialist tester is made more available by team members pitching in to other work domain specialist could opt in. I reminded that my estimate of having to find and log 150 bugs based on a day of early sampling and years of profiling projects and that we are at 72 reported now, which indicates about half way through. 

I recognised the question was really not about test cases. We sort of don't have them for the kind of testing the manager asked on. Or if I want to claim we have them for compliance purposes, they are high level ones and there are 21 of them, and only one of them is "passed". 

The second framing is one that I choose occasionally, and today I chose it in hindsight. If I had my statistics in place and memory, I could have also said something akin to this:

We do two kinds of testing. Programmatic tests do probably 10% of our testing work and we have 1699 of them. They are executed 20...50 times per working day, and are green (passing) on merge. 


These 1699 test cases miss out things the 21 high level test cases allow us to identify. And with those 21 high level test cases, we are about 40% into the work that is needed. 

There are real risks to quality we have accrued in recent times, particularly on controlling our scope (even if they say scope does not creep but understanding grows, we have had a lot coming in as must fix that are improvements and grow scope for entire team), reliability and performance. Assessing and managing these risks is the work that remains. 

Other people may have other ways of answering this question, have you paid attention what are your framings? 



Tuesday, August 1, 2023

A NoJira Health Check

One of my favorite transformation moves these days is around the habits we have gathered around Jira, and I call the move I pull NoJira experiment. We are in one, and have been for a while. What is fascinating though is that there is a lifecycle of it that I am observing, live, and wanted to report some observations on. 

October 5th, 2022 we had a conversation on NoJira experiment. We had scheduled a training on Jira for the whole team a few days later, that with the team deciding on this one then promptly cancelled. The observation leading to the experiment was that I, then this team's tester, knew well what we worked on and what was about to be deployed with merging of pull requests, but the Jira tasks were never up to date. I found myself cleaning them up to reflect status and the number of clicks to move tasks through steps was driving me out of my mind. 

Negotiating a change this major, and against the stream, I booked a conversation with product ownership, manager of the team and director responsible for the product area. I listened to concerns and wrote down what the group thought. 

  • Uncontrolled, uncoordinated work
  • Slower progress, confusion with priorities and requirements
  • Business case value get forgotten

We have not been using Jira since. Not quite a year for this project, but approaching. 

First things got better. Now they are kind of messy and not better. So what changed?

First of all, I stopped being the teams' tester and became the team's manager, allocating another tester. And I learned I have ideas of "common sense" that is not so common without having good conversations to change the well-trained ideas of "just assign me my ticket to work on". 

The first idea that I did not know how to emphasize is that communication happens in 1:1 conversations. I have a learned habit of calling people to demo a problem and stretching to fix within the call. For quality topics, I held together a network of addressing these things, and writing a small note on "errors on console" was mostly enough for people to know who would fix it without assigning it further. It's not that I wrote bug reports on a confluence page instead of Jira. I did not just report the bugs, I collaborated on getting them fixed by knowing who to talk to and figuring out how we could talk less in big coordinating meetings. 

The second idea that I did not know to emphasize enough is that we work with zero bugs policy and we stop the line when we collect a few bugs. So we finish the change including the unfinished work (my preferred term to bugs) and we rely on fix and forget - with test automation being improved while fixing identified problems. 

The third idea I missed in elaborating is that information prioritisation is as important as discovery. If you find 10 things to fix, even with a zero bug policy, you have a gazillion heuristics of what conversations to have first over throwing a pile of information suffocating your poor developer colleagues. It matters what you get fixed and how much waste that creates. 

The fourth idea I missed is that product owners expectations of scope need to be anchored. If we don't have a baseline of what was in scope and what not, we are up for disappointments on how little we got done compared to what wishes may be. We cannot neglect making the abstract backlog item on product owners external communications list into a more concretely scoped listing. 

The fifth idea to conclude my listing in reflection is that you have to understand PULL. Pull is a principle, and if it is a principle you never really worked on, it is kind of hard. Tester can pull fixes towards a better quality. Before we pull new work, we must finish current work. I underestimated the amount of built habit in thinking pull means taking tasks someone else built for your imagined silo, over together figuring out how we move the work effectively through our system. 

For first six months, we were doing ok on these ideas. But it is clear that doubling the size of the team without good and solid approach of rebuilding or even communicating the culture the past practice and success of it is built on does not provide the best of results. And I might be more comfortable applying my "common sense" when the results don't look right to me in the testing space than those coming after me. 

The original concerns are still not true - the work is coordinated, and progresses. It is just that nothing is done and now we have some bugs to fix that we buried in confluence page we started to use like Jira - as assigning / conversing / reprosteps listing, over sticking with the ideas of truly working together, not just side by side. 




     

    Saturday, July 29, 2023

    Public Speaking Closure

    Very long time ago, like three decades ago, I applied for a job I wanted in the student union. Most of the work in that space is volunteer-based and not paid, but there are few key positions that hold the organizations together, and the position I aspired to have was one of those. 

    As part of the process of applying, I had to tell about myself on a large lecture hall, with some tens of people in the audience. I felt physically sick, remember nothing of what come out of my mouth, had my legs so shaky I had hard time standing and probably managed to just about say my name from that stage. It became obvious if I had not given it credit before: my fear of public speaking was going to be a blocker on things I might want to do. 

    I did not get the job, nor do I know whether it was how awful my public speaking was or if it was my absolute lack of ability to tell jokes intentionally or some of many other good reasons, but I can still remember how I felt on that stage on that day. I can also remember the spite that drove me on taking actions since then and over time gave sense of purpose for many actions to be educated and practice the skills related to public speaking. 

    Somewhere along the line, I got over the fear and had done talks and trainings in the hundreds. Then a EuroSTAR program chair of the year decided to reflect on why he chose no women to keynote quoting "No women of merit", and I found spite-driven development to become a woman of merit to keynote. I did my second EuroSTAR keynote this June. 


    Over the years of speaking, I learned that the motivations or personal benefits to a speaker getting on those stages are as diverse as the speakers. I was first motivated by a personal development need, getting over a fear that would impact my future. Then I was motivated by learning from people who would come and discuss topics I was speaking about, as I still suffer from a particular brand of social anxiety on small talk and opening conversations with strangers. But I collected status, I lost a lot of that value with people thinking they needed something really special to start a conversation with me. I travelled the world experiencing panic attacks and felt sometimes the loneliest in big crowds and conferences, with all those extroverts without my limitations. In recent years, I find I have been speaking because I know my lessons from doing this - not consulting on this - are relevant and speaking gives me a reflection mechanism of how things are changing as I learn. However, it has been a while since getting on that stage has felt like a worthwhile investment. 

    Recently, I have been speaking on habit and policy. I don't submit talks on call for proposals, and I have used that policy to illustrate that not having women speakers is conference organizers choice as many people like myself will say yes to an invitation. The wasteful practice of preparing talks when we really should be collaborating is something I feel strongly on, still today. The money from speaking from stages isn't relevant, the money from trainings is. In last three years with Vaisala, I have had all the teams in the whole company to consult at will and availability, and I have not even wanted to make the time for all of the other organizations even though I still have a side business permission. I just love the work I have there where I have effectively already had four different positions within one due to the flexibility they built for me. Being away to travel to a conference feels a lot more like a stretch and significant investment that is at odds with other things I want to do.

    The final straw to change my policy I got from EuroSTAR feedback. In a group of my peers giving anonymous feedback, someone chose to give me feedback beyond the talk I delivered. I am ok with feedback that my talk today was shit. But the feedback did not stop there. It also made a point that all of my talks are shit. And that the feedback giver could not understand why people would let me on a stage. That has hateful and we call people like this trolls when they hide behind anonymity. However, this troll is a colleague, a fellow test conference goer. 

    Reflecting my boundaries and my aspirations, I decided: I retire from public speaking, delivering the talks I have already committed to but changing my default to invitations to No. I have done 9 No responses since the resolution, and I expect to be asked now less since I have announced unavailability. 

    It frees time for other things. And it tells you all that would have wanted to learn from me that you need to reign in the trolls sitting in you and amongst you. I sit in my power of choice, and quit with 527 sessions, adding only paid trainings on my list of delivered things for now. 

    I'm very lucky and privileged to be able to choose this as speaking has never been my work. It was always something I did because I felt there are people who would want to learn from what I had to offer from a stage. Now I have more time for 1:1 conversations where we both learn. 

    I will be doing more collaborative learning and benchmarking. More writing. Coding an app and maybe starting a business on it. Writing whenever I feel like it to give access to my experiences. Hanging out with people who are not trolls to me, working to have less trolls by removing structures that enable them. Swimming and dancing. Something Selenium related in the community. The list is long, and it's not my loss to not get on those stages - it's a loss for some people in the audiences. I am already a woman of merit, and there's plenty more of us to fill the keynote stages for me to be proud of.