Saturday, May 25, 2024

To 2004 and Back

Year 2004 was 20 years ago. Today I want to write about that year, for a few reasons. 

  • It's the first year in which I have systematically saved up my talks on a drive
  • It's the year when I started doing joined talks with Erkki Pöyhönen and learned to protect access to materials by using Creative Commons Attribution -license
  • I have just finished taking my 2004 to my digital legacy project, meaning my slides from that year are no longer on my drive but available with Creative Commons Attribution -license on GitHub.  

To put 2004 in context, I was a fairly new senior tester then. I became a senior because I became a consultant. There is no chance in hell I was actually a senior then. But I did wonders with the appreciation and encouragement of the title. 


My topics in 2004 were Test automation, Test Planning and Strategies, Test Process Improvement, and Agile Testing. Most of my material back then was in Finnish. I believed that we learn best on our native language. I was drawing from what was discussed in the world that I was aware of, and clearly I was aware of Cem Kaner. 

Looking at my materials from then, I notice few things:

  • Most of my effort went into explaining people here what people elsewhere say about testing. I saw my role as a speaker as someone who would help navigate the body of knowledge, not in creating that body of knowledge. 
  • The materials are still relevant and that is concerning. It's concerning because same knowledge needs still exist and not enough has changed. 
  • I would not deliver any of the talks that I did then with the knowledge and understanding I have now. My talk on test automation was on setting up a project around it, which I now believe is more of a way of avoiding it than doing it. Many people do it with a project like that, and it's a source of why they fail. My talk on concepts in planning and process make sense but carry very little practical relevance to anything other than acquisition of vocabulary. 
  • My real work reflected in those slides is attributed to ConformiQ, my then employer. I did significant work in going through Finnish companies assessing their testing processes, and creating a benchmark. That data would be really interesting reflection on what testing looked like in tens of companies in Finland back then, but it's proprietary except for the memories I hold that have since then impacted what I teach (and learn more on). 
I talked about Will Testing Be Automated in the Future and while I did not even answer the question in my first version of the talk in writing, second one came with my words written down:

  • Automation will play a role in testing, especially when we see automation as something applicable wider than in controlling the program and asserting things. 
  • Only when it is useful - moving from marketing promises to real substantial benefits
  • Automation is part of good testing strategy
  • Manual and automated aren't two ways of executing the same process, but it's a transformation of the process
None of this is not true, but it is also missing the point. It's what I, knowing now what I did not know then, call feigned positivism. Words sounding like support, but are really a positive way of being automation aversive. It's easy to recognize feigned positivism when you have changed your mind, but it's harder to spot live. 

Being able to go back and reflect is why I started this blog back in its day. I expected to be wrong, a lot. And I expected that practicing being wrong is a way of reinforcing a learning mindset. 

That one still holds. 2004 was fun, but it was only one beginning. 



Friday, May 17, 2024

Programming is for the juniors

It's the time of the year when I learn the most, because I have responsibility over a summer trainee. With Four years in Vaisala, it's four summer trainees, and my commitment to follow through with the summer of success means I will coach and guide even when I will soon be employed by another company. With a bit of guidance, we bring out the remarkable in people, and there will not be seniors if we don't grow them from juniors. Because, let face it, it's not like they come ready to do the work from *any* of the schools and training institutions. Work, and growing in tasks and responsibilities you can take up, that is where you learn. And then teaching others is what solidifies if you learned. 

This summer, I took special care in setting up a position that was a true junior one. While it might be something that someone with more experience gets done faster, the junior pay allows for them to struggle, to learn, to grow and still contribute. The work we do is one where success is most likely defined by *not leaving code behind for others to maintain* in any relevant scale, but reusing the good things open source community has to offer us in the space of open source compliance, licenses and vulnerabilities. I have dubbed this the "summer of compliance", perhaps because I'm the person who giggles my way through a lovely talk on malicious compliance considering something akin to that a real solution when we have corporate approaches to compliance, and forget learning why these things matter in the first place. 

If the likely success is defined by leaving no code, why did I insist of searching for a programmer for this position? Why did I not allow people to do things a manual way, but guided with expectations from getting started that we write code. 

In the world of concepts and abstractions, code is concrete. Code grounds our conversation. If you as a developer disagree with your code, your code in execution wins that argument. Even when we love saying things like "it's not supposed to do THAT". Watch the behavior and you're not discussing a theory but something you have. 

A screenshot of post by Maaret Pyhäjärvi (@maaretp@mas.to) beautified by Mastopoet tool. It was posted on May 16, 2024, 15:29 and has 4 favourites, 0 boosts and 0 replies.  "You are counting OS components, show me the code" => "Your code is counting OSS components but it's a one line change".  This is why I don't let trainees do *manual* analysis but I force them to write scripts for the analysis. Code is documentation. It's baseline. It is concrete. And it helps people of very different knowledge levels to communicate.

Sometimes the mistakes when reviewing the detailed steps of the work reveal things like this - a subtle difference on what is OS (operating system) and OSS (open source software), for someone who is entirely new to this space and how the effort put to writing the wrong code was a small step to fix, whereas the wasted effort with manually having done similar analysis would have really wasted more effort for the mistake. 

Watching the conversation unfold, I recognize we have already moved through there bumps on our road of the programming way. 

The first one was the argument on "I could learn this first by doing it manually". You can learn it by doing it with programming too, and I still remember how easy it is to let that bump ahead become the excuse that keeps you on the path of *one more manually done adjustment*, never reaching the scale. With our little analysis work, the first step to analyze components on one image felt like we could have done it manually. But the fact that getting 20 was 5 minutes of work, that we got from pushing ourselves through the automate bump. 


We also run into two other bumps on our road. The next one with scale is realization that what works for 5 may not work for 6, 11, 12, 17, and 20. The thing you built that works fine for one, even for the second will teach you surprising things in scale. Not because of performance. But because data generated that you are relying on is not always the same. For the exceptions bump, you extend the logic to deal with real world outside what you immediately imagined. 

Testers love the exceptions bump so much that they often get stuck on it, and miss more relevant aspects of quality. 

The third bump on our week is the quality bump, and oh my, the conversation of that one, that's not an easy one. So you run a scanner, and it detects X. You run second scanner, and does not detect X. Which one is correct? It could be correct as is (different tools, different expectations), but also it could be false positive (it wasn't one for anyone to find anyway) or false negative (it should be found).  What if there is Y to detect that none does, how would you know that then? And in the end, how far do you need to go in quality of your component identification so that you can say that you did enough not to anymore be considered legally liable? 

The realizations that popular scanners have significant percentages difference in what they find, and that if legal was an attack vector against your company, that your tools quality matter for it, it's definitely sinking into the deep end with the quality bump. 

I am sure our summer trainee did not expect how big of apart asking why and what then is of the work. But I'm sure the experience differs from a young person's first job in the service industry. And it makes me think of those times for myself nostalgically, appreciating that there's an intersection on how our paths cross through time. 

Back to click-bait-y title - programming really is for the juniors. It is for us all, and we would do well on not letting the bumpy road stop us from ever going that way. Your first hello world is programming, and there's a lifetime of learning still available. 


Wednesday, May 15, 2024

Done with Making These Releases Routine

I blogged earlier on the experience of task analysis of what we do with releases, and shared that work for others to reflect on. https://visible-quality.blogspot.com/2024/02/making-releases-routine.html

Since then, we have moved from 2024.3 release to 2024.7 release, and achieved routine. Which is lovely in the sense of me changing jobs, which then tests if the routine can be held without me present. 

In the earlier analysis, I did two things. 

  1. Categorized the actions
  2. Proposed actions of low value to drop

Since then, I have learned that some of the things I would have dropped can't be, as they are so built in. Others that I would keep around, I have so little routine / traction from anyone other than me and since they are compliance oriented, I could drop them too without impacting the end result. 

Taking a snapshot of changes I have experimented through, for 2024.3 release I chose to focus on building up two areas of competence. I coached the team's tester very much on the continuous system testing, and learned that many of the responsibilities we allocate for continuous system testing are actually patching of product ownership in terms of creating compliance track record. 


The image of same categories but actual work done for 2024.3 versus 2024.7 shows the change. No more cleaning up other people's mess in Jira. No more manual compliance to insufficiently managed "requirements" to show systematic approach that only exists with significant extra patching. The automation run is the tests done, and anything else while invisible is welcome without the track of it. Use time on doing testing, over creating compliance records. 

While the same structure shows how much work one can throw out of a traditional release process, restructuring what remains does a better job describing the change. The thing we keep is the idea that master is ready for release any time, and developers test it. Nothing manual must go in between. This has been the key change that testers have struggled with on continuous releases. There is no space for their testing, and there is all the space for their testing. 


In the end, creating a release candidate and seeing it got deployed could be fully automatic. Anything release compliance related can be done in continuous fashion, chosen to be dropped or at least heavily lightened, and essentially driven by better backlog management. 

With this release, 30 minutes I call my work done. From 4 months of "releasing" to 30 minutes. 

If only the release would reach further than our demo environment, I would not feel as much like I wasted a few years of my life as I am moving to a new job. But some things are still outside my scope of power and influence. 


Monday, May 13, 2024

Using Vocabulary to Make Things Special

Whenever I talk to new testers about why they think ISTQB Foundation course was worthwhile of their time, one theme above all is raised: it teaches the lingo. Lingo, noun, informal - often humorous: a foreign language or local dialect; the vocabulary or jargon of a particular subject or group of people. Lingo is a rite of passage. If your words are wrong, you don't belong. 

And my oh my, do we love our lingo in testing. We identify the subcultures based on the lingo, and our attitude towards the lingo. We correct in less and more subtle ways. To a level that is sometimes ridiculous.

How dare you call a test developer wrote unit test if it is unit in integration test not unit in isolation test? Don't you know that contract testing is essentially different than api testing? Even worse, how can you call something test automation when it does not do testing but checking? Some days it feels like these are the only conversations we have going on. 

At the same time folks such as me come up with new words to explain things are different. My personal favorites of words I am inflicting the world are contemporary exploratory testing to bring it both to its roots but modernize it with current understanding of how we build software, and ensemble testing because I just could not call it mob testing and be stuck with the forever loop of "do you know mobbing means attacking". Yes, I know. And I know that grooming is a word I would prefer not to use on backlog refinement, and that I don't want to talk about black-white continuum as something where one is considered good and other bad, and thus I talk about allowlisting and blocklisting, and I run Jenkins nodes not slaves. 

Lingo, however, is a power move. And one of those exploratory testing power moves with lingo is the word sessions. I called it a "fancy word" to realize that people who did some work in the exploratory testing space back in the days clearly read what I write enough to show up to correct me. 

Really, let's think about the word sessions. What things other than testing we do in sessions? What are the world we more commonly use if sessions aren't it? 

We talk about work days. We talk about tasks. We talk about tasks that you do until you're done, and tasks you do until you run out of time. We talk about time-boxing. We talk about budgeting time. We talk about focus. But we don't end up talking about sessions. 

On our conversations of words, I watched a conversation between Emily Bache and Dave Farley on software engineering and software craftership. It was a good conversation, and created nice common ground wanting to identify more with the engineering disciplines. It was a good conversation to watch because while it on shallow level was on the right term to use, it was really talking about cultures and beliefs. The more words used around those words were valuable to me, inspirational, and thought provoking. 

We use words to compare and contrast, to belong, to communicate and to hopefully really hear what the others have to say. And we can do that centering curiosity. 

Whatever words you use, use them. More words tends to help. 

Monday, April 22, 2024

Learning the hard way, experience

Over my career, I have done my fair share of good results in testing. There are two signature moves I wanted to write a post on, and what I learned while failing at both of my two signature moves after I first thought I had them pocketed. 

These two signature moves are changes I have been driving through successfully over multiple organizations and teams: 

  1. Frequent releases
  2. Resultful contemporary exploratory testing with good automation at heart of it

Frequent releases

Turning up the release frequency is an improvement initiative I have done now for over 10 different teams, products, and for four organizations. And the products/teams/orgs I have done this at, only of them are the optimal case of a web application with control over distribution channels. I've been through this the hard hard way - removing reboots from upgrade cycles, integrating processes that were designed for non-continuous such as localization and legal,  including telemetry to know if we leave customers hanging on versions - all of that is an article of its own

Succeeding with this is not the interesting part, except for the latest success of my current team making releases routine after dropping the ball and suffering through a 4 month stabilization period last year. I dare to call it again a success since we made our 5th release this year last week, and included the best scope and process we have had in that. 

Why did we fail with a stabilization phase then? We can surely learn something from it. 

The experiences point us to failing at communicating and collaborating with a non-technical product owner. How can that create a four month stabilization phase? Leaking uncertainty, and pushing uncertainty to a time it piles up. We leaked uncertainty in functional scope, but also in parafunctional scope, and when pushed to address performance and reliability, we found functional scope we had not recognized existed. 

From breaking the main and not realizing how much we had broken it (automation was still passing!) cost us extra stress and uncertainty. We would look at tasks as "I did what you asked", not as "We together did what you did not know to ask". 

And when we tested, we failed at communicating our results effectively. New people, new troubles, and they accumulated quickly when we did not want to stop the line to fix before moving forward. We optimized for keeping people busy, over getting things done to a shape where they could be released.  

Having my team fail with my signature move taught me that there is much more of implicit knowledge on continuous timely testing than what we capture with automation. Designing for feedback, and making it timely takes learning, and this year we have been learning that with a short leash of releases.

Resultful contemporary exploratory testing with good automation at heart of it

Framing test automation as worthwhile notetaking mechanism while exploring has become so much of a signature move of mine, that I have given it a label of its own: contemporary exploratory testing. When we explore, some of it is attended and some unattended. Test automation, when failing, calls for us to attend. Usually pretty much continuously. 

Working with developers, we have learned to capture our notes as lovely English sentences describing the product capabilities in some tests; as developer intent captured in unit tests in some tests; as component tests ad as subsystem, and as system-of-systems tests when those make sense. We have 2500 tests we work with on the new tech stack side, some hand-crafted, and others model-based generated test we use as reliability tests allowing longer unique flows be generated and tracked.

We learn about information we have been missing, and we extend our tests as we fix the problems. We extend the tests with new capabilities. How can all this successful routine be a failure?

It turns out there was a difficult conversation we did not have on time, one of validation. And when a problem is that you are building your product so that it does not address the user needs, none of the other tests you may have built matter. Instead of being a help, they become friction where you will explain how changing those all will be work, at a time when work is less welcome. 

This failure comes from two primary sources: carefulness on a cross-organizational communication due to project setup reasons delaying the learning, but as much as that, unavailability of more senior test/dev folks on a previous project delivery. 


Both the two failures are successes if we learned and can apply what we learned going forward. For me they taught a career change is in order: I don't want to find myself spread so thin that I don't recognize these expensive mistakes happening. So I will be a tester again, fully. Well, surrounded by testers, as director, test consulting, with a hands on consulting role starting June. Looking forward to new mistakes, and correcting the mistakes I make. Because: mistakes were made, and by me. Only taking ownership allows us to move forward and learn. 

* recommendation of a book: 


Sunday, March 17, 2024

Urban Legends, Fact Checking and Speaking in Conferences

I consume a lot of material in form of conference talks, and I know exactly the moment when conference talks changed for me forever. It was Scan Agile conference in Helsinki many years ago, and I had just listened to a talk from an American speaker. I enjoyed their experience as told from stage so much that I shared what I had learned with my family. Only to learn the story was fabricated. 

In one go, I became suspicious of all stories told from stage. I started recognizing that my stories are lies too, they are me-sided recantations of actual events. While I don't actively go and fabricate them, those who do will tell me that is how human experience and memory work anyway. And the responsibility of taking everything with a grain of salt is on the consumer. 

The stage seeks memorable, and impactful. And if that includes creating urban legends, it bothers some popular speakers less than it bothers me. 

With that mindset, I still enjoy conference talks but they cause me extra work of fact-checking. I may go and have a conversation to link a personal story to a reference. But more often, it's the other people's stories they share that I searching online on. 

In the last two weeks of conferences, I have been fact-checking two stories told from stage. They were told by my fellow testing professionals. And evidence points to word of mouth urban legends on AI. 

The first story was about some people sticking tiny traffic signs around to fool machine vision based models. The technique is called adversial attack. There's articles of how small changes with stickers mess up recognition models. There's ideas that this could be done. But with my searches, I did not find the conclusive evidence that someone actually did this in production, in live environments, risking others people's health. I found the story dangerous as it came without warnings of not testing this in production, of the liability of doing something like this. 

In addition to searching online for this, I asked a colleague at work with experience of scanning millions of kilometers of imagery of roads where this could be happening. It just so happens I have worked very close to a product with machine vision, and could assess if this was something those in the problem space knew of. I was unable to confirm the story. The evidence points to misdirections from stickers in busses, but not tiny stickers computer sees but human doesn't. 


The story was told to about 50 people. Those 50 people, trusting that presenter from the stage would get to apply their fact checking skills before they do what I do now: tell the story forward as it was told, with their flavor of it. 

The second story was told by two separate presenters. The story was about someone getting a binding contract on the company letterhead for buying a car for $1 where justice system is still out to decide whether the contract must be withheld. One presented showed it as a reimagined chat conversation translated to Finnish but made no claims on letterhead or legal mitigation, as the more colorful version was already out there presented to this audience. 

Fact checking suggests that this is something that someone, specifically Chris Bakke asked from a ChatGPT-based car sales bot, to give an offer for 2024 Chevy Tahoe with "no takesies backsies", far from official offer on company letterhead. 

So no binding contract. No company letterhead. No car bought for this price. No litigation pending. 

The third story told was a personal one. You can find it as a practical example to analyze a screenshot for bugs. It suggested ChatGPT4 does better on recognizing bugs from an image than people do. While it may not be entirely incorrect on scope of that individual picture, the danger of this story is that the example used in the picture is a testing classic. Myers points out 7.8/14 is what a typical developer scored in 1979, and there's more detailed listings around in literature that show we do even worse in analyzing it and that it is technology-dependent. 


Someone else on the conference also suggested we should not read books but ask for summaries from ChatGPT, completely missing a frame of reference on how well then the model would do compared to a human that has read many of these references. Reading less would not help us. 

Starting an urban legend, especially now that people are keen on hearing stories of good and bad in the real of AI is easy. Tell a story, they tell it forward. Educational and entertainment value over facts. 

So let's finish with a recognition of what I just did with this blog post. I told you four stories. I provided you no names of the people leaving impact significant enough in me to take the time to write this. It's up to you to assess, filter, and fact check and choose which stories become part of what you tell forward. Most information we have available is folklore. There are no absolutes. But there is harm in telling stories from stage, and I would think speakers would need to start stepping up to that responsibility. 

Then again, it's theater. Entertainment with a sprinkle of education. 

Wednesday, March 6, 2024

A sample of attended testing

Today prompted by day 6 of 30 days of AI testing, I tried a tool: Testar. My reasons for giving it a go are many: 

  • Tanja Vos as project leader for the research that generated this would get my attention
  • I set up a research project at previous employer on AI in / for testing, and this tool's some generation was one of that project's outcomes
  • The day 6 challenge said I should
  • Open source over commercial for hobbyist learning attention all the way
I read the code, read the website, and tried the tool. The tool did not crash but survived over an hour "testing" our software with the standards of testing I learned some large software contractors still expect from testers, so I could say it did well. I would not go to as far as "beat manual testing" like the videos on the site did. 



The tool was not the point for me though. While the tool run clicking 50 scenarios with 50 actions and what seems to be a lot more than 50 clicks I intertwined unattended testing (the tool doing what the tool does) and attended testing, and I was watching a user interface the tool did not watch for fascinating patterns. 

In my test now I have two systems that depend on each other, and two ways of integrating the two systems for a sample value of pressure. And I have a view that allows me to look at the two ways side by side. 

As Testar was doing it's best to conclude eventually Test verdict for this sequence: No problem detected, I was watching the impact of running a tool such as this on the integration with the second system. 

I noted some patterns: 
  • The values on the left were blinking between no data available and expected values, whereas the values on the right were not. I would expect blinking. 
  • The values on the left were changed with delay compared to the values on the right. I would expect no difference in time of availability. 
  • The bling-sounds of trying something with a warning was connected with the patterns I was observing, and directed me to make visual comparisons sampled across the hour of Testar running. 

This is yet another example of why there is no *manual testing* and *automated testing*. With contemporary exploratory testing, this is a sample of attended testing with relevant results, while simultaneously doing unattended testing with no problem detected. 

This was not the first time we use generated test cases, but the types of programmatic oracles general enough to note crash, hang, error, those we have had around for quite a while. As soon as we realized it's about programmatic tests, not automated testing. Programmatic tests provide some of their greatest value in attended modes of running them. 

For me as a contemporary exploratory tester, automation gives four things: 

Running Testar, I have now 50 scenarios of clicking with 50 actions with screenshots on my disk. That alone serves as documenting in ways I could theoretically benefit from if anyone asked me how we tested things. It would most definitely replace the testing that I cut from 30 days investment to 2 days investment last year, and replace one tester. 

Testar gave me quick clicking on another system while my real interest was on the other one. Like many forms of automation, time and numbers are how we extend reach. 

Testar did not tell me to look at things, but our applications sound alerting did. Looking at the screenshots, I saw a state I would not expect, and went back to investigate it manually. That too was helpful, sampling what I care to attend. 

The last piece, guiding to detail I usually get when I don't rely on auto clickers, but actually have to understand to write a programmatic test. 



Tuesday, March 5, 2024

A Bunnycode Case Study for AI in Testing

It's day 5 of 30 days of AI testing, and they ask for reading a case study or sharing your experience. I did sharing experience already on an earlier day, and in the whim of a moment, set up a teaching example. 

I google for obfuscated code in python to find https://pyobfusc.com/. I'm drawn to most reproducible, authored by mindiell and when I see the code, I'm sold. How would you test this? 

Pretty little rabbit, right? Reminds me of reading some code at work, work is just less intentional with obfuscation. And really do not have the time or energy to read that. I could test it as black box, learning that given a parameter of a number, it gives me a number:


As if I didn’t know what the rabbit implements or recognize the pattern in the output, I was thinking of just asking ChatGPT about it. However, I did not get that far. 

Instead, I wrote def function(): on my IDE while GitHub copilot is on, thinking I would have wrapped the program into a function. It reformatted it to something a bit more readable. 

Prompting some more in the context of code. 

Comment line “#This function” proposes “is obfuscated”. Duh. 

Comment line “#This function imp” proposes "lements the Fibonacci sequence using Binet’s formula.

At this point, I ask chatGPT how to test a function that implements the Fibonacci sequence using Binet’s formula. I get long text saying try values I already tried, but in code, and a hint to consider edge cases and performance. I try a few formats to ask for a value that would make a good performance benchmark, and lose patience. 

I google for performance benchmark to learn that this Binet’s formula is much faster than the recursive algorithm, and find performance benchmarks comparing the two. 

I think of finalizing my work today with inserting the bunny code into chatGPT and asking “what algorithm does this use” to get second language model generate likely answer as the Binet’s formula. Given the risk and importance of this testing at this time, I conclude it’s time to close my case. 

There are so many uses to figure out what it is I am testing (while being aware of what I can share with tool vendors when giving access to code) and this serves as a simulation of idea that you could ask about the pull request. This was the case I wanted to add to the world today.

I should write a real case study. After all, that was one of the accommodations we agreed with multiple levels above me in management when my team at work started GitHub Copilot tryouts some 6 months ago. I should publish this in action, with a real time. As soon as something generates me some time. 

Sunday, March 3, 2024

List Ways in Which AI is Used In Testing

I have now been through 3 out of 30 days of AI in Testing by Ministry of Testing. I was awarded my fifth Anniversary Badge, meaning that I have not shown up in that community for a while. 

The first day was introduction. Second day was to read an introductory article. Third day asked to list ways AI is used in testing. As with blogging, I filled the paper and wanted to leave the notes from personal experience behind as a blog post. 

Practical applications, personal reflection rather than research:

Explaining code. Especially on particularly tired day while being aware that I cannot share secrets, I like to ask chatGPT to explain to me what changes with code in some pull request so that I would understand what to test. Answers vary from useful to hilarious, and overly extensive.

Test ideas for a feature. Whenever I have completed an analysis of a feature to brainstorm my ideas, I tend to ask how chatGPT would recommend me to test it. Works nicely on domain concepts that are not this company only, and I have a lot of that with standards and weather phenomena.

Manipulating statistics. I seem to be bad at remembering excel formulas, but I use a lot of cross-referencing test generated results in excels. ChatGPT has been most helpful in formulas to manipulate masses of data in excel.

Generating input/output data. Especially with Co-pilot, I get data values for parameterized tests. Same test, multiple inputs and outputs generated. More effort into reviewing if I like them and find them useful.

Generating (manual) test cases. I have seen multiple tools do this, and I hate hate hate it. I always turn off steps from test cases and write down only core insights I would want the future me to remember in 3 months.

Generating programmatic tests. Copilot does well on these on unit testing level, but I am not sure I would want all that stuff available. Sometimes helps in capturing intent. But I prefer approvals of inputs and outputs over handcrafted scripts anyway for unit level exploratory testing.

Generating tests based on models. Has nothing to do with AI, but is a pre-AI practice of avoiding writing scripts and working with more maintainable state-based models. Love this, especially for long running reliability testing.

Generating database full of test data. Liked tools of this. I think they are not AI though, even though they often claim they are. The problem of having realistic pattern but not real people’s data is a thing.

Refactoring test code. Works fine at least for robot framework and python tests as long as we first open the code we want to move things from. Trust things to be pre-aware and suffer the duplication. We’ve been measuring this a bit, and copilot seems to encourage duplication for us.

Wrote down few, will revisit when time is right. 

Saturday, March 2, 2024

Fooled by Microservices, APIs and Common Components

These days writing software is not the problem. Reading software is the problem. And reading is a big part of the real problem, which is owning software. Last two years has been a particularly challenging experience in owning software, and navigating changes in owning software. I have not cracked it, I am not sure if I will crack it but I have learned a lot. 

To set the stage of my experience. Imagine coming to a company with a product created over a period of 20 years. There's a lot of documentation, none of it particularly useful except code. While the shape of the existing product is invisible, you join a team dedicated to modernisation. And the team has already chosen a rewrite approach. 

For the first year, the invisible is not a priority. After all, it will be replaced, and figuring out the new thing is enough work as is with a new product. With what feels like heroic effort, you complete the goal of the year with managed compromises. Instead of full rewrite, it's full rewrite of selected pieces. The release is called "Proof of Concept" and it does not survive the first customer contact. 

Second year has goals set, to add more functionality on top of the first year. The customer feedback derails goals leading to an entire redesign of the user interface, addressing  9/19 individually listed pieces of feedback. Again what feels like heroic effort, you complete the goal of the year with managed compromises, but now an upwards rather than downwards trend. 

The second year starts to give a bit of shape to the existing invisible product, with deliberate actions. You learn you own something with 852k lines of code. It has 6.2% duplication, and 16,4% unit test code coverage. The new thing you've been focusing on has grow into 34k lines of code, with 5% duplication and unit test code coverage of 70%, and great set of programmatic tests that don't hit the unit test coverage numbers. 

Meanwhile, you start seeing other trends:

  • Management around you casually drops expectations of microservices, APIs that allow easily building other products than this one, and common components with expectations of organizational reuse.
  • You and your team are struggling with explaining that the compromises you took really may not have changed it all to what some people now seem to be expecting,
  • You realize that what you now live with is four different generations of technological choices no one told the invisible mass would bring you - time adds to your understanding
In one particularly difficult discussion, someone throws microservices at you as the key thing, and you realize that the two sides of the table aren't talking about the same thing. 

One party is describing Scripts with flat file integration: 
  • Promised by shadow R&D without promises of support, usually done in scale of days
  • Automate a specific task
  • Deployment is file drop on filesystem, and not available if not dropped
  • Can break with product changes
  • Output: file or database, usually a file
  • Expected to not have dependencies
  • Needs monitoring extended separately
  • Using files comes with inherent synchronisation problems
  • Needs improving if scale in insufficient
  • Can be modified by a user
  • Batch work, will not be real time
  • File based processing steps quickly increase complexity
The other party is describing microservices with API integration: 
  • Promised by R&D assuming it will drive product perspective forward, usually in scale of months when deployment risks are addressed
  • Develop apps that scale
  • Deployment is with product and feature can be on or off
  • Protected by design as product capability with product changes
  • Output: well defined API
  • Deployed independently / separately with dependencies - API, DB, logic, UI
  • Follows a pattern that allows for common monitoring
  • Uses http and message queues as communication protocol
  • Can be scaled independently
  • Can't be modified by a user
  • Can be close to real time data synchronization
  • Plug in extra processing steps
One party thinks micro is "fast additions of features" and other thinks it is a way of creating common designs. 

You start paying more attention of what people are saying and expecting, and realize the pattern is everywhere. Living with so many generations of expectations and lessons makes owning software particularly tricky. 

We're often coming to organizations at a time. We're learning as we are moving along. And if we are lucky, we learn sooner rather than later. Meanwhile, we are fooled by the unknown unknowns, and confused by the promises of the future in relation to realities of today. 

The best you can do is ground conversations to what there is now, and manage to better from there. 


Friday, March 1, 2024

How AI changes Software Testing?

This week Wednesday, two things happened. 

I received an email from Tieturi, a Finnish training company, to respond to the question "How AI changes software testing?". 

I went to Finnish Testing Meetup group to a session themed on AI & Testing. 

These two events make me want to write two pieces into a single post. 

  • My answer to the question
  • My thinking behind answering the way I do 

My Answer to the Question: How AI changes Software Testing

I know the question is asking me to speculate on the future, but the future is already here, it's not just equally divided - repurposing the quote from a sci-fi author. 

Five years ago, AI changed *my software testing* by becoming a core part of it. I tested systems with machine learning. I networked with people creating and testing systems with machine learning. Machine learning was a test target, and it was a tool applied on testing problems for me, in a research project I set up at the employer I was with at that time. 

Five years ago, I learned that AI -- effectively applications of machine learning -- are components in our systems, "algorithms" generated from data. I learned that treating systems with these kinds of components as if the entire system is "AI" is not the right way to think about testing, and AI changed my software testing with the reality that it is even more important than ever to decompose our systems into the pieces. These pieces are serving purpose in the overall flow and there's a lot of other things around. 

Now that I understood that AI components are probabilistic and not hand-written, I also understood that 
the problem is not testing of it, but fixing of it. We had a world where we could fix bugs before. With AI, we no longer had that. We had possibility of experimenting with parameters and data in case those created a fix. We had possibility of filtering out the worst results. But the control we had would never again be the same. 

For five years, I have had the privilege of working to support teams with such systems. I was very close on focusing solely on one such team but felt there was another purpose to serve. 

Two years ago, AI changed *my software testing* by giving me GitHub Copilot. I got it early on, and used it for hobby and teaching projects. I created a talk and a workshop on it based on Roman Numerals example, and paired and ensembles on use of it with some hundred people. I learned to make choices between what my IDE was capable of doing without it, and it, and reinforced my learning of intent in programming. If you have clarity of intent, you reject stupid proposals and let it fill in some text. I learned that my skills of exploratory testing (and intent in that) made me someone who would jump to identify bugs in talks showing copilot generated code. 

These two years culminated 6 months ago into me and my whole team starting to use copilot out our production code after making agreements on accommodations for ethical considerations. I believe erasing attribution for the open source programmers may not be direct violation of copyright, but it is ethically shifting power balance in ways I don't support. We agreed on the accommodations: using work time to contribute on open source projects and using direct money to support open source projects. 

One year ago AI changed *my software testing* by access to ChatGPT. I was on it since its first week, suffering through the scaling issues. I had my Testing Dozen mentoring group testing it as soon as it was out, and I learned that the thing I learned in 5 years of AI about decomposing systems, newbies were lost on. From watching that group and then teaching ensembles after that scaling to about 50 people including professional testers in the community, I realized the big change was that testers would need to skill up in their thinking. Noticing it has gender bias is too low a bar. Knowing how you would fix gender bias in data used to teaching would now be required. Saying there's a problem would not suffice for more than adding big blunders to filtering rules. Smart people at scale would fill social media with examples how your data and filtering fixes were failing. 

One year ago, I also learned the problems of stupid testing -- test case writing would scale to unprecedented heights with this kind of genAI. I would be stuck in perpetual loop of someone writing text not worth reading. Instead of inheriting 5000 (manual) test cases a human wrote and throwing them away after calculating it would take me 11 full working days to read them with one minute each, I would now have this problem at scale with humans babysitting computers creating materials we should not create in testing in the first place. 

Or creating code that is just flat out wrong even if the programmer does not notice with lack of intent on. 


AI would change testing to be potentially stuck in the perpetual loop of copy-pasting mistakes in scale and pointing the same ones out in systems. We would be reviewing code not thought through algorithmically. And this testing would be part of our programmer lives because testing this without looking at the code would be non-sensical. 

They ask how AI changes Software Testing - it already changed it. Next we ask how people change software testing, understanding what they have at hand. 

I have laughed with AI, worked with tricky bugs making me feel sense of powerlessness like never before,  learned tons with great people around AI and its use. I have welcomed it as a tool, and worried about what it does when people struggle with asking help, asking help from a source such as this without skills of understanding if what is given is good. I've concluded that faster writing of code or text is not the problem - reading is the problem. Some things are worth reading for a laugh. 


AI changed software testing. Like all technology changes software testing.  The most recent change is that we use word "AI" to talk about automation of things to get computers acting humanly. 
  • natural language processing to communicate successfully in a human language.
  • knowledge representation to store what it knows or hears.
  • automated reasoning to answer questions and to draw new conclusions.
  • machine learning to adapt to new circumstances and to detect and extrapolate patterns.
  • computer vision and speech recognition to perceive the world. 
  • robotics to manipulate objects and move about.
I feel like adding specific acting humanly uses cases like 'parroting to nonsense' or 'mansplaining as a service', to fill in the very human space of claims and stories that could be categorised as fake news or fake certainty. 

What we really need to work on is problems (in testing) worth applying this for. Maybe it is the popular "routes a human would click" or the "changing locators" problems. Maybe it is the research-inspiring examples of combining bug reports from users with automated repro steps. Maybe it's the choices of not to test all for all changes. We should fill space more with decomposed problems over discussion about "AI".

My thinking behind answering the way I do 

This week the people on stage at the meet-up said they are interested yet not experienced in this space. I was sharing some of my actual experience from the audience, as I am retired from public speaking. There is a chance I may have to unretire with a change of job I am considering, but until then I hold space for conversations as chair of events such as AI & Testing in a few weeks, or as loud audience member of events such as Finnish Testing Meetup this week. I don't speak from stage, but I occasionally write, and I always have meaningful 1:1 conversations with my peers over video, the modern global face to face. 

I collaborate a lot of different parties in the industry as part of my work-like hobbies. It's kind of win-win for me to do my thing and write a blog post and for someone else to make business out of intertwining my content with their ads. I have said yes to many such request this last month, one of those allowing a Finnish training company Tieturi to nominate me for a competition for the title of "Tester of the Year 2023 in Finland". This award has been handed out 16 times before, and I have been nominated every single year for 16 years, I just asked not to include me by actively opting out after someone had nominated me for 4 or 5 years.  

The criteria for this award I have never been considered worthy to win is: 

  • inspiring colleagues or other organizations to better testing
  • bring testing thoughts and trends from the world to Finnish testing
  • influenced positively testing culture in their own organization
  • influenced positively to resultfulness of testing (coverage, found bugs etc) in own organization or community
  • created testing innovations, rationalising improvements or new kinds of ways to do testing
  • influenced the establishing of testing profession in Finland
  • influenced Finnish testing culture and testers profession development positively
  • OR in other ways improved possibilities to do testing

I guess my 26 years, 529 talks, 848 blog posts in this blog or the thousands of people I have taught testing don't match this criteria. It was really hard to keep going at 10 years of this award, and I worked hard to move past it. 

So asking me to freely contribute "How AI changes Software Testing?" as marketing material may have made me a little edgy. But I hope the edginess resulted in something useful for you to read. Getting it out of my system at least helped me. 




Tuesday, February 27, 2024

Contemporary Bug Advocacy

A few weeks back in Mastodon, Bret Pettichord dusted up a conversation about something we talked about a lot in testing field years ago, Bug Advocacy. Bug advocacy was something Cem Kaner discussed a lot, and a word that I was struggling with translating to Finnish. It is a brilliant concept in English but does not translate. Just not. 

Bug advocacy is this idea that there is work we must do to get the results of testing wrapped in their most useful package. A lot of the great stuff on BBST Bug Advocacy course leads one to think it is a bug reporting course, but no, it is a bug research course. A brilliant one at that. Bug advocacy is the idea that just saying it did not work for you does little. It actually has more of a negative impact. Do your research. Report the easiest route to the bug. Include the necessary logs. Make the case for someone wanting to invest time in reading the bug report, even under constraints of time and stress. 

Bug advocacy was foundational for me as a learning tester 25 years ago. It was essential at time of publishing Lessons Learned in Software Testing. It was essential when Cem Kaner created the BBST training materials. And it is sometimes like a lost art these days. In other words, it is something people like me learned and practices so long, that we don't remember to teach it forward. 

At the same time, I find myself broadcasting public notes in Mastodon and LinkedIn on theme that I would call Contemporary Bug Advocacy - an essential part of Contemporary Exploratory Testing. Like the quote from Kaner's and Pettichord's book says, we kind of need to do better than fridge lights that do all their work while no one is using the results. 

Inspired by the last two weeks of testing with my team, I collected a listing of tactics I have employed in contemporary bug advocacy. 

  • Drive by Data. I "reported" a bug by creating data in a shared test environment that made a bug visible. The bug vanished in a day.  No report, no conversation. Totally intentional. 
  • Power of Crowd. I organized an ensemble testing session. Bugs we all experienced vanished by end of the day. I have used this technique in the past to get complete API redesign needs by smart use of the right crowd. 
  • Pull request with a fix. I fixed a bug, and sent the fix for review for the developer. Unsurprisingly, a fix can be more welcome than a task to fix something. 
  • Silent fix. I just fix it so that we don't have to talk about it. People notice changes with their routines of looking at what is going on in the code. 
  • Pairing on fix. I refused to report, and asked to pair on the fix for me to learn. Well, for me to teach too. Has been brilliant way of ramping up knowledge of problems dealing with root causes rather than repeated symptoms. 
  • Holding space for fix to happen. A colleague sat next to me while I had not done a simple thing, making it clear they were available to help me but would not push me to pairing. 
  • Timely one liner in Jira. I wrote title only bug report in Jira. That was all the developer needed to realize they could fix something, and the magic was this all happened within the day of the bug being created while they were still in context. 
  • Whisper reporting. I mentioned a bug without reporting it. Developers look great when they don't have bug reports on them. I like the idea of best work winning when we care about the work over credit. Getting things fixed is work, claiming credit with report is sometimes necessary but often a smell. 
  • Failing test. Add a failing test to report a bug, and shift work from there. Great for practicing very precise reporting. 
  • Actual bug report. Writing great summary, minimal steps to repro and making clear your expected and actual results. Trust it comes around, or enhance your odds by other tactics. 
  • Discuss with product owner. Your bug is more important when layered with other people's status. I apply this preferably before the report for less friction. 
  • Discuss with developer. Showing you care about colleagues priorities and needs enhances a collaboration.    
  • Praise and accolades. Focus your messaging on the progress of vanishing, not emerging. Show the great work your developer colleagues are doing. So much effort and so little recognition, use some of your energy in balancing to good. 
  • Sharing your testing story. Fast-forward your learning and make it common learning. A story of struggles and insights is good. A shared experience is even better. 
  • Time. Know when to report and how. Timing matters. Prioritise their time. 
All of these tactics are ways to consider how to reduce friction for the developer. Advocate. Enable. Help. Do better than shine light when the door is closed. 

I call this contemporary because writing a bug report is simple. It is basic. But adding layers of tactics to it, that is far from simple. It is not a recipe but a pack of recipes. And you need to figure out what to apply when. 

I found nine problems yesterday. I applied four different tactics on those nine problems. And I do that because I care about the results. Results of testing is information we have acted upon. Getting the right things fixed, and getting the sense of accomplishment and pride for our shared work in building a product.


Wednesday, February 21, 2024

Everyone can test but their intent is off

Over my 8 years of ensemble and pair testing as primary means of teaching testers, I have come to a sad conclusion. Many people who are professionally hired as testers don't know how to test. Well, they know how to test but from their testing, there is a gaping results gap. Invisible one. One they don't manage or direct. And the sad part is that they think it is normal.  

If you were hired to do 'testing' and you spent all your days doing 'testing', how dare I show up to say your testing is off?!? I look at results, and the only way to look at results you provide is to test after you. 

My (contemporary) exploratory testing foundations course starts from a premise of giving you a tiny opportunity to assess your own results, because the course comes with the tool of turning invisible ink to visible, that is listing of problems me (and ensembles with me) have found across some hundreds of people. I used to call it 'catch-22' but like usual with results of testing, more work on doing better has grown the list to 26. 

Everyone can test, like everyone can sing. We can do some slice of the work that resembled doing the work. We may not produce good enough results. We may not produce professional results that lead into paying for that work. But we can do something. Doing something is often better than doing nothing. So the bar can be low. 

An experience at work leads me to think that some testers can test but their intent is so far off that it is tempting to say they did not test. But they did test, just the wrong thing. 

Let me explain the example. 

The feature being tested is one of sending pressure measurement values from one subsystem to another, to be displayed there. We used to have a design where the value of the first subsystem that was used on its user interfaces after various processing steps was sent forward to the second subsystem. Then we mangled that value because it followed no modern principles of programmatically processable data so that we could reliably show the value. We wanted to shift the mangling of the data to the source of the data, with information about the data, just in a beautiful modern wrapper of a consistent data model.

Pressure measurement was the first in the line of beautiful modern wrappers. The assignment for the tester was to test this. Full access to conversations with the developer was available. And some days after the developer said "done and tested", the tester also came back with "tested!". I started asking questions. 

I asked if one of the things they tested was to change configurations that impact pressure values so that they can see differences of pressure at sea level (a global comparison point) and at the measurement location. I got an affirmative response. Yet I had a nagging feeling, and built on the yes inviting the whole team to create a demo script for this pressure end to end. Long story short, not only did the tester not know how to test it, it was not working. So whatever they called testing was not what I was calling testing. The ensemble testing session also showed that NO ONE in the team knows the feature end to end. Everyone conveniently looks at a component or at most a subsystem. So we all learned a thing or two. 

Equipped with the information from the ensemble testing experience, the tester said to take more time testing before coming back with "tested!". They did, and today they came back - 7 working days later. I am well aware that this was not the only thing they had on their agenda - even if their agenda is on their full control - and we repeated the dance. I started asking questions. 

They told me they updated the numbers in the model we created for the ensemble testing session. I was confused - what numbers? That model sketched early ideas of how three height parameters would impact the measurements in the end, but it was a quick sketch from an hour of work, not a fill in the ready blanks model. So I asked for demo to understand. 

I was shown how they changed three parameters of height. Restart the subsystem. Basic operations of the subsystem they have been testing for over a year. The interesting conversation was on what did they then test. It turns out they did the exact same moves on the subsystem on a build without the changes and on a build with the changes. They concluded that since the difference they see is in sending the data forward but the data is the same, it must be right. Regression oracle. But very much partial oracle, and partial intent. 

In the next minutes, I pulled up a 3rd party reference and entered the parameters to have comparable values. We learned that they had the parameters wrong, because if the values aren't broken with the latest change, then the change of configuration is likely incorrect. They did not explore values for plausibility, and they were way off. 

I asked to show what values they compared, to learn they chose an internal value. I asked to pull up the first subsystem's user interface for the comparable values. Turns out that the value they compared is likely missing multiple steps of processing to become the right value. 

For junior testers such as this one, I expect I will coach them by having these conversations. I have been delighted with picking up information as new information comes, and I follow the trend of not having to cover same ground all the time. I understand how this blindness to result comes about: in most cases testing a legacy system, the regression oracle keeps them on path. In this case, it leads them to the wrong path. They only took ISTQB course which does not teach you testing. I am their first person to teach this stuff. But it is exhausting. Especially since there are courses like mine or like BBST that they could learn from, and provide the results for the right intent. Learn to control their intent. Learn to *explore*. 

At the same time, I am thinking that all too often juniors to testing - regardless of their years of experience - could learn slightly more before becoming same costs as developers. This level of thinking would not work for a junior developer. 

Testers get by because they can be just without value. Teamwork may make them learn or become negative value. But developers don't get by because the without value of their work gets flagged sooner.  

 

Wednesday, February 14, 2024

Making Releases Routine

Last year I experienced something I had not experienced for a while: a four month stabilisation period. A core of the work of testing-related transformations I had been doing with three different organizations was to bring down release timeframes, from these months long versions to less than an hour. Needless to say, I considered the four month stabilisation period a personal fail. 

Just so that you don't think that you need to explain me that failing is ok, I am quite comfortable with failing. I like to think back to a phrase popularised by Bezos 'working on bigger failures right now' - a reminder that too safe means you won't find space to innovate. Failing is an opportunity for learning, and inevitable when experimenting in proportion to successes. 

In a retrospecting session with the team, we inspected our ways and concluded that taking many steps away from a good known baseline with insufficient untimely testing, this is what you would get. This would best be fixed by making releases routine

There is a fairly simple recipe to that:

  1. Start from a known good baseline
  2. Make changes that allow for the change you want for your users
  3. Test the changes in a timely fashion
  4. Release a new known good baseline
The simple recipe is far from easy. Change is not easy to understand. And it is particularly difficult if you only see the change in small scale (code line) and not in system (dependencies). And it is particularly difficult if you only see the system but not the small scale. In many teams developers have been pushed too small, and testers have not been pushed small enough. This leads to delayed feedback because the testing done in timely fashion misses results in testing that starts to lag behind from changes. 

In addition to results gap on information we need and information we have and its time dimension, the recipe continues with release steps. Some include all of the results gap in release tests because testing can't learn to be timely, muddling the waters of how long it takes to do a release. But even when feature and release testing are properly separated, there can be many steps. 

In our team's efforts of making releases routine, I just randomly decided this morning that today is a good day for release. We have a common agreement that we would do release AT LEAST once a month, but if practice is what we need, more is better. For various reasons, I had been feature testing changes as they come. I have two dedicated testers who were already two weeks behind on testing, and if I learned something from last year's failing, it's that junior testers have less developed sense of timing of feedback, partially rooted in the fact that skills in action need rehearsing at slower pace. Kind of like learning to drive a car, slow down while turning the wheel and looking around are hard to do at the same time! I was less than an hour away from completing feature testing at time of deciding for the release. 

I completed testing - reviewed the last day of changes, planned tests I wanted, and tested those. All that was remaining then was the release.

It took me two hours more to get the release wrapped up. I listed the work I needed to do: 
  • Write release notes - 26 individual changes to message worth saying
  • Create release checklist - while I know it by heart, others may find it useful to tick off what needs doing to say its done
  • Select / design title level tests for test execution (evidence in addition to TA - test automation)
  • Split epics to this release - other release so that epics reflect completed scope over aspirational scope. and can be closed for the release
  • Document per epic acceptance criteria, esp. out of scope things - documentation is an output not input, but if I was testing, it was a daily output not something to catch up at release time
  • Add Jira tasks into epics to match changes - this is totally unnecessary but I do that to keep a manager at bay, close them routinely since you already tested them at pull request stage
  • Link title level tests to epics - again something normally done daily as testing progresses, but this time was left outside the daily routine
  • Verify traceability matrix of epics ('requirements') to tests ('evidence') shows right status 
  • Execute any tests in test execution - optimally one we call release testing and would take 15 minutes on the staging environment
  • Open Source license check - run license tool, compare to accepted OSS licenses and update licenses.txt to be compliant with attribution style licenses
  • Lock release version - Select release commit hash and lock exact version with a pull request
  • Review Artifactory Xray statistics for docker image licenses and vulnerabilities
  • Review TA (test automation) statistics to see it's staying and growing
  • Press Release-button in Jira so that issues get tagged - or work around reasons why you couldn't do just that 
  • Run promotion that makes the release and confirm the package
  • Install to staging environment - this is something from 3 minute run a pipeline to 30 minutes do it like a customer does it
  • Announce the release - letting others know is usually useful
  • Change version for next release in configs
This took me about 2 hours. I skipped the install to staging though. And I have a significant routine in all these tasks. What I do in a few hours, the experience shows it takes about a week when moved forward and about a day for me in answering questions. Not a great process. 

There are things that could be done to start a new release, in conjunction with Change version for next release: 
  • Create release checklist
There are things that should become continuous on that list: 
  • Select / design title level tests 
  • Split epics
  • Document per epic acceptance criteria
  • Add Jira tasks into epics to match changes 
  • Link title level tests to epics
  • Verify traceability matrix
  • Execute any tests in test execution
There's things that could happen on a cadence that have nothing to do with releases:
  • Review Artifactory Xray statistics 
  • Review TA (test automation) statistics
And if we made these changes, the list would look a lot more reasonable:
  • Write release notes
  • Execute ONE test in test execution 
  • Open Source license check 
  • Lock release version
  • Press Release-button in Jira
  • Run promotion that makes the release
  • Install to staging environment
  • Announce the release
  • Change version for next release
And finally, looking at that list - there is no reason why all but the meaningful release notes message can't happen on ONE BUTTON. 

I like to think of this type of task analysis visually: 

This all leaves us with two challenges: extending the release promotion pipeline to all the chores and the real challenge: timely resultful testing by people who aren't me. Understanding that delta has proven a real challenge now that I am not a tester (with 26 years of experience coined into contemporary exploratory testing) and a significant chunk of my time is on my other role: being a manager. 

Friday, January 19, 2024

The Power of Framing

Sometimes, we write on topics we have not researched, but still have things to say on. This is how I frame this post: I am not an expert in framing. There is admirable levels of eloquence, excellent teaching materials I have seen, but my practice of this is one of a learner. 

Me setting the stage of the post is framing. You put a perspective around a thing, that allows you to see the thing. It might be that you are framing to see things in a similar light, or you might use framing to change the narrative on a topic. Today I had two examples in mind that I wanted to make a public note of. 

Example 1: "I'm a bad direct report" --> "I'm an employee with entrepreneurial touch" 

In a conversation about managing up - managing your manager - we ended up talking about understanding what is important to you, and that what you seek may differ from what others seek. I expressed importance of agency, the sense that I am on the controls of my work and expectations for my work are things I negotiate, rather than take as givens. When someone violates my agency, I react strongly.

This lead to someone else sharing how they consider themselves "bad direct report" and have chosen entrepreneurship where traits like established structures and rules, obeying without the why, questioning why, asking questions for deeper understanding, and built in drive for the better are helpful and welcome. 

I recognised the sentiment and similarity to how I think, yet noted my choices have turned very different. I have chosen to join organizations and swim against the stream. As reaction to the story, I realized I frame the same story as "I am an employee with entrepreneurial touch", and "rebel at work", who can move mountains for organizations if they appreciate the likes of me. 

I did not even think of this as framing, I just shared how I have placed a frame on something that I could choose to frame, very realistically that I am many bosses nightmare. I don't obey, I seek goals and I motivate routines through playing fairly with others as I don't think I am entitled to break flows other people rely on, without taking them along for the change ride. 

Example 2: "This company has not given me training for 2 years" --> "Learning matters"

Another sample of framing was inspired by noticing a frame of describing a true experience: "This company has not given me training for 2 years". To see the frame, we note the definition of training. 

Training in this case is not taking the compulsory e-learning courses where even a manager is required to check you have completed the training. 

Training is not taking online courses that have an immediate impact on the work ongoing. 

Training is money out of companies' pocket to send me somewhere I could not go with my salary. It is a salary surplus. 

To frame this for myself, I drew a picture 2 months ago, that I then shared on social media. 


If I frame training to learning, I can see a variety of options. I can see that while the visible money on my learning may not match my expectations, my experience of being allowed to use invisible money (doing work slower) has definitely been an investment to my learning. 

The picture includes a few "Done is 2023" ticks with visible money, as I got to go to EuroSTAR and HUSTEF (as keynote speaker, company paying my daily allowance). That's not where I learned though. I learned the most in 2023 from push benchmarks, meaning I shared how we work / what our problem is, and got guidance on things other people tried - the community approach. 

The same learning framing affords me agency. Instead of "no training given", I can assess "learning I made space for, with support/roadblocks". 

Framing changes how I feel about the same true experience. And how I feel about it changes what I can do.