Friday, December 30, 2022

Baselining 2022

I had a busy year at work. While working mostly with one product (two teams), I was also figuring out how to work with other teams without either taking too much to become a bottleneck or to take so little I would be of no use. 

I got to experience by biggest test environment to this date in one of the projects - a weather radar. 

I got to work with a team that was replacing and renewing, plus willing and able to move from test automation (where tests are still isolated and entered around a tester) to programmatic tests where tests are whole team asset indistinguishable from other code. We ended up this year finalising the first customer version and to add 275 integrated tests and 991 isolated tests to best support our work - and get them to run green in blocking pipelines. The release process throughput times went from 7 hours from last commit to release to 19 minutes from last commit to release, and the fixing stage went from 27 days to 4 days. 

I became the owner of testing process for a lot of projects, and equally frustrated on the idea of so many owners of slices that coordination work is more than the value work. 

I volunteered with Tivia (ICT in Finland) as a board member for the whole year, and joined Selenium Open Source Project Leadership Committee early autumn. Formal community roles are a stretch, definitely. 

I got my fair share of positive reinforcement on doing good things for individuals salaries and career progression, learning testing and organising software development in smart ways some might consider modern agile. 

I spoke at conferences and delivered trainings, total 39 sessions out of which 5 were keynotes. I added a new country on my list of appearances, totalling now at 28 countries I have done talks in. I showed up at 7 podcasts as guest. 

I thought I did not write much into my blog, and yet this post is 42nd one of this year on this blog, and I have one #TalksTurnedArticles on and one article in IT Insider. My blog has now 840 222 hits, which is 56 655 more than a year ago. 

I celebrated my Silver Jubilee (25 years of ICT career) and started out with a group mentoring experiment of #TestingDozen. 

I spent monthly time reflecting and benchmarking with Joep Schuurkes, did regular on-demand reflections with Irja Strauss, ensemble programmed tests regularly with Alex Schladebeck and Elizabeth Zagroba and met so many serendipitous acquaintances from social media that I can't even count it. 

I said goodbye to twitter, started public note taking at mastodon and irregularly regular posting on LinkedIn. I am there (here) to learn together. 

Wednesday, December 28, 2022

A No Jira Experiment

If there is something I look back to from this year, it is doing changes that are impossible. I've been going against the dominant structures, making small changes and enabling a team where continuous change, always for the better is taking root. 

Saying it has taken root would be overpromising. Because the journey is only in the beginning. But we have done something good. We have moved to more frequent releases. We have established programmatic tests (over test automation), and we have a nice feedback cycle that captures insights of things we miss into those tests. The journey has not been easy.

When I wanted to drop scrum as the dominant agile framework of how management around us wanted things planned early this year, the conversations took some weeks up until a point of inviting trust. I made sure we were worthy of that trust, collecting metrics and helping the team hit targets. I spent time making sure the progress was visible, and that there was progress. 

Yet, I felt constrained. 

In early October this year, I scheduled a meeting to drive through yet another change I knew was right for the team. I made notes of what was said.

This would result in:

  • uncontrolled, uncoordinated work
  • slower progress, confusion with priorities and requirements
  • business case and business value forgotten 

Again inviting that trust, I got to do something unusual for the context at hand: I got to stop using Jira for planning and tracking of work. The first temporary yes was four weeks, and those were four of our clearest, most value delivering weeks. What started off as temporary, turned to *for now*. 

Calling it No Jira Experiment is a bit of an overpromise. Our product owner uses Jira on epic level tickets and creates this light illusion of visibility to our work in that. Unlike before, now with just few things he is moving around, the statuses are correctly represented. The epics have a documented acceptance criteria by the time they are accepted. Documentation is the output, not the input. 

While there is no tickets of details, the high level is better.

At the same time, we have just completed our most complicated everyone involved feature. We've created light lists of unfinished work, and are just about to delete all the evidence of the 50+ things we needed to list. Because our future is better when we see the documentation we wanted to leave behind, not the documentation that presents what emerged while the work was being discovered. The commit messages were more meaningful representing the work now done, not the work planned some weeks before. 

It was not uncontrolled or uncoordinated. It was faster, with clarity of priorities and requirements. And we did not forget business case or business value. 

It is far from perfect, or even good enough for longer term, we have a lot to work on. But it is so much better than following a ticket-centric process, faking that we know the shape of the work forcing an ineffective shape because that seems to be expected and asked for. 

Tuesday, December 6, 2022

There is such a thing as testing that is not exploratory

The team had been on a core practice of clarifying with tests for a while, and they invited an outsider to join their usual meeting routine.

Looking around, there were 8 people in a call. The one in charge of inviting the meeting shared their screen, for what was about to be their routine of test design sessions. He copied the user story they had been assigned to work on into the Jira ticket he had open, and called the group for ideas of tests. 

People started mentioning things that could be tried, and the one in charge wrote things down as the rest of the group watched. For a high level idea, he called for the detailed steps. Every idea that came about was written down and detailed. 

After the meeting, the group would split the work to automate it all. 

And two weeks later, they had all these tests passing, and a feature that just did not work.

The magic glue they miss is what I call exploratory testing. The sense of caring for results in testing by focusing on learning and recognising that the time most people create test cases like above, it is the time when they know the least.

You can add exploratory testing on top of this.

You can lighten up the details you want to write to start with to leave more room for exploratory testing, where output of your testing is the documentation. 

You can seek to optimize for learning, being aware of the time used.

The team that follows the planning pattern did not do exploratory testing. You could argue the next team using their results did exploratory testing through trying to use it, to tell them it is failing. 

Working with teams like this is real. It is embarrassing but it is real. And we don't change it by playing with words, but by making the results we expect clear and permissive to excellence. 

This story, unfortunately, was brought to you by the "tell no stories that did not happen" and "only stories from last two years allowed" rules. This is real. This is what people still make of testing, and some folks meekly follow the agreed practice.  

Monday, December 5, 2022

Jokes and Poetry from a Stochastic Parrot

Tell a joke about a programmer and a tester attempting to enter Heaven and having to confess their sins.

Two programmers were on their way to Heaven and they came to a fork in the road. One path led to Heaven, the other to Hell. The programmer who had been a tester in life said, "I'm not taking any chances. I'm going to test both paths to see which one is safe." The programmer who had been a developer in life said, "You fool! We don't have time for that. We need to take one path and trust that it leads to Heaven." The tester replied, "Trust, but verify."


A programmer and a tester were standing at the Gates of Heaven, waiting to confess their sins. The programmer went first and said, "I wrote code that I knew was buggy and released it anyway." The tester said, "I found the bugs in that code and didn't report them." The Gates of Heaven opened, and the programmer was welcomed inside. The tester was shocked and asked, "Why was he allowed in and not me?" The gatekeeper replied, "We already have plenty of bugs in Heaven. What we need are more testers."


A programmer and a tester were arguing about whether it was better to be a programmer or a tester. The programmer said, "Programmers are the ones who create the software that makes the world go round. Without us, there would be no progress or innovation." The tester replied, "That may be true, but without testers, all of your buggy code would never be caught and fixed. Testers are the ones who ensure that the software works properly and is of high quality." The programmer thought for a moment and then said, "I suppose you have a point. I guess it's a good thing that programmers and testers need each other to make the world a better place."

Make it more funny.

A programmer and a tester were on their way to Heaven, and they came across a sign that said "Heaven: 10,000 miles." The programmer looked at the sign and said, "I don't believe it. They've made a mistake. Heaven can't be that far away." The tester said, "Of course it is. They've measured the distance with a ruler, and it's accurate to within plus or minus an inch." The programmer rolled his eyes and said, "That's why I always use double-precision floating-point numbers. They're much more precise." The tester replied, "Yes, but they can't measure the distance to Heaven. Only faith can do that."

Ok, just one more.  

A programmer and a tester were standing in line at the Gates of Heaven, waiting to confess their sins. The programmer went first and said, "I wrote code that was full of bugs and never tested it." The tester said, "I tested the code and found the bugs, but I never reported them." The Gates of Heaven opened, and the programmer was allowed inside. The tester was shocked and asked, "Why was he allowed in and not me?" The gatekeeper replied, "We have plenty of bugs in Heaven. What we need are more testers to help us find and fix them."

You already told that, I want a new one.

I'm sorry, I don't have access to new information at this time. I am a large language model trained by OpenAI, and my knowledge is limited to what I was trained on. I don't have the ability to browse the internet or access new information, so I can only provide responses based on what I already know. Is there something else I can help you with?

One more joke about testers and programmers, please. 

A programmer and a tester were at a restaurant, and the waiter brought them a menu. The programmer looked at the menu and said, "I don't see any bugs on this menu. This must be a high-quality restaurant." The tester replied, "That's what you think. I've already found three bugs on the menu: the prices are incorrect, the descriptions are misleading, and the options are limited. I'm not sure I want to eat here." The programmer laughed and said, "I guess that's why we need both programmers and testers. Without us, the world would be full of bugs and errors."

 Give me a poem about exploratory testing.  

Thank you, OpenAI ChatGPT.  The tools we use are changing the world in ways we don't yet understand. Whose jokes and poems are these, or are these indeed "AI originals"? 

On more serious note, you should see what it does with advent of code day 5 on day 5. A separate leaderboard for AI and humans does not quite seem like the solution. 

Friday, November 18, 2022

1-2-3 model to test coverage

This afternoon I jumped on a call with a colleague from the community at large. This one had sent me a LinkedIn message asking to talk about test coverage, and our previous correspondence was limited. And like I sometimes do, I said yes to a discussion. After the call, I am grateful. For realizing there is a 1-2-3 model of how I explain test coverage, but also for the conversation channel that helps me steer to understanding, starting from where ever whoever is. 

The 1-2-3 model suggests there is one true measure of test coverage. Since that is unattainable, we have two we commonly use as starting point. And since the two are so bad, we need to remember three more to be able to explain further to people who may not understand the dimensions of testing. 

The One

There is really one true measure of coverage, and it is that of risk/results coverage. Imagine a list of all relevant and currently true information about the product that we should have a conversation on listed on a paper - that is what you are seeking to cover. The trouble is, the paper when given to you is empty. There is no good way of creating a listing of all the relevant risks and results. But we should be having conversation on this coverage, here is how. 

If you are lucky and work in a team where developers truly test and care for quality, the level of coverage in this perspective is around the middle line in the illustration below. That is a level of quality information produced by a Good Team (tm). The measure determining if we indeed are with a Good Team (tm) is sending someone Really Good at testing after them. That Really Good could be a tester, but I find that most testers find themselves out of jobs with good teams - the challenge level is that much higher. Or that Really Good could be all your users combined over time, with an unfortunate delay in feedback and higher risk of the feedback being lost in translation. 

I call the difference between the output for a Good Team (tm) and the quality where our stakeholders are really happy, even delighted the primary Results Gap. There are plenty of organizations who are not seeking to do anything with this results gap themselves but leave it to their users. That is possible, since the nature of the problems people find within the primary results gap is a surprise. 

I recognise if I am working with a team in this space by being surprised with problems. Sometimes I even exclaim: "this bug is so interesting that no one could have created this on purpose!". Consider yourself lucky if you get to work with a team like this that remains this way over time. After all, location on this map is dynamic with regards to consistently doing a good work across different kinds of changes. 

There is a secondary Results Gap too. Sometimes the level to which teams of developers get to is Less than Good Team's Output. We usually see this level with organizations where managers hire testers to do testing, even when they place the tester in the same team. Testing is too important to be left just for testers, and should be shared variably between different team members. Sometimes working as tester in these teams feels like your job is to point out that there are pizza boxes in the middle of the living room floor and remind that we should pick them up. Personally when I recognise the secondary results gap, I find the best solution is to take away the tester, reorganize quality responsibilities on the remaining developers. The job of a tester in a team like this is move the team to the primary results gap, not deal with the pizza boxes except for temporarily as protection of the reputation of the organization. 

A long explanation on the one true measure of coverage - risks/results. Everything else is an approximation subject to this. It helps to understand if we are operating with a team on the secondary results gap or with a team on the primary results gap, and the lower we start, the less likely we are ever to get to address all of the gap. 

The Two

The two measures of coverage we commonly use and thus everyone needs to understand are code coverage and requirements/spec coverage. These are both test coverage, but very different by their nature. 

Code coverage can only give us information of what is in the code and whether the tests we have touch it. If we have functionality we promised to implement, that users expect to be included but we are missing out on, that perspective will not emerge with code coverage. Code coverage focuses on the chances of seeing what is there. 

Cem Kaner has an older article of 101 different criteria in the space of code coverage, so let's remember it is not one thing. There are many ways we can look at the code and discuss having seen it in action. Touching each line is one, taking every direction at every crossroad is one, and paying attention to the complex criteria of the crossroads is one. Tools are only capable of the simpler ways of assessing code coverage. 

Seeing a high percentage does not mean "well tested". It means "well touched". Whether we looked at the right things, and verified the right details is another question. Driving up code coverage does not usually mean good testing. Whereas being code coverage aware, not wanting code coverage to go down from where it has been even when adding new functionality, and taking time for thoughtful testing based on code coverage seem to support good teams in being good. 

Requirement/spec coverage is about covering claims in authoritative documents. Sometimes requirements need to be rewritten as claims, sometimes we go about spending time with each claim we find, and sometimes we diligently link each requirement to one or more tests, but some form of this tends to exist. 

With requirements/spec coverage, we need to be aware that there are things the spec won't say and we still need to test for. We can never believe any material alone is authoritative, testing is about also discovering omissions. Omissions can be code that spec promises, or details spec fails at promising but users and customers would consider particularly problematic. 

Having one test for a claim is rarely sufficient. There is no set number of tests we need for each claim. So I prefer thinking in none / one / enough. Enough is about risk/results. And it changes from project to project, and requires us to be aware of what we are testing to do a good job testing. 

The Three

By this time, you may be a little exasperated with the One and the Two, and there is still the Three. These three are dimensions of coverage I find I need to explain again and again to help address the risks. 

Environment coverage starts with the idea that users environments are different and testing in one may not represent them all. We could talk for hours on what makes environments essentially different, but for purposes of coverage, take my word for it: sometimes they are and sometimes they are not essentially different. So for the 10 functionalities to cover with tests with one test for each functionality, if we have three environments, we could have 30 tests to run. 

Easy example is browsers. Firefox on Linux is separate from Firefox on Mac and Firefox on Windows. Safari on Mac or Edge on Windows is only available there. Chrome is available on Mac, Windows and Linux. That small listing alone gives us 8 environments. The amount of testing - should we want to do it regularly - could easily explode. We may address this with various strategies from having different people on different environments, changing environments on a round-robin fashion to cross-browser automation. Whether we care to depends on risks, and risks depends on the nature of the thing we are building. 

Data coverage starts with the idea that each functionality processing data is covered with one data, but that may be far from sufficient. Like with embedded devices over the last three years, I find it surprising how often covering such simple thing as positive and negative temperature is necessary with the registry manipulation technologies. For this coverage, we would heavily rely on sampling, and it is part of every requirement test making it flexible to consider what percentage we are getting. Well, at least enough to note that percentages are generally useless measures in space of coverage.

Parafunctional coverage would be reminding on other dimensions that positive outputs. Security would be to have functionality that does something that can be used in wrong hands for bad. Performance would be considerations of fast and resource effective, particularly now in era of green code considerations. Reliability would be to run same things over longer period of time. And so on. 

Plus One

Today's call concluded with us then discussing automation coverage. Usually what we end up putting in our automation is a subset of all the things we do, a subset we want to keep on repeating. Great automation isn't created from listing test cases and implementing them, for good automation we tend to decompose the feedback needed differently where sum of the whole is similar. 

Automation coverage is ratio of what we have automated to something we care for. Some people care for documented test cases but I don't. If and when I care about this, I talk about automation coverage in terms of plans of growing it, and I avoid the conversation a lot. 

In one project we did test automation coverage by assigning zero, one, enough values for requirements by tagging all automation with requirements identifiers. A lot of work, some good communications included on planning for what more we need (first), but the percentage was very much the same as I could estimate off cuff. 

You may not have the half an hour it took for us to discuss 1-2-3 on the call, but knowing how to ground conversations of coverage is invaluable skill. If you spend time with testing, you are likely to get as many chances of practicing this conversation that I have by now. 

Wednesday, November 9, 2022

Why Have You Not Added More Test Automation?

We sat down at a weekly meeting, looking at two graphs created to illustrate the progress with system level  test automation. Given the idea that we want to run some tests in the end-user like environment, and end-user like composition, installing the software like end-users would, the pictures illustrated an idea of how much work there is to a first milestone we had set for ourselves. 

The first picture showed a plan of how things could be added incrementally, normalised with story points (a practice I very much recommend against) and spread over time in what had seemed like a realistic plan of many months. In addition to the linear projection, it showed progress achieved, and slope projected for what was reality was showing we were dragging behind. 

The second picture showed a plan of use of time on test automation. Or more precisely, only visualized time used, there was no fixed plan which was a problem in itself. But you could see that in addition to having fairly little time for the test automation work in general, the fluctuation in focus time was significant. 

It was not hard to explain the relation of the two. No time to do means no progress on the work. Or so I thought. Why were we then again answering the age-old question:

Why have you not added more test automation? 

The test automation effort was taking place within a team that had a fixed number of people and multiple conflicting priorities. 

  • They were expected to address a bug backlog (by fixing) so that there would numbers of bugs were down from hundreds to tens. This would be a significant effort of testing to confirm - development to fix - testing to search for side effects. For every fix, the testers had two tasks for one of the developer. 
  • They were expected to make multiple releases from branches the team did not continuously develop on. This would be a significant effort to verify right fixes and no side effects from two baselines that differed from what they would use when testing the changes. 
  • They were expected to learn a new system and document everything they learned so that when they would be moved to new team at latest one year from joining, the software factory they were working on could run forward with them gone.
  • With the fixed budget and requests to spend time on concept work of a new feature, they were expected to get by with two people where there previously had been three. Starting something new was deemed important. 
We were asking the wrong question. We should have been asking why we under-allocated something we thought we need to take forward, and still thought there would be progress. Why did we not understand the basic premise of how adding more work makes things more late? Why did we not understand that while ideas are cheap and we can juggle many at a time, turning those ideas to reality is a pipe of learning and doing that just won't happen without investing the time? 

While I routinely explain these things, I can't help to wonder the epidemic in management with thinking asking something, or rather requiring something, can be done without considering the frame in which there are chances of success with time available to do the work. 

If you ask the same people to do *everything else* and this one drops, how can you even imagine asking anyone but your own mirror image the reasons of why your choices produce these results? 

Monday, October 24, 2022

How to Maximize the Need for Testers

Back when I was growing up as a tester, one conversation was particularly common: the ratio of developers in our teams. A particularly influential writing was from Cem Kaner et al, on Fall 2000 Software Test Managers Roundtable on 'Managing the Proportion of Testers to (Other) Developers".  

The industry followed the changes in the proportion from having less testers than developers, to peaking at the famous 1:1 tester developer ratio that Microsoft popularized, to again having less testers than developers to an extend where it was considered good to have no developers with testing emphasis (testers), but have everyone share the tester role. 

If anything, the whole trend of looking for particular kinds of developers as test system responsibles added to the confusion of what do we count as testers, especially when people are keen to give up on the title when salary levels associated for the same job end up essentially different - and not in favor of the tester title. 

The ratios - or task analysis of what tasks and skills we have in the team that we should next hire for a human-shaped unique individual - are still kind of core to managing team composition. Some go with the ratio to have at least ONE tester in each team. Others go with looking of tasks and results, and bring in a tester to coach on recognising what we can target in the testing space. Others have it built into the past experiences of the developers they've hired. It is not uncommon to have developers who started off with testing, and later changed focus from specializing into creating feedback systems to creating customer-oriented general purpose systems - test systems included. 

As I was watching some good testing unfold in a team where testing happens by everyone not only by the resident tester, I felt the need of a wry smile on how invisible testing I as the tester would do. Having ensured that no developer is expected to work alone and making space for it, I could tick off yet another problem I was suspecting I might have to test for to find the problem, but now instead I could most likely be enjoying that it works - others pointed out the problem. 

To appreciate how little structural changes can make my work more invisible and harder to point at, I collected the *sarcastic* list of how to maximise the need of testers by ensuring there will be visible for for you. Here's my to-avoid list that makes the testing I end up doing more straightforward, need of reporting bugs very infrequent, and allows me to focus more of my tester energies in telling the positive stories of how well things work out in the team. 

  1. Feature for Every Developer
    Make sure to support the optimising managers and directors who are seeking for a single name for each feature. Surely we get more done when everyone works on a different feature. With 8 people in the team, we can take forward 8 things, right? That must be efficient. Except we should not optimize for direct translation of requirements to code, but for learning when allocating developers features. Pairing them up, or even better *single piece flow* of one feature for the whole team would make them cross-test while building. Remember, we want to maximise need of testers, and having them do some of that isn't optimising for it. The developers fix problems before we get our hands on them, and we are again down a bug on reporting! So make sure two developers work together as little as possible, review while busy running with their own and the only true second pair of eyes available is a tester. 

  2. Detail, and externalised responsibility
    Lets write detailed specifications: Build exactly this [to a predetermined specification]. All other mandates of thinking belong with those who don't code, because developers are expensive. That leaves figuring out all higher mandate levels to testers and we can point out how we built the wrong thing (but as specified). When developer assumptions end up in implementation, let's make sure they have as many assumptions strongly hold with an appearance of great answers in detail. *this model is from John Cutler (@johncutlefish)
    There's so much fun work to find out how they went off the expected rails when you work on higher mandate level. Wider mandate for testers but let's not defend the developers access to user research, learning and seeing the bigger picture. That could take a bug away from us testers! Ensure developers hold their assumptions that could end up in production, and then a tester to the rescue. Starting a fire just to put it out, why not. 

  3. Overwhelm with walls of text
    It is known that some people don't do so well with reading essential parts of text, so let's have a lot of text. Maybe we could even try an appearance of structure, with links and links, endless links to more information - where some of the links contain essential key pieces of that information. Distribute information so that only the patient survives. And if testers by profession are something, we are patient. And we survive to find all the details you missed. And with our overlining pens of finding every possible claim that may not be true, we are doing well when exceptional patience of reading and analyzing is required. That's what "test cases" are for - writing the bad documentation into claims with a step by step plan. And those must be needed because concise shared documentation could make us less needed. 

  4. Smaller tasks for devs, and count the tasks all the time
    Visibility and continuously testing, so let's make sure the developers have to give very detailed plans of what they do and how long that will take. Also, make sure they feel bad when they use even an hour longer than they thought - that will train them to cut quality to fit the task into the time they allocated. Never leave time between tasks to look at more effective ways of doing same things, learning new tech or better use of current tech. Make sure the tasks focus on what needs *programming* because it's not like they need to know about how accessibility requirements come from most recent legislation, or how supply chain security attacks have been in their tech, or expectations of common UX heuristics, more to the testers to point out that they missed! Given any freedom, developers would be gold-plating anyway so better not give room for interpretation there. 

  5. Tell them developers they are too valuable test and we need to avoid overlapping work
    Don't lose out on the opportunities to tell developers how they were hired to develop and you were hired to test. Remember to mention that if they test it, you will test it anyway, and ensure the tone makes sure they don't really even try. You could also use your managers to create a no talking zone between a development team and testing team, and a very clear outline of everything the testing team will do that makes it clear that the development team does not need to do. Make sure every change comes through you, and that you can regularly say it was not good enough. You will be needed the more the less your developers opt in to test. Don't care that the time consuming part is not avoiding testing work overlap, but avoiding delayed fixing - testing could be quite fast if and when everything works. But that wouldn't maximise need of testers, so make sure the narrative makes them expect you pick up the trash. 

  6. Belittle feedback and triage it all
    Make sure it takes a proper fight to get any fixes on the developers task lists. A great way of doing this is making management very very concerned over change management and triaging bugs before, developers get only well-chewed clear instructions. No mentions of bugs in passing so that they might be fixed without anyone noticing! And absolutely no ensemble programming where you would mention a bug as it is about to be created, use that for collecting private notes to show later how you know where the bugs are. You may get to as far as getting managers telling developers they are not allowed to fix bugs without manager consent. That is a great ticket to all the work of those triage meetings. Nothing is important anyway, make the case for it. 

  7. Branching policy to test in branch and for release
    Make sure to require every feature to be fully tested in isolation on a branch, manually since automation is limited. Keep things in branches until they are tested. But also be sure to insist on a process where the same things get tested integrated with other recent changes, at latest release time. Testing twice is more testing than testing once, and types of testing requiring patience are cut out for testers. Maximize the effect by making sure this branch testing cannot be done by anyone other than a tester or the gate leaks bad quality. Gatekeep. And count how many changes you hold at the gates to scale up the testers. 

  8. Don't talk to people
    Never share your ideas what might fail when you have them. They are less likely to find a problem when you use them if someone else got there first. It might also be good to not use PRs but also not talk about changes. Rely on interface documentation over any conversation, and when documentation is off, write a jira ticket. Remember to make the ticket clear and perfect with full info, that is what testers do after all. A winning strategy is making sure people end up working neighbour changes that don't really like each other, the developers not talking bodes ill for software not talking either. Incentivising people to not work together is really easy to do through management. 

Sadly, each of these are behaviors I still keep seeing in teams. 

In a world set up for failing in the usual ways, we need to put special attention to doing the right thing.

It's not about maximising the need of testers. The world will take care bigger and harder systems. The time will take care of us growing to work with the changing landscapes in project's expectation on what the best value from us is, today.

There is still a gap in results of testing. It requires focused work. Make space for the hard work. 

Saturday, October 22, 2022

Being a Part of Solution

A software development team integrated a scanning tool that provides two lists: one about licenses in use, and another one about supply chain vulnerabilities in all of the components the project relies on. So now we know. We know to drop one component for licenses list to follow an established list of what we can use. And we know we have some vulnerabilities at hand. 

The team thinks of the most natural way of going forward, updating the components to their latest. Being realistic, they scan again, to realize the numbers are changing and while totals are down some, the list is far from empty. List is, in fact, relevant enough that there is a good chance there is not new more relevant vulnerabilities on the list. 

Seeking guidance, team talks to security experts. The sentiment is clear: the team has a problem and the team owns the solution. Experts reiterate the importance of the problem the team is well aware of. But what about the solution? How do we go about solving this? 

I find this same thing - saying fixing bugs is important - is what testers do too. We list all the ways the software can fail, old and new, and at best we help remind that some of the things we are now mentioning are old but their priority is changing. All too much, we work on the problem space, and we shy away from the solutions.

To fix that listing that security scanners provide, you need to make good choices. If you haven't made some bad choices and some better choices, you may not have the necessary information of experimenting into even better choices. Proposals on certainly effective choices are invaluable. 

To address those bugs, the context of use - acting as a proxy for the users to drive most important fixes first - is important. 

Testers are not only information providers, but also information enrichers, and part of teams making the better choices on what we react on. 

Security experts are not just holders of the truth that security is important, but also people who help teams make better choices so that the people spending more time on specializing aren't only specializing on knowing the problem, but also possible solutions.

How we come across matters. Not knowing it all is a reality, but stepping away from sharing that responsibility of doing something about it is necessary. 

Monday, October 17, 2022

Test Automation with Python, an Ecosystem Perspective

Earlier this year, we taught 'Python for Testing' internally at our company. We framed it as four half-day sessions, ensemble testing on everyone's computers to move along on the same exercise keeping everyone along for the ride. We started with installing and configuring vscode and git, python and pytest, and worked our way through tests that look for patterns on logs, to tests on REST apis, to tests on MQTT protocol, to tests on webUI. Each new type of test would be just importing libraries, and we could have continued the list for a very long time. 

Incrementally, very deliberately growing the use of libraries works great when working with new language and new people. We imported pytest for decorating with fixtures and parametrised tests. We imported assertpy for soft assertions. We imported approval tests for push-results-to-file type of comparisons. We imported pytest-allure for prettier reports. We imported requests for REST API calls. We imported paho-mqtt for dealing with mqtt-messages. And finally, we imported selenium to drive webUIs. 

On side of selenium, we built the very same tests importing playwright to driver webUIs, to have concrete conversations on the fact that while there are differences on the exact lines of code you need to write, we can do very much the same things. The only reason we ended up with this is the ten years of one of our pair of teachers on selenium, and the two years from another one of our pair of teachers on playwright.  We taught on selenium. 

You could say, we built our own framework. That is, we introduced the necessary fixtures, agreed on file structures and naming, selected runners and reports - all for the purpose of what we needed to level up our participants. And they learned many things, even the ones with years of experience in the python and testing space. 

Libraries and Frameworks

Library is something you call. Framework is something that sits on top, like a toolset you build within. 

Selenium library on python is for automating the browsers. Like the project web page says, what you do with that power is entirely up to you. 

If you need other selections made on the side of choosing to drive webUIs with selenium (or playwright for that matter), you make those choices. It would be quite safe bet to say that the most popular python framework for selenium is proprietary - each making their own choices. 

But what if you don't want to be making that many choices? You seem to have three general purpose selenium-included test frameworks to consider: 
  • Robot Framework with 7.2k stars on github
  • Helium with 3.1k stars on GitHub (and less active maintainer on new requests) 
  • SeleniumBase with 2.9k stars on GitHub
Making sense of what these include is a whole another story. Robot Framework centers around its own language you can extend with python. Helium and SeleniumBase collect together python ecosystem tools, and use conventions to streamline the getting started perspective. All three are a dependency that set the frame for your other dependencies. If the framework does not include (yet) support for Selenium 4.5, then you won't be using Selenium 4.5. 

Many testers who use frameworks may not be aware what exactly they are using. Especially with Robot Framework. Also, Robot Framework is actively driving people from selenium library to RF into a newer browser library to RF, which includes playwright. 

I made a small comparison of framework features, comparing generally available choices to choices we have ended up with in our proprietary frameworks. 

Frameworks give you features you'd have to build yourself, and centralise and share maintenance of those features and dependencies. They also bind you to those choices. For new people they offer early productivity, sometimes at the expense of later understanding. 

The later understanding, particularly with Robot Framework being popular in Finland may not be visible, and in some circles, has become a common way of recognising people stuck in an automation box we want to get out of. 

Friday, October 14, 2022

WebUI Testing

I don't know about you, but my recent years have seen a lot of systems where users are presented with a webUI. The embedded devices I've tested ship with a built-in web server serving pages. The complex data processing pipelines end up presenting chewed up observations and insights on WebUI. Some are hosted in cloud, others in own server rooms. Even the windows user interfaces turned into webUIs wrapped in some sort of frame that appears less of a browser, but is really just a specializing browser. 

With this in mind, it is no surprise that test automation tooling in this space is both evolving, and source of active buzz. The buzz is often on the new things, introducing something new and shiny. Be it a lovely API, the popularized 'self-healing' or the rebranded 'low-code/no-code' that has been around at least as long as I have been in this industry, there is buzz. 

And where there's buzz, there's sides. I have chosen one side intentionally, which is against Robot Framework and for libraries in the developer team's dominant language. For libraries I am very hard trying to be, as they say, Switzerland - neutral ground. But how could I be neutral ground, as a self-identified Playwright girl, and a member of Selenium Project Leadership Group? I don't really care for any of the tools, but I care for the problems. I want to make sense of the problems and how they are solved. 

In the organization I spend my work days in, we have a variety. We have Robot Framework (with Selenium library). We have Robot Framework (with Playwright library). We have python-pytest with Selenium, and python-pytest with Playwright. We have javascript with Selenium, Playwright and Testcafe, and Cypress. The love and enthusiasm of doing well seems to matter more for success than the libraries, but jury is still out. 

I have spent a significant amount of time trying to make sense on Playwright and Selenium. Cypress, with all the love it is getting in world and my org, seems to come with functional limitations, yet people always test what they can with the tools, and figure out ways of telling that is the most important thing we needed to do, anyway. Playwright and Selenium, that pair is a lot trickier. The discussion seems to center around both testing *real browsers*. Playwright appears to mean a superset-engine browser that real users don't use and would not recognise as real browser. Selenium appears to mean the real browser, the users-use-this browser, with all the hairy versions and stuff that add to the real-world complexity in this space. The one users download, install on their machines and use. 

Understanding this difference on what Playwright and Selenium drive for you isn't making it easy for me.

 I have strong affinity for the idea of risk-based testing, and build the need of it on top of experiences of maintaining cross-browser tests being more work than value. In many of the organizations I have tested in, we choose one browser we automate on, and cover other browsers by agreeing a rotation based on days of the week in testing, time of doing one-off runs of automation half-failing with significant analysis time or agreeing on different people using different browsers while we use our webUI. We have thought we have so few problems cross-browser hearing the customer feedback and analyzing behaviors from telemetry that the investment to cross-browser has just not been worth it. 

With single browser strategy in mind, it matters less if we use that superset-engine browser and automation never sees users-use-this browser. There is the eyes-on-application on our own computers that adds users-use-this browsers, even if not as continuous feedback for each change automation can provide. Risk has appeared both low in likelihood and low in impact when it rarely has hit a customer. We use the infamous words "try Chrome as workaround" while we deliver fix in the next release. 

Reality is that since we don't test across browsers, we believe this is true. It could be true. It could be untrue. The eyes-on sampling has not shown it to be untrue but it is also limited in coverage. Users rarely complain, they just leave if they can. And recognising problems from telemetry is still quite much of a form of art. We don't know if there are bugs we miss on our applications if we rely on superset-engine browsers over users-use-this browsers. 

Browsers of today are not browsers of the future. At least I am picking up a sense of differentiation emerging, where one seems to focus on privacy related features, another being more strict on security, and so on. Even if superset-engine browsers are sufficient for testing of today, are they sufficient for testing in five years with browsers in the stack becoming more and more different from one another. 

Yet that is not all. The answers you end up giving to these questions are going to be different depending on where your team's changes sit on the stack. Your team's contribution to the world of webUIs may be your very own application, and that is where we have large numbers. Each of these application teams need to test their very own application. Your team's contribution may also be on the framework applications are built on. Be it Wordpress or Drupal, or React or Vue, these exists to increase productivity in creating applications and come to an application team as a 3rd party dependency. Your team's contribution could also be in the browser space, providing a platform webUIs run on.  

Picture. Ecosystem Stack

This adds to the trickiness of the question of how do we test for the results we seek. Me on top of that stack with my team of seven will not want to inherit testing of the framework and browser we rely on, when most likely there's bigger teams already testing those and we have enough on our own. But our customers using that webUI we give them, they have no idea if the problem is created by our code, the components we depend on, or the browser we depend on to run this all. They just know they saw a problem with us. That puts us in a more responsible spot, and when the foundation under us leaks and gives us a bad name, we try making new choices of the platform when possible. And we try clear timely reports hoping our tiny voices are heard with that clarify in the game with mammoths. 

For applications teams we have the scale that matters the most for creators of web driver libraries. And with this risk profile and team size, we often need ease, even shortcuts. 

The story is quite different on the platforms the scale of applications rely on. For both browsers and frameworks, it would be great if they lived with users-use-this browsers with versions, variants and all that, and did not shortcut to superset-engine type of an approach where then figuring out something is essentially different becomes a problem for their customers, the webUI development community. The browser and framework vendors won't have access (or means to cover even if they had access) to all our applications, so they sample applications based on some sampling strategy to think their contributions are tested and work. 

We need to test the integrated system not only our own code for our customers. Sitting on top of that stack puts our name on all the problems. But if it costs us extra time to maintain tests cross-browser for users-use-this browser, we may just choose we can't afford to - the cost and the value our customers would get are not in balance. I'm tired of testing for browser and framework problems in the ever-changing world because those organizations wouldn't test their own, but our customers will never understand the complexities of responsibilities across this ecosystem stack. 

We would love if our teams could test what we have coded, and a whole category of cross-browser bugs would be someone else's problem. 

Saturday, October 8, 2022

When do YOU start testing?

This week I was a guest in podcast where we found one another in Polywork. It's a development podcast, and we talked of testing, and the podcaster is an absolute delight to talk to. The end result will air sometime in December. 

One question he asked left me thinking. Paraphrasing, he asked: 

"When do you start testing?" 

I have been thinking about it since. I was thinking about it today listening to TestFlix and the lovely talk by Seema Prabhu and the vivid discussion her talk created on the sense of wall between implementing and testing. And I am still thinking about it. 

The reason I think about this is that there is no single recipe I follow. 

I start testing when I join a project, and will test from whatever I have there at that moment. Joining a new product early on makes the testing I do look very different than joining a legacy product in support mode. But in both cases, I start with learning the product hands-on, asking myself: "why would anyone want to use this?" 

After being established in the project, I find myself usually working with continuous flow of features and changes. It would be easy to say I start testing features as we hear about them, and I test them forever since until they are retired. More concretely, I am often taking part in handoff of the request of that feature, us clarifying it with acceptance criteria that we write down and the product owner reviews to ensure it matches their idea. But not always, because I don't need to work on every feature, as long as we never leave anyone alone carrying the responsibility of (breaking) changes. 

When we figure out the feature, it would be easy to say that as a tester, I am part of architecture discussions. But instead, I have to say I am invited to be part of architecture discussions, but particularly recently I have felt like the learning and ownership that needs to happen in that space benefits from my absence, and my lovely team gives a few sentence summary that makes me feel like I got everything from their 3-hours - well, everything that I needed anyway, in that moment. Sometimes me participating as a tester is great, but not always. 

When the first changes are merged without being integrated with the system, I can start testing them. And sometimes I do. Yet more often, I don't. I look at the unit tests, and engage with questions about them and may not execute anything. And sometimes I don't look at the change when it is first made.

When a series of changes becomes "feature complete", I can start testing it. And sometimes I don't. It's not like I am the only one testing it. I choose to look when it feels like there is a gap I could help identify. But I don't test all the features as they become feature complete, sometimes I test some of the features after they have been released. 

Recently, I have started testing of features we are planning for roadmap next year. I test to make sure we are promising on a level that is realistic and allows us to exceed expectations. As a tester here, I test before a developer has even heard of the features. 

In the past, I have specialized in vendor management and contracts. I learned I can test early, but reporting results of my testing early can double the price of the fixed price contract without adding value. Early conversations of risks are delicate, and contracts have interesting features requiring a very special kind of testing. 

When people ask when do I start testing, they seem to look for a recipe, a process. But my recipe is varied. I seek for the work that serves my team best within the capabilities that I have in that moment of time. I work with the knowledge that testing is too important to be left for only for testers, but that does not mean that testers would not add value in it. But to find value, we need to accept that it is not always located in the same place. 

Instead of asking when do I start testing, I feel like reminding that I never stop. And when you don't stop, you also don't start. 

When do YOU start testing? 

Saturday, September 24, 2022

Empty Calendar Policy

If you seek around for scheduling advice, you will see proposals of scheduling yourself working time. Even if you don't seek for advice but happen to be user of the Microsoft platforms, you may find your inbox filled with that advice. 

"Maybe you should consider scheduling working time for yourself?" 

"You have many meetings in your calendar you have not confirmed"

"You spend N hours a week in meetings and these are the people who appear to be your people"

I have regularly delete these messages, but I have not turned them off. They remind me regularly that you can run your life and your calendar differently. 

My calendar is very close to empty. It does not mean that I would not have things to do, I just like being in control of those things more and let my calendar be in control of me less. 

A lot of the things people send invites for are information sessions, and I put them in my calendar as non-committal tentative reservations. I generally prefer doing information sessions with 2x speed, and the calendar invites I consider less as meetings in calendar, but reminders of cadence of checking out particular categories of information. 

Some things people need me for, and my rule around them is that everything that is supposed to be committed by me, you run by me first. I decide actively what goes into my calendar. 

Towards end of a week, I reflect on what I got done and what I want to do next. And I schedule my goals into the invisible slots without putting them in my calendar. This keeps me flexible to saying yes on things that fit my commitments on various level, and makes it easy for people who I really need to share work with find time in my calendar. 

But this also means that a great way of upsetting me is to decide you want something done, put it in my calendar and not accepting no for an answer claiming that my calendar is empty. 

There is no universal way of how people deal with their scheduling to suit their needs. A safer way is to assume it does not hurt to ping first, ask for consent on scheduling something for a particular timeframe in calendar, and letting people do their own decisions on how "it is only 30 minutes" is sometimes just that, and other times leads into a massive time sink of interrupting type of work that requires attentive time. 

My calendar is free so that I can get things done by limiting work in progress. 

Friday, September 16, 2022

Selenium and Me

In this blog, as usual, I speak for myself, not for the organizations I occasionally represent. I represent so many organizations in general that I find it hard to say which hat, and when. I try to step away when there is a conflict of interests. I am employee of Vaisala, entrepreneur of maaretp. I am board member of Tivia and project leadership committee (PLC) member of Selenium. I'm chairman of Software Testing Finland ry (Ohjelmistotestaus ry). And I'm running TechVoices initiative to help new speakers with a few other lovely people. My work and my hobbies are not the same, but they are synergetic. 

This is all relevant, because I am about to write about Selenium. I'm writing this as me, with all the experiences I have, and not sitting on the powers assigned to me on taking part on decisions about any of these organizations. 

The little old me has now had a month of being welcomed with open hearts and minds into the lovely Selenium community. My paths had crossed with Selenium, first in organizations I work in, then in other communities having love/hate relationship with all things GUI automation including Selenium, and then as keynote speaker at Selenium Conference India. I had come to learn Simon Stewart is a lovely human, and that Manoj Kumar is great to have around when you aspire to learn a lot, and that David Burns cares about all the right things. I had been infected with the knowledge of Pallavi Sharma, reinvigorated with the conversations with Pooja Shah, and watched in admiration the work Puja Jagani is doing with contributing to the project. Now that I hang out in the Selenium community slack, I have discovered how great and welcoming Titus Fortner is, and how much heart and time everyone there puts on things. The people who invited and welcomed me into Selenium PLC, thank you: Marcus Merrell, Bill McGee, Diego Molina and my trustworthy contact, Manoj Kumar. Manoj is the reason I joined, just to get the chance of working more with him more. 

It is fair to say I am here for the people. Not only for the people I now mentioned, or the people in that slack group, a totally intimidating number of 11,133, or the people who already use Selenium, or the 24.6k that star Selenium in GitHub (add a star here!). But the people who get to cross paths with Selenium also in the future.

I joined the project work, even if I am a self-proclaimed 'playwright girl', because I believe the ethos of choosing one to rule them all isn't the best of the industry. Having many means we look around for inspirations, and there is a significant thing in Selenium that should inspire us all: 

It's been around a long time and has never stopped evolving. 18 years is an impressive commitment, that that is just looking back! 

So if you have heard the rumours of "Selenium losing steam", I think you may be working with information that is constantly changing and jury is still out, and evidence where I see it isn't very conclusive.

In this world of fast changes and things evolving, with difficult to grasp messages, I'm adding to the confusion until I reach clarity. I'm starting my modeling of the world from a corner close to me, python. 

I hope to find us on a mutual route of discovering clarity, and would - just as me - welcome you to join whatever we together can cook up with the Selenium open source project. Open means that you can see too much, but also choose what your interests are.  

Thursday, August 18, 2022

The Result Gap - Describe the Invisible

At work, and at a planning meeting. The product owner asks about status of a thing we're working on in the team. It's done, and tested, exclaims the developer. It's done, tested by you but not tested by me, correct the tester. 

This repeats with small word variations all the time. Sometimes it's the eyeroll that says your testing may not be all the testing. Sometimes it is words. Sometimes it is the silence. 

But it is hard to explain to people how it is possible that testing is done but testing is not done. What is this funny business here? 

I started modeling this after one particularly frustrating planning meeting like that. I can explain the gap between testing and testing as the seven functional problems we still needed to correct, and the one completely lost test environment and another completely lost mailsink service that blocked verifying things work. Of the seven functional issues on three minor features one of the issues was that one feature would not work at all in the customer environment, the famous "works on my machine" for the development environment. 

While the conversation of testing being done while testing isn't done is frustrating, describing the difference as the results gap can work. There's the level we really need to have on knowing and working before we bother our customers, and there's the level developers in my great team tend to hit. It is generally pretty good. When they say they tested, they really did. It just left behind a gap. 

I call this type of results gap one of "Surprise!" nature. We are all surprised about how weird things can we miss, and how versatile the ways of software failing can be. We add things with each surprise to our automation tests, yet there seems to be always new ways things can fail. But the surprise nature says these are new ways it fails. 

The top gap is what I expect a developer with a testing emphasis to bring in an agile team. Catch surprises. Make sure they and the team spend enough time and focus to close the results gap. It's a gap we close together over time by learning, but it also may be a gap where learning new things continually is needed. I have yet to meet a team where their best effort to make me useless really made me useless. 

There is another possible meaning for the results gap, and this type of results gap makes me sad. I call this "Pick up the pizza boxes..." results gap. In some teams, the developer with a testing emphasis is tasked to create mechanisms to notice repeating errors. A little like assuming all kids who eat pizza leave the boxes in living room floor and you either automate the reminder making it just right so that kids will react to it and take out the garbage, or you go tell that with your authoritative voice while smiling. Some people and teams think this is what testers are supposed to do - be developers clean up reminding service. 

When working in teams with pizza box results gap, it is hard to ever get to the levels of excellency. You may see all the energy going into managing the bug list of hundreds or thousands that we expect to leave lying around, and half your energy goes into just categorising which piles of garbage we should and should not mention. 

This sad level I see often in teams where management has decided testing is for testers and developers just do unit testing - if even that. The developers are rewarded in fast changes, and fixing those changes is invisible in the ways we account for work. 

What does it look like then in teams where we are minimising the results gap? 

It looks like we are on schedule, or ahead of schedule. 

It looks like we are not inviting developers to continuously do on-call work because their latest changes broke something their automations were not yet attending to. 

It looks like we pair, ensemble, collaborate and learn loads as a team. It look like the developer with testing emphasis is doing exploratory testing documenting with automation, or like I now think of it: all manual testing gets done while preparing the automated testing. It might be that 99% of time is prep, and if prep is too little, your developer with testing emphasis may simplify the world joining the rest of the ream on good team's output level information and no one attends to the results gap on top but the customer. 

Do you know which category of a results gap your team is working on, and how much of a gap there is? Describe the invisible. 

Thursday, July 28, 2022

Language change in Gherkin Experiment

I find myself a Gherkin (the language often associated with BDD) sceptic. The idea that makes other people giddy with joy on writing gherkin scenarios instead of manual tests makes me feel despair, as I was never writing the manual tests. The more I think about it and look at it, the more clear it is that the Gherkin examples when done well are examples rather than tests, and some of the test case thinking is causing us trouble. 

What we seek with Gherkin and BDD is primarily a conversation of clarity with the customer and business. When different, business-relevant examples illustrate the scenarios our software needs to work through, the language of the user is essential. 

In our experiment of concise-but-code tests vs gherkin-on-top tests, I find myself still heavily favouring concise-but-code. 

def test_admin_can_delete_users(
assigned_user: User, users_page_logged_in_as_admin: UsersPage, users_page: UsersPage
) -> None:
users_page_logged_in_as_admin.create_new_user(, assigned_user.password)

I'm certain that the current state of fixtures has some cleaning up to do, but I can have a conversation also with this style in user's language. Before implementing, we talk in Friends format the one where admins can delete all users including themselves and after implementing, we have the format turned into something where just the name of the test(s) and main steps need to be occasionally looked at together. 

It is clear though this is not a format in which the users/business would already be writing our tests for us, and currently I am in a team where we have a product owner who would love nothing more than to be able to do that. There is also a sense of security for external customers if we could share them the specs that we test against that makes considering something like Gherkin worthwhile.

In this post, I want to look at the user/business collaboration created Gherkin for this feature in our project compared to the result that is running in our pipeline today. Luckily, the feature we are experimenting is so commonplace that we reveal absolutely nothing of our system by sharing the examples. 

The user/business collaboration generated examples started with two on creating new users: 
From: old

These turned to three by now. 
From: new

You can see a difference in language. Earlier we talked about users, where some of them get admin rights and others don't. Now, the language emphasis is on having admin and user. Also the new language isn't yet consistent on these, having admin and admin user used interchangeably in the steps. The new language also reveals a concept of default admin, which just happens to be one user's information on the database when it starts - clearly something where we should have a better conversation than the one I was around to document with the user/business collaboration session. 

The second one of the new also threw me off now that I compare it. I first reacted on admin's not being able to create users without admin rights - there is no admin without rights, those are the same thing. But then I realized that the it is trying to say Admin can create users that are not admins. 

Another thing that this sampling makes obvious is that two scenarios turned to three, and the split to creating admins and users makes sense more for a testing point of view than business point of view. Again, admin is just a user with admin rights. Any user could be an admin, permanently or temporarily.  

Similarly, it turned longer. Extra steps - deleting in particular - was introduced as step to each. Clearly a testing step made visible, and adding most likely not the essential part of the example for user/business.  

And it also turned shorter. Changing password, the essential functionality originally described vanished and found a new home, simplifying this scenario as it was also separately described in the original. 
From: new

From: old

With this one, steps exist for testing purpose only. They now describe the implementation. A common pitfall, and one we most definitely fell into: not illustrating the requirement as concisely as we could, but illustrating operating of the system as test steps. 

Interestingly, after the first example and mapping the original and the new, we have six things that originate with user/business, and only one that remains in the implemented side. 
From: new

We can find the match from the originals to this one. 
From: new

So what else got lost in translation? These three. 

From: new

The first two are the real reason why such functionality exists in the first place, and it is telling that those are missing. They require testing two applications together, a "system test" so to speak. 

The last one missing is also interesting. It's one that we can only do manually editing the database after no admin access, again a "system test". 

Having looked at these, my takeaways are: 
  1. Gherkin fooled us in losing business information while feeling like it was "easier" and "more straightforward". The value of reading the end result is not same as value of reading the business input. 
  2. Tests will be created on testing terms and creating an appearance of connection with business isn't yet a connection
  3. The conversation mattered, the automated examples didn't. 

Tuesday, July 26, 2022

Limiting what's held in my head

I'm struggling with the work I have now. For a long time, I could not quite make sense of why. I knew what I was observing: 

  • a tester before me assigned into this team could not find their corner of being useful in a year
  • a tester after me stayed less than a month
  • a tester before us all felt overwhelmed and changed to something else where work is clearer

This project repels testers left and right. Yet it has some of the loveliest developers I have met. It has a collaborative and caring product owner. But somehow it repels testers. And I have never before experienced projects that repel testers while having developers who love testing like these folks do. 

In the six months I have spent with this team, I have managed to figure out some of the testing I want to do and I've:

  • clarified each new feature for what is in scope and what is not in scope, and tested to discover when those boundaries are fuzzy
  • learned how to control inputs, the transformations happening on the way, and how to watch the outputs
  • shortened release cycles from years to months
  • introduced some test automation that helps me track when things change
  • introduced test automation other people write that exercise things that would otherwise be hard to cover, allowing for the devs to find bugs while writing tests
  • had lovely conversations with the team resulting in better ways of working
I have plenty of frustrations:
  • I can barely run our dev environment because I hate how complex we've made it and can almost always opt to avoiding working with tests like the rest of the team - it took me 5 months of avoiding and having all my code on the side, now putting some of it together and losing debug
  • We use linux because devs like it, but customers expect windows because they like it. Discrepancies like this can bite us later and they already do if you happen to join the team with a windows workstation (like all testers other than myself who changed to Mac recently)
  • Our pipelines fail a lot, and we're spending way too much time on individual branches that live longer than I would like
  • Our organizations loves Jira, and our team does not. That means that we don't properly fake using it. The truth is in the commits and conversations (great) but I feel continuously guilty for not living up to some expectation that then makes it harder for testers who think they can rely on Jira for info.
  • We have so much documentation that I can't get through it in a lifetime.
So I have done my share of testing, I live with frustrations, and navigate day to day by juggling too many responsibilities anyway. What am I really struggling with? The adaptations.

I have come to limit heavily what of the things the team discusses I hold in my head. I don't need all of it to do what I have set out to do. I model the users' and other stakeholders space and I care about tech for the interfaces it gives me for visibility. I model what ready looks like and how we can make it smaller. I don't accept all the tools and tech, because there's more than I can consume going on. I focus on what others may not look at on the system. 

I start my days with checking if there is anything new in master, and design my days around change for the users. I choose to mention many things over doing them myself. I limit what I hold in my head to have energy for seeing what others may have missed. 

There is no simple "here's your assignment, test it - write test automation code for it" for me, there is always someone (currently a junior I'm training up on programming/testing). Most test automation in this team requires diving in deep into the depths of the implementation. The work is intertwined, messy and invites often to deep end without focus time unless you create it for yourself. 

So I struggle: finding a tester to do this work seems increasingly difficult. Training newbies feels a more likely direction. And it makes me think that some of this is the future we see: traditional testers and test automation folks won't find their corner of contribution. 

This world is different and it calls for different focus - intentional limiting of what we hold in our head to collaboratively work while being different. 


Saturday, July 23, 2022

Optimising start of your exploratory testing

Whenever I teach people exploratory testing, we start the teaching experience with an application they don't yet know. Let's face it: exploring something you have no knowledge of, and exploring something you have baseline knowledge on are different activities. Unless we take a longer course or the context of the application you work with, we really start with something we don't know. 

I have two favourite targets of testing I use now. Both of them are small, and thus learning them and completing the activity of testing them is possible. In addition, I teach with targets that are too large to complete testing on, for the variety they offer. 

The first one is a web application that presents input fields and produces outputs on a user interface. The second one is code that gets written as we are exploring the problem we should have an implementation for. 

With both test targets, I have had the pleasure of observing tens of testers work with me on the testing problem, and thus would like to mention a few things people stereotypically may do that aren't great choices. 

1. Start with input filtering tests

Someone somewhere taught a lot of testers that they should be doing negative testing. This means that when they see an input field, be it UI or API level, they start with the things they should not be able to input. It is relevant test, but only if you first:

  • Know what positive cases with the application look like
  • Know that there is an attempt of implementing error handling
  • Specifically want to test input filtering after it exists
With the code-oriented activity, we can see that input filtering exists only when we have expressed the intent of having input filtering and error handling. With both activities, we can't properly understand and appreciate what is incorrect input before we know a baseline of what is a correct input. 

A lot of testers skip the baseline of how this might work and why would anyone care. Don't. 

2. Only one sunny day

Like testers were taught about negative tests, they were also taught about sunny day scenarios. In action, it appears many testers hold the false belief that there is one sunny day scenario, when in fact there's many, and a lot of variation in all of them. We have plenty to explore without trying incorrect inputs. We can vary the order of things. We can vary what we type. We can vary times between our actions. We can vary how we observe. 

There are plenty of positive variations for our sunny day scenario and we need to start with searching for them. When problems happen in sunny day scenarios, they are more important to address.

Imagining one sunny day leads also to people stopping exploring prematurely, before they have the results the stakeholders asking for testing have the information, the results they'd expect. 

3. Start with complex and long

To ground exploring to something relevant, many people come up with a scenario that is complex or long, trying to capture all-in-one story of the application. As an anchor for learning it's a great way of exploring, but it becomes two things that aren't that great: 

  • Scenario we must go through, even if that means blind sighting for what is reality of the application
  • Scenario we think we got through no matter what we ended up doing
I find that people are better at tracking small things to see variation than they are tracking large things to see variation. Thus an idea of a scenario is great, but making notes and naming things that are smaller tend to yield better results. 

Also, setting up a complex thing that takes long to get through means delay to finding basic information. I've watched again and again people doing something really complex and slow to learn about problems later they could have shown and had fixed early if they did small before large, and quick before slow. 

The question with this one is though - does speed of feedback matter? In the sense of having to repeat it all after the fix, it should matter to whoever was testing, and knowing of a problem sooner tends to allow motivation to fix it without forgetting what introduced the problem.  Yet better later than never. 

4. Focus on the thing that isn't yours

Many people seem to think exploratory testing should not care for team limits but the customers overall experience. Anything and everything that peaks the tester's curiosity is fair game. I've watched people test javascript random function. I've watched people test browser functionalities. I've watched people test the things that aren't theirs so much that they forget to test what is theirs. 

This is usually a symptom of not thinking at all in terms of what is yours - architecture wise. When you find something does not work, who will be able to address it? Your team can address how they use 3rd party services and change to a different one. Just because you can test the things that you rely on, does not mean you always should. 

I find that if we think in terms of feedback we want to react on and can react on, we can find more sense to information we provide for our teams. Yes, it all needs to work together, but if we are aware of who is providing certain functionalities, we can have conversations of reacting to feedback we otherwise miss.

5. Overemphasis on usability

Usability is important and since you are learning a new domain and application while exploring, you probably have ideas on it. Sometimes we push these ideas to the center so early that we don't get to the other kinds of things expected as results. 

This usually for me is a symptom of the "even a broken clock is right twice a day" syndrome, where any results of testing are considered good, instead of looking at the purpose of testing as a whole. It feels great to be able to say that I found 10 bugs, and it sometimes makes us forget if those 10 bugs had the ones that exist that people care the most for. 

Delaying reporting of some types of bugs and particularly usability bugs is often a good approach. It allows for you, the tester, to consider if with 4 hours of experience you still see the same problems the same way, and why is learning the application changing your feedback.

6. Explicit requirements only

Finally, my pet peeve of giving up control of requirements to an external source. Exploratory testing, in fact, is discovering theories of requirements and having conversations on these discovered requirements. Limiting to explicit requirements is a loss of results. 

There's a whole category of important problems exploratory testing is able to uncover - bugs of omission. The things that we reasonably should expect to be there and be true, but aren't. While we try to think of these in advance to the best of our abilities, we are at extra creative with the application as our external imagination, letting us think in terms of what we may be missing. 

With the code-oriented exploratory testing target, I have two dimensions of things the PO holds true and assumes the others would know, yet the others will come with completely different set of assumptions.

I'll leave this today with the action call for exploratory testing: 

Go find (some of) what the other's have missed! 

The recipe is open, but some ingredients make it less likely to do with great results.