Wednesday, March 22, 2023

Memoirs of Bad Testing

I'm in a peculiar position right now. I am a manager of a manager who manages test manager managing themselves and a tester. Confused already - I am too.

The dynamic is clear through. I get to let go the tester for not thinking their work isn't worth paying for. And that is kind of what I am thinking. But it is not that straightforward. All testers, gather around, this needs a lot more than just throwing it at your face like this.

What Good Testing Looks Like

Having been a tester for such a long time and still being a tester on side of my other dark-side managerial duties. I know what good testing looks like and I would buy some in a heartbeat. Some examples of that testing I vouch for are embodied in people like Ru Cindrea (and pretty much anyone at Altom I have seen test, including past people from Altom like Alessandra Casapu, pure testing brilliance), Elizabeth Zagroba, Joep Schuurkes and James Thomas. And my idol of all times, Fiona Charles, who I attribute to a special blend of excellence in orchestrating testing of this kind. 

If there is a phrase that good testing often comes to me with, it is the words *exploratory testing*. Not the exploratory testing that you put into a single task as a reason to say that you need extra time to think, but the kind that engulfs all your testing, makes you hyper focus on results and aware of cost of premature documentation as well as opportunity cost, and positively pushes you to think about risks and results as an unsplittable pair where there is not one without the other. 

These kinds of testers have product research mindset, they are always on the lookout of how they invest their limited time the best in the context at hand and they regularly surprise their teams with information we did not have available regardless of how much automation we have already managed to accrue. 

Seniority much? 

While it may sound like I am describing very senior people, I have seen this in 1st year testers, and I have seen lack of this in 10th year testers. It is not about level of experience, it is about level of learning, and attitude to what it means to do testing. 

What does it look like when the work may not be worth paying for?

While writing documentation and notes and checklist is good, writing test cases is bad. Generally speaking of course. So my first smell towards recognising I might come to a case of testing not worth paying for is test case obsession. 

Test Case Obsession is counting test cases. Defining goals of test automation in perspective of manual test cases automated. Not understanding that good test automation means decomposing the same testing problems in a different way, not mimicking your movements manually. 

Its listing test cases and using the very same test cases framing them into "regression suite", "minor functionality suite" and "major functionality suite" and avoiding thinking about why we test today - what is the change and risk - and hoping that following a predetermined pattern would be a catch all important with regards to results. 

If this is what you are asked to do, you would do it. But if you are asked to follow the risk and adjust your ideas of what you test, and you still refuse, you are dangerously on the territory of testing not worth paying for.  When that is enhanced with missing simple basic things that are breaking because you prioritise safety and comfort of repeating plans over finding the problems with lack of information your organization hired you to work on, you are squarely on my testing not worth paying for turf. 

The Fixing Manoeuvres

I've done this so many times over the last 25 years that it almost scares me. 

First you move the testing not worth paying for into a team of its own and if you can't stop doing that work, you send it somewhere where it gets even thinner on results by outsourcing it. And in 6-12 months, you stop having that team. In the process you have often replaced 10 testers with 1 tester and  better results. 

Manager (1) - Manager (2) - Manager (3) - Tester

Being the Manager to Testers, I can do the Fixing Manoeuvres. But add more layers, and you have a whole different dynamic.

First of all, being the manager (1), I can ask for Testing Worth Paying For (tm) while manager (2) is requiring testing not worth paying for, and manager (3) and tester being great and competent folks find themselves squeezed from two sides. 

Do what your manager says and their manager will remove you.

Do what your managers manager say and your manager will nag you because they have been taught that test cases are testing, and know of nothing better. 

Long story, and Morale  

We have an epidemic of good managers with bad ideas about how to manage testing and this is why not all testing is not exploratory testing. Bad ideas remove a lot of the good from testing. Great testers do good testing even in presence of bad ideas, but it is so much harder. 

We need to educate good managers with bad ideas and I have not done my part as I find myself in a Manager - Manager - Manager - Tester chain. I will try again. 

Firing good people you could grow is the last resort while also a common manouvre. Expecting good testing (with mistakes and learning) should be what we grow people to. 

Friday, February 17, 2023

The Browser Automation Newbie Experience

Last week I called for Selenium newbies to learn to set up and run tests on their own computer, and create material I could edit a video out of. I have no idea if I ever manage to get to editing the video, but having now had a few sessions, writing about it seemed like a good use of an evening. 

There are two essentially different things I mean by a newbie experience: 

  1. Newbies to it all - testing, programming, browser automation. I love teaching this group and observing how quickly people learn in context of doing. 
  2. Newbies to browser automation library. I love teaching this group too. Usually more about testing (and finding bugs with automation), and observing what comes in the way of taking a new tool into use. 
I teach both groups in either pair or ensemble setting. In pairs, it's about *their* computer we pair on. In ensemble, it about one person's computer in the group we pair on. 

The very first test experience is slightly different on the two and comprises three parts:
  1. From empty project to tools installed
  2. From tools installed to first test
  3. Assessing the testing this enables
I have expressed a lot of love for Playwright (to the extent of calling myself a playwright girl) and with actions of using it in projects at work. But I have also expressed a lot of love for Selenium (to the extent that I volunteer for the Selenium Project Leadership Committee) and with encouraging using it in projects at work. I'm a firm believer that *our* world is better without competition and war, rather admiration and understanding and sustainability. 

So with this said, let's compare Playwright and Selenium *Python* on each of these three. 

From Empty Project To Tools Installed - Experience Over Time

Starting with listing dependencies in requirements.txt, I get the two projects set up side by side. 

Playwright         Selenium

I run pip install -r requirements.txt and ... I have some more to do for playwright: running playwright install from command line, and watching it download superset browsers and node dependencies. 

If everything goes well, this is it. At time of updating the dependencies, I will repeat the same steps, and I find the playwright error message telling me to update in plain English quite helpful. 

If we tend to lock our dependencies, need of update actions are on our control. On many of the projects, I find failing forward to the very latest to be a good option, and I take the unusual surprise of update over the pain of not having updated almost any day. 

From Tools Installed to First Test

At this point, we have done the visible tool installation and we only see if everything is fine if we use them tools and write a test. Today we're sampling the same basic test on todo mvc app, the js vanilla flavour

We have the Playwright-python code

With test run on chromium, headed: 
== 1 passed in 2.71s ==

And we have the Selenium-python code

With test run on chrome Version 110.0.5481.100 (Official Build) (arm64)
== 1 passed in 6.48s ==

What I probably should stress in explaining this is that while 3 seconds is less than 7 seconds, more relevant for overall time is the design we go to from here on what we test while on the same browser instance. Also, I have run tests for both libraries on my machine earlier today which is kind of relevant for the time consideration.

That is, with Playwright we installed the superset browsers with the command line. With Selenium, we did not because we are using the real browsers and the browser versions on the computer we run the tests on. For that to work some of you may know that it requires a browser-specific driver. The most recent Selenium versions have built-in SeleniumDriver that is - if all things are as intended - is invisible. 

I must mention that this is not what I got from reading Selenium documentation, and since I am sure this vanishes soon as in gets fixed, I am documenting my experience with a screenshot.

I know it says "No extra configuration needed" but since I had -- outside path - an older ChromeDriver, I ended up with an error message that made no sense with the info I had and reading the blog post, downloaded a command line tool I could not figure out how I would use. Coming to this with Playwright, I was keen to expected command line tool instead of automated built-in dependency maintenance feature and the blog post just built on the assumption. 

Assessing the Testing This Enables

All too often we look at the lines of code, the commands to run, and the minutiae of maintaining the tool, but at least on this level, it's not like the difference is that huge. With python, both libraries on my scenarios rely on existence of pytest and much of the asserting stuff can be done the same. Playwright has been recently (ehh, over few years, time flies I have no clue anymore) been introducing more convenience methods for web style expects like expect(locator()).to_have_text() and web style locators like get_by_placeholder(), yet these feel like syntactic sugar to me. And sugar feels good but sometimes causes trouble for some people in the long run. Happy to admit I have both been one of those people, and observed those people. Growing in tech understanding isn't easier for newbie programmers when they get comfortable with avoiding layered learning for that tech understanding. 

If it's not time or lines, what it it then? 

People who have used Selenium are proficient with Selenium syntax and struggle with the domain vocabulary change. I am currently struggling the other way around with syntax, and with Selenium the ample sample sets of old versions of APIs that even GitHub copilot seems to suggest wrong syntax from isn't helping me. Longer history means many versions of learning what the API should look like showing in the samples. 

That too is time and learning, and a temporary struggle. So not that either. 

The real consideration can be summed with *Selenium automates real browsers*. The things your customers use your software on. Selenium runs on the webdriver protocol that is a standard the browser vendors use to allow programmatic control. There's other ways of controlling browsers too, but using remote debugging protocols (like CFP for Chromium) isn't running the full real browser. Packaging superset browsers isn't working with the real versions the users are using.  

Does that matter? I have been thinking that for most of my use cases, not. Cross-browser testing the layers that sit on top of frontend framework or on top of a platform (like salesforce, odoo or Wordpress), there is only so much I want to test the framework or the platform. For projects creating those frameworks and platforms, please run on real browsers or I will have to as you are risking bugs my customers will suffer from. 

And this, my friend, this is where experience over time matters. 

We need multiple browser vendors so that one of them does not use vendor lock in to do something evil(tm). We need the choice of priorities different browser vendors make as our reason to choose one or another. We need to do something now and change our mind when the world changes. We want our options open. 

Monoculture is bad for social media (twitter) with change. Monoculture is bad for browsers with change. And monoculture is bad for browser automation. Monoculture is bad for financing open source communities. We need variety. 

It's browser automation, not test automation. We can do so much more than test with this stuff. And if it wasn't obvious from what I said earlier in this post, I think 18-year old selenium has many many good years ahead of it. 

Finally, the tool matters less than what you do with it. Do good testing. With all of these tools, we can do much better on the resultful testing. 

Friday, January 6, 2023

Contemporary Regression Testing

One of the first things I remember learning about testing is the repeating nature of it. Test results are like milk and stay fresh only a limited time, so we keep replenishing our tests. We write code and it stays and does the same (even if wrong) thing until changes, but testing repeats. It's not just the code changing that breaks systems, it's also the dependencies and platform changing, people and expectations changing. 

An illustration of kawaii box of milk from my time of practicing sketchnoting

There's corrective, adaptive, perfective and preventive maintenance. There's the project and then there's "maintenance". And maintenance is 80% of products lifecycle costs since maintenance starts with first time you put the system in production. 

  • Corrective maintenance is when we had problems and need to fix them.
  • Adaptive maintenance is when we will have problems if we allow for world around us to change and we really can't stop it, but we emphasize that everything was FINE before the law changes, the new operating system emerged or that 3rd party vendor figured out they had a security bug that we have to react to because of a dependency we have.
  • Perfective maintenance is when we add new features while maintaining, because customers learn what they really need when they use systems. 
  • Preventive maintenance is when we foresee adaptive maintenance and change our structures so that we wouldn't always be needing to adapt individually. 

It's all change, and in a lot of cases it matters that only the first one is defects and implying work you complete without invoicing for the work. 

The thing about change is that it is small development work, and large testing work. This can be true considering the traditional expectations of projects:

  1. Code, components and architecture are spaghetti
  2. Systems are designed, delivered and updated as integrated end-to-end tested monoliths
  3. Infrastructure and dependencies are not version controlled
With all this, the *repeating nature* becomes central, and we have devised terminology for it. There is re-testing (verifying a fix indeed fixed the problem) and regression testing (verifying that things that used to work still work), and made it a central concept in how we discuss testing.

For some people, it feels regression testing is all the testing they think of. When this is true, it almost makes sense to talk about doing this manual or automated. After all, we are only talking of the part of testing that we are replenishing results for. 

Looking at the traditional expectations, we come to expectations of two ways to think about regression testing. One takes a literal interpretation of "used to work", as in we clicked through exactly this and it worked, and I would call this test-case based regression testing. The other takes a liberal interpretation of "used to work" remembering that with risk-based testing we never looked at it all working but some of it worked even when we did not test it, and thus continuing with risk-based perspective, the new changes drive entirely new tests. I would call this exploratory regression testing. This discrepancy of thinking is a source of a lot of conversation in automated space because the latter would need to actively choose to pick tests as output to leave behind that we consider worthwhile repeating - and it is absolutely not all the tests we currently are leaving behind. 

So far, we have talked in traditional  expectations. What is contemporary expectation then?

The things we believe are true of projects are sometimes changing:
  1. Code is clean, components are microservices and architecture creates clear domain-driven architecture where tech and business concepts meet
  2. Systems are designed, delivered and updated incrementally, but also per service basis
  3. Infrastructure and dependencies are code
This leads to thinking many things are different. Things mostly break only when we break them with a change. We can see changes. We can review the change as code. We can test the change from a working baseline, instead of a ball of change spaghetti described in vague promises of tickets. 

Contemporary regression testing can more easily rely on exploratory regression testing with improved change control. Risk-based thinking helps us uncover really surprising side effects of our changes without using major efforts. But also, contemporary exploratory testing relies on teams doing programmatic test-case based regression testing whenever it is hard for developers to hold their past intent in their heads. Which is a lot, with people changing and us needing safety nets. 

Where with traditional regression testing we could choose one or the other, with contemporary regression testing we can't.   

Monday, January 2, 2023

The Three Cultures

Over the last 25 years, I have been dropped to a lot of projects and organizations. While I gave up on consulting early on and deemed it unsuited for my aspirations, I have been a tester with an entrepreneurial attitude - a consultant / mentor / coach even within the team I deliver as part of. 

Being dropped to a lot of projects and organizations, I have come to accept that two are rarely the same. Sometimes the drop feels like time travel to past. Rarely it feels like time travel to future. I find myself often brought in to help with some sort of trouble, or if there was no trouble, I can surely create some like with a past employer where we experimented with no product owner. There was trouble, we just did not recognise it without breaking away from some of our strong-held assumptions. 

I have come to categorize the culture, the essential belief systems around testing to three stages:

  1. Manual testing is the label I use for organizations predominantly stuck in test case creation. They may even automate some of those test cases, usually with the idea speeding up regression testing, but majority of what they do relies on the idea that testing is predominantly without automation, for various reasons. Exploratory testing is something done on top of everything else. 
  2. Automated testing is the label I use for organizations predominantly stuck in spearing manual and automated testing. Automated testing is protected from manual testing (because it includes so much of its own kind of manual testing), and the groups doing automation are usually specialists in test automation space. The core of automated testing is user interfaces and mostly integrated systems, something a user would use. Exploratory testing is something for the manual testers. 
  3. Programmatic tests is the label I use for whole team test efforts that center automation as a way of capturing developer intent, user intent and past intent. Exploratory testing is what drives the understanding of intent. 
The way we talk, and our foundational beliefs in these three different cultures just don't align. 

These cultures don't map just to testing, but the overall ideas of how we organize for software development. For the first, we test because we can't trust. For the middle, we test because we are supposed to. For the last, we test because not testing threatens value and developer happiness. 

Just like testing shifts, other things shift too. The kind of problems we solve. The power of business decisions. Testing (in the large) as part of business decisions. The labels we use for our processes in explaining those to the world. 

This weekend I watched an old talk from Agile India, by Fred George on 'Programmer Anarchy'. I would not be comfortable taking things to anarchy, but there there is a definite shift in where the decision power is held, with everyone caring for business success in programmer-centric ways of working. 

The gaps are where we need essentially new cultures and beliefs accepted. Working right now with the rightmost cultural gap, the ideas of emergent design are harder to achieve than programmed tests. 

Documentation is an output and should be created at times we know the best. Programmed tests are a great way of doing living documentation that, used responsibly, gives us a green on our past intent in the scope we care to document it. 

Friday, December 30, 2022

Baselining 2022

I had a busy year at work. While working mostly with one product (two teams), I was also figuring out how to work with other teams without either taking too much to become a bottleneck or to take so little I would be of no use. 

I got to experience by biggest test environment to this date in one of the projects - a weather radar. 

I got to work with a team that was replacing and renewing, plus willing and able to move from test automation (where tests are still isolated and entered around a tester) to programmatic tests where tests are whole team asset indistinguishable from other code. We ended up this year finalising the first customer version and to add 275 integrated tests and 991 isolated tests to best support our work - and get them to run green in blocking pipelines. The release process throughput times went from 7 hours from last commit to release to 19 minutes from last commit to release, and the fixing stage went from 27 days to 4 days. 

I became the owner of testing process for a lot of projects, and equally frustrated on the idea of so many owners of slices that coordination work is more than the value work. 

I volunteered with Tivia (ICT in Finland) as a board member for the whole year, and joined Selenium Open Source Project Leadership Committee early autumn. Formal community roles are a stretch, definitely. 

I got my fair share of positive reinforcement on doing good things for individuals salaries and career progression, learning testing and organising software development in smart ways some might consider modern agile. 

I spoke at conferences and delivered trainings, total 39 sessions out of which 5 were keynotes. I added a new country on my list of appearances, totalling now at 28 countries I have done talks in. I showed up at 7 podcasts as guest. 

I thought I did not write much into my blog, and yet this post is 42nd one of this year on this blog, and I have one #TalksTurnedArticles on and one article in IT Insider. My blog has now 840 222 hits, which is 56 655 more than a year ago. 

I celebrated my Silver Jubilee (25 years of ICT career) and started out with a group mentoring experiment of #TestingDozen. 

I spent monthly time reflecting and benchmarking with Joep Schuurkes, did regular on-demand reflections with Irja Strauss, ensemble programmed tests regularly with Alex Schladebeck and Elizabeth Zagroba and met so many serendipitous acquaintances from social media that I can't even count it. 

I said goodbye to twitter, started public note taking at mastodon and irregularly regular posting on LinkedIn. I am there (here) to learn together. 

Wednesday, December 28, 2022

A No Jira Experiment

If there is something I look back to from this year, it is doing changes that are impossible. I've been going against the dominant structures, making small changes and enabling a team where continuous change, always for the better is taking root. 

Saying it has taken root would be overpromising. Because the journey is only in the beginning. But we have done something good. We have moved to more frequent releases. We have established programmatic tests (over test automation), and we have a nice feedback cycle that captures insights of things we miss into those tests. The journey has not been easy.

When I wanted to drop scrum as the dominant agile framework of how management around us wanted things planned early this year, the conversations took some weeks up until a point of inviting trust. I made sure we were worthy of that trust, collecting metrics and helping the team hit targets. I spent time making sure the progress was visible, and that there was progress. 

Yet, I felt constrained. 

In early October this year, I scheduled a meeting to drive through yet another change I knew was right for the team. I made notes of what was said.

This would result in:

  • uncontrolled, uncoordinated work
  • slower progress, confusion with priorities and requirements
  • business case and business value forgotten 

Again inviting that trust, I got to do something unusual for the context at hand: I got to stop using Jira for planning and tracking of work. The first temporary yes was four weeks, and those were four of our clearest, most value delivering weeks. What started off as temporary, turned to *for now*. 

Calling it No Jira Experiment is a bit of an overpromise. Our product owner uses Jira on epic level tickets and creates this light illusion of visibility to our work in that. Unlike before, now with just few things he is moving around, the statuses are correctly represented. The epics have a documented acceptance criteria by the time they are accepted. Documentation is the output, not the input. 

While there is no tickets of details, the high level is better.

At the same time, we have just completed our most complicated everyone involved feature. We've created light lists of unfinished work, and are just about to delete all the evidence of the 50+ things we needed to list. Because our future is better when we see the documentation we wanted to leave behind, not the documentation that presents what emerged while the work was being discovered. The commit messages were more meaningful representing the work now done, not the work planned some weeks before. 

It was not uncontrolled or uncoordinated. It was faster, with clarity of priorities and requirements. And we did not forget business case or business value. 

It is far from perfect, or even good enough for longer term, we have a lot to work on. But it is so much better than following a ticket-centric process, faking that we know the shape of the work forcing an ineffective shape because that seems to be expected and asked for. 

Tuesday, December 6, 2022

There is such a thing as testing that is not exploratory

The team had been on a core practice of clarifying with tests for a while, and they invited an outsider to join their usual meeting routine.

Looking around, there were 8 people in a call. The one in charge of inviting the meeting shared their screen, for what was about to be their routine of test design sessions. He copied the user story they had been assigned to work on into the Jira ticket he had open, and called the group for ideas of tests. 

People started mentioning things that could be tried, and the one in charge wrote things down as the rest of the group watched. For a high level idea, he called for the detailed steps. Every idea that came about was written down and detailed. 

After the meeting, the group would split the work to automate it all. 

And two weeks later, they had all these tests passing, and a feature that just did not work.

The magic glue they miss is what I call exploratory testing. The sense of caring for results in testing by focusing on learning and recognising that the time most people create test cases like above, it is the time when they know the least.

You can add exploratory testing on top of this.

You can lighten up the details you want to write to start with to leave more room for exploratory testing, where output of your testing is the documentation. 

You can seek to optimize for learning, being aware of the time used.

The team that follows the planning pattern did not do exploratory testing. You could argue the next team using their results did exploratory testing through trying to use it, to tell them it is failing. 

Working with teams like this is real. It is embarrassing but it is real. And we don't change it by playing with words, but by making the results we expect clear and permissive to excellence. 

This story, unfortunately, was brought to you by the "tell no stories that did not happen" and "only stories from last two years allowed" rules. This is real. This is what people still make of testing, and some folks meekly follow the agreed practice.