A Seasoned Tester's Crystal Ball

Friday, February 17, 2023

The Browser Automation Newbie Experience

Last week I called for Selenium newbies to learn to set up and run tests on their own computer, and create material I could edit a video out of. I have no idea if I ever manage to get to editing the video, but having now had a few sessions, writing about it seemed like a good use of an evening.

There are two essentially different things I mean by a newbie experience:

Newbies to it all - testing, programming, browser automation. I love teaching this group and observing how quickly people learn in context of doing.
Newbies to browser automation library. I love teaching this group too. Usually more about testing (and finding bugs with automation), and observing what comes in the way of taking a new tool into use.

I teach both groups in either pair or ensemble setting. In pairs, it's about *their* computer we pair on. In ensemble, it about one person's computer in the group we pair on.

The very first test experience is slightly different on the two and comprises three parts:

From empty project to tools installed
From tools installed to first test
Assessing the testing this enables

I have expressed a lot of love for Playwright (to the extent of calling myself a playwright girl) and with actions of using it in projects at work. But I have also expressed a lot of love for Selenium (to the extent that I volunteer for the Selenium Project Leadership Committee) and with encouraging using it in projects at work. I'm a firm believer that *our* world is better without competition and war, rather admiration and understanding and sustainability.

So with this said, let's compare Playwright and Selenium *Python* on each of these three.

From Empty Project To Tools Installed - Experience Over Time

Starting with listing dependencies in requirements.txt, I get the two projects set up side by side.

Playwright	Selenium
pytest playwright pytest-playwright	pytest selenium

I run pip install -r requirements.txt and ... I have some more to do for playwright: running playwright install from command line, and watching it download superset browsers and node dependencies.

If everything goes well, this is it. At time of updating the dependencies, I will repeat the same steps, and I find the playwright error message telling me to update in plain English quite helpful.

If we tend to lock our dependencies, need of update actions are on our control. On many of the projects, I find failing forward to the very latest to be a good option, and I take the unusual surprise of update over the pain of not having updated almost any day.

From Tools Installed to First Test

At this point, we have done the visible tool installation and we only see if everything is fine if we use them tools and write a test. Today we're sampling the same basic test on todo mvc app, the js vanilla flavour.

We have the Playwright-python code

With test run on chromium, headed:

== 1 passed in 2.71s ==

And we have the Selenium-python code

With test run on chrome Version 110.0.5481.100 (Official Build) (arm64)

== 1 passed in 6.48s ==

What I probably should stress in explaining this is that while 3 seconds is less than 7 seconds, more relevant for overall time is the design we go to from here on what we test while on the same browser instance. Also, I have run tests for both libraries on my machine earlier today which is kind of relevant for the time consideration.

That is, with Playwright we installed the superset browsers with the command line. With Selenium, we did not because we are using the real browsers and the browser versions on the computer we run the tests on. For that to work some of you may know that it requires a browser-specific driver. The most recent Selenium versions have built-in SeleniumDriver that is - if all things are as intended - is invisible.

I must mention that this is not what I got from reading Selenium documentation, and since I am sure this vanishes soon as in gets fixed, I am documenting my experience with a screenshot.

I know it says "No extra configuration needed" but since I had -- outside path - an older ChromeDriver, I ended up with an error message that made no sense with the info I had and reading the blog post, downloaded a command line tool I could not figure out how I would use. Coming to this with Playwright, I was keen to expected command line tool instead of automated built-in dependency maintenance feature and the blog post just built on the assumption.

Assessing the Testing This Enables

All too often we look at the lines of code, the commands to run, and the minutiae of maintaining the tool, but at least on this level, it's not like the difference is that huge. With python, both libraries on my scenarios rely on existence of pytest and much of the asserting stuff can be done the same. Playwright has been recently (ehh, over few years, time flies I have no clue anymore) been introducing more convenience methods for web style expects like expect(locator()).to_have_text() and web style locators like get_by_placeholder(), yet these feel like syntactic sugar to me. And sugar feels good but sometimes causes trouble for some people in the long run. Happy to admit I have both been one of those people, and observed those people. Growing in tech understanding isn't easier for newbie programmers when they get comfortable with avoiding layered learning for that tech understanding.

If it's not time or lines, what it it then?

People who have used Selenium are proficient with Selenium syntax and struggle with the domain vocabulary change. I am currently struggling the other way around with syntax, and with Selenium the ample sample sets of old versions of APIs that even GitHub copilot seems to suggest wrong syntax from isn't helping me. Longer history means many versions of learning what the API should look like showing in the samples.

That too is time and learning, and a temporary struggle. So not that either.

The real consideration can be summed with *Selenium automates real browsers*. The things your customers use your software on. Selenium runs on the webdriver protocol that is a standard the browser vendors use to allow programmatic control. There's other ways of controlling browsers too, but using remote debugging protocols (like CFP for Chromium) isn't running the full real browser. Packaging superset browsers isn't working with the real versions the users are using.

Does that matter? I have been thinking that for most of my use cases, not. Cross-browser testing the layers that sit on top of frontend framework or on top of a platform (like salesforce, odoo or Wordpress), there is only so much I want to test the framework or the platform. For projects creating those frameworks and platforms, please run on real browsers or I will have to as you are risking bugs my customers will suffer from.

And this, my friend, this is where experience over time matters.

We need multiple browser vendors so that one of them does not use vendor lock in to do something evil(tm). We need the choice of priorities different browser vendors make as our reason to choose one or another. We need to do something now and change our mind when the world changes. We want our options open.

Monoculture is bad for social media (twitter) with change. Monoculture is bad for browsers with change. And monoculture is bad for browser automation. Monoculture is bad for financing open source communities. We need variety.

It's browser automation, not test automation. We can do so much more than test with this stuff. And if it wasn't obvious from what I said earlier in this post, I think 18-year old selenium has many many good years ahead of it.

Finally, the tool matters less than what you do with it. Do good testing. With all of these tools, we can do much better on the resultful testing.

Friday, January 6, 2023

Contemporary Regression Testing

One of the first things I remember learning about testing is the repeating nature of it. Test results are like milk and stay fresh only a limited time, so we keep replenishing our tests. We write code and it stays and does the same (even if wrong) thing until changes, but testing repeats. It's not just the code changing that breaks systems, it's also the dependencies and platform changing, people and expectations changing.

An illustration of kawaii box of milk from my time of practicing sketchnoting

There's corrective, adaptive, perfective and preventive maintenance. There's the project and then there's "maintenance". And maintenance is 80% of products lifecycle costs since maintenance starts with first time you put the system in production.

Corrective maintenance is when we had problems and need to fix them.
Adaptive maintenance is when we will have problems if we allow for world around us to change and we really can't stop it, but we emphasize that everything was FINE before the law changes, the new operating system emerged or that 3rd party vendor figured out they had a security bug that we have to react to because of a dependency we have.
Perfective maintenance is when we add new features while maintaining, because customers learn what they really need when they use systems.
Preventive maintenance is when we foresee adaptive maintenance and change our structures so that we wouldn't always be needing to adapt individually.

It's all change, and in a lot of cases it matters that only the first one is defects and implying work you complete without invoicing for the work.

The thing about change is that it is small development work, and large testing work. This can be true considering the traditional expectations of projects:

Code, components and architecture are spaghetti
Systems are designed, delivered and updated as integrated end-to-end tested monoliths
Infrastructure and dependencies are not version controlled

With all this, the *repeating nature* becomes central, and we have devised terminology for it. There is re-testing (verifying a fix indeed fixed the problem) and regression testing (verifying that things that used to work still work), and made it a central concept in how we discuss testing.

For some people, it feels regression testing is all the testing they think of. When this is true, it almost makes sense to talk about doing this manual or automated. After all, we are only talking of the part of testing that we are replenishing results for.

Looking at the traditional expectations, we come to expectations of two ways to think about regression testing. One takes a literal interpretation of "used to work", as in we clicked through exactly this and it worked, and I would call this test-case based regression testing. The other takes a liberal interpretation of "used to work" remembering that with risk-based testing we never looked at it all working but some of it worked even when we did not test it, and thus continuing with risk-based perspective, the new changes drive entirely new tests. I would call this exploratory regression testing. This discrepancy of thinking is a source of a lot of conversation in automated space because the latter would need to actively choose to pick tests as output to leave behind that we consider worthwhile repeating - and it is absolutely not all the tests we currently are leaving behind.

So far, we have talked in traditional expectations. What is contemporary expectation then?

The things we believe are true of projects are sometimes changing:

Code is clean, components are microservices and architecture creates clear domain-driven architecture where tech and business concepts meet
Systems are designed, delivered and updated incrementally, but also per service basis
Infrastructure and dependencies are code

This leads to thinking many things are different. Things mostly break only when we break them with a change. We can see changes. We can review the change as code. We can test the change from a working baseline, instead of a ball of change spaghetti described in vague promises of tickets.

Contemporary regression testing can more easily rely on exploratory regression testing with improved change control. Risk-based thinking helps us uncover really surprising side effects of our changes without using major efforts. But also, contemporary exploratory testing relies on teams doing programmatic test-case based regression testing whenever it is hard for developers to hold their past intent in their heads. Which is a lot, with people changing and us needing safety nets.

Where with traditional regression testing we could choose one or the other, with contemporary regression testing we can't.

Monday, January 2, 2023

The Three Cultures

Over the last 25 years, I have been dropped to a lot of projects and organizations. While I gave up on consulting early on and deemed it unsuited for my aspirations, I have been a tester with an entrepreneurial attitude - a consultant / mentor / coach even within the team I deliver as part of.

Being dropped to a lot of projects and organizations, I have come to accept that two are rarely the same. Sometimes the drop feels like time travel to past. Rarely it feels like time travel to future. I find myself often brought in to help with some sort of trouble, or if there was no trouble, I can surely create some like with a past employer where we experimented with no product owner. There was trouble, we just did not recognise it without breaking away from some of our strong-held assumptions.

I have come to categorize the culture, the essential belief systems around testing to three stages:

Manual testing is the label I use for organizations predominantly stuck in test case creation. They may even automate some of those test cases, usually with the idea speeding up regression testing, but majority of what they do relies on the idea that testing is predominantly without automation, for various reasons. Exploratory testing is something done on top of everything else.
Automated testing is the label I use for organizations predominantly stuck in spearing manual and automated testing. Automated testing is protected from manual testing (because it includes so much of its own kind of manual testing), and the groups doing automation are usually specialists in test automation space. The core of automated testing is user interfaces and mostly integrated systems, something a user would use. Exploratory testing is something for the manual testers.
Programmatic tests is the label I use for whole team test efforts that center automation as a way of capturing developer intent, user intent and past intent. Exploratory testing is what drives the understanding of intent.

The way we talk, and our foundational beliefs in these three different cultures just don't align.

These cultures don't map just to testing, but the overall ideas of how we organize for software development. For the first, we test because we can't trust. For the middle, we test because we are supposed to. For the last, we test because not testing threatens value and developer happiness.

Just like testing shifts, other things shift too. The kind of problems we solve. The power of business decisions. Testing (in the large) as part of business decisions. The labels we use for our processes in explaining those to the world.

This weekend I watched an old talk from Agile India, by Fred George on 'Programmer Anarchy'. I would not be comfortable taking things to anarchy, but there there is a definite shift in where the decision power is held, with everyone caring for business success in programmer-centric ways of working.

The gaps are where we need essentially new cultures and beliefs accepted. Working right now with the rightmost cultural gap, the ideas of emergent design are harder to achieve than programmed tests.

Documentation is an output and should be created at times we know the best. Programmed tests are a great way of doing living documentation that, used responsibly, gives us a green on our past intent in the scope we care to document it.

Friday, December 30, 2022

Baselining 2022

I had a busy year at work. While working mostly with one product (two teams), I was also figuring out how to work with other teams without either taking too much to become a bottleneck or to take so little I would be of no use.

I got to experience by biggest test environment to this date in one of the projects - a weather radar.

I got to work with a team that was replacing and renewing, plus willing and able to move from test automation (where tests are still isolated and entered around a tester) to programmatic tests where tests are whole team asset indistinguishable from other code. We ended up this year finalising the first customer version and to add 275 integrated tests and 991 isolated tests to best support our work - and get them to run green in blocking pipelines. The release process throughput times went from 7 hours from last commit to release to 19 minutes from last commit to release, and the fixing stage went from 27 days to 4 days.

I became the owner of testing process for a lot of projects, and equally frustrated on the idea of so many owners of slices that coordination work is more than the value work.

I volunteered with Tivia (ICT in Finland) as a board member for the whole year, and joined Selenium Open Source Project Leadership Committee early autumn. Formal community roles are a stretch, definitely.

I got my fair share of positive reinforcement on doing good things for individuals salaries and career progression, learning testing and organising software development in smart ways some might consider modern agile.

I spoke at conferences and delivered trainings, total 39 sessions out of which 5 were keynotes. I added a new country on my list of appearances, totalling now at 28 countries I have done talks in. I showed up at 7 podcasts as guest.

I thought I did not write much into my blog, and yet this post is 42nd one of this year on this blog, and I have one #TalksTurnedArticles on dev.to and one article in IT Insider. My blog has now 840 222 hits, which is 56 655 more than a year ago.

I celebrated my Silver Jubilee (25 years of ICT career) and started out with a group mentoring experiment of #TestingDozen.

I spent monthly time reflecting and benchmarking with Joep Schuurkes, did regular on-demand reflections with Irja Strauss, ensemble programmed tests regularly with Alex Schladebeck and Elizabeth Zagroba and met so many serendipitous acquaintances from social media that I can't even count it.

I said goodbye to twitter, started public note taking at mastodon and irregularly regular posting on LinkedIn. I am there (here) to learn together.

Wednesday, December 28, 2022

A No Jira Experiment

If there is something I look back to from this year, it is doing changes that are impossible. I've been going against the dominant structures, making small changes and enabling a team where continuous change, always for the better is taking root.

Saying it has taken root would be overpromising. Because the journey is only in the beginning. But we have done something good. We have moved to more frequent releases. We have established programmatic tests (over test automation), and we have a nice feedback cycle that captures insights of things we miss into those tests. The journey has not been easy.

When I wanted to drop scrum as the dominant agile framework of how management around us wanted things planned early this year, the conversations took some weeks up until a point of inviting trust. I made sure we were worthy of that trust, collecting metrics and helping the team hit targets. I spent time making sure the progress was visible, and that there was progress.

Yet, I felt constrained.

In early October this year, I scheduled a meeting to drive through yet another change I knew was right for the team. I made notes of what was said.

This would result in:

uncontrolled, uncoordinated work
slower progress, confusion with priorities and requirements
business case and business value forgotten

Again inviting that trust, I got to do something unusual for the context at hand: I got to stop using Jira for planning and tracking of work. The first temporary yes was four weeks, and those were four of our clearest, most value delivering weeks. What started off as temporary, turned to *for now*.

Calling it No Jira Experiment is a bit of an overpromise. Our product owner uses Jira on epic level tickets and creates this light illusion of visibility to our work in that. Unlike before, now with just few things he is moving around, the statuses are correctly represented. The epics have a documented acceptance criteria by the time they are accepted. Documentation is the output, not the input.

While there is no tickets of details, the high level is better.

At the same time, we have just completed our most complicated everyone involved feature. We've created light lists of unfinished work, and are just about to delete all the evidence of the 50+ things we needed to list. Because our future is better when we see the documentation we wanted to leave behind, not the documentation that presents what emerged while the work was being discovered. The commit messages were more meaningful representing the work now done, not the work planned some weeks before.

It was not uncontrolled or uncoordinated. It was faster, with clarity of priorities and requirements. And we did not forget business case or business value.

It is far from perfect, or even good enough for longer term, we have a lot to work on. But it is so much better than following a ticket-centric process, faking that we know the shape of the work forcing an ineffective shape because that seems to be expected and asked for.

Tuesday, December 6, 2022

There is such a thing as testing that is not exploratory

The team had been on a core practice of clarifying with tests for a while, and they invited an outsider to join their usual meeting routine.

Looking around, there were 8 people in a call. The one in charge of inviting the meeting shared their screen, for what was about to be their routine of test design sessions. He copied the user story they had been assigned to work on into the Jira ticket he had open, and called the group for ideas of tests.

People started mentioning things that could be tried, and the one in charge wrote things down as the rest of the group watched. For a high level idea, he called for the detailed steps. Every idea that came about was written down and detailed.

After the meeting, the group would split the work to automate it all.

And two weeks later, they had all these tests passing, and a feature that just did not work.

The magic glue they miss is what I call exploratory testing. The sense of caring for results in testing by focusing on learning and recognising that the time most people create test cases like above, it is the time when they know the least.

You can add exploratory testing on top of this.

You can lighten up the details you want to write to start with to leave more room for exploratory testing, where output of your testing is the documentation.

You can seek to optimize for learning, being aware of the time used.

The team that follows the planning pattern did not do exploratory testing. You could argue the next team using their results did exploratory testing through trying to use it, to tell them it is failing.

Working with teams like this is real. It is embarrassing but it is real. And we don't change it by playing with words, but by making the results we expect clear and permissive to excellence.

This story, unfortunately, was brought to you by the "tell no stories that did not happen" and "only stories from last two years allowed" rules. This is real. This is what people still make of testing, and some folks meekly follow the agreed practice.

Monday, December 5, 2022

Jokes and Poetry from a Stochastic Parrot

Tell a joke about a programmer and a tester attempting to enter Heaven and having to confess their sins.

Two programmers were on their way to Heaven and they came to a fork in the road. One path led to Heaven, the other to Hell. The programmer who had been a tester in life said, "I'm not taking any chances. I'm going to test both paths to see which one is safe." The programmer who had been a developer in life said, "You fool! We don't have time for that. We need to take one path and trust that it leads to Heaven." The tester replied, "Trust, but verify."

Another!

A programmer and a tester were standing at the Gates of Heaven, waiting to confess their sins. The programmer went first and said, "I wrote code that I knew was buggy and released it anyway." The tester said, "I found the bugs in that code and didn't report them." The Gates of Heaven opened, and the programmer was welcomed inside. The tester was shocked and asked, "Why was he allowed in and not me?" The gatekeeper replied, "We already have plenty of bugs in Heaven. What we need are more testers."

More.

A programmer and a tester were arguing about whether it was better to be a programmer or a tester. The programmer said, "Programmers are the ones who create the software that makes the world go round. Without us, there would be no progress or innovation." The tester replied, "That may be true, but without testers, all of your buggy code would never be caught and fixed. Testers are the ones who ensure that the software works properly and is of high quality." The programmer thought for a moment and then said, "I suppose you have a point. I guess it's a good thing that programmers and testers need each other to make the world a better place."

Make it more funny.

A programmer and a tester were on their way to Heaven, and they came across a sign that said "Heaven: 10,000 miles." The programmer looked at the sign and said, "I don't believe it. They've made a mistake. Heaven can't be that far away." The tester said, "Of course it is. They've measured the distance with a ruler, and it's accurate to within plus or minus an inch." The programmer rolled his eyes and said, "That's why I always use double-precision floating-point numbers. They're much more precise." The tester replied, "Yes, but they can't measure the distance to Heaven. Only faith can do that."

Ok, just one more.

A programmer and a tester were standing in line at the Gates of Heaven, waiting to confess their sins. The programmer went first and said, "I wrote code that was full of bugs and never tested it." The tester said, "I tested the code and found the bugs, but I never reported them." The Gates of Heaven opened, and the programmer was allowed inside. The tester was shocked and asked, "Why was he allowed in and not me?" The gatekeeper replied, "We have plenty of bugs in Heaven. What we need are more testers to help us find and fix them."

You already told that, I want a new one.

I'm sorry, I don't have access to new information at this time. I am a large language model trained by OpenAI, and my knowledge is limited to what I was trained on. I don't have the ability to browse the internet or access new information, so I can only provide responses based on what I already know. Is there something else I can help you with?

One more joke about testers and programmers, please.

A programmer and a tester were at a restaurant, and the waiter brought them a menu. The programmer looked at the menu and said, "I don't see any bugs on this menu. This must be a high-quality restaurant." The tester replied, "That's what you think. I've already found three bugs on the menu: the prices are incorrect, the descriptions are misleading, and the options are limited. I'm not sure I want to eat here." The programmer laughed and said, "I guess that's why we need both programmers and testers. Without us, the world would be full of bugs and errors."

Give me a poem about exploratory testing.

Thank you, OpenAI ChatGPT. The tools we use are changing the world in ways we don't yet understand. Whose jokes and poems are these, or are these indeed "AI originals"?

On more serious note, you should see what it does with advent of code day 5 on day 5. A separate leaderboard for AI and humans does not quite seem like the solution.