A Seasoned Tester's Crystal Ball

Thursday, June 27, 2024

Well, did AI improve writing test plans?

Over 27 years of testing, I've read my fair share of test plans, and contributed quite a chunk. I've created templates to support people in writing it slightly better, and I've provided examples. And I have confessed that since it's the performance of planning not the writing of the plan that matters, some of the formats that support me in going my best work in planning like the one-page-master-test-plan have created some of the worst plans I have seen from others.

Test plans aren't written, they are socialized. Part of socializing is owning the plan, and another part is bringing everyone along for the ride while we live by the plan.

Some people need a list of risk we are concerned for. I love the form of planning where we list risks, and map that to testing activities to mitigate those risks. And I love making documents so short that people may actually read them.

In these few years of LLMs being around, obviously I have had to both try generating test plans and watch others try to ask me to review plans where LLM has been more of less helpful.

Overall, we are bad at writing plans. We are bad at making plans. We are really bad at updating plans when things change. And they always change.

I wanted to make a note on did AI change it already?

People who were not great at writing plans before are not great in a different way. When confronted with feedback, they now have the Pinocchio to blame for it. Someone else did it - the LLM. If the average of the world does this, how can I make the point of not being happy with it. And I can be less happy with the responsibility avoidance.
People who need to write plans that were not really even needed except for the process are now more efficient in producing the documentation. If they did not know to track TFIRPUSS (quality perspectives) in their plans, at least they are not missing that when they ask the tool. The difference still comes from the performance of continuous planning and organizing for the actions, rather than the act of writing the plan.
Detailed test ideas for those with least ideas are already better in per-feature plans. Both those who were not so great are doing slightly better with generated ideas and those who were great become greater because they compete with Pinocchio.

I am worried my future - like my past - is in reviewing subpar plans, but now with externalized responsibility. Maintaining ownership matters. Pinocchio is not a real boy, we're just at the start of the story.

Monday, June 24, 2024

Testing gives me 3x4

You know how you sometimes try to explain testing, and you do some of your better work in explaining it as response something that is clearly not the thing. This happened to me, but it has been long enough since, and what time gives me is integration of ideas. So this post is about showing you the three lists that now help me explain testing.

Speaking of testing to developers, I go for the list of what developer-style test automation, especially done test-driven style, gives me. It gives me a specification that describes my intent as developer. It gives me feedback if I am making progress to my specification. Well, it also gives me a concrete artifact to grow as I am learning, an intersection of specification and feedback. Since I can keep the tests around, their executable nature gives me protection for regression, helping me track my past intent. And when my tests fail, they give me granularity on the exact omission that is causing the fail. At least if I did those well.

The lists helps me recall why would I want to make space for test-driven development, or capturing past intent in unit / api level approval tests of behaviors someone else had intent for, even if I am not using energy to track back to those.

When I look at this list, I think back to a Kevlin Henney classic reminder:

"A majority of production failures (77%) can be reproduced by a unit test."

The quote reminds me of two things:

While majority can be reproduced, we get to these numbers in hindsight looking at the escapes we did not find with unit tests before production.
The need of hindsight and the remaining significant portion indicate there is more.

What we're missing is usually framed not in feedback, but in discovery. I tend to call that discovery exploratory testing, some others tend to call it learning. If I wanted to have a list of what this style of testing, one that frames testing as performance with the product and components as external imagination, I again have a list of four things it gives me. It gives me guidance, like a vector of possible directions but one that still requires applying judgment. It gives me understanding, seeing how things connect and what it brings in as stakeholder experience. User, customer, support, troubleshooting, developer... All relevant stakeholders. It gives me models on what we have and how it's put together, what is easy and what is hard to get right. And it gives me serendipity, lucky accidents that make me exclaim: "I had no idea these things would not work in combination".

I wrote about these two in detail in 2016.

The two lists are still as helpful to me as they were when I created them, but a third list has emerged for me since from my work with contemporary exploratory testing. I find that the ideas of spec-feedback-regression-granularity miss out on things we could do on artifact creation in frame of exploratory testing, and have been cultivating and repeating a third list.

In frame of contemporary exploratory testing, programmatic tests (test automation) gives me more. It gives me documenting the insights as I am learning them, and it gives me that spec-feedback-regression-granularity loop. It's one of the models, one that I can keep around for further benefit. It gives me extending reach, be it testing things in scope I already have automated again and again, or be it taking me to repetition I would not do without programmatic tests for new unique scenarios, or even getting me to a starting place where I can start exploring for new routes, making space for that serendipity. It gives me alerting to attend when a test fails, alerting me that someone forgot to tell me something so that I can go and learn, or that there's a change that breaks something we care for. It keeps my mind free for deeper, wider, more, trusting the tingly line red over a past green. And finally, it gives me guiding to detail. I know so much more about how things are implemented, how elements are loaded, when they emerge, and what are the prerequisites I could easily skip if I didn't have the magnifying element programmatic tests bring in.

All of my tests don't have to stay around in pipelines. Especially when extending reach, I am likely to create programmatic tests that I discard. Ones that collect data for a full day, seeking for a particular trend.

Where I previously explained testing with two lists of four, I now need three lists of four. And like my lists were created as a response, you may choose to create your own lists as response. Whatever helps you - or in this case me - explain that the idea you don't need manual testing at all relies on the idea that creating your automation is framed as a contemporary exploratory testing process. Writing those programmatic tests is still, even in genAI era, an awfully manual process of learning about how to frame our intent.

Friday, June 21, 2024

Memory Lane 2005: ISTQB Foundation Syllabus, Principles of Testing

Some people create great visuals. One of those that I appreciated today was to turn 7 principles of testing into a path around number 7, by Islem Srih. As my eyes were tracking through the image, a sense of familiarity hit me. This is the list of 7 I curated for ISTQB Foundation Syllabus 1st edition in 2005. The one I still joke about holding copyright to, since I never signed off my rights. ISTQB did ask, I did not respond.

Also, it is a list that points to other people's work. Back then, as a researcher, I was collating not reimagining. Things I consider worth my time has changed since, and I would not contribute to ISTQB syllabi anymore.

Taking the trip back memory lane, I had to look at what I started with to build the 7 principles. I recognize editions since have changed labels - thing I called *pesticide paradox* to honor Boris Beizer's work is now "tests wear out". I'm pretty sure I would have called Dijkstra's on the absence-not-presence principle as he is the originator of that idea, and I find it impossible to believe I would have penciled in [Kaner 2011] for something that is very obviously [Kaner 2001]. It is incorrect in the latest published syllabus. Back then I knew who said what, and had nothing to say myself as I imagined it was all clear already. Little did I know...

The 7 principles that made the cut were ones where we did not have much of arguments.

I have the 2005 originals that I was no longer able to find online easily. I can confirm I did not mess up the references, because I did not put in the references. Now that I look at this, the agreement was not to put in references unless they were recommended reading to complete the syllabus.

I also went back to look at principles I started working from to get to these 7. The older set of principles is from ISEB Foundation.

Comparing the two

ISTQB:

Presence not absence
Exhaustive is impossible
Early testing
Defects cluster
Pesticide paradox
Context-driven
Absence of errors fallacy

ISEB

Exhaustive testing is usually impractical
Testing is risk-based and risk must be used to allocate time
Removal of faults potentially improves reliability
Testing is measurement of quality
Requirements may determine the testing performed
Difficult to know how much is enough

It was clear we did not agree on the latter. While these may be principles that are much better than X normally sees as per one of the comments on the post that inspired me to take this memory lane trip, I am no longer convinced they are truly the principles worth sharing. But they do make rounds. I am also not convinced the additions and explanations improved them, at least on their generalized nature.

In hindsight, I can see what I did there. Grounded the syllabus a bit more on things in research. Some of which I don't think we've properly done to this day (early testing for example - jury is still out; clustering of defects feels more like a folklore than principle; pesticide paradox more of an encouragement to explore than tests actually becoming ineffective).

Perhaps revisiting what is truly true would now be good. Now that I no longer think my greatest contribution to testing is to know what everyone else says and not say anything myself.

Two Years with Selenium Project Leadership Committee

Today I am learning that I am not great - even uncomfortable - using voice of an organization. And for two years, I have held a small part of voice of an organization, being a member of the Selenium Project Leadership Committee. I have been holding space and facilitating volunteer organizations for over three decades, and just from my choice of words you note that I don't say I have been leading or running or managing even those probably would be fair words for other people to assign watching the work I do.

Volunteer organizations are special. It is far from obvious that they stay around and active for two decades, like Selenium project has. It is far from obvious they survive the changes of leadership, but for an organization to outlive people's attention span, it is necessary. It is also far from obvious that they manage to navigate the complex landscape of individual and corporate users, their hopes and expectations, other collaborators on solving similar problems, and create just enough but not too much governance to have that voice of an organization.

If I was comfortable using voice of an organization, I would not be nervous speaking in Selenium Conference 2024 today with Diego Molina and Puja Jagani on 'State of Union' keynote. Similarly, I might find it in me to write a post in the official Selenium Blog since I even have admin accesses to Selenium repos on GitHub. But it's just so much easier to avoid all the collaboration and write in my own voice and avoid the voice of an organization that I always hold in then back of my mind - or my heart.

In 2018 Selenium Conference India I was invited to keynote, and I chose to speak about Intersection of Automation and Exploratory Testing. Back then I did not know I would end up combining the paths under the flag of Contemporary Exploratory Testing and I most definitely did not know that the organizers inviting me to keynote would bring me to an intersection where my path aligned with the Selenium Project.

In 2018 I keynoted. A little later, I volunteered with program committee reviewing proposals, again invited by the people already in. And in 2022, I ended up with Selenium Project Leadership Committee (PLC), again on an invite. 20 years of Selenium now in 2024 marks 2 years in PLC for me.

Today I am wondering what do I have to show for it. And like I always do when I start that line of thinking, I write it down. With my voice.

Making sense to what PLC (project leadership committee) is in relation to TLC (technical leadership committee) wasn't an easy one to figure out. There's the code, packaging that to releases and the related documentation and messaging, and all of that is lead by TLC. If there is no software, there is no software project. Software that does not change is dead, and in two years I have reinforced the understanding that Selenium is not dead, or dying. It's alive, well, and taking next steps in fairly complex world of cross-browser standards-based agreements to then implement something everyone, not just the project's immediate web driver implementations use. Selenium project is the pioneer and collaborator in building up the modern web.

The first thing I have to show for two years of Selenium PLC is to understand that what Selenium is. That is many things. It is multiple components to this entire ecosystem. It's working for standards to unite browsers. It's community of people who find this interesting enough to do this as a hobby, volunteering, unpaid. It's where we grow and learn together. It's taking pride in inspiring other solutions in browser automation space whether they are built on top of Selenium or if there's other choices than trying to get real browsers continue on their path of differentiation but still allow for automating real browsers with a unified web driver library.

What have I been doing in the project then, other than making sense of it?

Realizing there is a lot of emails people want to send to "Selenium". There's a lot of requests on privacy and security assessments that organizations using Selenium think they can just ask "the company" - where there is no company. There's a lot of proposals for collaboration, especially in marketing. And there's donations / sponsoring - because even a volunteer project needs money to pay for its tools and services.
Fighting against becoming the "project assistant". Getting people to get together, someone needs to schedule it. Calendars as self-service aren't a thing everywhere in the busy tech world. After fighting enough personal demons, I did end up putting up a calendar invite and taking up the habit of posting on slack after every conversation to share what the conversation was about. Seems that bit is something I have enough routine to run with.
Picking and choosing what/how I want to contribute: I wanted to move from face to face conferences and online broadcasts to online conversations, and set up Selenium Open Space Conference for it. I loved the versatility of sessions, and particularly the one person showing up with his own open source project he would walk different people through in various sessions. I also loved learning about how Pallavi Sharma ended up publishing three books on Selenium, and what goes on in the background of authoring books.
Picking and choosing what/how I want to contribute: setting up micro sponsorship model and starting steps towards full financial transparency. I will need other two years to complete the full financial transparency, but that is something I believe in heavily. I want it to be normal to say that Selenium project holds $500k on Software Freedom Conservancy. You can dig out that info from their annual reports. But I also want to say that in addition to what is held there, it would all be visible on Open Collective. We have raised $1,684.49 since we started off with the Selenium community money, and there's many steps to take to move host of that money to be Selenium/BrowserAutomation Inc with every transaction transparently visible.
Fiscal hosting ended up being something I specialize in. We have five people in PLC, where two of them do double duty with TLC, and every one of us holds paying job with relevant levels of responsibility in different companies. Within the two years we had also two more people in PLC, Bill McGee and Corina Pip, brilliant folks we now hold space to become free from other engagements to rejoin our efforts, after some fiscal hosting / governance stuff is better sorted. Knowing where you are is start of sorting it out.
Setting up Selenium in Mastodon. We could really use someone taking care of Selenium in Social Media overall, but I did create a placeholder for content even if my content-producing with voice of organization is sporadic at best.
Introducing a new class of sponsors, where companies sponsor with allocating people to contribute significant work time to project and be recognized for it. Not like I did this alone, but working in a user organization rather than vendor organization, I just happen to be able to hold space for something like this by positioning.
Dealing with people. Some of that is overwhelming, but at the same time necessary. People have ideas, and sometimes the ideas need aligning. Mostly I listen but I would also step in to mediate. And have been known to do so.

If there is a future I have an impact on for Selenium, it will be:

Diverse Community Centric and focused on Collaboration.
Fiscally transparent and easy to access for taking stuff forward.
Worthwhile for users and contributors - individual and corporate.

Some days - most days - I feel completely insufficient with the amount of time I can volunteer. But it adds up. And it will add up more. The scale in which it matters never ceases to amaze me. 2.5M unique monthly users. Real polyglot support with bindings for python, java, .NET, javascript, ruby, php... 16M Python downloads. 98.5M Java downloads.

It takes a village, and I am showing up with you all, inviting you along.

Thursday, June 20, 2024

Good Enough Quality Is Taking a Dip

A colleague in the community was reporting on her experience teaching tech for a group of kids, and expressing how tech makes it hard to love tech sometimes, and her post pushed me to think about it. This is the experience a lot of us have.

We want to show a newbie group on how to run with a lovely new tool, only to realize that to run the new tool, we first have to go round whatever troubles come our way. Some of the troubles come from the differences of our environments. Some from the fact that no one updates and maintains some computers. Some from the choices of features (ads in particular) that get in the way. It's not enough if the thing we're building works, but the operating system underneath and even antivirus running in the background are equally part of the users' experience.

Those of us teaching in tech have found workarounds for environmental differences, my favorite being to work on either my computer through ensemble programming, or if I want to teach newbies for real, working on their computers and making time to solve real life problems they will face for things being different.

Quality, while good enough, is currently not great. And it has not only been getting better in the recent years, even if some people speak of AI in terms of it being today the worst it will ever be.

Looking at AI-enabled products, very few of them are really making things better.

I was purchasing a lot of books during my vacation time from Amazon, and seeing the AI-generated summary posts on top of reviews wasn't helping me with my selections. I went for #booktok as AI renders the first recommendation pretty much useless to me - I don't want summary, I want to find someone like me.

Trying out new AI-powered search from Google wasn't producing better results, and I am pretty sure they too know that the AI-powered one is doing worse than the classic one.

Apple just announced introducing - finally - some AI-powered features with chatGPT integration, looking like a placeholder to grow worthwhile features in, but other than that giving appearance of little usefulness to the product.

Everyone seeks uses of AI. Experiments with AI. I do too. I would still report than in the last two years of actively having used AI-tools, it has not added to my productivity, but it has added sometimes to my quality (I'm really good at not being happy with AI's results) and it has added to my fun and enjoyment.

Current uses of AI are driving down experience we have on good enough quality. To include a placeholder for AI to grow - assuming it is only getting better - we make quite significant compromises in the value our products and services provide. The goal for AI-placeholders is not to make things better today. It is to make space to make things better soon, and meanwhile we may take a dip in the experience of relevance for users.

For years, I have come to note that truly seeking good quality is not common. Our words to explain the level we are targeting or achieving are hidden behind the fuzzy term "quality". Even bad quality is quality. Making things fun and entertaining is quality.

Our real target may be usefulness and productivity. Being able to do things with tech we weren't able to do before, and being able to do things we really need to do. While we find our way there with GenAI, we're going to experience a dip on good enough quality to allow for large scale experimentation on us.

Thursday, June 6, 2024

GenAI Pair Testing

This week, I got to revisit my talk on Social Software Testing Approaches. The storyline is pretty much this:

I was feeling lonely as the only tester amongst 20 developers at a job I had 10 years ago.
I had a team of developers who tested but could only produce results expected from testing if they looked at my face sitting next to them, even if I said nothing. I learned about holding space.
I wanted to learn to work better on the same tasks. Ensemble programming became my gateway experience to extensive pair testing beyond people I vibe with, and learning to do strong style pairing was transformative.
People are intimidating, you feel vulnerable, but all of the learning that comes out is worth so much.

As I was going through the preparations for the talk, I realized something has essentially changed since I delivered that talk last time. We've been given an excuse to talk to a faceless, anonymous robot in form of generative AI chatbots. The success I had described as essential to strong-style pairing (expressing intent!) was now the key differentiator on who good more out of the tooling that was widely available.

I created a single slide to start the conversation on something that we had been doing: pairing with the application, sometimes finding it hard to not humanize a thing that is a stochastic parrot, even if a continuously improving predictive text generator. Realizing that when pairing with genAI, I was always the navigator (brains) of the pair, and the tool was the driver (hands). Both of us would need to attend to our responsibilities but the principle "an idea from my head must go through someone else's hands" is a helpful framing.

I made few notes on things I at least would need to remember to tell people about this particular style of pairing.

External imagination. Like with testing and applications, we do better when we look at the application while we think about what more we would need to do with the application, genAI acts as external imagination. We are better at criticizing something when we see it. With a testing mindset, we expect it to be average, less than perfect, and our job is to help it elevate. We are not seeking boring long clear sentences, we are seeking the message core. We search boundaries, arguing with the tool on different stances and assumptions. We recognize insufficiency and fix it with feedback, because average of the existing base of texts is not *our* goal. We feel free to criticize as it's not a person with feelings, taking possibly offense when we are delivering the message on the baby being ugly. We dare to ask things, in ways we wouldn't dare to ask from a colleague. We're comfortable wasting our own time, but uncomfortable taking up space of others days.

Confidentiality.We need to remember that it does not hold our secrets. Anything and everything we ask is telemetry, and we are not in control of what conclusions someone else will draw on our inputs. When we have things we would need to say or share that we can't share outside our immediate circle, we really can't post all that for these genAI pairs. But for things that aren't confidential, it listens to more than your colleague in making its summaries and conclusions. And there is an option of not using the cloud-based services but hosting a model of your own from Hugging Face, where whatever you say never leaves your own environment. Just be aware.

Ethical compensations. Using tools like this changes things. The code generation models being trained on open source code change the essential baseline of attribution that enables many people to find things that allow them the space of contributing to open source. These tools strip names of people and the sense of community around the solutions. I strongly believe we need to compensate. At my previous place we made three agreements on compensation: using our work time on open source; using our company money to support open source; and contributing to body of research not the change these tools bring about. Another ethical dilemma to compensate for is energy consumption of training models - we need to reuse not recreate, as one round of training is said to take up energy equivalent of car trip to moon and back. While calling for reuse, I am more inclined to build forward the community-based reuse models such as Hugging Face over centralizing information to large commercial bodies with service conditions giving promises what they will do with out data. And being part of underrepresented group in tech, there's most definitely compensations needed on the bias embedded in the data to create world we could be happy with.

Intent and productivity. Social software testing approaches have given me ample experiences of using tools like this with other people, and seeing them in action with genAI pairing. People who pair well with people, pair better with the tools. People who express intent in test-driven development well and clearly, get better code generated from these tools. The world may talk of prompt engineering but it seems to be expressing intent. Another note is on the reality of looking at productivity enhancements. People insist they are more productive, but a lot of the deeper investigations show that there is creative bookkeeping involved. Like fixing a bug takes 10 minutes AFTER you know exactly which line to change, and your pipelines just work without attending. You just happen to use a day in finding out the line to change, and another on care of the pipeline on whatever random happened on it being on your watch.

These tools don't help me write test cases, write test automation and write bug reports. They help me write something of my intent and choosing which is rarely test case or automation, they help me summarize, understand, recall from notes, ideate actions and oracles, scope to today / tomorrow - this environment / the other - the dev / the tester, prioritize and filter, and inspire. They help me more when I think of testing activities not as planning - test case design - execution - reporting, but as intake - survey - setup - analysis - closure. Writing is rarely the problem. Avoiding writing and discovering information that exists or doesn't, that is more of a problem. Things written are a liability, someone needs to read them for writing them to be useful.

Testing has always been about sampling, and it has always been risk-based. When we generate to support our microdecisions, we sample to control the quality of outputs. And when things are too important to be left for non-experts, we learn to refer them to experts. If and when we get observability in place, we can explore patterns after the fact, to correct and adjust.

I still worry about the long term impacts of not having to think, but I believe criticizing and thinking is now more important than ever. The natural tendency of agreeing and going with the flow has existed before too.

And yes, I too would prefer AI to take care of housekeeping and laundry, and let me do the arts, crafts and creative writing. Meanwhile, artificial intelligence is assisting intelligence. We could use some assisting day to day.

Friday, May 31, 2024

Taking Testing Courses with Login

When we talk about testing (programming) courses we've taken, I notice a longing feeling in me when I ask about the other's most recent experience. They usually speak of the tool ("Playwright!", "Robot Framework with Selenium and Requests Library!", "Selenium in Java!") whereas where I keep hoping they would talk about is what got tested and how well. From that feeling of mine, a question forms:

What did the course use as target of testing in teaching you?

For a while now, I have centered my courses around targets of testing, and I have quite a collection. I feel what you learn depends on the target you test. And all too many courses leave me unsatisfied with the students with their certificates of completion, since what they really teach is operating a tool, not testing. Even for operating a tool, the target of testing determines the lessons you will be forced to focus on.

An overused example I find is a login page. Overused, yet undereducated.

In its simplest form, it is this idea of two text fields and a button. Username, password, login. Some courses make it simple and have lovely IDs for each of the fields. Some courses start of making locators on the login page complicated so clicking them takes a bit of puzzle solving. In the end, you manage to create test automation for successful and unsuccessful login, and enjoy the power of programming at your fingertips - now you can try *all the combinations* you can think of, and write them down once into a list.

I've watched hundreds of programmed testing newbies with shine in their eyes having done this for their first time. It's great, but it is an illustration of the tool, it's not what I would expect you to do when hired to do "testing".

Sometimes they don't come in the simplest form. On a testing course targets, the added stuff scream education. Like this one.

From a general experience of having seen too many logins, here's things I don't expect to see in a login and it's missing things that I might expect to see if a login flow gets embellished for real reasons. If you're take on automating something like this is that you can automate it, not that it has stuff that never should be there in the first place, you are not the tester I am looking for.

Let's elaborate the bugs - or things that should make you #curious like Elizabeth Zagroba taught on her talk at NewCrafts just recently. You should be curious on:

Why is there a radio button to log in as admin vs. user, and why is Admin the default? There are some but very few cases where the user would have to know and asking that in a login form like this is unusual at best, but also only the minority users who are both would naturally have a selection like this. For things where I could stretch my imagination to see this as useful, the default would be User. The judgmental me says this is there to illustrate how to programmatically select a text box.
Why is there dropdown menu? Is that like a role? While I incline to think this too is there to illustrate how to programmatically select from list I also defer my judgement to the moment of login in. Maybe this is relevant. Well, was not. This is either half of an aspired implementation or there for demo purposes. And it's missing label explaining it, unlike the other fields.
Why is there terms and conditions to tick? I can already feel the weight of the mild annoyance of having to tick this every single time, with changing conditions hidden in there, and you promising your first borne child is yet another Wednesday some week. The judgmental me says this is here to show functional problem of not requiring ticking it when testing. And the judgmental me is not wrong, login works just fine without what appears to be compulsory acceptance of terms, this time with default off to communicate higher level of committing when I log in.

The second level judgement I pass upon people through this is that testers end up overvaluing being able to click when they should focus on needing to click and waste everyone's time with that and this is a trap. I could use this to rule out testers except overcoming this level of shallowness can be taught in such a short time that we shouldn't gatekeeper on this level of detail.

I don't want to have the conversation of not automating this either. Of course we automate this. In the time I am writing this, I could already have written a parametrized test with username and password as input that then clicks the button. However, I'd most likely not care to write that piece of code.

Login in a concept of having authentication and authorization to do stuff. Login is not interesting in its own right, it is interesting as a way of knowing I have or don't have access to stuff. Think about that for a moment. If your login page redirects you to an application like this one did, is login successful? I can only hope it was not on the course I did not take but got inspired on to write this.

I filled in the info, and got redirected on a e-store application. However, application URL and another browser, I get to use the very same application without logging in. I let our a deep sigh, worried for the outcome of the course for the students.

Truth be told, before I got to check this I already noted the complete absence of logout functionality. That too hinted that the login may be an app of its own for testing purposes only. Well, it does illustrate combinations you can so easily cover with programmatic tests. What a waste.

What work in projects around login looks like, really? We can hope it looks like taking something like Keycloak (an open source solution in this space), styling a login page to look like your application, avoiding the thousands of ways you can do login wrong. You'll still face some challenges but successful and failing login aren't the level you're expected to work on.

What you would work on with most of your programmatic testing is the idea that nothing in the application should work if you aren't authorized. You would be more likely to automate login by calling an API endpoint giving you a token you'd carry through the rest of your tests on the actual application functionality. You'd hide your login and roles into fixtures and setups, rather than create login tests.

The earlier post I linked above is based on a whole talk I did some years back on the things that were broken in our login beyond login.

Learn to click programmatically, by all means. You will need it. But don't think that what you were taught on that course was how to test login. Even if they said they did, they did not. I don't know about this particular one, but I have sampled enough to know the level of oversimplification students walk away with. And it leads me to thinking we really really would need to do better in educating testers.