Thursday, July 11, 2024

Why is it bad practice to write your programmatic tests like your manual tests?

There must a lot of articles on why it is a bad practice to write your programmatic tests like your manual tests but finding one you would recommend in the moment just did not happen when the question emerged today. So here's one reference to the basic stuff we keep having a conversation on, with less finesse than the entire conversation around the internets has. 

So why not this - you got your manual test case: 

And then you transform it to your programmatic test case: 


Here's the usual recommendations:
  • Structure beyond test cases. We would generally expect to see structure in code beyond just test cases. 
    • You should use variables. That would move your URL, your selectors and maybe even your inputs to variables and they would be called from the script. 
    • You should capture actions on a higher level of abstraction. That would mean that you'd have method for login with all these steps, and your test case would call login(username, password) and would be probably called test_login_with_valid_credentials()
    • Your variables and actions would probably be in a separate file from your test case, a page object organized to model what you see in that web page or part of it that you're modeling. When things change, you'd change the page object rather than each test case. 
    • Your tests probably have parts that should happen before the test and after the test. Some languages call this setup and teardown, some put these in fixtures. You are supposed to design fixtures so that they make your tests clear and independent. 
    • You will have things you do on many pages, and shared ways of doing those on all pages. Find a place for commons in your structure. Like you might always make sure there is no text "error" anywhere on pages while you test. 
    • The usual recommendations for structure within test include patterns like AAA (Arrange - Act - Assert) that suggests you focus first in setting things up you need in your test, then do the thing and then verify the things you want to verify, and avoid mixing the three up in your order of actions. 
  • Separation of data. We would generally expect that data would be separated from the test case and even from the page object. Especially if it is complicated. And while it should be separated, you should still manage to describe your tests so that their name makes sense when test fails without looking at the data. 
    • Parametrize. Name data, input the data, use same test case to become many test cases. 
  • Naming is modeling. Your manual test cases may not be the model you want for your automation. Usually you decompose things differently to avoid repetition in programmatic tests and you give names to parts to model the repetition. 
  • Focus test on an architecture layer. 
    • API-based login over UI-based login. We would generally expect there to be an authentication endpoint that does not require you to click buttons, and you'd do login for all your non-login tests requiring logged in user with an API. When things are easier to do without UI, we'd generally expect you to design steps that are different from your manual test cases that expect you to operate things like a user. 
    • Mocking API calls to control UI. We would generally expect you to test very little with the whole integrated end to end system over a longer period of time. We would expect that instead of using real APIs, we would expect you to test the UI layer with fake data that you can control to see texts. Like if you had a weather app that knows night and day, you would want to be able to fake night or nightless day without doing all the setup we do for stuff like that when we test manually. 
  • Imperative vs. declarative. You should probably learn the difference and know that people feel strongly about both. 
There's a lot people say about this, but maybe this will get you started. Even if I did not take time to show you with examples how each of these changes the manual and programmatic tests we started from. 


Testing is amplifying fail signals

There's a phrase I picked up though Joep Schuurkes that points to a book by Gene Kim et al that I did not yet read, but did listen to a podcast episode on a podcast all this made me discover and appreciate wider

Amplifying fail signals.

Combine this with sampling a talk on YouTube where the speaker made the point of not being anti-AI and then proceeding to spend 45 minutes on amplifying fail signals with examples of problems of what AI can't do, characterizing it humanly with negatives only and landing back to something this crowd is known for, two key questions separating what a *tester* and *manager*. I have not liked this cut choice of centering testers as information providers instead of active agents part of making those decisions before, and I did not like it now with AI. 

Sometimes we amplify fail signals so much that no other signals are left. 

* I am well aware that amplifying fail signals is intended to be for signals that are hard to pick up. Like the fact that when you ask a question leading with "hi guys", not getting a response from me does not mean I would not have the answer or be unwilling to share with you, just that you make it extra effort to get by to doing the work excluding me. There are patterns we don't recognize as failure that require special amplification to be successful. 

The whole agency in decisions, it would be hard to live in Finland with cases like Vastaamo (psychotherapy services security breech with very private information leaked) and not realize that when we fight in courts, the conversation is on responsibilities. Should the IT specialists have 1) created a paper trail that shows the company did not allocate resources to fix the problems 2) fixed the problems if they truly were on level of not changing default passwords and 3) considered the work assigned to them specifically is where their responsibility lies. Is this kind of a thing responsibility of the management, or does the expert play a role in the responsibilities too? Whether court assigned on not, I like to act as if I had responsibilities on things I am aware of, and I can choose to change the password away from default even when I can't get the hiring budget for someone who specializes in it in addition to myself. But things are easy when you're not being blamed for a mistake in court. Which for me is still pretty much always. 

The talk goes back to one question that has been flying around long enough that it should not need attribution: is there a problem here? This time around, it drops a second question: are you ok with all of this? 

With edits of emphasis on the latter question, it helps me frame things like I want to. Even if we (over)amplified fail signals and asked managers: are you ok with all of this?, we really need to ask ourselves in democratizing data and use of AI: are you ok with all of this?

I can talk as much as the next person on the threats but I tend to park them as a list to make choices on using my time elsewhere. I have had to decide on a personal level what to do with the things that I am not ok with, and decide that the really useful style of amplifying fail signals is to fix some of the fails resulting in different signals, compensate for others, and put a foot down ethically when it really is something I am not ok with, and start movements. 

The listing of threats I work with currently stands with this:
  • Mostly right is not enough when stakes are high so we may want to keep good humans in the loop.
  • Keeping what is secret, secret so we understand that use of AI is sharing information we need to be more intentional on
  • Cultural filtering with encoded cultural norms so that someone else's standards don't lose someone else's history and culture
  • Plagiarism at scale so that we remember what this is ethically even if not legally
  • Move against green code so that we start considering perspectives beyond I can do it
  • Institutionalized bias so that we understand how bias shows up in data
  • Personal bias so that we understand reasons our inputs make us like results better
  • Humanization of technology so that we remember it is only technology
  • Skills atrophy so that we pay attention to necessary social changes
  • Labor protections so that we are not creating conditions of world we don't want to live in
  • AI washing so that we understand value within the buzzwords
  • Energy to refute misinformation so that we pitch in to learning and educating
  • Accidental and Intentional data poisoning so that we remember our actions today impact our future experiences and not everyone is out for good
  • Easy over right solutions to testing problems so that we have the necessary conversations we've been mulling over that allow us to do something smart
So am I ok with all of this? Absolutely not. And because I am not, I will work on figuring out how to use it, not just as technical tool but in a social context, with guardrails and agreements in place. Since my time is limited, I choose to monitor the thread landscape, yet focus my actions on designing guardrails within testing to get the value we are out for in this all. 

We can try characterizing technology humanly from fail perspective, but that just leads us to seeing how many negatives we can apply on anything if we so choose. 

 

Tuesday, July 9, 2024

AI is automation at scale of benefits

For the longest time, I have spent a small part of my attention on explaining to people that framing test automation as something that solves the risk of regression problem only is such a waste. You create some of your worst test automation by taking the scripted manual tests and automating those one by one. And it is not a problem to test the "same things" manually if you aren't finding problems with that manual thing after that you continuously need to retrace back on being the problems we already have accepted. 

I reframed test automation to something that is

  1. Documenting
  2. Extending reach
  3. Alerting to attend
  4. Guiding to detail
That was already before this whole age of AI started. And now, looking at the age of AI, we are struggling with similar issues of framing. We want to overfit tool to a task and are being overprotective about our idea of classification. 

At a previous project, I had an excellent example of model-based testing. Instead of writing scripts, we had a model of flows driving functionalities modeled into a page object. Not AI. But so much more intelligent and significantly better approximation of acting humanly to include in our ways of testing than the usual scripts alone. 

Many people qualify a few categories of solutions as AI, making the point that it is a marketing fad that people in the industry would frame automation at scale as AI. In a narrower definition, we would look for technologies that are new and different: machine learning; large language models; retrieval augmented generation; computer vision and so forth. 

A few years back reading a book "Artificial Intelligence, a Modern Approach", I picked up one of their definitions of AI:
Acting humanly.

Mundane clicking is acting humanly. Mundane clicking by a UI automation script is not acting humanly alone, it removes some of the emergent intent we overlay as humans when we do mundane clicking. Including randomness and fun to increase chances of serendipity are very much human. So from this frame, of AI targeting acting humanly, we are looking at automation at scale of benefits. 

The book lists six areas of acting humanly: 
  1. Natural language processing
  2. Knowledge representation
  3. Automated reasoning
  4. Machine learning
  5. Computer vision and speech recognition
  6. Robotics
Watching tools bring in AI by renaming money as "AI unit", adding RAG-style LLM use as one of the consumables is quite a marketing smoke and screen founded on a type of market where people are likely to be willing to consume paid content to have more targeted responses makes some sense, even if I find it difficult to consider that AI. Creating the most silly monkey to click around in a research project and calling that AI makes some sense, even if I again find it difficult to consider that AI. 

To make progress and sense, I have the idea that we would need to start caring less about protecting a classification and talking about the work we do. 


What would Acting humanly for purposes of testing look like? 

Right now it looks like having a colleague who never tires of your questions and misframings, providing you somewhat average, overconfident answers. 



Then again, I have been a colleague like that on my good days, and a colleague with even more negative traits associated with me on my bad days. Good thing is the social context corrects our individual shortcomings. 

Would it be possible to just think of AI as a way of saying that our target level of automation just scaled for benefits, and then solve the real problems, in collaboration. 

Obviously I too listed the negatives I park: 

There are things I still balance so heavily on the negatives like my support for plagiarism at scale, but instead of my choice being educating on those or hoping my avoidance of use sends the message, I'm trying out compensations and offsetting impacts with other actions. I prioritize learning to apply things more thoughtfully and being at the table when we decide on follow up actions.



Fascination of Scale

A lot of people are worried about future of employment. My little bubble includes people who have been searching for their next employment in IT for a while, and feeling deflated. Similarly, my bubble includes people who gave up on IT, women in particular, and chose to do something else.  

Some days it feels like threats to employment are everywhere but in particular in the effort we use in being threatened over finding ways to direct that energy elsewhere. People in my little bubble have three major sources of existential threat:

  1. Testing is moving from testers to developers
  2. Work is moving from local to global
  3. AI will replace us all, or people using AI will replace people without AI
That's a lot for anyone to think about. So today, I wanted to frame my thoughts in a blog post around three simplified numbers.
  • Number of programmers doubles in 5 years
  • 50% of jobs change, 5% of jobs replaced by automation

There's a lot of 5's there, and a comforting message. Keeping up with the changes keeps around 95% of the jobs, and with the rate of growth, more jobs are being created than is going away.  

Out of curiosity, I wanted to dig in a little deeper. In a game of scale, we can either imagine there is always something worthwhile for us, or we can imagine that finding something worthwhile for us is prohibitively difficult. I tend to believe that we have problems with discovery and matchmaking, and we have problems with supporting change and growth, and part of the reasons for those is focus in individual (entities) over networks. But again, back to digging. I live in a fairly small country, Finland. And understanding anything would start with understanding your own bubble in relation to others. 

Finland has population of 5.6M people. As per random searching of information online, we have 117k people graduating annually, and 130k people currently employed in ICT, with an average salary of 3900 €/month. Another random point of comparison I had to dig out is that it seems usage statistics of Selenium suggest 0.5% use in relation to population for Selenium, regardless of the fact that most of my friends here block all statistics from being sent. With one of me and 5.6M of others close to me, being worried about running out of work has not become a concern. The experience I live with it we continuously want to do more than what we have time for, and a significant productivity improvement helping us solve a few more of those persisting issues would be quite welcome. 

To make sense of scale, I selected a country of many of my colleagues in the industry. The scale in which things work in India makes it easy to understand why my socials are full of friends from that corner of the world. India has a population of 1.4B people. I could not even grasp that as a number without doing a bit of area calculation math, and drawing those areas as circles. I saw claims that 1M people graduate annually, and that there are 2.8M people working in ICT in India. Average salary counts to 6500 € / year. Selenium usage to population is 0.01%. That all is an entirely different scale. 

My search for numbers continued on global. 55.3M people in ICT, with 4% annual growth, 22% over 5 years. So if the claim I picked up some years ago in Agile Technical Conference counts *programmers*, it would appear that programmers are growing faster than the industry at large - which would probably make some sense. ICT-industry segment value $341B would suggest that per person slice of that value for each employed is $6k/year. 



With all the numbers, this brings us back from worrying about being without jobs to figuring out the continuous work we do in changing our jobs. Changing our jobs to share the work and value. To evolve the work and value. To move at pace we can work with. 

There's a change in testing that has been ongoing a while. 



Paying attention not just to "what are my tasks" but to what is the value I'm contributing and evolving that further is a part of the work towards productivity on everyone's plate. 

Wednesday, July 3, 2024

Threads, Sessions and Two Kinds of Tasks

It all started with a call I joined early in a morning. I had been asked to introduce what I do for work for a larger group of people, and the messages started flowing in: 

"The sound is very low". "Something is wrong with your microphone". 

If this was the first time, I would not have known what was going on. Nothing was essentially different. The same equipment that works fine most of these day now would not. And at an inconvenient time. And it was *on me*, *my omission*. I could have rebooted the machine after the Selenium project leadership committee call I had been on, but I forgot, at a most inconvenient time. 

You see, this bug had been bugging me for months. It had bugged me on two different Mac computers, and I had been aware of its general conditions to reproduce for a few weeks now, but I had parked the thread of doing something about, other than working around it with a reboot. I was hoping that parking it long enough, it would go away without me doing anything. But no such luck for last six months. 

With so many things to test, I parked the one that I was paid for on the investigation and reporting work. 

You see, taking forward the work of getting rid of that bug, it's a thread of work. It's multiple steps. For reasons within my understanding, the companies - let alone open source projects - don't see the problems just because we experience them. There's real work, testing work, to turn a random experience like that into a bug report that could even stand a chance of getting fixed. 

I knew the general conditions. The general conditions consisted of a weekly repetitive pattern in my life. When I would go to Software Freedom Conservancy's installation of BigBlueButton and have a lovely call on topics of advancing Selenium project's governance and support, any calls after that on teams or zoom would lose 80% of the volume. 

With the embarrassment of the call fresh on my mind, I made a choice on next steps in the thread of this work. Insisting it is not my work, I used a minute to complain about it on Mastodon and received some advice. Realizing my minute to complain turned into a whole conversation of many minutes and I lost control over my use of time for the curiosity of it, I made a choice on opportunity cost. I could use time on explaining why the advice would not make sense for me. Or I could use time and turn the general conditions to specific conditions, and actually report the bug. I won't believe in the good of the world enough to think it will be fixed with just reporting, but at least I am living a little closer to good choices on opportunity cost. 

You may guess it, the thread of work unfolds. What I think it going to be simply writing down the steps turns into a more of an investigation. The first version of the report I write without submitting mentions I use Safari, and its precise version, and also including details of my environment otherwise. It's specific enough that these instructions give me the reproduction of the problem on two different computers. It's adorned with pictures to show I am not just imagining it and expressing frustration on something I did not even investigate. I did after all, over the last 6 months. 

Looking at the report, I realize saying it happens on Safari, I have questions. What about the other browsers? Is it all browsers, or did I hit a sample of browsers giving a different experience of quality? To my surprise, it does not reproduce on Chrome or Firefox. You guessed it again, the rabbit hole deepens. 

I dash up python + playwright and it's been a few months since I did that. I set up a project and write up a test script to reproduce the problem on webkit that ships with Playwright. My idea is twofold. I would not mind reporting with a script that helps see it. But I am mostly curious on whether this bug reproduces on that webkit browser version that approximates safari but isn't it. And it does not. It does reproduce on selenium though, but I can't exactly ship my machine for the project and their pipeline does not include real browsers on Mac anyway. Well, I assume it don't which may be incorrect. 

I get the report in, and I have a fascinating case of a bug that might be in BigBlueButton (open source project) or Safari (Apple), and there's no way I am going to storm the latter for figuring out ways of reporting there. I cut my losses of time, and revel in my gains of learning. The likelihood of anyone setting up a cross-systems test for a scenario like this is low so it's not exactly a testament for importance of cross-browser automation, but it is a nudge and encouragement to keep with Safari because the others don't. 

And a work day later, I am rewarded with another Safari only bug, this time in our company CRM system where save button is not in sight in Safari. Lucky me. Another thread to park. 

---

Whether I work with the application I am testing or an application I using, I have come to learn that explaining the work is riddled with difficulties. *Testing* sounds so simple. The long story above on testing just one kind of scenario is not simple. It's not that it's hard, it's just a flow of doing something and making decisions, and driving towards an idea of completion of a thread of work,

My story there had two threads: 

  • What I do to live with the bug to increase odds of it going away
  • What I learn from the bug since I am investing time and energy into it
This blog post is part of the latter thread. 

So imagine I had a task in Jira or whatever tool of choice your project puts you on. Your task says to make the bug go away. All these things are subtasks, and some of those you discover as you do things. But you don't need this level of explanation in the tools unless you are in a particular environment. 

Your task is a thread. It's actually multiple tasks. And thread is a way for you to label things that are born inspired from the thing you set out to do in the first place, to talk about the work in a more clear way. And frame it in discovery and learning. 

No matter what focus you put on this task, with the dependencies it has it won't be completed within four hours. I heard someone again suggesting that is the granularity of expected task breakdown in a tool. Quite a documentation investment, and opportunity cost suggests that maybe you could do something else with the time. At least think about it. 

I have been teaching people that there are two kinds of tasks: 
  • expansive - the work continues until it's done
  • time-boxed - the work is done when the time on it is done
Testing can be done in the time-boxed style. It is testing that I test something for 4 hours. It is still testing that I test it for 40 hours. Time-boxed style is the foundation for managing testing with sessions. 

There is a more lightweight way of managing testing than sessions through. That way is managing testing with threads. Tracking threads, their completion, their importance, the need of using budget on them is how I work with most of my exploratory testing. The sessions are charters are not 1:1 but I have a charter for a thread, and I am careful at modeling learning as new threads. 

You may want to try out visualizing the work you have at hand. Sometimes people see what your head holds better with a picture. 


Learning to track, visualize and explain threads, sessions and the two kinds of tasks (expansive and time boxes) has been a bug part of what has helped me get the trust in testing. Too many daily go by with doing 'testing yesterday, and continuing testing today' as the report of work. 

Let's see when I can report that people on Safari won't need a reboot after using BigBlueButton. I changed something already - taking the calls on Chrome now. So we can call it someone else's problem, and the world is full of them. I make them mine for organizations that pay me, it's kind of the work. 

Test Automation - Friend or Foe?

When in 2023 four people including myself got together in a panel to decide if test automation is friend or foe, I chose it was a foe. Panels where we all agree aren't fun, and in search of really understanding something more, we need to find more than the places we agree on. I remembered that one as particular panel, so I went back to rewatch it to the points we were making. The panelists were Ben Dowen, Beth Marshall, Brijesh Deb and myself. I remembered that one as particular panel, so I went back to rewatch it to the points we were making. 

What is test automation? 

  • Code doing something useful for testing 
  • Support for exploring and coded checks in testing
  • Not silver bullet 
  • Way to add efficiency through code 
How to decide when to automate a particular test and when to do manual test instead? 

  • Human led testing could include writing automation 
  • Definition of manual or automation is a myth, it's both 
  • Roadmap to turn manual to automation, not a journey of learning how to do real work for you 
  • Overanalysing is not route to success - try test suite canvas by Ashley Hunsberger 

What if I figure that test case becomes more complicated to automate than I thought originally? 

  • Start with feasibility study 
  • Try doing it and figure out that your skills and knowledge aren't enough? Ask around when you hit a roadblock? Step through and start learning. 
  • Too much studying not enough of real work that stays around, fail to learn a little 
  • Collaboration up front, talk with developers - failing early is always an option 
  • Learning from people around you on the ideas of what to try first is not feasibility study, it's collaboration and whole team work on important problems

What is the biggest challenge to test automation and how to overcome it? 

  • They are about people, culture and communication 
  • Misunderstanding about automation not needing maintenance 
  • Manager or colleagues who say let's automate everything 
  • Getting system into the right state to start the actions has a lot to setup, testability interfaces, knowledge from various teams 
  • Success turning to failure overnight, dualism for the organizations, surviving generations of testers and the move to whole team automation 

Does every tester need to be coder in a future? 

  • More than now, industry doubles and we need different ways of starting 
  • Some testers don't know they are testers, and some of them are paid less than us testers 
  • Future makes every coder a tester, and already more of them are than we give them credit for - code aware and not 100% if anyone's job 
  • Overfocus is exclusionary and stops otherwise fantastic people from starting at this industry - no need to code in binary, not necessary to always code in java 
  • Coding is an entry level job - I want both, and teaching basic programming is simple 
  • Advancing in companies this works well for all kinds of testers, but it's broken in recruitment 
  • We ask for the wrong things in the recruitment, recruiters need to understand testing not just automation
  • Coding in test jobs is wrong kind of tasks, like a bubble sort Is test automation is ultimately a way of cutting costs? 
  • Augments what you're doing - cutting costs in one place (maintenance changes) to have more of it elsewhere (new value) 
  • 50% of jobs changed, only 5% replaced by automation 
  • In isolation it does not save time, cost or improve quality, must be supported by other activities 
  • Operating test automation in a silo away from the product development does not produce greatest results 

How can you balance need for speed and efficiency and maintaining high level of test coverage?

  • Everything is code nowadays, now you see changes somewhere with IaC and I don't need to run a test if there is no change, slow down the time, control the change 
  • Taking control of the world around me as part of automation efforts 
  • If you think test automation is the goal not a tool that adds more effective testing 
  • Testing isn't the goal, it's a goal we use. Even software development is not the goal. It's the value we are creating for customers. 
  • 1250 tests, 20 times a day, learning after having already invested in small scale feedback programmatic tests, keeping us on the right track. 
  • Misselling stream of information as silver bullets of these techniques exists 

If we focus on test automation, are we risking that we don't focus on other things? 

  • Yes, and. Not either or. 
  • Testing plays an important part in improving quality, but it's not the whole picture. 
  • Red-Yellow-Green call. Red lead to punishing testers. 
  • Watermelon tests, testing your tests 
  • Had multiple levels of teams of testers. Effort but needing explainer. 

Is test automation friend or foe? 

  • Friend. High maintenance friend. A needy one. 
  • Friend. Work on a mutually beneficial with boundaries or you will fall out. 
  • Foe. Respectful even if you don't like it. Collaborate with the enemy when stakes are high.
  • Friend. Friend who likes to be pampered. Can turn into a foe.

Thursday, June 27, 2024

Well, did AI improve writing test plans?

Over 27 years of testing, I've read my fair share of test plans, and contributed quite a chunk. I've created templates to support people in writing it slightly better, and I've provided examples. And I have confessed that since it's the performance of planning not the writing of the plan that matters, some of the formats that support me in going my best work in planning like the one-page-master-test-plan have created some of the worst plans I have seen from others. 

Test plans aren't written, they are socialized. Part of socializing is owning the plan, and another part is bringing everyone along for the ride while we live by the plan. 

Some people need a list of risk we are concerned for. I love the form of planning where we list risks, and map that to testing activities to mitigate those risks. And I love making documents so short that people may actually read them. 

In these few years of LLMs being around, obviously I have had to both try generating test plans and watch others try to ask me to review plans where LLM has been more of less helpful.

Overall, we are bad at writing plans. We are bad at making plans. We are really bad at updating plans when things change. And they always change. 

I wanted to make a note on did AI change it already?

  • People who were not great at writing plans before are not great in a different way. When confronted with feedback, they now have the Pinocchio to blame for it. Someone else did it - the LLM. If the average of the world does this, how can I make the point of not being happy with it. And I can be less happy with the responsibility avoidance. 
  • People who need to write plans that were not really even needed except for the process are now more efficient in producing the documentation. If they did not know to track TFIRPUSS (quality perspectives) in their plans, at least they are not missing that when they ask the tool. The difference still comes from the performance of continuous planning and organizing for the actions, rather than the act of writing the plan. 
  • Detailed test ideas for those with least ideas are already better in per-feature plans. Both those who were not so great are doing slightly better with generated ideas and those who were great become greater because they compete with Pinocchio. 
I am worried my future - like my past - is in reviewing subpar plans, but now with externalized responsibility. Maintaining ownership matters. Pinocchio is not a real boy, we're just at the start of the story.