Tuesday, August 1, 2023

A NoJira Health Check

One of my favorite transformation moves these days is around the habits we have gathered around Jira, and I call the move I pull NoJira experiment. We are in one, and have been for a while. What is fascinating though is that there is a lifecycle of it that I am observing, live, and wanted to report some observations on. 

October 5th, 2022 we had a conversation on NoJira experiment. We had scheduled a training on Jira for the whole team a few days later, that with the team deciding on this one then promptly cancelled. The observation leading to the experiment was that I, then this team's tester, knew well what we worked on and what was about to be deployed with merging of pull requests, but the Jira tasks were never up to date. I found myself cleaning them up to reflect status and the number of clicks to move tasks through steps was driving me out of my mind. 

Negotiating a change this major, and against the stream, I booked a conversation with product ownership, manager of the team and director responsible for the product area. I listened to concerns and wrote down what the group thought. 

  • Uncontrolled, uncoordinated work
  • Slower progress, confusion with priorities and requirements
  • Business case value get forgotten

We have not been using Jira since. Not quite a year for this project, but approaching. 

First things got better. Now they are kind of messy and not better. So what changed?

First of all, I stopped being the teams' tester and became the team's manager, allocating another tester. And I learned I have ideas of "common sense" that is not so common without having good conversations to change the well-trained ideas of "just assign me my ticket to work on". 

The first idea that I did not know how to emphasize is that communication happens in 1:1 conversations. I have a learned habit of calling people to demo a problem and stretching to fix within the call. For quality topics, I held together a network of addressing these things, and writing a small note on "errors on console" was mostly enough for people to know who would fix it without assigning it further. It's not that I wrote bug reports on a confluence page instead of Jira. I did not just report the bugs, I collaborated on getting them fixed by knowing who to talk to and figuring out how we could talk less in big coordinating meetings. 

The second idea that I did not know to emphasize enough is that we work with zero bugs policy and we stop the line when we collect a few bugs. So we finish the change including the unfinished work (my preferred term to bugs) and we rely on fix and forget - with test automation being improved while fixing identified problems. 

The third idea I missed in elaborating is that information prioritisation is as important as discovery. If you find 10 things to fix, even with a zero bug policy, you have a gazillion heuristics of what conversations to have first over throwing a pile of information suffocating your poor developer colleagues. It matters what you get fixed and how much waste that creates. 

The fourth idea I missed is that product owners expectations of scope need to be anchored. If we don't have a baseline of what was in scope and what not, we are up for disappointments on how little we got done compared to what wishes may be. We cannot neglect making the abstract backlog item on product owners external communications list into a more concretely scoped listing. 

The fifth idea to conclude my listing in reflection is that you have to understand PULL. Pull is a principle, and if it is a principle you never really worked on, it is kind of hard. Tester can pull fixes towards a better quality. Before we pull new work, we must finish current work. I underestimated the amount of built habit in thinking pull means taking tasks someone else built for your imagined silo, over together figuring out how we move the work effectively through our system. 

For first six months, we were doing ok on these ideas. But it is clear that doubling the size of the team without good and solid approach of rebuilding or even communicating the culture the past practice and success of it is built on does not provide the best of results. And I might be more comfortable applying my "common sense" when the results don't look right to me in the testing space than those coming after me. 

The original concerns are still not true - the work is coordinated, and progresses. It is just that nothing is done and now we have some bugs to fix that we buried in confluence page we started to use like Jira - as assigning / conversing / reprosteps listing, over sticking with the ideas of truly working together, not just side by side. 




     

    Saturday, July 29, 2023

    Public Speaking Closure

    Very long time ago, like three decades ago, I applied for a job I wanted in the student union. Most of the work in that space is volunteer-based and not paid, but there are few key positions that hold the organizations together, and the position I aspired to have was one of those. 

    As part of the process of applying, I had to tell about myself on a large lecture hall, with some tens of people in the audience. I felt physically sick, remember nothing of what come out of my mouth, had my legs so shaky I had hard time standing and probably managed to just about say my name from that stage. It became obvious if I had not given it credit before: my fear of public speaking was going to be a blocker on things I might want to do. 

    I did not get the job, nor do I know whether it was how awful my public speaking was or if it was my absolute lack of ability to tell jokes intentionally or some of many other good reasons, but I can still remember how I felt on that stage on that day. I can also remember the spite that drove me on taking actions since then and over time gave sense of purpose for many actions to be educated and practice the skills related to public speaking. 

    Somewhere along the line, I got over the fear and had done talks and trainings in the hundreds. Then a EuroSTAR program chair of the year decided to reflect on why he chose no women to keynote quoting "No women of merit", and I found spite-driven development to become a woman of merit to keynote. I did my second EuroSTAR keynote this June. 


    Over the years of speaking, I learned that the motivations or personal benefits to a speaker getting on those stages are as diverse as the speakers. I was first motivated by a personal development need, getting over a fear that would impact my future. Then I was motivated by learning from people who would come and discuss topics I was speaking about, as I still suffer from a particular brand of social anxiety on small talk and opening conversations with strangers. But I collected status, I lost a lot of that value with people thinking they needed something really special to start a conversation with me. I travelled the world experiencing panic attacks and felt sometimes the loneliest in big crowds and conferences, with all those extroverts without my limitations. In recent years, I find I have been speaking because I know my lessons from doing this - not consulting on this - are relevant and speaking gives me a reflection mechanism of how things are changing as I learn. However, it has been a while since getting on that stage has felt like a worthwhile investment. 

    Recently, I have been speaking on habit and policy. I don't submit talks on call for proposals, and I have used that policy to illustrate that not having women speakers is conference organizers choice as many people like myself will say yes to an invitation. The wasteful practice of preparing talks when we really should be collaborating is something I feel strongly on, still today. The money from speaking from stages isn't relevant, the money from trainings is. In last three years with Vaisala, I have had all the teams in the whole company to consult at will and availability, and I have not even wanted to make the time for all of the other organizations even though I still have a side business permission. I just love the work I have there where I have effectively already had four different positions within one due to the flexibility they built for me. Being away to travel to a conference feels a lot more like a stretch and significant investment that is at odds with other things I want to do.

    The final straw to change my policy I got from EuroSTAR feedback. In a group of my peers giving anonymous feedback, someone chose to give me feedback beyond the talk I delivered. I am ok with feedback that my talk today was shit. But the feedback did not stop there. It also made a point that all of my talks are shit. And that the feedback giver could not understand why people would let me on a stage. That has hateful and we call people like this trolls when they hide behind anonymity. However, this troll is a colleague, a fellow test conference goer. 

    Reflecting my boundaries and my aspirations, I decided: I retire from public speaking, delivering the talks I have already committed to but changing my default to invitations to No. I have done 9 No responses since the resolution, and I expect to be asked now less since I have announced unavailability. 

    It frees time for other things. And it tells you all that would have wanted to learn from me that you need to reign in the trolls sitting in you and amongst you. I sit in my power of choice, and quit with 527 sessions, adding only paid trainings on my list of delivered things for now. 

    I'm very lucky and privileged to be able to choose this as speaking has never been my work. It was always something I did because I felt there are people who would want to learn from what I had to offer from a stage. Now I have more time for 1:1 conversations where we both learn. 

    I will be doing more collaborative learning and benchmarking. More writing. Coding an app and maybe starting a business on it. Writing whenever I feel like it to give access to my experiences. Hanging out with people who are not trolls to me, working to have less trolls by removing structures that enable them. Swimming and dancing. Something Selenium related in the community. The list is long, and it's not my loss to not get on those stages - it's a loss for some people in the audiences. I am already a woman of merit, and there's plenty more of us to fill the keynote stages for me to be proud of. 



    Thursday, May 18, 2023

    The Documentation Conundrum

    Writing things down is not hard. My piles of notebooks are a testament to that - I write things down. The act of writing is important to me. It is how I recall things. It is how I turn something abstract like an idea into something I can see in my minds eye on a physical location. It frees my capacity from the idea of having to remember. Yet it is a complete illusion. 

    I don't read most of the stuff I write down. I rarely need to go back to read what I wrote. And if I read what I wrote, it probably would make less sense than I thought in the moment, and would incur a huge cost in reading effort. 

    Yet, the world of software is riddled with the idea of writing things down and expecting that people would read them. We put new hires through the wringer of throwing them at hundreds of pages of partially outdated text and expect this early investment into bad documentation to save us from having to explain the same things as new people join. Yet the reality is that most of the things we wrote down, we would be better off deleting. 

    We think that writing once means reading in scale, and it might be true for a blog post. To write once in a form that is worth reading in scale either takes a lot of effort from the writer or happens to touch something that is serendipitously useful. Technical documentation should not be serendipitously useful, but it should be worth reading, in the future that is not here yet. It should help much more than hinder. It should be concise and to the point. 

    Over the course of a year, I have been on an experiment on writing down acceptance criteria. I set the experiment with a few thoughts:

    • product owner should not write the acceptance criteria, they should review and accept the acceptance criteria - writing is more powerful clarification tool than reading, and we need most power to clearing mistakes where they end up in code
    • acceptance criteria is output of having developed and delivered - we start writing them as we have conversations, but they are ready when the feature is ready and I will hold together discipline of writing down the best knowledge as output
    • question format for accepting / rejecting feels powerful and is also something that annoys both the people above me in org charts who would choose "shall" requirements format and a product owner believed it was important to change format, thus we will
    • acceptance criteria exist on epic level that matches a feature, smallest thing we found worth delivering and mentioning - it's bigger than books recommend but what is possible today drives what we do today

    So for a year, I was doing the moves. I tried coaching another tester into writing acceptance criteria, someone who was used to getting requirements ready and they escaped back to projects where they weren't expected to pay attention to discovering agreements towards acceptance but it was someone else's job. I tried getting developers to write some, but came to the notion of collecting from conversations being a less painful route. I learned that my best effort in writing acceptance criteria before starting feature was 80% of the acceptance criteria I would have discovered by being done with a feature, fairly consistently. And I noted that I felt very much like I was the only one, through my testing activities, who hit uncertainties of what our criteria had been and what it needed to be to deliver well. I used the epics as anchors for testing of new value, and left behind 139 green ticks. 


    Part of my practice was to also collect 'The NO list' of acceptance criteria that someone expected to be true that we left out of scope. With the practice, I learned that what was out of scope was more relevant to clarify, and would come back as questions much more than what was in scope. 18 of the items on 'The NO list' ended up being incrementally addressed, leaving 40 still as valid as ever on time of my one-year check. 

    For a whole year, no one cared for these lists. Yesterday, a new tester-in-training asked for documentation on scope and ended up first surprised that this documentation existed as apparently I was the only one fully aware of it it the team. They also appeared a little concerned on the epics incremental nature and possibility that it was not all valid anymore, so I took a few hours to put a year of them on one page. 

    The acceptance criteria and 'the NO list' created a document the tool estimates takes 12 minutes to read. I read them all, to note they were all still valid, 139 green ticks. 'The NO list' items, 58 of them had 31% to remove as we had since added those in scope. 

    The upcoming weeks and conversations show if the year's work on my part to be disciplined for this experiment is useful to anyone else. As artifact, it is not relevant to me - I remember every single thing now having written it down and using it for a year.  But 12 minutes reading time could be worth the investment even for a reader. 

    On the other hand, I have my second project with 5000 traditional test cases written down, estimating 11 days of reading if I ever wanted to get through them, just to read once. 

    We need some serious revisiting on how we invest in our writing and reading, and some extra honesty and critical thinking in audiences we write to. 

    You yourself are a worthwhile audience too. 



    Wednesday, April 26, 2023

    Stop Expecting Rules

    Imagine having a web application with a logout functionality. The logout button is on the top LEFT corner. That's a problem, even if your requirements did not state that users would look for logout functionality on the top RIGHT corner. This is a good example of requirements (being able to logout) and implicit requirements (we expect to find it on the right). There are a lot of things like this where we don't say all the rules out loud. 

    At work, we implemented a new feature on sftp. When sftp changes, I would have tested ftp. And it turns out that whoever was testing that did not have a *test case* that said to test ftp. Ftp was broken in a fairly visible way that was not really about changing sftp, but about recompiling C++ code to make a latent bug of 2009 now visible. Now we fixed ftp, and while whoever was testing that now had a test case saying to testing sftp and ftp as a pair, I was again unhappy. With the size of the change to ftp, I would not waste my time testing sftp. Instead, I had a list of exactly 22 places where we had the sonar tool identify problems exactly as this used-to-be-latent one, and I would have sampled some of those in components we had changed recently. 

    Search of really simple rules fails you. If you combine two things for your decision, the number of parameters is still small. 

    In the latter case, the rules are applied for a purpose of optimising opportunity cost. To understand how I make my decisions - essentially exploratory testing - would require balancing the cost of waiting another day for the release, the risks I perceive in the changes going with the release, the functional and structural connections in a few different layers. The structural connection in this case had both size and type of the change driving my decisions on how we would optimize opportunity cost.

    I find myself often explaining people that there are no rules. In this case, it would hurt timelines for maybe a few hours to tests ftp even when those few hours would be better used elsewhere. The concern is not so much on the two hours wasted, but on not considering the options of how that two hours could be invested - on faster delivery or on better choice of testing that supports our end goal of having an acceptable delivery on the third try. 

    A similar vagueness of rules exist with greeting people. When two people meet, who says hello first? There are so many rules, and rules on what rules apply, that trying to model it all would be next to impossible. We can probably say that in a Finnish military context, the rank plays to the rules of who starts and punishment of not considering the rules during rookie season is teaching you rule following. Yet the amount of interpretations we can make on other's intentions when passing someone at the office without them (or us) saying hello - kind of interesting sometimes. 

    We're navigating a complex world. Stop expecting rules. And when you don't expect rules, your test cases you always run will make much less sense than you think they do. 

     

    Saturday, April 15, 2023

    On Test Cases


    25 years ago, I was a new tester working with localisation testing for Microsoft Office products. For reasons beyond my comprehension, my 1st ever employer in IT industry had won localisation project for four languages and Greek was one of them. The combination of my Greek language hobby and my Computer Science studies turned me into a tester. 

    The customer provided us test cases, essentially describing tours of functionalities that we needed to discover. I would have never known how certain functionalities of Access and Excel work without those test cases, and was extremely grateful for the step by step instructions on how to test. 

    Well, I was happy with them until I got burned. Microsoft had a testing QA routine where a more senior tester would take exactly the same test cases that I had, not follow the instructions but be inspired by the instructions and tell me how miserably I had failed at testing. The early feedback for new hires did the trick, and I haven't trusted test cases as actual step-by-step instructions since. Frankly, I think we would have been better off if we described those as feature manuals over test cases, I might have gotten the idea that those were just a platform a little sooner. 

    In years to come, I have created tons of all kinds of documentation, and I have been grateful for some of it. I have learned that instead of creating separate test case documentation, I can contribute to user manuals and benefit the end users - and still use those as inspiration and reminder of what there is in the system. Documentation can also be a distraction and reading it an effort away from the work that would provide us result, and there is always a balance. 

    When I started learning more on exploratory testing, I learned that a core group figuring that stuff out had decided to try to make space between test cases (and the steps those entail) by moving to use word charter, which as a word communicates the idea that it exists for a while, can repeat over time but isn't designed to be repeated, as the quest for information may require us to frame the journey in new ways. 

    10 years ago, I joined a product development organization, where the manager believed no one would enjoy testing and that test cases existed to keep us doing the work we hate. He pushed very clearly me to write test cases, and I very adamantly refused. He hoped for me to write those and tick at least ten each day to show I was working, and maybe if I couldn't do all the work alone, the developers could occasionally use the same test cases and do some of this work. I made a pact of writing down session notes for a week, tagging test ideas, notes, bugs, question and the sort. I used 80% of my work time on writing and I wrote a lot. With the window to how I thought and the results in the 20% time I had for testing, I got my message through. In the upcoming 2,5 years I would find 2261 bugs I would report that also got fixed, until I learned that pair fixing with developers was a way of not having to have those bugs created in the first place. 

    Today, I have a fairly solid approach to testing, grounded on exploratory testing approach. I collect claims, both implicit and explicit and probably have some sort of a listing of those claims. You would think of this as a feature list, and optimally it's not in test cases but in user facing documentation that helps us make sense of the story of the software we have been building. To keep things in check, I have programmatic tests, where I have written some, many have been written because I have shown ways systems fail, and they are now around to keep the things we have written down in check with changes. 

    I would sometimes take those tests, and make changes on data, running order, or timing to explore things that I can explore building on assets we have - only to throw most of those away after. Sometimes I would just use the APIs and GUIs and just think in various dimensions, to identify things inspired by application and change as my external imagination. I would explore alone without and with automation, but also with people. Customer support folks. Business representatives. Other testers and developers. Even customers. And since I would not have test cases I would be following, I would be finding surprising new information as I grow with with the product. 

    Test cases are step by step instructions on how to miss bugs. The sooner we embrace it, the sooner we can start thinking about what really helps our teams collaborate over time and changes of personnel. I can tell you: it is not test cases, especially in their non-automated format. 

    Tale of Two Teams

    Exploratory testing and creating organizational systems encouraging agency and learning in testing has been a center of my curiosity in testing space for a very long time. I embrace title of exploratory tester extraordinaire assigned to me by someone whose software I broke in like an hour and half. I run exploratory testing academy. And I research - empirically, in project settings at work - exploratory testing.

    Over the years people have told me time and time again that *all testing is exploratory*. But looking at what I have at work, it is very evident this is not true. 

    Not all projects and teams - and particularly managers - encourage exploratory testing. 

    To encourage exploratory testing, your focus of guidance would be on managing performance of testing, not the artifact of testing. In the artifacts you would seek structures that invest more in artifacts when you know the most (after learning) and encourage very different artifacts early on to support that learning. The reason that the most known examples of managing exploratory testing focus on new style of artifacts: charters and session notes over test cases. The same idea of testing as performance (exploratory testing) and testing as artifact creation (traditional testing) repeats in both manual-centering and automation-centering conversations. 



    Right now I have two teams, one for each subsystems of the product I am managing. Since I have been managing development teams only since March, I did not build the teams, I inherited them. And my work is not to fix the testing in them, but to enable great software development in them. That is, I am not a test manager, I am engineering manager. 

    The engineering culture of the teams is very essentially different. Team 1 is what I would dub 'Modern Agile', with emergent architecture and design, no jira, pairing and ensembling. Team 2 is what I would dub as 'Industrial Agile', with extreme focus on Jira tasks and visibility, and separation of roles, focusing on definition of ready and definition of done. 

    Results from the whole team level is also very essentially different - both in quantity and quality. Team 1 has output increasing is scope, and team 2 struggles to deliver anything with quality in place. Some of the differences can be explained by the fact that team 1 works on new technology delivery and team 2 is on a legacy system. 

    Looking at the testing of the two teams, the value system in place is very different. 

    Team 1 lives with my dominant style of Contemporary Exploratory Testing. It has invested in baselining quality into thousands of programmatic tests run in hundreds of pipelines daily. Definition of test pass is binary green (or not). The programmatic tests running is 10% of effort, maintenance and growing of them is a relevant percentage more but in addition we spend time exploring with and without automation, documenting new insights again in programmatic tests on lowest possible levels. Team 1 had first me as testing specialist, then decided on no testers but due to unforeseeable circumstances have again a tester in training participating in the team work.  

    Team 2 lives in testing I don't think will ever work - Traditional Testing. They write plans, test cases and execute same test cases in manual fashion over and over again. When they apply exploratory testing it means they vanish from regular work for a few days, do something they don't understand or explain to learn a bit more on a product area, but they always return back to test cases after the exploratory testing task. Team 2 testing finds little to none of bugs, but gets releases returned as they miss bugs. With feedback of missing something, they add yet another test case to their lists. They have 5000 test cases, and run a set of 12 for releases, and by executing the same 12 minimise their chances of being useful. 

    It is clear I want a transformation from Traditional Testing to Contemporary Exploratory testing, or at least to Exploratory testing. And my next puzzle at hand is on how to do that transformation I have done *many* times over the years as the lead tester as a development manager. 

    At this point I am trying to figure out how to successfully explain the difference. But to solve this, I have a few experiments in mind:
    1. Emphasize time on product with metrics. Spending your days in writing of documentation is time away from testing. I don't need all that documentation. Figure out how to spend time on the application, to learn it, to understand it, and to know when it does not work.
    2. Ensemble testing. We'll learn how you look at an application in context of learning to use it with all the stakeholders by doing it together. I know how, and we'll figure out how. 


    Wednesday, March 22, 2023

    Memoirs of Bad Testing

    I'm in a peculiar position right now. I am a manager of a manager who manages test manager managing themselves and a tester. Confused already - I am too.

    The dynamic is clear through. I get to let go the tester for not thinking their work isn't worth paying for. And that is kind of what I am thinking. But it is not that straightforward. All testers, gather around, this needs a lot more than just throwing it at your face like this.

    What Good Testing Looks Like

    Having been a tester for such a long time and still being a tester on side of my other dark-side managerial duties. I know what good testing looks like and I would buy some in a heartbeat. Some examples of that testing I vouch for are embodied in people like Ru Cindrea (and pretty much anyone at Altom I have seen test, including past people from Altom like Alessandra Casapu, pure testing brilliance), Elizabeth Zagroba, Joep Schuurkes and James Thomas. And my idol of all times, Fiona Charles, who I attribute to a special blend of excellence in orchestrating testing of this kind. 

    If there is a phrase that good testing often comes to me with, it is the words *exploratory testing*. Not the exploratory testing that you put into a single task as a reason to say that you need extra time to think, but the kind that engulfs all your testing, makes you hyper focus on results and aware of cost of premature documentation as well as opportunity cost, and positively pushes you to think about risks and results as an unsplittable pair where there is not one without the other. 

    These kinds of testers have product research mindset, they are always on the lookout of how they invest their limited time the best in the context at hand and they regularly surprise their teams with information we did not have available regardless of how much automation we have already managed to accrue. 

    Seniority much? 

    While it may sound like I am describing very senior people, I have seen this in 1st year testers, and I have seen lack of this in 10th year testers. It is not about level of experience, it is about level of learning, and attitude to what it means to do testing. 

    What does it look like when the work may not be worth paying for?

    While writing documentation and notes and checklist is good, writing test cases is bad. Generally speaking of course. So my first smell towards recognising I might come to a case of testing not worth paying for is test case obsession. 

    Test Case Obsession is counting test cases. Defining goals of test automation in perspective of manual test cases automated. Not understanding that good test automation means decomposing the same testing problems in a different way, not mimicking your movements manually. 

    Its listing test cases and using the very same test cases framing them into "regression suite", "minor functionality suite" and "major functionality suite" and avoiding thinking about why we test today - what is the change and risk - and hoping that following a predetermined pattern would be a catch all important with regards to results. 

    If this is what you are asked to do, you would do it. But if you are asked to follow the risk and adjust your ideas of what you test, and you still refuse, you are dangerously on the territory of testing not worth paying for.  When that is enhanced with missing simple basic things that are breaking because you prioritise safety and comfort of repeating plans over finding the problems with lack of information your organization hired you to work on, you are squarely on my testing not worth paying for turf. 

    The Fixing Manoeuvres

    I've done this so many times over the last 25 years that it almost scares me. 

    First you move the testing not worth paying for into a team of its own and if you can't stop doing that work, you send it somewhere where it gets even thinner on results by outsourcing it. And in 6-12 months, you stop having that team. In the process you have often replaced 10 testers with 1 tester and  better results. 

    Manager (1) - Manager (2) - Manager (3) - Tester

    Being the Manager to Testers, I can do the Fixing Manoeuvres. But add more layers, and you have a whole different dynamic.

    First of all, being the manager (1), I can ask for Testing Worth Paying For (tm) while manager (2) is requiring testing not worth paying for, and manager (3) and tester being great and competent folks find themselves squeezed from two sides. 

    Do what your manager says and their manager will remove you.

    Do what your managers manager say and your manager will nag you because they have been taught that test cases are testing, and know of nothing better. 

    Long story, and Morale  

    We have an epidemic of good managers with bad ideas about how to manage testing and this is why not all testing is not exploratory testing. Bad ideas remove a lot of the good from testing. Great testers do good testing even in presence of bad ideas, but it is so much harder. 

    We need to educate good managers with bad ideas and I have not done my part as I find myself in a Manager - Manager - Manager - Tester chain. I will try again. 

    Firing good people you could grow is the last resort while also a common manouvre. Expecting good testing (with mistakes and learning) should be what we grow people to. 

    Friday, February 17, 2023

    The Browser Automation Newbie Experience

    Last week I called for Selenium newbies to learn to set up and run tests on their own computer, and create material I could edit a video out of. I have no idea if I ever manage to get to editing the video, but having now had a few sessions, writing about it seemed like a good use of an evening. 

    There are two essentially different things I mean by a newbie experience: 

    1. Newbies to it all - testing, programming, browser automation. I love teaching this group and observing how quickly people learn in context of doing. 
    2. Newbies to browser automation library. I love teaching this group too. Usually more about testing (and finding bugs with automation), and observing what comes in the way of taking a new tool into use. 
    I teach both groups in either pair or ensemble setting. In pairs, it's about *their* computer we pair on. In ensemble, it about one person's computer in the group we pair on. 

    The very first test experience is slightly different on the two and comprises three parts:
    1. From empty project to tools installed
    2. From tools installed to first test
    3. Assessing the testing this enables
    I have expressed a lot of love for Playwright (to the extent of calling myself a playwright girl) and with actions of using it in projects at work. But I have also expressed a lot of love for Selenium (to the extent that I volunteer for the Selenium Project Leadership Committee) and with encouraging using it in projects at work. I'm a firm believer that *our* world is better without competition and war, rather admiration and understanding and sustainability. 

    So with this said, let's compare Playwright and Selenium *Python* on each of these three. 

    From Empty Project To Tools Installed - Experience Over Time

    Starting with listing dependencies in requirements.txt, I get the two projects set up side by side. 

    Playwright         Selenium
    pytest
    playwright
    pytest-playwright
    pytest
    selenium

    I run pip install -r requirements.txt and ... I have some more to do for playwright: running playwright install from command line, and watching it download superset browsers and node dependencies. 

    If everything goes well, this is it. At time of updating the dependencies, I will repeat the same steps, and I find the playwright error message telling me to update in plain English quite helpful. 

    If we tend to lock our dependencies, need of update actions are on our control. On many of the projects, I find failing forward to the very latest to be a good option, and I take the unusual surprise of update over the pain of not having updated almost any day. 

    From Tools Installed to First Test

    At this point, we have done the visible tool installation and we only see if everything is fine if we use them tools and write a test. Today we're sampling the same basic test on todo mvc app, the js vanilla flavour

    We have the Playwright-python code


    With test run on chromium, headed: 
    == 1 passed in 2.71s ==

    And we have the Selenium-python code


    With test run on chrome Version 110.0.5481.100 (Official Build) (arm64)
    == 1 passed in 6.48s ==

    What I probably should stress in explaining this is that while 3 seconds is less than 7 seconds, more relevant for overall time is the design we go to from here on what we test while on the same browser instance. Also, I have run tests for both libraries on my machine earlier today which is kind of relevant for the time consideration.

    That is, with Playwright we installed the superset browsers with the command line. With Selenium, we did not because we are using the real browsers and the browser versions on the computer we run the tests on. For that to work some of you may know that it requires a browser-specific driver. The most recent Selenium versions have built-in SeleniumDriver that is - if all things are as intended - is invisible. 

    I must mention that this is not what I got from reading Selenium documentation, and since I am sure this vanishes soon as in gets fixed, I am documenting my experience with a screenshot.


    I know it says "No extra configuration needed" but since I had -- outside path - an older ChromeDriver, I ended up with an error message that made no sense with the info I had and reading the blog post, downloaded a command line tool I could not figure out how I would use. Coming to this with Playwright, I was keen to expected command line tool instead of automated built-in dependency maintenance feature and the blog post just built on the assumption. 

    Assessing the Testing This Enables

    All too often we look at the lines of code, the commands to run, and the minutiae of maintaining the tool, but at least on this level, it's not like the difference is that huge. With python, both libraries on my scenarios rely on existence of pytest and much of the asserting stuff can be done the same. Playwright has been recently (ehh, over few years, time flies I have no clue anymore) been introducing more convenience methods for web style expects like expect(locator()).to_have_text() and web style locators like get_by_placeholder(), yet these feel like syntactic sugar to me. And sugar feels good but sometimes causes trouble for some people in the long run. Happy to admit I have both been one of those people, and observed those people. Growing in tech understanding isn't easier for newbie programmers when they get comfortable with avoiding layered learning for that tech understanding. 

    If it's not time or lines, what it it then? 

    People who have used Selenium are proficient with Selenium syntax and struggle with the domain vocabulary change. I am currently struggling the other way around with syntax, and with Selenium the ample sample sets of old versions of APIs that even GitHub copilot seems to suggest wrong syntax from isn't helping me. Longer history means many versions of learning what the API should look like showing in the samples. 

    That too is time and learning, and a temporary struggle. So not that either. 

    The real consideration can be summed with *Selenium automates real browsers*. The things your customers use your software on. Selenium runs on the webdriver protocol that is a standard the browser vendors use to allow programmatic control. There's other ways of controlling browsers too, but using remote debugging protocols (like CFP for Chromium) isn't running the full real browser. Packaging superset browsers isn't working with the real versions the users are using.  

    Does that matter? I have been thinking that for most of my use cases, not. Cross-browser testing the layers that sit on top of frontend framework or on top of a platform (like salesforce, odoo or Wordpress), there is only so much I want to test the framework or the platform. For projects creating those frameworks and platforms, please run on real browsers or I will have to as you are risking bugs my customers will suffer from. 

    And this, my friend, this is where experience over time matters. 

    We need multiple browser vendors so that one of them does not use vendor lock in to do something evil(tm). We need the choice of priorities different browser vendors make as our reason to choose one or another. We need to do something now and change our mind when the world changes. We want our options open. 

    Monoculture is bad for social media (twitter) with change. Monoculture is bad for browsers with change. And monoculture is bad for browser automation. Monoculture is bad for financing open source communities. We need variety. 

    It's browser automation, not test automation. We can do so much more than test with this stuff. And if it wasn't obvious from what I said earlier in this post, I think 18-year old selenium has many many good years ahead of it. 

    Finally, the tool matters less than what you do with it. Do good testing. With all of these tools, we can do much better on the resultful testing. 

    Friday, January 6, 2023

    Contemporary Regression Testing

    One of the first things I remember learning about testing is the repeating nature of it. Test results are like milk and stay fresh only a limited time, so we keep replenishing our tests. We write code and it stays and does the same (even if wrong) thing until changes, but testing repeats. It's not just the code changing that breaks systems, it's also the dependencies and platform changing, people and expectations changing. 

    An illustration of kawaii box of milk from my time of practicing sketchnoting

    There's corrective, adaptive, perfective and preventive maintenance. There's the project and then there's "maintenance". And maintenance is 80% of products lifecycle costs since maintenance starts with first time you put the system in production. 

    • Corrective maintenance is when we had problems and need to fix them.
    • Adaptive maintenance is when we will have problems if we allow for world around us to change and we really can't stop it, but we emphasize that everything was FINE before the law changes, the new operating system emerged or that 3rd party vendor figured out they had a security bug that we have to react to because of a dependency we have.
    • Perfective maintenance is when we add new features while maintaining, because customers learn what they really need when they use systems. 
    • Preventive maintenance is when we foresee adaptive maintenance and change our structures so that we wouldn't always be needing to adapt individually. 

    It's all change, and in a lot of cases it matters that only the first one is defects and implying work you complete without invoicing for the work. 

    The thing about change is that it is small development work, and large testing work. This can be true considering the traditional expectations of projects:

    1. Code, components and architecture are spaghetti
    2. Systems are designed, delivered and updated as integrated end-to-end tested monoliths
    3. Infrastructure and dependencies are not version controlled
    With all this, the *repeating nature* becomes central, and we have devised terminology for it. There is re-testing (verifying a fix indeed fixed the problem) and regression testing (verifying that things that used to work still work), and made it a central concept in how we discuss testing.

    For some people, it feels regression testing is all the testing they think of. When this is true, it almost makes sense to talk about doing this manual or automated. After all, we are only talking of the part of testing that we are replenishing results for. 

    Looking at the traditional expectations, we come to expectations of two ways to think about regression testing. One takes a literal interpretation of "used to work", as in we clicked through exactly this and it worked, and I would call this test-case based regression testing. The other takes a liberal interpretation of "used to work" remembering that with risk-based testing we never looked at it all working but some of it worked even when we did not test it, and thus continuing with risk-based perspective, the new changes drive entirely new tests. I would call this exploratory regression testing. This discrepancy of thinking is a source of a lot of conversation in automated space because the latter would need to actively choose to pick tests as output to leave behind that we consider worthwhile repeating - and it is absolutely not all the tests we currently are leaving behind. 

    So far, we have talked in traditional  expectations. What is contemporary expectation then?

    The things we believe are true of projects are sometimes changing:
    1. Code is clean, components are microservices and architecture creates clear domain-driven architecture where tech and business concepts meet
    2. Systems are designed, delivered and updated incrementally, but also per service basis
    3. Infrastructure and dependencies are code
    This leads to thinking many things are different. Things mostly break only when we break them with a change. We can see changes. We can review the change as code. We can test the change from a working baseline, instead of a ball of change spaghetti described in vague promises of tickets. 

    Contemporary regression testing can more easily rely on exploratory regression testing with improved change control. Risk-based thinking helps us uncover really surprising side effects of our changes without using major efforts. But also, contemporary exploratory testing relies on teams doing programmatic test-case based regression testing whenever it is hard for developers to hold their past intent in their heads. Which is a lot, with people changing and us needing safety nets. 

    Where with traditional regression testing we could choose one or the other, with contemporary regression testing we can't.   





    Monday, January 2, 2023

    The Three Cultures

    Over the last 25 years, I have been dropped to a lot of projects and organizations. While I gave up on consulting early on and deemed it unsuited for my aspirations, I have been a tester with an entrepreneurial attitude - a consultant / mentor / coach even within the team I deliver as part of. 

    Being dropped to a lot of projects and organizations, I have come to accept that two are rarely the same. Sometimes the drop feels like time travel to past. Rarely it feels like time travel to future. I find myself often brought in to help with some sort of trouble, or if there was no trouble, I can surely create some like with a past employer where we experimented with no product owner. There was trouble, we just did not recognise it without breaking away from some of our strong-held assumptions. 

    I have come to categorize the culture, the essential belief systems around testing to three stages:

    1. Manual testing is the label I use for organizations predominantly stuck in test case creation. They may even automate some of those test cases, usually with the idea speeding up regression testing, but majority of what they do relies on the idea that testing is predominantly without automation, for various reasons. Exploratory testing is something done on top of everything else. 
    2. Automated testing is the label I use for organizations predominantly stuck in spearing manual and automated testing. Automated testing is protected from manual testing (because it includes so much of its own kind of manual testing), and the groups doing automation are usually specialists in test automation space. The core of automated testing is user interfaces and mostly integrated systems, something a user would use. Exploratory testing is something for the manual testers. 
    3. Programmatic tests is the label I use for whole team test efforts that center automation as a way of capturing developer intent, user intent and past intent. Exploratory testing is what drives the understanding of intent. 
    The way we talk, and our foundational beliefs in these three different cultures just don't align. 

    These cultures don't map just to testing, but the overall ideas of how we organize for software development. For the first, we test because we can't trust. For the middle, we test because we are supposed to. For the last, we test because not testing threatens value and developer happiness. 

    Just like testing shifts, other things shift too. The kind of problems we solve. The power of business decisions. Testing (in the large) as part of business decisions. The labels we use for our processes in explaining those to the world. 


    This weekend I watched an old talk from Agile India, by Fred George on 'Programmer Anarchy'. I would not be comfortable taking things to anarchy, but there there is a definite shift in where the decision power is held, with everyone caring for business success in programmer-centric ways of working. 

    The gaps are where we need essentially new cultures and beliefs accepted. Working right now with the rightmost cultural gap, the ideas of emergent design are harder to achieve than programmed tests. 

    Documentation is an output and should be created at times we know the best. Programmed tests are a great way of doing living documentation that, used responsibly, gives us a green on our past intent in the scope we care to document it.