Thursday, July 11, 2024

Why is it bad practice to write your programmatic tests like your manual tests?

There must a lot of articles on why it is a bad practice to write your programmatic tests like your manual tests but finding one you would recommend in the moment just did not happen when the question emerged today. So here's one reference to the basic stuff we keep having a conversation on, with less finesse than the entire conversation around the internets has. 

So why not this - you got your manual test case: 

And then you transform it to your programmatic test case: 


Here's the usual recommendations:
  • Structure beyond test cases. We would generally expect to see structure in code beyond just test cases. 
    • You should use variables. That would move your URL, your selectors and maybe even your inputs to variables and they would be called from the script. 
    • You should capture actions on a higher level of abstraction. That would mean that you'd have method for login with all these steps, and your test case would call login(username, password) and would be probably called test_login_with_valid_credentials()
    • Your variables and actions would probably be in a separate file from your test case, a page object organized to model what you see in that web page or part of it that you're modeling. When things change, you'd change the page object rather than each test case. 
    • Your tests probably have parts that should happen before the test and after the test. Some languages call this setup and teardown, some put these in fixtures. You are supposed to design fixtures so that they make your tests clear and independent. 
    • You will have things you do on many pages, and shared ways of doing those on all pages. Find a place for commons in your structure. Like you might always make sure there is no text "error" anywhere on pages while you test. 
    • The usual recommendations for structure within test include patterns like AAA (Arrange - Act - Assert) that suggests you focus first in setting things up you need in your test, then do the thing and then verify the things you want to verify, and avoid mixing the three up in your order of actions. 
  • Separation of data. We would generally expect that data would be separated from the test case and even from the page object. Especially if it is complicated. And while it should be separated, you should still manage to describe your tests so that their name makes sense when test fails without looking at the data. 
    • Parametrize. Name data, input the data, use same test case to become many test cases. 
  • Naming is modeling. Your manual test cases may not be the model you want for your automation. Usually you decompose things differently to avoid repetition in programmatic tests and you give names to parts to model the repetition. 
  • Focus test on an architecture layer. 
    • API-based login over UI-based login. We would generally expect there to be an authentication endpoint that does not require you to click buttons, and you'd do login for all your non-login tests requiring logged in user with an API. When things are easier to do without UI, we'd generally expect you to design steps that are different from your manual test cases that expect you to operate things like a user. 
    • Mocking API calls to control UI. We would generally expect you to test very little with the whole integrated end to end system over a longer period of time. We would expect that instead of using real APIs, we would expect you to test the UI layer with fake data that you can control to see texts. Like if you had a weather app that knows night and day, you would want to be able to fake night or nightless day without doing all the setup we do for stuff like that when we test manually. 
  • Imperative vs. declarative. You should probably learn the difference and know that people feel strongly about both. 
There's a lot people say about this, but maybe this will get you started. Even if I did not take time to show you with examples how each of these changes the manual and programmatic tests we started from. 


Testing is amplifying fail signals

There's a phrase I picked up though Joep Schuurkes that points to a book by Gene Kim et al that I did not yet read, but did listen to a podcast episode on a podcast all this made me discover and appreciate wider

Amplifying fail signals.

Combine this with sampling a talk on YouTube where the speaker made the point of not being anti-AI and then proceeding to spend 45 minutes on amplifying fail signals with examples of problems of what AI can't do, characterizing it humanly with negatives only and landing back to something this crowd is known for, two key questions separating what a *tester* and *manager*. I have not liked this cut choice of centering testers as information providers instead of active agents part of making those decisions before, and I did not like it now with AI. 

Sometimes we amplify fail signals so much that no other signals are left. 

* I am well aware that amplifying fail signals is intended to be for signals that are hard to pick up. Like the fact that when you ask a question leading with "hi guys", not getting a response from me does not mean I would not have the answer or be unwilling to share with you, just that you make it extra effort to get by to doing the work excluding me. There are patterns we don't recognize as failure that require special amplification to be successful. 

The whole agency in decisions, it would be hard to live in Finland with cases like Vastaamo (psychotherapy services security breech with very private information leaked) and not realize that when we fight in courts, the conversation is on responsibilities. Should the IT specialists have 1) created a paper trail that shows the company did not allocate resources to fix the problems 2) fixed the problems if they truly were on level of not changing default passwords and 3) considered the work assigned to them specifically is where their responsibility lies. Is this kind of a thing responsibility of the management, or does the expert play a role in the responsibilities too? Whether court assigned on not, I like to act as if I had responsibilities on things I am aware of, and I can choose to change the password away from default even when I can't get the hiring budget for someone who specializes in it in addition to myself. But things are easy when you're not being blamed for a mistake in court. Which for me is still pretty much always. 

The talk goes back to one question that has been flying around long enough that it should not need attribution: is there a problem here? This time around, it drops a second question: are you ok with all of this? 

With edits of emphasis on the latter question, it helps me frame things like I want to. Even if we (over)amplified fail signals and asked managers: are you ok with all of this?, we really need to ask ourselves in democratizing data and use of AI: are you ok with all of this?

I can talk as much as the next person on the threats but I tend to park them as a list to make choices on using my time elsewhere. I have had to decide on a personal level what to do with the things that I am not ok with, and decide that the really useful style of amplifying fail signals is to fix some of the fails resulting in different signals, compensate for others, and put a foot down ethically when it really is something I am not ok with, and start movements. 

The listing of threats I work with currently stands with this:
  • Mostly right is not enough when stakes are high so we may want to keep good humans in the loop.
  • Keeping what is secret, secret so we understand that use of AI is sharing information we need to be more intentional on
  • Cultural filtering with encoded cultural norms so that someone else's standards don't lose someone else's history and culture
  • Plagiarism at scale so that we remember what this is ethically even if not legally
  • Move against green code so that we start considering perspectives beyond I can do it
  • Institutionalized bias so that we understand how bias shows up in data
  • Personal bias so that we understand reasons our inputs make us like results better
  • Humanization of technology so that we remember it is only technology
  • Skills atrophy so that we pay attention to necessary social changes
  • Labor protections so that we are not creating conditions of world we don't want to live in
  • AI washing so that we understand value within the buzzwords
  • Energy to refute misinformation so that we pitch in to learning and educating
  • Accidental and Intentional data poisoning so that we remember our actions today impact our future experiences and not everyone is out for good
  • Easy over right solutions to testing problems so that we have the necessary conversations we've been mulling over that allow us to do something smart
So am I ok with all of this? Absolutely not. And because I am not, I will work on figuring out how to use it, not just as technical tool but in a social context, with guardrails and agreements in place. Since my time is limited, I choose to monitor the thread landscape, yet focus my actions on designing guardrails within testing to get the value we are out for in this all. 

We can try characterizing technology humanly from fail perspective, but that just leads us to seeing how many negatives we can apply on anything if we so choose. 

 

Tuesday, July 9, 2024

AI is automation at scale of benefits

For the longest time, I have spent a small part of my attention on explaining to people that framing test automation as something that solves the risk of regression problem only is such a waste. You create some of your worst test automation by taking the scripted manual tests and automating those one by one. And it is not a problem to test the "same things" manually if you aren't finding problems with that manual thing after that you continuously need to retrace back on being the problems we already have accepted. 

I reframed test automation to something that is

  1. Documenting
  2. Extending reach
  3. Alerting to attend
  4. Guiding to detail
That was already before this whole age of AI started. And now, looking at the age of AI, we are struggling with similar issues of framing. We want to overfit tool to a task and are being overprotective about our idea of classification. 

At a previous project, I had an excellent example of model-based testing. Instead of writing scripts, we had a model of flows driving functionalities modeled into a page object. Not AI. But so much more intelligent and significantly better approximation of acting humanly to include in our ways of testing than the usual scripts alone. 

Many people qualify a few categories of solutions as AI, making the point that it is a marketing fad that people in the industry would frame automation at scale as AI. In a narrower definition, we would look for technologies that are new and different: machine learning; large language models; retrieval augmented generation; computer vision and so forth. 

A few years back reading a book "Artificial Intelligence, a Modern Approach", I picked up one of their definitions of AI:
Acting humanly.

Mundane clicking is acting humanly. Mundane clicking by a UI automation script is not acting humanly alone, it removes some of the emergent intent we overlay as humans when we do mundane clicking. Including randomness and fun to increase chances of serendipity are very much human. So from this frame, of AI targeting acting humanly, we are looking at automation at scale of benefits. 

The book lists six areas of acting humanly: 
  1. Natural language processing
  2. Knowledge representation
  3. Automated reasoning
  4. Machine learning
  5. Computer vision and speech recognition
  6. Robotics
Watching tools bring in AI by renaming money as "AI unit", adding RAG-style LLM use as one of the consumables is quite a marketing smoke and screen founded on a type of market where people are likely to be willing to consume paid content to have more targeted responses makes some sense, even if I find it difficult to consider that AI. Creating the most silly monkey to click around in a research project and calling that AI makes some sense, even if I again find it difficult to consider that AI. 

To make progress and sense, I have the idea that we would need to start caring less about protecting a classification and talking about the work we do. 


What would Acting humanly for purposes of testing look like? 

Right now it looks like having a colleague who never tires of your questions and misframings, providing you somewhat average, overconfident answers. 



Then again, I have been a colleague like that on my good days, and a colleague with even more negative traits associated with me on my bad days. Good thing is the social context corrects our individual shortcomings. 

Would it be possible to just think of AI as a way of saying that our target level of automation just scaled for benefits, and then solve the real problems, in collaboration. 

Obviously I too listed the negatives I park: 

There are things I still balance so heavily on the negatives like my support for plagiarism at scale, but instead of my choice being educating on those or hoping my avoidance of use sends the message, I'm trying out compensations and offsetting impacts with other actions. I prioritize learning to apply things more thoughtfully and being at the table when we decide on follow up actions.



Fascination of Scale

A lot of people are worried about future of employment. My little bubble includes people who have been searching for their next employment in IT for a while, and feeling deflated. Similarly, my bubble includes people who gave up on IT, women in particular, and chose to do something else.  

Some days it feels like threats to employment are everywhere but in particular in the effort we use in being threatened over finding ways to direct that energy elsewhere. People in my little bubble have three major sources of existential threat:

  1. Testing is moving from testers to developers
  2. Work is moving from local to global
  3. AI will replace us all, or people using AI will replace people without AI
That's a lot for anyone to think about. So today, I wanted to frame my thoughts in a blog post around three simplified numbers.
  • Number of programmers doubles in 5 years
  • 50% of jobs change, 5% of jobs replaced by automation

There's a lot of 5's there, and a comforting message. Keeping up with the changes keeps around 95% of the jobs, and with the rate of growth, more jobs are being created than is going away.  

Out of curiosity, I wanted to dig in a little deeper. In a game of scale, we can either imagine there is always something worthwhile for us, or we can imagine that finding something worthwhile for us is prohibitively difficult. I tend to believe that we have problems with discovery and matchmaking, and we have problems with supporting change and growth, and part of the reasons for those is focus in individual (entities) over networks. But again, back to digging. I live in a fairly small country, Finland. And understanding anything would start with understanding your own bubble in relation to others. 

Finland has population of 5.6M people. As per random searching of information online, we have 117k people graduating annually, and 130k people currently employed in ICT, with an average salary of 3900 €/month. Another random point of comparison I had to dig out is that it seems usage statistics of Selenium suggest 0.5% use in relation to population for Selenium, regardless of the fact that most of my friends here block all statistics from being sent. With one of me and 5.6M of others close to me, being worried about running out of work has not become a concern. The experience I live with it we continuously want to do more than what we have time for, and a significant productivity improvement helping us solve a few more of those persisting issues would be quite welcome. 

To make sense of scale, I selected a country of many of my colleagues in the industry. The scale in which things work in India makes it easy to understand why my socials are full of friends from that corner of the world. India has a population of 1.4B people. I could not even grasp that as a number without doing a bit of area calculation math, and drawing those areas as circles. I saw claims that 1M people graduate annually, and that there are 2.8M people working in ICT in India. Average salary counts to 6500 € / year. Selenium usage to population is 0.01%. That all is an entirely different scale. 

My search for numbers continued on global. 55.3M people in ICT, with 4% annual growth, 22% over 5 years. So if the claim I picked up some years ago in Agile Technical Conference counts *programmers*, it would appear that programmers are growing faster than the industry at large - which would probably make some sense. ICT-industry segment value $341B would suggest that per person slice of that value for each employed is $6k/year. 



With all the numbers, this brings us back from worrying about being without jobs to figuring out the continuous work we do in changing our jobs. Changing our jobs to share the work and value. To evolve the work and value. To move at pace we can work with. 

There's a change in testing that has been ongoing a while. 



Paying attention not just to "what are my tasks" but to what is the value I'm contributing and evolving that further is a part of the work towards productivity on everyone's plate. 

Wednesday, July 3, 2024

Threads, Sessions and Two Kinds of Tasks

It all started with a call I joined early in a morning. I had been asked to introduce what I do for work for a larger group of people, and the messages started flowing in: 

"The sound is very low". "Something is wrong with your microphone". 

If this was the first time, I would not have known what was going on. Nothing was essentially different. The same equipment that works fine most of these day now would not. And at an inconvenient time. And it was *on me*, *my omission*. I could have rebooted the machine after the Selenium project leadership committee call I had been on, but I forgot, at a most inconvenient time. 

You see, this bug had been bugging me for months. It had bugged me on two different Mac computers, and I had been aware of its general conditions to reproduce for a few weeks now, but I had parked the thread of doing something about, other than working around it with a reboot. I was hoping that parking it long enough, it would go away without me doing anything. But no such luck for last six months. 

With so many things to test, I parked the one that I was paid for on the investigation and reporting work. 

You see, taking forward the work of getting rid of that bug, it's a thread of work. It's multiple steps. For reasons within my understanding, the companies - let alone open source projects - don't see the problems just because we experience them. There's real work, testing work, to turn a random experience like that into a bug report that could even stand a chance of getting fixed. 

I knew the general conditions. The general conditions consisted of a weekly repetitive pattern in my life. When I would go to Software Freedom Conservancy's installation of BigBlueButton and have a lovely call on topics of advancing Selenium project's governance and support, any calls after that on teams or zoom would lose 80% of the volume. 

With the embarrassment of the call fresh on my mind, I made a choice on next steps in the thread of this work. Insisting it is not my work, I used a minute to complain about it on Mastodon and received some advice. Realizing my minute to complain turned into a whole conversation of many minutes and I lost control over my use of time for the curiosity of it, I made a choice on opportunity cost. I could use time on explaining why the advice would not make sense for me. Or I could use time and turn the general conditions to specific conditions, and actually report the bug. I won't believe in the good of the world enough to think it will be fixed with just reporting, but at least I am living a little closer to good choices on opportunity cost. 

You may guess it, the thread of work unfolds. What I think it going to be simply writing down the steps turns into a more of an investigation. The first version of the report I write without submitting mentions I use Safari, and its precise version, and also including details of my environment otherwise. It's specific enough that these instructions give me the reproduction of the problem on two different computers. It's adorned with pictures to show I am not just imagining it and expressing frustration on something I did not even investigate. I did after all, over the last 6 months. 

Looking at the report, I realize saying it happens on Safari, I have questions. What about the other browsers? Is it all browsers, or did I hit a sample of browsers giving a different experience of quality? To my surprise, it does not reproduce on Chrome or Firefox. You guessed it again, the rabbit hole deepens. 

I dash up python + playwright and it's been a few months since I did that. I set up a project and write up a test script to reproduce the problem on webkit that ships with Playwright. My idea is twofold. I would not mind reporting with a script that helps see it. But I am mostly curious on whether this bug reproduces on that webkit browser version that approximates safari but isn't it. And it does not. It does reproduce on selenium though, but I can't exactly ship my machine for the project and their pipeline does not include real browsers on Mac anyway. Well, I assume it don't which may be incorrect. 

I get the report in, and I have a fascinating case of a bug that might be in BigBlueButton (open source project) or Safari (Apple), and there's no way I am going to storm the latter for figuring out ways of reporting there. I cut my losses of time, and revel in my gains of learning. The likelihood of anyone setting up a cross-systems test for a scenario like this is low so it's not exactly a testament for importance of cross-browser automation, but it is a nudge and encouragement to keep with Safari because the others don't. 

And a work day later, I am rewarded with another Safari only bug, this time in our company CRM system where save button is not in sight in Safari. Lucky me. Another thread to park. 

---

Whether I work with the application I am testing or an application I using, I have come to learn that explaining the work is riddled with difficulties. *Testing* sounds so simple. The long story above on testing just one kind of scenario is not simple. It's not that it's hard, it's just a flow of doing something and making decisions, and driving towards an idea of completion of a thread of work,

My story there had two threads: 

  • What I do to live with the bug to increase odds of it going away
  • What I learn from the bug since I am investing time and energy into it
This blog post is part of the latter thread. 

So imagine I had a task in Jira or whatever tool of choice your project puts you on. Your task says to make the bug go away. All these things are subtasks, and some of those you discover as you do things. But you don't need this level of explanation in the tools unless you are in a particular environment. 

Your task is a thread. It's actually multiple tasks. And thread is a way for you to label things that are born inspired from the thing you set out to do in the first place, to talk about the work in a more clear way. And frame it in discovery and learning. 

No matter what focus you put on this task, with the dependencies it has it won't be completed within four hours. I heard someone again suggesting that is the granularity of expected task breakdown in a tool. Quite a documentation investment, and opportunity cost suggests that maybe you could do something else with the time. At least think about it. 

I have been teaching people that there are two kinds of tasks: 
  • expansive - the work continues until it's done
  • time-boxed - the work is done when the time on it is done
Testing can be done in the time-boxed style. It is testing that I test something for 4 hours. It is still testing that I test it for 40 hours. Time-boxed style is the foundation for managing testing with sessions. 

There is a more lightweight way of managing testing than sessions through. That way is managing testing with threads. Tracking threads, their completion, their importance, the need of using budget on them is how I work with most of my exploratory testing. The sessions are charters are not 1:1 but I have a charter for a thread, and I am careful at modeling learning as new threads. 

You may want to try out visualizing the work you have at hand. Sometimes people see what your head holds better with a picture. 


Learning to track, visualize and explain threads, sessions and the two kinds of tasks (expansive and time boxes) has been a bug part of what has helped me get the trust in testing. Too many daily go by with doing 'testing yesterday, and continuing testing today' as the report of work. 

Let's see when I can report that people on Safari won't need a reboot after using BigBlueButton. I changed something already - taking the calls on Chrome now. So we can call it someone else's problem, and the world is full of them. I make them mine for organizations that pay me, it's kind of the work. 

Test Automation - Friend or Foe?

When in 2023 four people including myself got together in a panel to decide if test automation is friend or foe, I chose it was a foe. Panels where we all agree aren't fun, and in search of really understanding something more, we need to find more than the places we agree on. I remembered that one as particular panel, so I went back to rewatch it to the points we were making. The panelists were Ben Dowen, Beth Marshall, Brijesh Deb and myself. I remembered that one as particular panel, so I went back to rewatch it to the points we were making. 

What is test automation? 

  • Code doing something useful for testing 
  • Support for exploring and coded checks in testing
  • Not silver bullet 
  • Way to add efficiency through code 
How to decide when to automate a particular test and when to do manual test instead? 

  • Human led testing could include writing automation 
  • Definition of manual or automation is a myth, it's both 
  • Roadmap to turn manual to automation, not a journey of learning how to do real work for you 
  • Overanalysing is not route to success - try test suite canvas by Ashley Hunsberger 

What if I figure that test case becomes more complicated to automate than I thought originally? 

  • Start with feasibility study 
  • Try doing it and figure out that your skills and knowledge aren't enough? Ask around when you hit a roadblock? Step through and start learning. 
  • Too much studying not enough of real work that stays around, fail to learn a little 
  • Collaboration up front, talk with developers - failing early is always an option 
  • Learning from people around you on the ideas of what to try first is not feasibility study, it's collaboration and whole team work on important problems

What is the biggest challenge to test automation and how to overcome it? 

  • They are about people, culture and communication 
  • Misunderstanding about automation not needing maintenance 
  • Manager or colleagues who say let's automate everything 
  • Getting system into the right state to start the actions has a lot to setup, testability interfaces, knowledge from various teams 
  • Success turning to failure overnight, dualism for the organizations, surviving generations of testers and the move to whole team automation 

Does every tester need to be coder in a future? 

  • More than now, industry doubles and we need different ways of starting 
  • Some testers don't know they are testers, and some of them are paid less than us testers 
  • Future makes every coder a tester, and already more of them are than we give them credit for - code aware and not 100% if anyone's job 
  • Overfocus is exclusionary and stops otherwise fantastic people from starting at this industry - no need to code in binary, not necessary to always code in java 
  • Coding is an entry level job - I want both, and teaching basic programming is simple 
  • Advancing in companies this works well for all kinds of testers, but it's broken in recruitment 
  • We ask for the wrong things in the recruitment, recruiters need to understand testing not just automation
  • Coding in test jobs is wrong kind of tasks, like a bubble sort Is test automation is ultimately a way of cutting costs? 
  • Augments what you're doing - cutting costs in one place (maintenance changes) to have more of it elsewhere (new value) 
  • 50% of jobs changed, only 5% replaced by automation 
  • In isolation it does not save time, cost or improve quality, must be supported by other activities 
  • Operating test automation in a silo away from the product development does not produce greatest results 

How can you balance need for speed and efficiency and maintaining high level of test coverage?

  • Everything is code nowadays, now you see changes somewhere with IaC and I don't need to run a test if there is no change, slow down the time, control the change 
  • Taking control of the world around me as part of automation efforts 
  • If you think test automation is the goal not a tool that adds more effective testing 
  • Testing isn't the goal, it's a goal we use. Even software development is not the goal. It's the value we are creating for customers. 
  • 1250 tests, 20 times a day, learning after having already invested in small scale feedback programmatic tests, keeping us on the right track. 
  • Misselling stream of information as silver bullets of these techniques exists 

If we focus on test automation, are we risking that we don't focus on other things? 

  • Yes, and. Not either or. 
  • Testing plays an important part in improving quality, but it's not the whole picture. 
  • Red-Yellow-Green call. Red lead to punishing testers. 
  • Watermelon tests, testing your tests 
  • Had multiple levels of teams of testers. Effort but needing explainer. 

Is test automation friend or foe? 

  • Friend. High maintenance friend. A needy one. 
  • Friend. Work on a mutually beneficial with boundaries or you will fall out. 
  • Foe. Respectful even if you don't like it. Collaborate with the enemy when stakes are high.
  • Friend. Friend who likes to be pampered. Can turn into a foe.

Thursday, June 27, 2024

Well, did AI improve writing test plans?

Over 27 years of testing, I've read my fair share of test plans, and contributed quite a chunk. I've created templates to support people in writing it slightly better, and I've provided examples. And I have confessed that since it's the performance of planning not the writing of the plan that matters, some of the formats that support me in going my best work in planning like the one-page-master-test-plan have created some of the worst plans I have seen from others. 

Test plans aren't written, they are socialized. Part of socializing is owning the plan, and another part is bringing everyone along for the ride while we live by the plan. 

Some people need a list of risk we are concerned for. I love the form of planning where we list risks, and map that to testing activities to mitigate those risks. And I love making documents so short that people may actually read them. 

In these few years of LLMs being around, obviously I have had to both try generating test plans and watch others try to ask me to review plans where LLM has been more of less helpful.

Overall, we are bad at writing plans. We are bad at making plans. We are really bad at updating plans when things change. And they always change. 

I wanted to make a note on did AI change it already?

  • People who were not great at writing plans before are not great in a different way. When confronted with feedback, they now have the Pinocchio to blame for it. Someone else did it - the LLM. If the average of the world does this, how can I make the point of not being happy with it. And I can be less happy with the responsibility avoidance. 
  • People who need to write plans that were not really even needed except for the process are now more efficient in producing the documentation. If they did not know to track TFIRPUSS (quality perspectives) in their plans, at least they are not missing that when they ask the tool. The difference still comes from the performance of continuous planning and organizing for the actions, rather than the act of writing the plan. 
  • Detailed test ideas for those with least ideas are already better in per-feature plans. Both those who were not so great are doing slightly better with generated ideas and those who were great become greater because they compete with Pinocchio. 
I am worried my future - like my past - is in reviewing subpar plans, but now with externalized responsibility. Maintaining ownership matters. Pinocchio is not a real boy, we're just at the start of the story. 


Monday, June 24, 2024

Testing gives me 3x4

You know how you sometimes try to explain testing, and you do some of your better work in explaining it as response something that is clearly not the thing. This happened to me, but it has been long enough since, and what time gives me is integration of ideas. So this post is about showing you the three lists that now help me explain testing.

Speaking of testing to developers, I go for the list of what developer-style test automation, especially done test-driven style, gives me. It gives me a specification that describes my intent as developer. It gives me feedback if I am making progress to my specification. Well, it also gives me a concrete artifact to grow as I am learning, an intersection of specification and feedback. Since I can keep the tests around, their executable nature gives me protection for regression, helping me track my past intent. And when my tests fail, they give me granularity on the exact omission that is causing the fail. At least if I did those well. 



The lists helps me recall why would I want to make space for test-driven development, or capturing past intent in unit / api level approval tests of behaviors someone else had intent for, even if I am not using energy to track back to those. 

When I look at this list, I think back to a Kevlin Henney classic reminder: 

"A majority of production failures (77%) can be reproduced by a unit test."  
The quote reminds me of two things: 
  1. While majority can be reproduced, we get to these numbers in hindsight looking at the escapes we did not find with unit tests before production.
  2. The need of hindsight and the remaining significant portion indicate there is more. 
What we're missing is usually framed not in feedback, but in discovery. I tend to call that discovery exploratory testing, some others tend to call it learning. If I wanted to have a list of what this style of testing, one that frames testing as performance with the product and components as external imagination, I again have a list of four things it gives me. It gives me guidance, like a vector of possible directions but one that still requires applying judgment. It gives me understanding, seeing how things connect and what it brings in as stakeholder experience. User, customer, support, troubleshooting, developer... All relevant stakeholders. It gives me models on what we have and how it's put together, what is easy and what is hard to get right. And it gives me serendipity, lucky accidents that make me exclaim: "I had no idea these things would not work in combination". 



The two lists are still as helpful to me as they were when I created them, but a third list has emerged for me since from my work with contemporary exploratory testing. I find that the ideas of spec-feedback-regression-granularity miss out on things we could do on artifact creation in frame of exploratory testing, and have been cultivating and repeating a third list. 

In frame of contemporary exploratory testing, programmatic tests (test automation) gives me more. It gives me documenting the insights as I am learning them, and it gives me that spec-feedback-regression-granularity loop. It's one of the models, one that I can keep around for further benefit. It gives me extending reach, be it testing things in scope I already have automated again and again, or be it taking me to repetition I would not do without programmatic tests for new unique scenarios, or even getting me to a starting place where I can start exploring for new routes, making space for that serendipity. It gives me alerting to attend when a test fails, alerting me that someone forgot to tell me something so that I can go and learn, or that there's a change that breaks something we care for. It keeps my mind free for deeper, wider, more, trusting the tingly line red over a past green. And finally, it gives me guiding to detail. I know so much more about how things are implemented, how elements are loaded, when they emerge, and what are the prerequisites I could easily skip if I didn't have the magnifying element programmatic tests bring in. 
All of my tests don't have to stay around in pipelines. Especially when extending reach, I am likely to create programmatic tests that I discard. Ones that collect data for a full day, seeking for a particular trend.

Where I previously explained testing with two lists of four, I now need three lists of four. And like my lists were created as a response, you may choose to create your own lists as response. Whatever helps you - or in this case me - explain that the idea you don't need manual testing at all relies on the idea that creating your automation is framed as a contemporary exploratory testing process. Writing those programmatic tests is still, even in genAI era, an awfully manual process of learning about how to frame our intent. 

Friday, June 21, 2024

Memory Lane 2005: ISTQB Foundation Syllabus, Principles of Testing

Some people create great visuals. One of those that I appreciated today was to turn 7 principles of testing into a path around number 7, by Islem Srih. As my eyes were tracking through the image, a sense of familiarity hit me. This is the list of 7 I curated for ISTQB Foundation Syllabus 1st edition in 2005. The one I still joke about holding copyright to, since I never signed off my rights. ISTQB did ask, I did not respond. 

Also, it is a list that points to other people's work. Back then, as a researcher, I was collating not reimagining. Things I consider worth my time has changed since, and I would not contribute to ISTQB syllabi anymore.

Taking the trip back memory lane, I had to look at what I started with to build the 7 principles. I recognize editions since have changed labels - thing I called *pesticide paradox* to honor Boris Beizer's work is now "tests wear out".  I'm pretty sure I would have called Dijkstra's on the absence-not-presence principle as he is the originator of that idea, and I find it impossible to believe I would have penciled in [Kaner 2011] for something that is very obviously [Kaner 2001]. It is incorrect in the latest published syllabus. Back then I knew who said what, and had nothing to say myself as I imagined it was all clear already. Little did I know... 

The 7 principles that made the cut were ones where we did not have much of arguments. 



I have the 2005 originals that I was no longer able to find online easily. I can confirm I did not mess up the references, because I did not put in the references. Now that I look at this, the agreement was not to put in references unless they were recommended reading to complete the syllabus.
I also went back to look at principles I started working from to get to these 7. The older set of principles is from ISEB Foundation. 

Comparing the two

ISTQB:

  • Presence not absence
  • Exhaustive is impossible
  • Early testing
  • Defects cluster
  • Pesticide paradox
  • Context-driven
  • Absence of errors fallacy
ISEB

  • Exhaustive testing is usually impractical
  • Testing is risk-based and risk must be used to allocate time
  • Removal of faults potentially improves reliability
  • Testing is measurement of quality
  • Requirements may determine the testing performed
  • Difficult to know how much is enough
It was clear we did not agree on the latter. While these may be principles that are much better than X normally sees as per one of the comments on the post that inspired me to take this memory lane trip, I am no longer convinced they are truly the principles worth sharing. But they do make rounds. I am also not convinced the additions and explanations improved them, at least on their generalized nature. 

In hindsight, I can see what I did there. Grounded the syllabus a bit more on things in research. Some of which I don't think we've properly done to this day (early testing for example - jury is still out; clustering of defects feels more like a folklore than principle; pesticide paradox more of an encouragement to explore than tests actually becoming ineffective).

Perhaps revisiting what is truly true would now be good. Now that I no longer think my greatest contribution to testing is to know what everyone else says and not say anything myself. 

Two Years with Selenium Project Leadership Committee

Today I am learning that I am not great - even uncomfortable - using voice of an organization. And for two years, I have held a small part of voice of an organization, being a member of the Selenium Project Leadership Committee. I have been holding space and facilitating volunteer organizations for over three decades, and just from my choice of words you note that I don't say I have been leading or running or managing even those probably would be fair words for other people to assign watching the work I do.

Volunteer organizations are special. It is far from obvious that they stay around and active for two decades, like Selenium project has. It is far from obvious they survive the changes of leadership, but for an organization to outlive people's attention span, it is necessary. It is also far from obvious that they manage to navigate the complex landscape of individual and corporate users, their hopes and expectations, other collaborators on solving similar problems, and create just enough but not too much governance to have that voice of an organization. 

If I was comfortable using voice of an organization, I would not be nervous speaking in Selenium Conference 2024 today with Diego Molina and Puja Jagani on 'State of Union' keynote. Similarly, I might find it in me to write a post in the official Selenium Blog since I even have admin accesses to Selenium repos on GitHub. But it's just so much easier to avoid all the collaboration and write in my own voice and avoid the voice of an organization that I always hold in then back of my mind - or my heart. 

In 2018 Selenium Conference India I was invited to keynote, and I chose to speak about Intersection of Automation and Exploratory Testing. Back then I did not know I would end up combining the paths under the flag of Contemporary Exploratory Testing and I most definitely did not know that the organizers inviting me to keynote would bring me to an intersection where my path aligned with the Selenium Project. 

In 2018 I keynoted. A little later, I volunteered with program committee reviewing proposals, again invited by the people already in. And in 2022, I ended up with Selenium Project Leadership Committee (PLC), again on an invite. 20 years of Selenium now in 2024 marks 2 years in PLC for me. 

Today I am wondering what do I have to show for it. And like I always do when I start that line of thinking, I write it down. With my voice. 


Making sense to what PLC (project leadership committee) is in relation to TLC (technical leadership committee) wasn't an easy one to figure out. There's the code, packaging that to releases and the related documentation and messaging, and all of that is lead by TLC. If there is no software, there is no software project. Software that does not change is dead, and in two years I have reinforced the understanding that Selenium is not dead, or dying. It's alive, well, and taking next steps in fairly complex world of cross-browser standards-based agreements to then implement something everyone, not just the project's immediate web driver implementations use. Selenium project is the pioneer and collaborator in building up the modern web. 

The first thing I have to show for two years of Selenium PLC is to understand that what Selenium is. That is many things. It is multiple components to this entire ecosystem. It's working for standards to unite browsers. It's community of people who find this interesting enough to do this as a hobby, volunteering, unpaid. It's where we grow and learn together. It's taking pride in inspiring other solutions in browser automation space whether they are built on top of Selenium or if there's other choices than trying to get real browsers continue on their path of differentiation but still allow for automating real browsers with a unified web driver library. 

What have I been doing in the project then, other than making sense of it? 

  • Realizing there is a lot of emails people want to send to "Selenium". There's a lot of requests on privacy and security assessments that organizations using Selenium think they can just ask "the company" - where there is no company. There's a lot of proposals for collaboration, especially in marketing. And there's donations / sponsoring - because even a volunteer project needs money to pay for its tools and services. 
  • Fighting against becoming the "project assistant". Getting people to get together, someone needs to schedule it. Calendars as self-service aren't a thing everywhere in the busy tech world. After fighting enough personal demons, I did end up putting up a calendar invite and taking up the habit of posting on slack after every conversation to share what the conversation was about. Seems that bit is something I have enough routine to run with. 
  • Picking and choosing what/how I want to contribute: I wanted to move from face to face conferences and online broadcasts to online conversations, and set up Selenium Open Space Conference for it. I loved the versatility of sessions, and particularly the one person showing up with his own open source project he would walk different people through in various sessions. I also loved learning about how Pallavi Sharma ended up publishing three books on Selenium, and what goes on in the background of authoring books. 
  • Picking and choosing what/how I want to contribute: setting up micro sponsorship model and starting steps towards full financial transparency. I will need other two years to complete the full financial transparency, but that is something I believe in heavily. I want it to be normal to say that Selenium project holds $500k on Software Freedom Conservancy. You can dig out that info from their annual reports. But I also want to say that in addition to what is held there, it would all be visible on Open Collective. We have raised $1,684.49 since we started off with the Selenium community money, and there's many steps to take to move host of that money to be Selenium/BrowserAutomation Inc with every transaction transparently visible. 
  • Fiscal hosting ended up being something I specialize in. We have five people in PLC, where two of them do double duty with TLC, and every one of us holds paying job with relevant levels of responsibility in different companies. Within the two years we had also two more people in PLC, Bill McGee and Corina Pip, brilliant folks we now hold space to become free from other engagements to rejoin our efforts, after some fiscal hosting / governance stuff is better sorted. Knowing where you are is start of sorting it out. 
  • Setting up Selenium in Mastodon. We could really use someone taking care of Selenium in Social Media overall, but I did create a placeholder for content even if my content-producing with voice of organization is sporadic at best. 
  • Introducing a new class of sponsors, where companies sponsor with allocating people to contribute significant work time to project and be recognized for it. Not like I did this alone, but working in a user organization rather than vendor organization, I just happen to be able to hold space for something like this by positioning. 
  • Dealing with people. Some of that is overwhelming, but at the same time necessary. People have ideas, and sometimes the ideas need aligning. Mostly I listen but I would also step in to mediate. And have been known to do so. 
If there is a future I have an impact on for Selenium,  it will be: 
  1. Diverse Community Centric and focused on Collaboration. 
  2. Fiscally transparent and easy to access for taking stuff forward. 
  3. Worthwhile for users and contributors - individual and corporate. 
Some days - most days - I feel completely insufficient with the amount of time I can volunteer. But it adds up. And it will add up more. The scale in which it matters never ceases to amaze me. 2.5M unique monthly users. Real polyglot support with bindings for python, java, .NET, javascript, ruby, php... 16M Python downloads. 98.5M Java downloads. 

It takes a village, and I am showing up with you all, inviting you along. 

Thursday, June 20, 2024

Good Enough Quality Is Taking a Dip

A colleague in the community was reporting on her experience teaching tech for a group of kids, and expressing how tech makes it hard to love tech sometimes, and her post pushed me to think about it. This is the experience a lot of us have. 

We want to show a newbie group on how to run with a lovely new tool, only to realize that to run the new tool, we first have to go round whatever troubles come our way. Some of the troubles come from the differences of our environments. Some from the fact that no one updates and maintains some computers. Some from the choices of features (ads in particular) that get in the way. It's not enough if the thing we're building works, but the operating system underneath and even antivirus running in the background are equally part of the users' experience. 

Those of us teaching in tech have found workarounds for environmental differences, my favorite being to work on either my computer through ensemble programming, or if I want to teach newbies for real, working on their computers and making time to solve real life problems they will face for things being different. 

Quality, while good enough, is currently not great. And it has not only been getting better in the recent years, even if some people speak of AI in terms of it being today the worst it will ever be. 

Looking at AI-enabled products, very few of them are really making things better. 

I was purchasing a lot of books during my vacation time from Amazon, and seeing the AI-generated summary posts on top of reviews wasn't helping me with my selections. I went for #booktok as AI renders the first recommendation pretty much useless to me - I don't want summary, I want to find someone like me. 

Trying out new AI-powered search from Google wasn't producing better results, and I am pretty sure they too know that the AI-powered one is doing worse than the classic one. 

Apple just announced introducing - finally - some AI-powered features with chatGPT integration, looking like a placeholder to grow worthwhile features in, but other than that giving appearance of little usefulness to the product. 

Everyone seeks uses of AI. Experiments with AI. I do too. I would still report than in the last two years of actively having used AI-tools, it has not added to my productivity, but it has added sometimes to my quality (I'm really good at not being happy with AI's results) and it has added to my fun and enjoyment. 

Current uses of AI are driving down experience we have on good enough quality. To include a placeholder for AI to grow - assuming it is only getting better - we make quite significant compromises in the value our products and services provide. The goal for AI-placeholders is not to make things better today. It is to make space to make things better soon, and meanwhile we may take a dip in the experience of relevance for users. 

For years, I have come to note that truly seeking good quality is not common. Our words to explain the level we are targeting or achieving are hidden behind the fuzzy term "quality". Even bad quality is quality. Making things fun and entertaining is quality. 

Our real target may be usefulness and productivity. Being able to do things with tech we weren't able to do before, and being able to do things we really need to do. While we find our way there with GenAI, we're going to experience a dip on good enough quality to allow for large scale experimentation on us.  

 

Thursday, June 6, 2024

GenAI Pair Testing

This week, I got to revisit my talk on Social Software Testing Approaches. The storyline is pretty much this: 

  • I was feeling lonely as the only tester amongst 20 developers at a job I had 10 years ago. 
  • I had a team of developers who tested but could only produce results expected from testing if they looked at my face sitting next to them, even if I said nothing. I learned about holding space. 
  • I wanted to learn to work better on the same tasks. Ensemble programming became my gateway experience to extensive pair testing beyond people I vibe with, and learning to do strong style pairing was transformative. 
  • People are intimidating, you feel vulnerable, but all of the learning that comes out is worth so much.
As I was going through the preparations for the talk, I realized something has essentially changed since I delivered that talk last time. We've been given an excuse to talk to a faceless, anonymous robot in form of generative AI chatbots. The success I had described as essential to strong-style pairing (expressing intent!) was now the key differentiator on who good more out of the tooling that was widely available. 

I created a single slide to start the conversation on something that we had been doing: pairing with the application, sometimes finding it hard to not humanize a thing that is a stochastic parrot, even if a continuously improving predictive text generator. Realizing that when pairing with genAI, I was always the navigator (brains) of the pair, and the tool was the driver (hands). Both of us would need to attend to our responsibilities but the principle "an idea from my head must go through someone else's hands" is a helpful framing.


I made few notes on things I at least would need to remember to tell people about this particular style of pairing. 

External imagination. Like with testing and applications, we do better when we look at the application while we think about what more we would need to do with the application, genAI acts as external imagination. We are better at criticizing something when we see it. With a testing mindset, we expect it to be average, less than perfect, and our job is to help it elevate. We are not seeking boring long clear sentences, we are seeking the message core. We search boundaries, arguing with the tool on different stances and assumptions. We recognize insufficiency and fix it with feedback, because average of the existing base of texts is not *our* goal. We feel free to criticize as it's not a person with feelings, taking possibly offense when we are delivering the message on the baby being ugly. We dare to ask things, in ways we wouldn't dare to ask from a colleague. We're comfortable wasting our own time, but uncomfortable taking up space of others days. 

Confidentiality.We need to remember that it does not hold our secrets. Anything and everything we ask is telemetry, and we are not in control of what conclusions someone else will draw on our inputs. When we have things we would need to say or share that we can't share outside our immediate circle, we really can't post all that for these genAI pairs. But for things that aren't confidential, it listens to more than your colleague in making its summaries and conclusions. And there is an option of not using the cloud-based services but hosting a model of your own from Hugging Face, where whatever you say never leaves your own environment. Just be aware. 

Ethical compensations. Using tools like this changes things. The code generation models being trained on open source code change the essential baseline of attribution that enables many people to find things that allow them the space of contributing to open source. These tools strip names of people and the sense of community around the solutions. I strongly believe we need to compensate. At my previous place we made three agreements on compensation: using our work time on open source; using our company money to support open source; and contributing to body of research not the change these tools bring about.  Another ethical dilemma to compensate for is energy consumption of training models - we need to reuse not recreate, as one round of training is said to take up energy equivalent of car trip to moon and back. While calling for reuse, I am more inclined to build forward the community-based reuse models such as Hugging Face over centralizing information to large commercial bodies with service conditions giving promises what they will do with out data. And being part of underrepresented group in tech, there's most definitely compensations needed on the bias embedded in the data to create world we could be happy with. 

Intent and productivity. Social software testing approaches have given me ample experiences of using tools like this with other people, and seeing them in action with genAI pairing. People who pair well with people, pair better with the tools. People who express intent in test-driven development well and clearly, get better code generated from these tools. The world may talk of prompt engineering but it seems to be expressing intent. Another note is on the reality of looking at productivity enhancements. People insist they are more productive, but a lot of the deeper investigations show that there is creative bookkeeping involved. Like fixing a bug takes 10 minutes AFTER you know exactly which line to change, and your pipelines just work without attending. You just happen to use a day in finding out the line to change, and another on care of the pipeline on whatever random happened on it being on your watch. 

These tools don't help me write test cases, write test automation and write bug reports. They help me write something of my intent and choosing which is rarely test case or automation, they help me summarize, understand, recall from notes, ideate actions and oracles, scope to today / tomorrow - this environment / the other - the dev / the tester, prioritize and filter, and inspire. They help me more when I think of testing activities not as planning - test case design - execution - reporting, but as intake - survey - setup - analysis - closure. Writing is rarely the problem. Avoiding writing and discovering information that exists or doesn't, that is more of a problem. Things written are a liability, someone needs to read them for writing them to be useful. 

Testing has always been about sampling, and it has always been risk-based. When we generate to support our microdecisions, we sample to control the quality of outputs. And when things are too important to be left for non-experts, we learn to refer them to experts. If and when we get observability in place, we can explore patterns after the fact, to correct and adjust. 

I still worry about the long term impacts of not having to think, but I believe criticizing and thinking is now more important than ever. The natural tendency of agreeing and going with the flow has existed before too. 

And yes, I too would prefer AI to take care of housekeeping and laundry, and let me do the arts, crafts and creative writing. Meanwhile, artificial intelligence is assisting intelligence. We could use some assisting day to day. 

Friday, May 31, 2024

Taking Testing Courses with Login

When we talk about testing (programming) courses we've taken, I notice a longing feeling in me when I ask about the other's most recent experience. They usually speak of the tool ("Playwright!", "Robot Framework with Selenium and Requests Library!", "Selenium in Java!") whereas where I keep hoping they would talk about is what got tested and how well. From that feeling of mine, a question forms: 

What did the course use as target of testing in teaching you? 

For a while now, I have centered my courses around targets of testing, and I have quite a collection. I feel what you learn depends on the target you test. And all too many courses leave me unsatisfied with the students with their certificates of completion, since what they really teach is operating a tool, not testing. Even for operating a tool, the target of testing determines the lessons you will be forced to focus on. 

An overused example I find is a login page. Overused, yet undereducated. 

In its simplest form, it is this idea of two text fields and a button. Username, password, login. Some courses make it simple and have lovely IDs for each of the fields. Some courses start of making locators on the login page complicated so clicking them takes a bit of puzzle solving. In the end, you manage to create test automation for successful and unsuccessful login, and enjoy the power of programming at your fingertips - now you can try *all the combinations* you can think of, and write them down once into a list. 

I've watched hundreds of programmed testing newbies with shine in their eyes having done this for their first time. It's great, but it is an illustration of the tool, it's not what I would expect you to do when hired to do "testing". 

Sometimes they don't come in the simplest form. On a testing course targets, the added stuff scream education. Like this one. 


From a general experience of having seen too many logins, here's things I don't expect to see in a login and it's missing things that I might expect to see if a login flow gets embellished for real reasons. If you're take on automating something like this is that you can automate it, not that it has stuff that never should be there in the first place, you are not the tester I am looking for. 

Let's elaborate the bugs - or things that should make you #curious like Elizabeth Zagroba taught on her talk at NewCrafts just recently. You should be curious on: 

  • Why is there a radio button to log in as admin vs. user, and why is Admin the default? There are some but very few cases where the user would have to know and asking that in a login form like this is unusual at best, but also only the minority users who are both would naturally have a selection like this. For things where I could stretch my imagination to see this as useful, the default would be User. The judgmental me says this is there to illustrate how to programmatically select a text box
  • Why is there dropdown menu? Is that like a role? While I incline to think this too is there to illustrate how to programmatically select from list I also defer my judgement to the moment of login in. Maybe this is relevant. Well, was not. This is either half of an aspired implementation or there for demo purposes. And it's missing label explaining it, unlike the other fields. 
  • Why is there terms and conditions to tick? I can already feel the weight of the mild annoyance of having to tick this every single time, with changing conditions hidden in there, and you promising your first borne child is yet another Wednesday some week. The judgmental me says this is here to show functional problem of not requiring ticking it when testing. And the judgmental me is not wrong, login works just fine without what appears to be compulsory acceptance of terms, this time with default off to communicate higher level of committing when I log in. 
The second level judgement I pass upon people through this is that testers end up overvaluing being able to click when they should focus on needing to click and waste everyone's time with that and this is a trap. I could use this to rule out testers except overcoming this level of shallowness can be taught in such a short time that we shouldn't gatekeeper on this level of detail. 

I don't want to have the conversation of not automating this either. Of course we automate this. In the time I am writing this, I could already have written a parametrized test with username and password as input that then clicks the button. However, I'd most likely not care to write that piece of code. 

Login in a concept of having authentication and authorization to do stuff. Login is not interesting in its own right, it is interesting as a way of knowing I have or don't have access to stuff. Think about that for a moment. If your login page redirects you to an application like this one did, is login successful? I can only hope it was not on the course I did not take but got inspired on to write this. 

I filled in the info, and got redirected on a e-store application. However, application URL and another browser, I get to use the very same application without logging in. I let our a deep sigh, worried for the outcome of the course for the students.

Truth be told, before I got to check this I already noted the complete absence of logout functionality. That too hinted that the login may be an app of its own for testing purposes only. Well, it does illustrate combinations you can so easily cover with programmatic tests. What a waste.  

What work in projects around login looks like, really? We can hope it looks like taking something like Keycloak (an open source solution in this space), styling a login page to look like your application, avoiding the thousands of ways you can do login wrong. You'll still face some challenges but successful and failing login aren't the level you're expected to work on. 

What you would work on with most of your programmatic testing is the idea that nothing in the application should work if you aren't authorized. You would be more likely to automate login by calling an API endpoint giving you a token you'd carry through the rest of your tests on the actual application functionality. You'd hide your login and roles into fixtures and setups, rather than create login tests. 

The earlier post I linked above is based on a whole talk I did some years back on the things that were broken in our login beyond login. 

Learn to click programmatically, by all means. You will need it. But don't think that what you were taught on that course was how to test login. Even if they said they did, they did not. I don't know about this particular one, but I have sampled enough to know the level of oversimplification students walk away with. And it leads me to thinking we really really would need to do better in educating testers. 

Tuesday, May 28, 2024

Compliance is Chore, not Core

It's my Summer of Compliance. Well, that's what I call the work I kicked off and expected to support even when I switch jobs in the middle. I have held a role of doing chores in my team, and driving towards automating some of those chores. Compliance is a chore and we'd love if it was a minimized to invisible, while producing the necessary results. 

There is, well, for me, three compliance chores. 

There's the compliance to company process, process meaning this is required of development in this company. Let's leave that compliance for another summer. 

Then there's the two I am on for this summer, open source license compliance and security vulnerability handling. Since a lot of the latter is from managing the supply chain of other people's software, these two kind of go hand in hand. 

You write a few line of code of your own. You call a library someone else created. You combine it with an operating system image and other necessary dependencies. And all of a sudden, you realize you are distributing 2000 pieces from someone else. 

Dealing with the things others created for compliance is really a chore, not core to the work you're trying to get done. But licenses need attending to and use requires something of you. And even if you didn't write it, you are responsible for distributing other people's security holes. 

Making things better starts with understanding what you got. And I got three kinds of things: 

1. Application-like dependencies. These are things that aren't ours but essentially make up the bones of what is our application. It's great to know that if your application gets distributed as multiple images, each image is a licensing boundary. So within each image with application layer, you want to group things with copyleft (infectious license) awareness. 

2. Middleware-like dependencies. These are things that your system relies on, but are applications of their own. In my case, things like rabbitMQ or keycloak. Use 'em, configure but that's it. But do distribute and deploy, so compliance needs exists. 

3. Operating system -like dependencies. Nothing runs without a layer in between. We have agreements on licensing boundary between this and whatever sits on top. 

So that gives us boundaries horizontally, but also vertically in a more limited degree. 

Figuring out this, we can describe our compliance landscape.


The only format this group in particular redistributes software is executables (the olden way) and images (the new way). Understanding that. these get built up was part of the work to do. We identified inputs, outputs we would need and impacts we seek on us having to create those outputs. 

I use the landscape picture to color code our options. Our current one "scripts" takes source code and dependencies as input, ignores base images and builds a license check list and license.txt - with a few manual compliance checks on knowing what you seek in the patterns of license change. It's not hard work, but it's tedious. Fails for us on two of the impacts -- we do chore work to create and maintain the scripts, unable to focus on core; requires manual work every single time. 

We're toying with two open source options: ORT (Open Source Review Toolkit) that shows promise to replace our scripts, and possibly extend to image based scans as its open source project. Does not really come best wrapped as service. Syft+Grype+GoTemplates that seems to some of the tricks, but leaves things open in the outputs realm. 

And then we're toying with an open source service offering, where money does buy you a solution, with FOSSA. 

I use the word toying as an uncommittal way of discussing a problem I have come to understand in last weeks. 

Running a compliance scan for base images, there is significant differences in numbers of components identified with Syft vs. Docker SBOM vs. Docker Scout vs. self-proclaimed at the source. There's quality assessment tool for SBOMs that does checking for many other things but not correctness showing significant other differences. And that is just the quality of the SBOM piece of the puzzle. 

We started off with the wrong question. It is no longer a question if we are "able to generate SBOMs" but instead we are asking: 
  • should we care that different tools provide different listing for same inputs, as in, are we *really* responsible for quality or just the good faith effort 
  • how we should make those available next to you our releases 
  • how we scale this in an organization where compliance is chore not core 
 This 'summer of compliance' is forcing me to know more of this than I am comfortable with. When quality matters, it becomes more difficult. *If* it matters.