Tuesday, December 6, 2022

There is such a thing as testing that is not exploratory

The team had been on a core practice of clarifying with tests for a while, and they invited an outsider to join their usual meeting routine.

Looking around, there were 8 people in a call. The one in charge of inviting the meeting shared their screen, for what was about to be their routine of test design sessions. He copied the user story they had been assigned to work on into the Jira ticket he had open, and called the group for ideas of tests. 

People started mentioning things that could be tried, and the one in charge wrote things down as the rest of the group watched. For a high level idea, he called for the detailed steps. Every idea that came about was written down and detailed. 

After the meeting, the group would split the work to automate it all. 

And two weeks later, they had all these tests passing, and a feature that just did not work.

The magic glue they miss is what I call exploratory testing. The sense of caring for results in testing by focusing on learning and recognising that the time most people create test cases like above, it is the time when they know the least.

You can add exploratory testing on top of this.

You can lighten up the details you want to write to start with to leave more room for exploratory testing, where output of your testing is the documentation. 

You can seek to optimize for learning, being aware of the time used.

The team that follows the planning pattern did not do exploratory testing. You could argue the next team using their results did exploratory testing through trying to use it, to tell them it is failing. 

Working with teams like this is real. It is embarrassing but it is real. And we don't change it by playing with words, but by making the results we expect clear and permissive to excellence. 

This story, unfortunately, was brought to you by the "tell no stories that did not happen" and "only stories from last two years allowed" rules. This is real. This is what people still make of testing, and some folks meekly follow the agreed practice.  

Monday, December 5, 2022

Jokes and Poetry from a Stochastic Parrot

Tell a joke about a programmer and a tester attempting to enter Heaven and having to confess their sins.

Two programmers were on their way to Heaven and they came to a fork in the road. One path led to Heaven, the other to Hell. The programmer who had been a tester in life said, "I'm not taking any chances. I'm going to test both paths to see which one is safe." The programmer who had been a developer in life said, "You fool! We don't have time for that. We need to take one path and trust that it leads to Heaven." The tester replied, "Trust, but verify."


A programmer and a tester were standing at the Gates of Heaven, waiting to confess their sins. The programmer went first and said, "I wrote code that I knew was buggy and released it anyway." The tester said, "I found the bugs in that code and didn't report them." The Gates of Heaven opened, and the programmer was welcomed inside. The tester was shocked and asked, "Why was he allowed in and not me?" The gatekeeper replied, "We already have plenty of bugs in Heaven. What we need are more testers."


A programmer and a tester were arguing about whether it was better to be a programmer or a tester. The programmer said, "Programmers are the ones who create the software that makes the world go round. Without us, there would be no progress or innovation." The tester replied, "That may be true, but without testers, all of your buggy code would never be caught and fixed. Testers are the ones who ensure that the software works properly and is of high quality." The programmer thought for a moment and then said, "I suppose you have a point. I guess it's a good thing that programmers and testers need each other to make the world a better place."

Make it more funny.

A programmer and a tester were on their way to Heaven, and they came across a sign that said "Heaven: 10,000 miles." The programmer looked at the sign and said, "I don't believe it. They've made a mistake. Heaven can't be that far away." The tester said, "Of course it is. They've measured the distance with a ruler, and it's accurate to within plus or minus an inch." The programmer rolled his eyes and said, "That's why I always use double-precision floating-point numbers. They're much more precise." The tester replied, "Yes, but they can't measure the distance to Heaven. Only faith can do that."

Ok, just one more.  

A programmer and a tester were standing in line at the Gates of Heaven, waiting to confess their sins. The programmer went first and said, "I wrote code that was full of bugs and never tested it." The tester said, "I tested the code and found the bugs, but I never reported them." The Gates of Heaven opened, and the programmer was allowed inside. The tester was shocked and asked, "Why was he allowed in and not me?" The gatekeeper replied, "We have plenty of bugs in Heaven. What we need are more testers to help us find and fix them."

You already told that, I want a new one.

I'm sorry, I don't have access to new information at this time. I am a large language model trained by OpenAI, and my knowledge is limited to what I was trained on. I don't have the ability to browse the internet or access new information, so I can only provide responses based on what I already know. Is there something else I can help you with?

One more joke about testers and programmers, please. 

A programmer and a tester were at a restaurant, and the waiter brought them a menu. The programmer looked at the menu and said, "I don't see any bugs on this menu. This must be a high-quality restaurant." The tester replied, "That's what you think. I've already found three bugs on the menu: the prices are incorrect, the descriptions are misleading, and the options are limited. I'm not sure I want to eat here." The programmer laughed and said, "I guess that's why we need both programmers and testers. Without us, the world would be full of bugs and errors."

 Give me a poem about exploratory testing.  

Thank you, OpenAI ChatGPT.  The tools we use are changing the world in ways we don't yet understand. Whose jokes and poems are these, or are these indeed "AI originals"? 

On more serious note, you should see what it does with advent of code day 5 on day 5. A separate leaderboard for AI and humans does not quite seem like the solution. 

Friday, November 18, 2022

1-2-3 model to test coverage

This afternoon I jumped on a call with a colleague from the community at large. This one had sent me a LinkedIn message asking to talk about test coverage, and our previous correspondence was limited. And like I sometimes do, I said yes to a discussion. After the call, I am grateful. For realizing there is a 1-2-3 model of how I explain test coverage, but also for the conversation channel that helps me steer to understanding, starting from where ever whoever is. 

The 1-2-3 model suggests there is one true measure of test coverage. Since that is unattainable, we have two we commonly use as starting point. And since the two are so bad, we need to remember three more to be able to explain further to people who may not understand the dimensions of testing. 

The One

There is really one true measure of coverage, and it is that of risk/results coverage. Imagine a list of all relevant and currently true information about the product that we should have a conversation on listed on a paper - that is what you are seeking to cover. The trouble is, the paper when given to you is empty. There is no good way of creating a listing of all the relevant risks and results. But we should be having conversation on this coverage, here is how. 

If you are lucky and work in a team where developers truly test and care for quality, the level of coverage in this perspective is around the middle line in the illustration below. That is a level of quality information produced by a Good Team (tm). The measure determining if we indeed are with a Good Team (tm) is sending someone Really Good at testing after them. That Really Good could be a tester, but I find that most testers find themselves out of jobs with good teams - the challenge level is that much higher. Or that Really Good could be all your users combined over time, with an unfortunate delay in feedback and higher risk of the feedback being lost in translation. 

I call the difference between the output for a Good Team (tm) and the quality where our stakeholders are really happy, even delighted the primary Results Gap. There are plenty of organizations who are not seeking to do anything with this results gap themselves but leave it to their users. That is possible, since the nature of the problems people find within the primary results gap is a surprise. 

I recognise if I am working with a team in this space by being surprised with problems. Sometimes I even exclaim: "this bug is so interesting that no one could have created this on purpose!". Consider yourself lucky if you get to work with a team like this that remains this way over time. After all, location on this map is dynamic with regards to consistently doing a good work across different kinds of changes. 

There is a secondary Results Gap too. Sometimes the level to which teams of developers get to is Less than Good Team's Output. We usually see this level with organizations where managers hire testers to do testing, even when they place the tester in the same team. Testing is too important to be left just for testers, and should be shared variably between different team members. Sometimes working as tester in these teams feels like your job is to point out that there are pizza boxes in the middle of the living room floor and remind that we should pick them up. Personally when I recognise the secondary results gap, I find the best solution is to take away the tester, reorganize quality responsibilities on the remaining developers. The job of a tester in a team like this is move the team to the primary results gap, not deal with the pizza boxes except for temporarily as protection of the reputation of the organization. 

A long explanation on the one true measure of coverage - risks/results. Everything else is an approximation subject to this. It helps to understand if we are operating with a team on the secondary results gap or with a team on the primary results gap, and the lower we start, the less likely we are ever to get to address all of the gap. 

The Two

The two measures of coverage we commonly use and thus everyone needs to understand are code coverage and requirements/spec coverage. These are both test coverage, but very different by their nature. 

Code coverage can only give us information of what is in the code and whether the tests we have touch it. If we have functionality we promised to implement, that users expect to be included but we are missing out on, that perspective will not emerge with code coverage. Code coverage focuses on the chances of seeing what is there. 

Cem Kaner has an older article of 101 different criteria in the space of code coverage, so let's remember it is not one thing. There are many ways we can look at the code and discuss having seen it in action. Touching each line is one, taking every direction at every crossroad is one, and paying attention to the complex criteria of the crossroads is one. Tools are only capable of the simpler ways of assessing code coverage. 

Seeing a high percentage does not mean "well tested". It means "well touched". Whether we looked at the right things, and verified the right details is another question. Driving up code coverage does not usually mean good testing. Whereas being code coverage aware, not wanting code coverage to go down from where it has been even when adding new functionality, and taking time for thoughtful testing based on code coverage seem to support good teams in being good. 

Requirement/spec coverage is about covering claims in authoritative documents. Sometimes requirements need to be rewritten as claims, sometimes we go about spending time with each claim we find, and sometimes we diligently link each requirement to one or more tests, but some form of this tends to exist. 

With requirements/spec coverage, we need to be aware that there are things the spec won't say and we still need to test for. We can never believe any material alone is authoritative, testing is about also discovering omissions. Omissions can be code that spec promises, or details spec fails at promising but users and customers would consider particularly problematic. 

Having one test for a claim is rarely sufficient. There is no set number of tests we need for each claim. So I prefer thinking in none / one / enough. Enough is about risk/results. And it changes from project to project, and requires us to be aware of what we are testing to do a good job testing. 

The Three

By this time, you may be a little exasperated with the One and the Two, and there is still the Three. These three are dimensions of coverage I find I need to explain again and again to help address the risks. 

Environment coverage starts with the idea that users environments are different and testing in one may not represent them all. We could talk for hours on what makes environments essentially different, but for purposes of coverage, take my word for it: sometimes they are and sometimes they are not essentially different. So for the 10 functionalities to cover with tests with one test for each functionality, if we have three environments, we could have 30 tests to run. 

Easy example is browsers. Firefox on Linux is separate from Firefox on Mac and Firefox on Windows. Safari on Mac or Edge on Windows is only available there. Chrome is available on Mac, Windows and Linux. That small listing alone gives us 8 environments. The amount of testing - should we want to do it regularly - could easily explode. We may address this with various strategies from having different people on different environments, changing environments on a round-robin fashion to cross-browser automation. Whether we care to depends on risks, and risks depends on the nature of the thing we are building. 

Data coverage starts with the idea that each functionality processing data is covered with one data, but that may be far from sufficient. Like with embedded devices over the last three years, I find it surprising how often covering such simple thing as positive and negative temperature is necessary with the registry manipulation technologies. For this coverage, we would heavily rely on sampling, and it is part of every requirement test making it flexible to consider what percentage we are getting. Well, at least enough to note that percentages are generally useless measures in space of coverage.

Parafunctional coverage would be reminding on other dimensions that positive outputs. Security would be to have functionality that does something that can be used in wrong hands for bad. Performance would be considerations of fast and resource effective, particularly now in era of green code considerations. Reliability would be to run same things over longer period of time. And so on. 

Plus One

Today's call concluded with us then discussing automation coverage. Usually what we end up putting in our automation is a subset of all the things we do, a subset we want to keep on repeating. Great automation isn't created from listing test cases and implementing them, for good automation we tend to decompose the feedback needed differently where sum of the whole is similar. 

Automation coverage is ratio of what we have automated to something we care for. Some people care for documented test cases but I don't. If and when I care about this, I talk about automation coverage in terms of plans of growing it, and I avoid the conversation a lot. 

In one project we did test automation coverage by assigning zero, one, enough values for requirements by tagging all automation with requirements identifiers. A lot of work, some good communications included on planning for what more we need (first), but the percentage was very much the same as I could estimate off cuff. 

You may not have the half an hour it took for us to discuss 1-2-3 on the call, but knowing how to ground conversations of coverage is invaluable skill. If you spend time with testing, you are likely to get as many chances of practicing this conversation that I have by now. 

Wednesday, November 9, 2022

Why Have You Not Added More Test Automation?

We sat down at a weekly meeting, looking at two graphs created to illustrate the progress with system level  test automation. Given the idea that we want to run some tests in the end-user like environment, and end-user like composition, installing the software like end-users would, the pictures illustrated an idea of how much work there is to a first milestone we had set for ourselves. 

The first picture showed a plan of how things could be added incrementally, normalised with story points (a practice I very much recommend against) and spread over time in what had seemed like a realistic plan of many months. In addition to the linear projection, it showed progress achieved, and slope projected for what was reality was showing we were dragging behind. 

The second picture showed a plan of use of time on test automation. Or more precisely, only visualized time used, there was no fixed plan which was a problem in itself. But you could see that in addition to having fairly little time for the test automation work in general, the fluctuation in focus time was significant. 

It was not hard to explain the relation of the two. No time to do means no progress on the work. Or so I thought. Why were we then again answering the age-old question:

Why have you not added more test automation? 

The test automation effort was taking place within a team that had a fixed number of people and multiple conflicting priorities. 

  • They were expected to address a bug backlog (by fixing) so that there would numbers of bugs were down from hundreds to tens. This would be a significant effort of testing to confirm - development to fix - testing to search for side effects. For every fix, the testers had two tasks for one of the developer. 
  • They were expected to make multiple releases from branches the team did not continuously develop on. This would be a significant effort to verify right fixes and no side effects from two baselines that differed from what they would use when testing the changes. 
  • They were expected to learn a new system and document everything they learned so that when they would be moved to new team at latest one year from joining, the software factory they were working on could run forward with them gone.
  • With the fixed budget and requests to spend time on concept work of a new feature, they were expected to get by with two people where there previously had been three. Starting something new was deemed important. 
We were asking the wrong question. We should have been asking why we under-allocated something we thought we need to take forward, and still thought there would be progress. Why did we not understand the basic premise of how adding more work makes things more late? Why did we not understand that while ideas are cheap and we can juggle many at a time, turning those ideas to reality is a pipe of learning and doing that just won't happen without investing the time? 

While I routinely explain these things, I can't help to wonder the epidemic in management with thinking asking something, or rather requiring something, can be done without considering the frame in which there are chances of success with time available to do the work. 

If you ask the same people to do *everything else* and this one drops, how can you even imagine asking anyone but your own mirror image the reasons of why your choices produce these results? 

Monday, October 24, 2022

How to Maximize the Need for Testers

Back when I was growing up as a tester, one conversation was particularly common: the ratio of developers in our teams. A particularly influential writing was from Cem Kaner et al, on Fall 2000 Software Test Managers Roundtable on 'Managing the Proportion of Testers to (Other) Developers".  

The industry followed the changes in the proportion from having less testers than developers, to peaking at the famous 1:1 tester developer ratio that Microsoft popularized, to again having less testers than developers to an extend where it was considered good to have no developers with testing emphasis (testers), but have everyone share the tester role. 

If anything, the whole trend of looking for particular kinds of developers as test system responsibles added to the confusion of what do we count as testers, especially when people are keen to give up on the title when salary levels associated for the same job end up essentially different - and not in favor of the tester title. 

The ratios - or task analysis of what tasks and skills we have in the team that we should next hire for a human-shaped unique individual - are still kind of core to managing team composition. Some go with the ratio to have at least ONE tester in each team. Others go with looking of tasks and results, and bring in a tester to coach on recognising what we can target in the testing space. Others have it built into the past experiences of the developers they've hired. It is not uncommon to have developers who started off with testing, and later changed focus from specializing into creating feedback systems to creating customer-oriented general purpose systems - test systems included. 

As I was watching some good testing unfold in a team where testing happens by everyone not only by the resident tester, I felt the need of a wry smile on how invisible testing I as the tester would do. Having ensured that no developer is expected to work alone and making space for it, I could tick off yet another problem I was suspecting I might have to test for to find the problem, but now instead I could most likely be enjoying that it works - others pointed out the problem. 

To appreciate how little structural changes can make my work more invisible and harder to point at, I collected the *sarcastic* list of how to maximise the need of testers by ensuring there will be visible for for you. Here's my to-avoid list that makes the testing I end up doing more straightforward, need of reporting bugs very infrequent, and allows me to focus more of my tester energies in telling the positive stories of how well things work out in the team. 

  1. Feature for Every Developer
    Make sure to support the optimising managers and directors who are seeking for a single name for each feature. Surely we get more done when everyone works on a different feature. With 8 people in the team, we can take forward 8 things, right? That must be efficient. Except we should not optimize for direct translation of requirements to code, but for learning when allocating developers features. Pairing them up, or even better *single piece flow* of one feature for the whole team would make them cross-test while building. Remember, we want to maximise need of testers, and having them do some of that isn't optimising for it. The developers fix problems before we get our hands on them, and we are again down a bug on reporting! So make sure two developers work together as little as possible, review while busy running with their own and the only true second pair of eyes available is a tester. 

  2. Detail, and externalised responsibility
    Lets write detailed specifications: Build exactly this [to a predetermined specification]. All other mandates of thinking belong with those who don't code, because developers are expensive. That leaves figuring out all higher mandate levels to testers and we can point out how we built the wrong thing (but as specified). When developer assumptions end up in implementation, let's make sure they have as many assumptions strongly hold with an appearance of great answers in detail. *this model is from John Cutler (@johncutlefish)
    There's so much fun work to find out how they went off the expected rails when you work on higher mandate level. Wider mandate for testers but let's not defend the developers access to user research, learning and seeing the bigger picture. That could take a bug away from us testers! Ensure developers hold their assumptions that could end up in production, and then a tester to the rescue. Starting a fire just to put it out, why not. 

  3. Overwhelm with walls of text
    It is known that some people don't do so well with reading essential parts of text, so let's have a lot of text. Maybe we could even try an appearance of structure, with links and links, endless links to more information - where some of the links contain essential key pieces of that information. Distribute information so that only the patient survives. And if testers by profession are something, we are patient. And we survive to find all the details you missed. And with our overlining pens of finding every possible claim that may not be true, we are doing well when exceptional patience of reading and analyzing is required. That's what "test cases" are for - writing the bad documentation into claims with a step by step plan. And those must be needed because concise shared documentation could make us less needed. 

  4. Smaller tasks for devs, and count the tasks all the time
    Visibility and continuously testing, so let's make sure the developers have to give very detailed plans of what they do and how long that will take. Also, make sure they feel bad when they use even an hour longer than they thought - that will train them to cut quality to fit the task into the time they allocated. Never leave time between tasks to look at more effective ways of doing same things, learning new tech or better use of current tech. Make sure the tasks focus on what needs *programming* because it's not like they need to know about how accessibility requirements come from most recent legislation, or how supply chain security attacks have been in their tech, or expectations of common UX heuristics, more to the testers to point out that they missed! Given any freedom, developers would be gold-plating anyway so better not give room for interpretation there. 

  5. Tell them developers they are too valuable test and we need to avoid overlapping work
    Don't lose out on the opportunities to tell developers how they were hired to develop and you were hired to test. Remember to mention that if they test it, you will test it anyway, and ensure the tone makes sure they don't really even try. You could also use your managers to create a no talking zone between a development team and testing team, and a very clear outline of everything the testing team will do that makes it clear that the development team does not need to do. Make sure every change comes through you, and that you can regularly say it was not good enough. You will be needed the more the less your developers opt in to test. Don't care that the time consuming part is not avoiding testing work overlap, but avoiding delayed fixing - testing could be quite fast if and when everything works. But that wouldn't maximise need of testers, so make sure the narrative makes them expect you pick up the trash. 

  6. Belittle feedback and triage it all
    Make sure it takes a proper fight to get any fixes on the developers task lists. A great way of doing this is making management very very concerned over change management and triaging bugs before, developers get only well-chewed clear instructions. No mentions of bugs in passing so that they might be fixed without anyone noticing! And absolutely no ensemble programming where you would mention a bug as it is about to be created, use that for collecting private notes to show later how you know where the bugs are. You may get to as far as getting managers telling developers they are not allowed to fix bugs without manager consent. That is a great ticket to all the work of those triage meetings. Nothing is important anyway, make the case for it. 

  7. Branching policy to test in branch and for release
    Make sure to require every feature to be fully tested in isolation on a branch, manually since automation is limited. Keep things in branches until they are tested. But also be sure to insist on a process where the same things get tested integrated with other recent changes, at latest release time. Testing twice is more testing than testing once, and types of testing requiring patience are cut out for testers. Maximize the effect by making sure this branch testing cannot be done by anyone other than a tester or the gate leaks bad quality. Gatekeep. And count how many changes you hold at the gates to scale up the testers. 

  8. Don't talk to people
    Never share your ideas what might fail when you have them. They are less likely to find a problem when you use them if someone else got there first. It might also be good to not use PRs but also not talk about changes. Rely on interface documentation over any conversation, and when documentation is off, write a jira ticket. Remember to make the ticket clear and perfect with full info, that is what testers do after all. A winning strategy is making sure people end up working neighbour changes that don't really like each other, the developers not talking bodes ill for software not talking either. Incentivising people to not work together is really easy to do through management. 

Sadly, each of these are behaviors I still keep seeing in teams. 

In a world set up for failing in the usual ways, we need to put special attention to doing the right thing.

It's not about maximising the need of testers. The world will take care bigger and harder systems. The time will take care of us growing to work with the changing landscapes in project's expectation on what the best value from us is, today.

There is still a gap in results of testing. It requires focused work. Make space for the hard work. 

Saturday, October 22, 2022

Being a Part of Solution

A software development team integrated a scanning tool that provides two lists: one about licenses in use, and another one about supply chain vulnerabilities in all of the components the project relies on. So now we know. We know to drop one component for licenses list to follow an established list of what we can use. And we know we have some vulnerabilities at hand. 

The team thinks of the most natural way of going forward, updating the components to their latest. Being realistic, they scan again, to realize the numbers are changing and while totals are down some, the list is far from empty. List is, in fact, relevant enough that there is a good chance there is not new more relevant vulnerabilities on the list. 

Seeking guidance, team talks to security experts. The sentiment is clear: the team has a problem and the team owns the solution. Experts reiterate the importance of the problem the team is well aware of. But what about the solution? How do we go about solving this? 

I find this same thing - saying fixing bugs is important - is what testers do too. We list all the ways the software can fail, old and new, and at best we help remind that some of the things we are now mentioning are old but their priority is changing. All too much, we work on the problem space, and we shy away from the solutions.

To fix that listing that security scanners provide, you need to make good choices. If you haven't made some bad choices and some better choices, you may not have the necessary information of experimenting into even better choices. Proposals on certainly effective choices are invaluable. 

To address those bugs, the context of use - acting as a proxy for the users to drive most important fixes first - is important. 

Testers are not only information providers, but also information enrichers, and part of teams making the better choices on what we react on. 

Security experts are not just holders of the truth that security is important, but also people who help teams make better choices so that the people spending more time on specializing aren't only specializing on knowing the problem, but also possible solutions.

How we come across matters. Not knowing it all is a reality, but stepping away from sharing that responsibility of doing something about it is necessary. 

Monday, October 17, 2022

Test Automation with Python, an Ecosystem Perspective

Earlier this year, we taught 'Python for Testing' internally at our company. We framed it as four half-day sessions, ensemble testing on everyone's computers to move along on the same exercise keeping everyone along for the ride. We started with installing and configuring vscode and git, python and pytest, and worked our way through tests that look for patterns on logs, to tests on REST apis, to tests on MQTT protocol, to tests on webUI. Each new type of test would be just importing libraries, and we could have continued the list for a very long time. 

Incrementally, very deliberately growing the use of libraries works great when working with new language and new people. We imported pytest for decorating with fixtures and parametrised tests. We imported assertpy for soft assertions. We imported approval tests for push-results-to-file type of comparisons. We imported pytest-allure for prettier reports. We imported requests for REST API calls. We imported paho-mqtt for dealing with mqtt-messages. And finally, we imported selenium to drive webUIs. 

On side of selenium, we built the very same tests importing playwright to driver webUIs, to have concrete conversations on the fact that while there are differences on the exact lines of code you need to write, we can do very much the same things. The only reason we ended up with this is the ten years of one of our pair of teachers on selenium, and the two years from another one of our pair of teachers on playwright.  We taught on selenium. 

You could say, we built our own framework. That is, we introduced the necessary fixtures, agreed on file structures and naming, selected runners and reports - all for the purpose of what we needed to level up our participants. And they learned many things, even the ones with years of experience in the python and testing space. 

Libraries and Frameworks

Library is something you call. Framework is something that sits on top, like a toolset you build within. 

Selenium library on python is for automating the browsers. Like the project web page says, what you do with that power is entirely up to you. 

If you need other selections made on the side of choosing to drive webUIs with selenium (or playwright for that matter), you make those choices. It would be quite safe bet to say that the most popular python framework for selenium is proprietary - each making their own choices. 

But what if you don't want to be making that many choices? You seem to have three general purpose selenium-included test frameworks to consider: 
  • Robot Framework with 7.2k stars on github
  • Helium with 3.1k stars on GitHub (and less active maintainer on new requests) 
  • SeleniumBase with 2.9k stars on GitHub
Making sense of what these include is a whole another story. Robot Framework centers around its own language you can extend with python. Helium and SeleniumBase collect together python ecosystem tools, and use conventions to streamline the getting started perspective. All three are a dependency that set the frame for your other dependencies. If the framework does not include (yet) support for Selenium 4.5, then you won't be using Selenium 4.5. 

Many testers who use frameworks may not be aware what exactly they are using. Especially with Robot Framework. Also, Robot Framework is actively driving people from selenium library to RF into a newer browser library to RF, which includes playwright. 

I made a small comparison of framework features, comparing generally available choices to choices we have ended up with in our proprietary frameworks. 

Frameworks give you features you'd have to build yourself, and centralise and share maintenance of those features and dependencies. They also bind you to those choices. For new people they offer early productivity, sometimes at the expense of later understanding. 

The later understanding, particularly with Robot Framework being popular in Finland may not be visible, and in some circles, has become a common way of recognising people stuck in an automation box we want to get out of.