Saturday, March 2, 2024

Fooled by Microservices, APIs and Common Components

These days writing software is not the problem. Reading software is the problem. And reading is a big part of the real problem, which is owning software. Last two years has been a particularly challenging experience in owning software, and navigating changes in owning software. I have not cracked it, I am not sure if I will crack it but I have learned a lot. 

To set the stage of my experience. Imagine coming to a company with a product created over a period of 20 years. There's a lot of documentation, none of it particularly useful except code. While the shape of the existing product is invisible, you join a team dedicated to modernisation. And the team has already chosen a rewrite approach. 

For the first year, the invisible is not a priority. After all, it will be replaced, and figuring out the new thing is enough work as is with a new product. With what feels like heroic effort, you complete the goal of the year with managed compromises. Instead of full rewrite, it's full rewrite of selected pieces. The release is called "Proof of Concept" and it does not survive the first customer contact. 

Second year has goals set, to add more functionality on top of the first year. The customer feedback derails goals leading to an entire redesign of the user interface, addressing  9/19 individually listed pieces of feedback. Again what feels like heroic effort, you complete the goal of the year with managed compromises, but now an upwards rather than downwards trend. 

The second year starts to give a bit of shape to the existing invisible product, with deliberate actions. You learn you own something with 852k lines of code. It has 6.2% duplication, and 16,4% unit test code coverage. The new thing you've been focusing on has grow into 34k lines of code, with 5% duplication and unit test code coverage of 70%, and great set of programmatic tests that don't hit the unit test coverage numbers. 

Meanwhile, you start seeing other trends:

  • Management around you casually drops expectations of microservices, APIs that allow easily building other products than this one, and common components with expectations of organizational reuse.
  • You and your team are struggling with explaining that the compromises you took really may not have changed it all to what some people now seem to be expecting,
  • You realize that what you now live with is four different generations of technological choices no one told the invisible mass would bring you - time adds to your understanding
In one particularly difficult discussion, someone throws microservices at you as the key thing, and you realize that the two sides of the table aren't talking about the same thing. 

One party is describing Scripts with flat file integration: 
  • Promised by shadow R&D without promises of support, usually done in scale of days
  • Automate a specific task
  • Deployment is file drop on filesystem, and not available if not dropped
  • Can break with product changes
  • Output: file or database, usually a file
  • Expected to not have dependencies
  • Needs monitoring extended separately
  • Using files comes with inherent synchronisation problems
  • Needs improving if scale in insufficient
  • Can be modified by a user
  • Batch work, will not be real time
  • File based processing steps quickly increase complexity
The other party is describing microservices with API integration: 
  • Promised by R&D assuming it will drive product perspective forward, usually in scale of months when deployment risks are addressed
  • Develop apps that scale
  • Deployment is with product and feature can be on or off
  • Protected by design as product capability with product changes
  • Output: well defined API
  • Deployed independently / separately with dependencies - API, DB, logic, UI
  • Follows a pattern that allows for common monitoring
  • Uses http and message queues as communication protocol
  • Can be scaled independently
  • Can't be modified by a user
  • Can be close to real time data synchronization
  • Plug in extra processing steps
One party thinks micro is "fast additions of features" and other thinks it is a way of creating common designs. 

You start paying more attention of what people are saying and expecting, and realize the pattern is everywhere. Living with so many generations of expectations and lessons makes owning software particularly tricky. 

We're often coming to organizations at a time. We're learning as we are moving along. And if we are lucky, we learn sooner rather than later. Meanwhile, we are fooled by the unknown unknowns, and confused by the promises of the future in relation to realities of today. 

The best you can do is ground conversations to what there is now, and manage to better from there. 


Friday, March 1, 2024

How AI changes Software Testing?

This week Wednesday, two things happened. 

I received an email from Tieturi, a Finnish training company, to respond to the question "How AI changes software testing?". 

I went to Finnish Testing Meetup group to a session themed on AI & Testing. 

These two events make me want to write two pieces into a single post. 

  • My answer to the question
  • My thinking behind answering the way I do 

My Answer to the Question: How AI changes Software Testing

I know the question is asking me to speculate on the future, but the future is already here, it's not just equally divided - repurposing the quote from a sci-fi author. 

Five years ago, AI changed *my software testing* by becoming a core part of it. I tested systems with machine learning. I networked with people creating and testing systems with machine learning. Machine learning was a test target, and it was a tool applied on testing problems for me, in a research project I set up at the employer I was with at that time. 

Five years ago, I learned that AI -- effectively applications of machine learning -- are components in our systems, "algorithms" generated from data. I learned that treating systems with these kinds of components as if the entire system is "AI" is not the right way to think about testing, and AI changed my software testing with the reality that it is even more important than ever to decompose our systems into the pieces. These pieces are serving purpose in the overall flow and there's a lot of other things around. 

Now that I understood that AI components are probabilistic and not hand-written, I also understood that 
the problem is not testing of it, but fixing of it. We had a world where we could fix bugs before. With AI, we no longer had that. We had possibility of experimenting with parameters and data in case those created a fix. We had possibility of filtering out the worst results. But the control we had would never again be the same. 

For five years, I have had the privilege of working to support teams with such systems. I was very close on focusing solely on one such team but felt there was another purpose to serve. 

Two years ago, AI changed *my software testing* by giving me GitHub Copilot. I got it early on, and used it for hobby and teaching projects. I created a talk and a workshop on it based on Roman Numerals example, and paired and ensembles on use of it with some hundred people. I learned to make choices between what my IDE was capable of doing without it, and it, and reinforced my learning of intent in programming. If you have clarity of intent, you reject stupid proposals and let it fill in some text. I learned that my skills of exploratory testing (and intent in that) made me someone who would jump to identify bugs in talks showing copilot generated code. 

These two years culminated 6 months ago into me and my whole team starting to use copilot out our production code after making agreements on accommodations for ethical considerations. I believe erasing attribution for the open source programmers may not be direct violation of copyright, but it is ethically shifting power balance in ways I don't support. We agreed on the accommodations: using work time to contribute on open source projects and using direct money to support open source projects. 

One year ago AI changed *my software testing* by access to ChatGPT. I was on it since its first week, suffering through the scaling issues. I had my Testing Dozen mentoring group testing it as soon as it was out, and I learned that the thing I learned in 5 years of AI about decomposing systems, newbies were lost on. From watching that group and then teaching ensembles after that scaling to about 50 people including professional testers in the community, I realized the big change was that testers would need to skill up in their thinking. Noticing it has gender bias is too low a bar. Knowing how you would fix gender bias in data used to teaching would now be required. Saying there's a problem would not suffice for more than adding big blunders to filtering rules. Smart people at scale would fill social media with examples how your data and filtering fixes were failing. 

One year ago, I also learned the problems of stupid testing -- test case writing would scale to unprecedented heights with this kind of genAI. I would be stuck in perpetual loop of someone writing text not worth reading. Instead of inheriting 5000 (manual) test cases a human wrote and throwing them away after calculating it would take me 11 full working days to read them with one minute each, I would now have this problem at scale with humans babysitting computers creating materials we should not create in testing in the first place. 

Or creating code that is just flat out wrong even if the programmer does not notice with lack of intent on. 


AI would change testing to be potentially stuck in the perpetual loop of copy-pasting mistakes in scale and pointing the same ones out in systems. We would be reviewing code not thought through algorithmically. And this testing would be part of our programmer lives because testing this without looking at the code would be non-sensical. 

They ask how AI changes Software Testing - it already changed it. Next we ask how people change software testing, understanding what they have at hand. 

I have laughed with AI, worked with tricky bugs making me feel sense of powerlessness like never before,  learned tons with great people around AI and its use. I have welcomed it as a tool, and worried about what it does when people struggle with asking help, asking help from a source such as this without skills of understanding if what is given is good. I've concluded that faster writing of code or text is not the problem - reading is the problem. Some things are worth reading for a laugh. 


AI changed software testing. Like all technology changes software testing.  The most recent change is that we use word "AI" to talk about automation of things to get computers acting humanly. 
  • natural language processing to communicate successfully in a human language.
  • knowledge representation to store what it knows or hears.
  • automated reasoning to answer questions and to draw new conclusions.
  • machine learning to adapt to new circumstances and to detect and extrapolate patterns.
  • computer vision and speech recognition to perceive the world. 
  • robotics to manipulate objects and move about.
I feel like adding specific acting humanly uses cases like 'parroting to nonsense' or 'mansplaining as a service', to fill in the very human space of claims and stories that could be categorised as fake news or fake certainty. 

What we really need to work on is problems (in testing) worth applying this for. Maybe it is the popular "routes a human would click" or the "changing locators" problems. Maybe it is the research-inspiring examples of combining bug reports from users with automated repro steps. Maybe it's the choices of not to test all for all changes. We should fill space more with decomposed problems over discussion about "AI".

My thinking behind answering the way I do 

This week the people on stage at the meet-up said they are interested yet not experienced in this space. I was sharing some of my actual experience from the audience, as I am retired from public speaking. There is a chance I may have to unretire with a change of job I am considering, but until then I hold space for conversations as chair of events such as AI & Testing in a few weeks, or as loud audience member of events such as Finnish Testing Meetup this week. I don't speak from stage, but I occasionally write, and I always have meaningful 1:1 conversations with my peers over video, the modern global face to face. 

I collaborate a lot of different parties in the industry as part of my work-like hobbies. It's kind of win-win for me to do my thing and write a blog post and for someone else to make business out of intertwining my content with their ads. I have said yes to many such request this last month, one of those allowing a Finnish training company Tieturi to nominate me for a competition for the title of "Tester of the Year 2023 in Finland". This award has been handed out 16 times before, and I have been nominated every single year for 16 years, I just asked not to include me by actively opting out after someone had nominated me for 4 or 5 years.  

The criteria for this award I have never been considered worthy to win is: 

  • inspiring colleagues or other organizations to better testing
  • bring testing thoughts and trends from the world to Finnish testing
  • influenced positively testing culture in their own organization
  • influenced positively to resultfulness of testing (coverage, found bugs etc) in own organization or community
  • created testing innovations, rationalising improvements or new kinds of ways to do testing
  • influenced the establishing of testing profession in Finland
  • influenced Finnish testing culture and testers profession development positively
  • OR in other ways improved possibilities to do testing

I guess my 26 years, 529 talks, 848 blog posts in this blog or the thousands of people I have taught testing don't match this criteria. It was really hard to keep going at 10 years of this award, and I worked hard to move past it. 

So asking me to freely contribute "How AI changes Software Testing?" as marketing material may have made me a little edgy. But I hope the edginess resulted in something useful for you to read. Getting it out of my system at least helped me. 




Tuesday, February 27, 2024

Contemporary Bug Advocacy

A few weeks back in Mastodon, Bret Pettichord dusted up a conversation about something we talked about a lot in testing field years ago, Bug Advocacy. Bug advocacy was something Cem Kaner discussed a lot, and a word that I was struggling with translating to Finnish. It is a brilliant concept in English but does not translate. Just not. 

Bug advocacy is this idea that there is work we must do to get the results of testing wrapped in their most useful package. A lot of the great stuff on BBST Bug Advocacy course leads one to think it is a bug reporting course, but no, it is a bug research course. A brilliant one at that. Bug advocacy is the idea that just saying it did not work for you does little. It actually has more of a negative impact. Do your research. Report the easiest route to the bug. Include the necessary logs. Make the case for someone wanting to invest time in reading the bug report, even under constraints of time and stress. 

Bug advocacy was foundational for me as a learning tester 25 years ago. It was essential at time of publishing Lessons Learned in Software Testing. It was essential when Cem Kaner created the BBST training materials. And it is sometimes like a lost art these days. In other words, it is something people like me learned and practices so long, that we don't remember to teach it forward. 

At the same time, I find myself broadcasting public notes in Mastodon and LinkedIn on theme that I would call Contemporary Bug Advocacy - an essential part of Contemporary Exploratory Testing. Like the quote from Kaner's and Pettichord's book says, we kind of need to do better than fridge lights that do all their work while no one is using the results. 

Inspired by the last two weeks of testing with my team, I collected a listing of tactics I have employed in contemporary bug advocacy. 

  • Drive by Data. I "reported" a bug by creating data in a shared test environment that made a bug visible. The bug vanished in a day.  No report, no conversation. Totally intentional. 
  • Power of Crowd. I organized an ensemble testing session. Bugs we all experienced vanished by end of the day. I have used this technique in the past to get complete API redesign needs by smart use of the right crowd. 
  • Pull request with a fix. I fixed a bug, and sent the fix for review for the developer. Unsurprisingly, a fix can be more welcome than a task to fix something. 
  • Silent fix. I just fix it so that we don't have to talk about it. People notice changes with their routines of looking at what is going on in the code. 
  • Pairing on fix. I refused to report, and asked to pair on the fix for me to learn. Well, for me to teach too. Has been brilliant way of ramping up knowledge of problems dealing with root causes rather than repeated symptoms. 
  • Holding space for fix to happen. A colleague sat next to me while I had not done a simple thing, making it clear they were available to help me but would not push me to pairing. 
  • Timely one liner in Jira. I wrote title only bug report in Jira. That was all the developer needed to realize they could fix something, and the magic was this all happened within the day of the bug being created while they were still in context. 
  • Whisper reporting. I mentioned a bug without reporting it. Developers look great when they don't have bug reports on them. I like the idea of best work winning when we care about the work over credit. Getting things fixed is work, claiming credit with report is sometimes necessary but often a smell. 
  • Failing test. Add a failing test to report a bug, and shift work from there. Great for practicing very precise reporting. 
  • Actual bug report. Writing great summary, minimal steps to repro and making clear your expected and actual results. Trust it comes around, or enhance your odds by other tactics. 
  • Discuss with product owner. Your bug is more important when layered with other people's status. I apply this preferably before the report for less friction. 
  • Discuss with developer. Showing you care about colleagues priorities and needs enhances a collaboration.    
  • Praise and accolades. Focus your messaging on the progress of vanishing, not emerging. Show the great work your developer colleagues are doing. So much effort and so little recognition, use some of your energy in balancing to good. 
  • Sharing your testing story. Fast-forward your learning and make it common learning. A story of struggles and insights is good. A shared experience is even better. 
  • Time. Know when to report and how. Timing matters. Prioritise their time. 
All of these tactics are ways to consider how to reduce friction for the developer. Advocate. Enable. Help. Do better than shine light when the door is closed. 

I call this contemporary because writing a bug report is simple. It is basic. But adding layers of tactics to it, that is far from simple. It is not a recipe but a pack of recipes. And you need to figure out what to apply when. 

I found nine problems yesterday. I applied four different tactics on those nine problems. And I do that because I care about the results. Results of testing is information we have acted upon. Getting the right things fixed, and getting the sense of accomplishment and pride for our shared work in building a product.


Wednesday, February 21, 2024

Everyone can test but their intent is off

Over my 8 years of ensemble and pair testing as primary means of teaching testers, I have come to a sad conclusion. Many people who are professionally hired as testers don't know how to test. Well, they know how to test but from their testing, there is a gaping results gap. Invisible one. One they don't manage or direct. And the sad part is that they think it is normal.  

If you were hired to do 'testing' and you spent all your days doing 'testing', how dare I show up to say your testing is off?!? I look at results, and the only way to look at results you provide is to test after you. 

My (contemporary) exploratory testing foundations course starts from a premise of giving you a tiny opportunity to assess your own results, because the course comes with the tool of turning invisible ink to visible, that is listing of problems me (and ensembles with me) have found across some hundreds of people. I used to call it 'catch-22' but like usual with results of testing, more work on doing better has grown the list to 26. 

Everyone can test, like everyone can sing. We can do some slice of the work that resembled doing the work. We may not produce good enough results. We may not produce professional results that lead into paying for that work. But we can do something. Doing something is often better than doing nothing. So the bar can be low. 

An experience at work leads me to think that some testers can test but their intent is so far off that it is tempting to say they did not test. But they did test, just the wrong thing. 

Let me explain the example. 

The feature being tested is one of sending pressure measurement values from one subsystem to another, to be displayed there. We used to have a design where the value of the first subsystem that was used on its user interfaces after various processing steps was sent forward to the second subsystem. Then we mangled that value because it followed no modern principles of programmatically processable data so that we could reliably show the value. We wanted to shift the mangling of the data to the source of the data, with information about the data, just in a beautiful modern wrapper of a consistent data model.

Pressure measurement was the first in the line of beautiful modern wrappers. The assignment for the tester was to test this. Full access to conversations with the developer was available. And some days after the developer said "done and tested", the tester also came back with "tested!". I started asking questions. 

I asked if one of the things they tested was to change configurations that impact pressure values so that they can see differences of pressure at sea level (a global comparison point) and at the measurement location. I got an affirmative response. Yet I had a nagging feeling, and built on the yes inviting the whole team to create a demo script for this pressure end to end. Long story short, not only did the tester not know how to test it, it was not working. So whatever they called testing was not what I was calling testing. The ensemble testing session also showed that NO ONE in the team knows the feature end to end. Everyone conveniently looks at a component or at most a subsystem. So we all learned a thing or two. 

Equipped with the information from the ensemble testing experience, the tester said to take more time testing before coming back with "tested!". They did, and today they came back - 7 working days later. I am well aware that this was not the only thing they had on their agenda - even if their agenda is on their full control - and we repeated the dance. I started asking questions. 

They told me they updated the numbers in the model we created for the ensemble testing session. I was confused - what numbers? That model sketched early ideas of how three height parameters would impact the measurements in the end, but it was a quick sketch from an hour of work, not a fill in the ready blanks model. So I asked for demo to understand. 

I was shown how they changed three parameters of height. Restart the subsystem. Basic operations of the subsystem they have been testing for over a year. The interesting conversation was on what did they then test. It turns out they did the exact same moves on the subsystem on a build without the changes and on a build with the changes. They concluded that since the difference they see is in sending the data forward but the data is the same, it must be right. Regression oracle. But very much partial oracle, and partial intent. 

In the next minutes, I pulled up a 3rd party reference and entered the parameters to have comparable values. We learned that they had the parameters wrong, because if the values aren't broken with the latest change, then the change of configuration is likely incorrect. They did not explore values for plausibility, and they were way off. 

I asked to show what values they compared, to learn they chose an internal value. I asked to pull up the first subsystem's user interface for the comparable values. Turns out that the value they compared is likely missing multiple steps of processing to become the right value. 

For junior testers such as this one, I expect I will coach them by having these conversations. I have been delighted with picking up information as new information comes, and I follow the trend of not having to cover same ground all the time. I understand how this blindness to result comes about: in most cases testing a legacy system, the regression oracle keeps them on path. In this case, it leads them to the wrong path. They only took ISTQB course which does not teach you testing. I am their first person to teach this stuff. But it is exhausting. Especially since there are courses like mine or like BBST that they could learn from, and provide the results for the right intent. Learn to control their intent. Learn to *explore*. 

At the same time, I am thinking that all too often juniors to testing - regardless of their years of experience - could learn slightly more before becoming same costs as developers. This level of thinking would not work for a junior developer. 

Testers get by because they can be just without value. Teamwork may make them learn or become negative value. But developers don't get by because the without value of their work gets flagged sooner.  

 

Wednesday, February 14, 2024

Making Releases Routine

Last year I experienced something I had not experienced for a while: a four month stabilisation period. A core of the work of testing-related transformations I had been doing with three different organizations was to bring down release timeframes, from these months long versions to less than an hour. Needless to say, I considered the four month stabilisation period a personal fail. 

Just so that you don't think that you need to explain me that failing is ok, I am quite comfortable with failing. I like to think back to a phrase popularised by Bezos 'working on bigger failures right now' - a reminder that too safe means you won't find space to innovate. Failing is an opportunity for learning, and inevitable when experimenting in proportion to successes. 

In a retrospecting session with the team, we inspected our ways and concluded that taking many steps away from a good known baseline with insufficient untimely testing, this is what you would get. This would best be fixed by making releases routine

There is a fairly simple recipe to that:

  1. Start from a known good baseline
  2. Make changes that allow for the change you want for your users
  3. Test the changes in a timely fashion
  4. Release a new known good baseline
The simple recipe is far from easy. Change is not easy to understand. And it is particularly difficult if you only see the change in small scale (code line) and not in system (dependencies). And it is particularly difficult if you only see the system but not the small scale. In many teams developers have been pushed too small, and testers have not been pushed small enough. This leads to delayed feedback because the testing done in timely fashion misses results in testing that starts to lag behind from changes. 

In addition to results gap on information we need and information we have and its time dimension, the recipe continues with release steps. Some include all of the results gap in release tests because testing can't learn to be timely, muddling the waters of how long it takes to do a release. But even when feature and release testing are properly separated, there can be many steps. 

In our team's efforts of making releases routine, I just randomly decided this morning that today is a good day for release. We have a common agreement that we would do release AT LEAST once a month, but if practice is what we need, more is better. For various reasons, I had been feature testing changes as they come. I have two dedicated testers who were already two weeks behind on testing, and if I learned something from last year's failing, it's that junior testers have less developed sense of timing of feedback, partially rooted in the fact that skills in action need rehearsing at slower pace. Kind of like learning to drive a car, slow down while turning the wheel and looking around are hard to do at the same time! I was less than an hour away from completing feature testing at time of deciding for the release. 

I completed testing - reviewed the last day of changes, planned tests I wanted, and tested those. All that was remaining then was the release.

It took me two hours more to get the release wrapped up. I listed the work I needed to do: 
  • Write release notes - 26 individual changes to message worth saying
  • Create release checklist - while I know it by heart, others may find it useful to tick off what needs doing to say its done
  • Select / design title level tests for test execution (evidence in addition to TA - test automation)
  • Split epics to this release - other release so that epics reflect completed scope over aspirational scope. and can be closed for the release
  • Document per epic acceptance criteria, esp. out of scope things - documentation is an output not input, but if I was testing, it was a daily output not something to catch up at release time
  • Add Jira tasks into epics to match changes - this is totally unnecessary but I do that to keep a manager at bay, close them routinely since you already tested them at pull request stage
  • Link title level tests to epics - again something normally done daily as testing progresses, but this time was left outside the daily routine
  • Verify traceability matrix of epics ('requirements') to tests ('evidence') shows right status 
  • Execute any tests in test execution - optimally one we call release testing and would take 15 minutes on the staging environment
  • Open Source license check - run license tool, compare to accepted OSS licenses and update licenses.txt to be compliant with attribution style licenses
  • Lock release version - Select release commit hash and lock exact version with a pull request
  • Review Artifactory Xray statistics for docker image licenses and vulnerabilities
  • Review TA (test automation) statistics to see it's staying and growing
  • Press Release-button in Jira so that issues get tagged - or work around reasons why you couldn't do just that 
  • Run promotion that makes the release and confirm the package
  • Install to staging environment - this is something from 3 minute run a pipeline to 30 minutes do it like a customer does it
  • Announce the release - letting others know is usually useful
  • Change version for next release in configs
This took me about 2 hours. I skipped the install to staging though. And I have a significant routine in all these tasks. What I do in a few hours, the experience shows it takes about a week when moved forward and about a day for me in answering questions. Not a great process. 

There are things that could be done to start a new release, in conjunction with Change version for next release: 
  • Create release checklist
There are things that should become continuous on that list: 
  • Select / design title level tests 
  • Split epics
  • Document per epic acceptance criteria
  • Add Jira tasks into epics to match changes 
  • Link title level tests to epics
  • Verify traceability matrix
  • Execute any tests in test execution
There's things that could happen on a cadence that have nothing to do with releases:
  • Review Artifactory Xray statistics 
  • Review TA (test automation) statistics
And if we made these changes, the list would look a lot more reasonable:
  • Write release notes
  • Execute ONE test in test execution 
  • Open Source license check 
  • Lock release version
  • Press Release-button in Jira
  • Run promotion that makes the release
  • Install to staging environment
  • Announce the release
  • Change version for next release
And finally, looking at that list - there is no reason why all but the meaningful release notes message can't happen on ONE BUTTON. 

I like to think of this type of task analysis visually: 

This all leaves us with two challenges: extending the release promotion pipeline to all the chores and the real challenge: timely resultful testing by people who aren't me. Understanding that delta has proven a real challenge now that I am not a tester (with 26 years of experience coined into contemporary exploratory testing) and a significant chunk of my time is on my other role: being a manager. 

Friday, January 19, 2024

The Power of Framing

Sometimes, we write on topics we have not researched, but still have things to say on. This is how I frame this post: I am not an expert in framing. There is admirable levels of eloquence, excellent teaching materials I have seen, but my practice of this is one of a learner. 

Me setting the stage of the post is framing. You put a perspective around a thing, that allows you to see the thing. It might be that you are framing to see things in a similar light, or you might use framing to change the narrative on a topic. Today I had two examples in mind that I wanted to make a public note of. 

Example 1: "I'm a bad direct report" --> "I'm an employee with entrepreneurial touch" 

In a conversation about managing up - managing your manager - we ended up talking about understanding what is important to you, and that what you seek may differ from what others seek. I expressed importance of agency, the sense that I am on the controls of my work and expectations for my work are things I negotiate, rather than take as givens. When someone violates my agency, I react strongly.

This lead to someone else sharing how they consider themselves "bad direct report" and have chosen entrepreneurship where traits like established structures and rules, obeying without the why, questioning why, asking questions for deeper understanding, and built in drive for the better are helpful and welcome. 

I recognised the sentiment and similarity to how I think, yet noted my choices have turned very different. I have chosen to join organizations and swim against the stream. As reaction to the story, I realized I frame the same story as "I am an employee with entrepreneurial touch", and "rebel at work", who can move mountains for organizations if they appreciate the likes of me. 

I did not even think of this as framing, I just shared how I have placed a frame on something that I could choose to frame, very realistically that I am many bosses nightmare. I don't obey, I seek goals and I motivate routines through playing fairly with others as I don't think I am entitled to break flows other people rely on, without taking them along for the change ride. 

Example 2: "This company has not given me training for 2 years" --> "Learning matters"

Another sample of framing was inspired by noticing a frame of describing a true experience: "This company has not given me training for 2 years". To see the frame, we note the definition of training. 

Training in this case is not taking the compulsory e-learning courses where even a manager is required to check you have completed the training. 

Training is not taking online courses that have an immediate impact on the work ongoing. 

Training is money out of companies' pocket to send me somewhere I could not go with my salary. It is a salary surplus. 

To frame this for myself, I drew a picture 2 months ago, that I then shared on social media. 


If I frame training to learning, I can see a variety of options. I can see that while the visible money on my learning may not match my expectations, my experience of being allowed to use invisible money (doing work slower) has definitely been an investment to my learning. 

The picture includes a few "Done is 2023" ticks with visible money, as I got to go to EuroSTAR and HUSTEF (as keynote speaker, company paying my daily allowance). That's not where I learned though. I learned the most in 2023 from push benchmarks, meaning I shared how we work / what our problem is, and got guidance on things other people tried - the community approach. 

The same learning framing affords me agency. Instead of "no training given", I can assess "learning I made space for, with support/roadblocks". 

Framing changes how I feel about the same true experience. And how I feel about it changes what I can do. 

Friday, December 8, 2023

Model-Based Testing in Python

Some of the best things about being a tester but also a manager is that I have more direct control over some of the choices we make. A year ago I took part in deciding that my main team will have no testers, while my secondary team had two testers. Then I learned secondary team did some test automation not worth having around and fixed that by hiring Ru Cindrea, co-founder of Altom, to help the secondary team. She refactored all of the secondary team's automation (with great success I might add). She took over testing an embedded product the secondary team had ownership on (with great success again). And finally, I moved her to my primary team to contribute on a new concept I wanted since I joined Vaisala 3,5 years ago: applying model-based testing. This all has happened in a timeframe of one year. 

Some weeks back, I asked her to create a presentation on what we have on model-based testing and what that it about, and she did that. There is nothing secret in this type of testing, and I was thinking readers of this blog might also benefit from a glimpse into the concept and experience.

The team has plenty of other automated tests, traditional hand-crafted scripts in various scopes of the test target. These tests though, they are in scope of the whole system. The whole system means two different products with a dependency. The programmatic interfaces include a web user interface, a remote control interface to a simulator tool, and an interface to the operating system services the system runs on. This basically means a few python libraries for purposes of this testing, for each interface. And a plan to add more interfaces for control and visibility at different points of observation of the system. 

In explaining why of model-based testing, Ru used an example of a process of how we give control and support for people observing weather conditions with the product to create standard reports. In the context we work with, there are reports that someone is creating on a cadence, and these reports then get distributed globally to be part of a system of weather information. Since there's an elaborate network that relies on cadence, the product has features to remind about different things that need to happen. 


Choosing some states as screenshots of what a user sees, showing why model-based testing became a go-to approach for us makes sense. Traditionally we would analyse to draw these arrows, and then hand-craft test scenarios that would go through each arrow, and maybe even combinations of a few of the most important ones. Often designing those scripts was tedious work, and not all of us were visually representing them. We may even have relied on luck on what ended up as the scripts, and noting some of the paths we were missing through the exploratory testing sessions where we let go of the preconceived idea of the paths. 

The illustration of screenshots is just a lead in to the model. The model itself is a finite state model with nodes and vertices, and we used AltWalker (that uses GraphWalker) on creating the model. 

The model is just a representation of paths, and while visual, it is also available as a structured text file. Tests are generated based on the model to cover paths through the model, or to walk through the model even indefinitely. 

For the application to respond, we needed some code. For the webUI part, we use selenium, but instead of building tests, we build something akin to POM (page object model) with functions to map with the actions (arrows) and checks (boxes) in the model. 
For use of Selenium, we have three main reasons. One is that Playwright that we have used otherwise on UI tests in this product does not play with AltWalker (for now, I am sure it is pending to change) due to some dependency conflicts. The other that I quite prefer, is that Selenium WebDriver is the only driver for real browsers to the level of what our users use. It gives us access to real Safari, real and specific versions of Chrome and Firefox, instead of an approximation that works nicely for most application-level testing stuff we do with playwright. The third reasons is that while Ru learned Playwright in some days to no longer find it new, she has decade on Selenium and writes this stuff by heart. The same days of effort suffices for the others on playwright to learn enough of Selenium to work on these tests. 

We had other python libraries too, but obviously I am more comfortable sharing simple web page actions than showing details of how we do remote control of complex sensor networks and airport might have. 

In the end of Ru's presentation, she included a video of a test run with the visual showing ramping up the coverage. We watched boxes Turing from grey to green, our application doing things as per the model, and I was thinking about how I keep doing exploratory testing sometimes by looking at the application run, and allowing myself to pick up cues from what I see.  For purposes of this article, I did not include the video but a screenshot from it. 
In the end of the short illustration, Ru summed up what we have gained with model-based testing. 

It's been significantly useful in reliability testing we have needed to do, in pinpointing performance-oriented reliability issues. The reliability of the tests has been a non-issue. 

A system like this runs for years without reboots, and we need a way of testing that includes production-like behaviors with sufficient randomness. A model can run until it's covered. Or a model can run indefinitely until it's stopped. 

The obvious question from people is if we can generate the model from the code. Theoretically perhaps in a not distant future. But do we rather draw and think, or try to understand if a ready drawing really represents what we wanted it to represent. And if we don't create a different model, aren't we at risk of accepting all the mistakes we made and thus not serving the purpose we seek for testing it? 

We hope the approach will enable us to get a second organization acceptance testing our systems to consume test automation with its visual approach, and avoid duplication in organizational handoff, and to increase trust without having to rebuild the testing knowledge to same levels as with an iteratively working software development team would have. 

While the sample is one model, we already have two. We will have more. And we will have combinations of feature models creating even more paths we no longer need to manually design for. 

I'm optimistic. Ru can tell what it is like to create this, and she still speaks at conferences while I do not. I'm happy to introduce her in case you can't find her directly.