A Seasoned Tester's Crystal Ball: February 2025

Tuesday, February 25, 2025

Habit of AI in Quality Engineering

I did a talk today, and published slides. I encouraged actively seeking uses to learn, rather than seeking the one tool that does the thing for you.

https://speakerdeck.com/maaretp/habit-of-ai-in-quality-engineering

Someone from my audience provided an AI-generated example of exactly that one-up attitude to your use of time and speech to text, test to summarized my messages with a basic approved chatbot genAI.

Overlayed in red things where either I or AI missed the mark on the message.

Introduction to AI in Quality Engineering:
- The document begins with an introduction to the speaker's experiences with AI in quality engineering, highlighting the relevance of AI tools in improving testing processes.
- The speaker shares their initial encounter with GitHub Copilot in 2021, emphasizing its potential to assist in coding and testing tasks.
Job Interview Experience:
- The speaker recounts a job interview where they used GitHub Copilot during a pair programming session. This experience demonstrated how AI could assist in testing and coding tasks.
- The interview involved converting integers to English words, and the speaker used GitHub Copilot to generate multiple solutions, showcasing the tool's capabilities and limitations.
Evolution of GitHub Copilot:
- The document details the evolution of GitHub Copilot from its early access days to its current state in 2025. Initially, the tool had limitations, but it has since seen significant improvements and new features.
- The speaker highlights the introduction of features like multi-file context, chat interfaces, and agent-type functionalities, which have enhanced the tool's usability and effectiveness.
Role of AI in Testing:
- AI is portrayed as a valuable pair programmer and testing assistant, capable of generating code and test cases. However, human oversight and decision-making remain crucial to ensure quality and accuracy.
- The speaker emphasizes the importance of evaluating AI-generated outputs and making informed decisions based on the context and requirements of the task at hand.
Building AI Habits:
- The document discusses the importance of integrating AI tools into daily tasks to build effective habits. The speaker encourages using AI tools like GitHub Copilot and Edge Copilot to enhance productivity and learning.
- Practical advice is given on how to start using AI tools in everyday work, such as seeking second opinions, refining prompts, and continuously learning from AI interactions.
Challenges and Considerations:
- The speaker addresses the challenges of balancing automation with manual testing, noting that while AI can assist in many tasks, it is essential to maintain a human touch to ensure comprehensive testing.
- Considerations are given to the quality and accuracy of AI-generated outputs, emphasizing the need for thorough evaluation and validation.
Future of AI in Quality Engineering:
- The document concludes with a forward-looking perspective on the potential of AI to transform testing practices. The speaker envisions a future where AI tools are seamlessly integrated into quality engineering processes.
- Continuous learning and adaptation are highlighted as key factors in leveraging AI effectively, with an emphasis on staying updated with new features and advancements in AI technology.

The summary layers a tone that isn't mine. The tone of *focusing on today*, considering learning a journey, and the conclusion that for anything and everything we do, we are either learning or contributing. It's that learning that saves percentages of time to allow us to do more, not the autogeneration of contributions. Learning is curation of AI-contributions, deciding on the reigns.

Friday, February 21, 2025

Remembering job interviews

As I delivered a talk today on 'Lessons Learned from Landing a Job Offer with GenAI', someone from the audience wrote a comment on an impressive learning journey a serendipitous chain of events started for me. The comment was dwelling in the back of my mind, bringing out realization that I have quite a collection of stories of what job interviews can be like for a senior. Some of the jobs I ended up taking and others not, as interviews are two-sided explorations of needs and aspirations. I wanted to draft together a view into some of these.

The interview when GitHub Copilot was new

I had gone through a few rounds of conversations, and the last step of the process was a pair programming interview. It was my first pair programming interview, and I approached the interview with concerns:

Testing ME on programming skill? I'm a tester, and while I write code and pair on all kinds of tasks, writing code is not where I shine. Layering feedback on top of code as it's been written or as it has been written, that's my ballpark.
Pairing in an interview? Watching me do without working with me is not pairing. That alone was enough to feel wary.

The instructions were to come with IDE and setup of my own, and we'd work from there.

The serendipitous event that made this a story worth a stage was that GitHub Copilot had just been released to the public. October 29th 2021 was release date, and December 10th 2021 was my interview date. I also was lucky to get access early on.

When I could have been preparing for interviewing without generative AI, I just could not resist using it, and taking it along with me. I also had an idea: if I was to show what I do as *tester* on code, having a programmer who is not refusing to pair with me would be a good idea.

I practiced with Roman Numerals -kata. I learned a lot about how it would be tested by doing the usual moves I do for exploratory testing. I read up some "specs". I generated a selection of outputs programmatically for inputs. I compared generated outputs for outputs of other programs, particularly a web application with praise on how good it was for roman numerals, and excel.

While I did not become the Roman Numerals expert I am now, I got started. I am the expert on it I am now, because I turned that into an exercise I did with some hundred people, crowdsourcing learnings and collating those into a talk known as 'Let's Do a Thing and Call It Foo'.

Showing up to the interview, I found it a little funny how the exercise I was expected to work on was not Roman numerals (1 --> I) but numbers to text (1 --> One). These are very similar problems.

While I recognized the idea of my pair was to get me to write examples TDD style and grow the application, we did only a little bit of that. We ended up writing a line of comment, selecting our chosen implementation from GitHub Copilot 10 options list, and focusing on writing example tests and approval tests, and talking about my choices of those.

I let the company know I would not be joining fairly soon after. Being the only tester and the only woman, and expecting my life to include bringing that perspective in did not feel like the right personal choice. They would not have understood extra load that places on me.

The interview where we tested together for the whole day

I had again gone through multiple layers of conversations, and even a full day psychological evaluation to quality me as a possible leader for the organization. I had given them an option of seeing me in action. I was training a course concept of exploratory testing where we could teach my future colleagues (and I could meet them), with us testing their application. We set that up.

The course is really fairly standard, and I have run that for a lot of different companies and knew what I was up against. The experience was also fairly standard: we found, with my facilitation, significant bugs in their latest version of software that they considered important and had not found without my facilitation. I learned my future colleagues would have been lovely, and they would have welcomed me to the organization.

I ended up not accepting the position after I felt they should have offered to compensate me for the day I taught them, and I did not like the way one of the hiring managers was challenging testing when they should have shown support.

The interview that was a workshop on creating my own job description

This interview was for a company that I knew already I wanted to work at. That is a lovely starting point, and the whole interview experience was really built around allowing me a good start at the right place of the company.

The two interview sessions set up for me were with colleagues I interviewed for what they'd like from someone like me to incorporate that into my job description. I wrote my own job description that then became a part of the offer I accepted. It was a great way of landing me with support into a fairly big and complicated, even siloed organization.

Exceptional, and I worked there for multiple years.

The interview that was psych evaluation telling me I am not fit for my career

This one was me applying for a position with a standard approach. I went through interviews that I don't remember in particular, but the half-day testing at an evaluation center, that one I remember. This was my first experience of those, and I had significantly more than 10 years of tester career behind me by then. Unlike the other psych evaluation that was assessing my strengths and weaknesses as a manager, this one was testing my intellectual abilities and creating a profile of my preferences with a questionnaire of some sort.

I will always remember how hilarious I found it that I got a paper telling me I am not likely to be successful in a tester career. Not only I was it then, but continued to be after it, but at least know I have it on paper that no one should let me test. Apparently I am not cut out for it. Or, they don't know why would be cut out to do a good job on testing. That is more likely.

The organization said they have a principle decision on not hiring without this service provider's recommendation. I did not get the job, and I am not convinced I would have taken it even if it was given to me.

The interview where we went through an improvement plan where texts were written by me (CC licensed)

On this one, the recruiting manager came in with a TPI (test process improvement) assessment report, to discuss my approach to helping them improve testing while doing it. The conversation was lovely, but the report made it memorable. I had written most of the texts in that report. Not that the recruiting manager knew that before.

It became soon clear that I knew the structure of the report, the likely conclusions, could enhance them in the moment. And correct some of the mistakes I had in my public creative commons -available materials that had been helping write that report.

I took the job, and loved my time there.

The interview where they made me test a text field

This one was a fun one. It was my second time joining the same organization, and a result of people I had worked with before inviting me to interview with them. While I had been gone, things had changed. In the interview I had an architect who wanted me to show I know how to test because that is apparently how you test testers.

They asked me to test a text field. And I told them I was around when that assessment exercise was created and had talked about the exercise on conference stages since, enhancing the context of the exercise to actually having a text field we could test - with real context to it.

They asked me to test a chair then, or tell me how. I refused to play along, politely. I did not consider that something that was a worthwhile test for my skills, but more of a humorous conversation.

I got the job, took the job and absolutely loved my time there. More than anywhere else, even if I have loved working where ever I have been.

The interview where they made me test notepad

This one was my foundational interview, for 1st ever job I had in testing. I had no clue what testing is. They invited me to a classroom setup with many other people, sat me in front of a computer and told me to report discrepancies between English and Finnish versions of Notepad. Some bugs had been seeded into the Finnish version, and I was expected to systematically report them.

This is how I became a tester.

The interview where we talked about meaningful work

To conclude this, I need to talk about my last job interview that landed me this position I am in now. It was pleasant experience of meeting people twice, to talk about my aspirations, my search for meaningful work and meaningful systems, and their organization.

It felt painless, collaborative and appreciative. Then again, the people interviewing me were aware of me and my work, even if they did not know me.

I'm very happy I accepted the position, and I am even more happy that they made the position something I could not have known to ask. Carving the right shape for me is what I appreciate the most

Others I don't remember specifically

I'm sure there are others. After all, I have been around a while. I have been loyal to my employer for the time I am there, and open about my ideas of what I want to spend my limited time on next. Knowing I will commit a minimum of two years and work to leave my places of work in a better state than they were has generally been helpful.

Are your stories as varied as mine?

Saturday, February 8, 2025

Evolving scoring criteria for To Do App for recruiting

I have used various applications for assessing people's (exploratory) testing skills in recruiting. While it's already a high-stress scenario, I feel there needs to be a way to *test something* if that is the core of the job you'd get hired for. I may believe this because I too, was tested with a test of testing 28 years ago.

Back when I was tested, the application was two versions of Notepad, one the original English and the other a localized version with seeded bugs. My setting for doing the test then was one hour in front of a computer in a classroom, being observed. There were 8-12 of us in total for each scheduled session, we were all nervous, did not know each other and most likely never even introduced ourselves. We did our thing, reported discrepancies we identified as clearly as we could, and got a call back or not. I remember they did these tests in scale. The testing teams we had weren't small, and we were all total newbies.

This week I assigned To Do app as the thing to test. For newbies, I make it a take home assignment and recommend using under two hours. For experienced testers, I spend half an hour face to face out of the max two hours of interviewing time we allocate. The work of people is not back yet, but the work of me looking at the application myself got done.

The most recent form of this take home assignment is one where I give two implementations of To Do App, and explain they are what developers use to show off their best practice for applying front end frameworks.

Elm: https://todomvc.com/examples/elm/
Angular: https://todolist.james.am/#/

I ask for outputs:

A clearly catalogued listing of identified issues you’d like to give as feedback to whoever authored the version
Listing of features you recognize while you test
Description of how you ended up doing the assignment
(optional) example of test automation in language and framework of your choice

The previous post listed the issues I am aware of, and today I created a listing of scoring the homework. In case you have your version, or would like to talk about mine. I am still hoping for the day when some of the people doing the assignment *surprise me* by reading about how to approach these exercises from my blog.

I can't expect complete newbies to get to all, but what worries me the most is that too many of seasoned testers don't even scratch the surface. We still expect testers to learn testing by testing, often without training or feedback.

To Do Application -Assessment Grid

ESSENTIAL INSIGHTS

Architecture: frontend only

Same spec for both implementations

Material online to reuse

Reading the room, clarifying assumptions

Optional is chance to show more

Presenting your work is not just description of doing

ESSENTIAL ACTIONS

Research: find the spec as it is online

Research: ask questions

Meta: explain what and why you do

Learning: showing something changed in knowledge while testing

Bias to action: balance explaining and results

Modeling function, data, environments

Recognizing tools of environment

Choosing a constraint to control perspective

Stopping criteria: time or coverage

Classifying and prioritizing

Clarity of reporting issues

Reporting per implementation and common for both

TL;DR - expect lazy readers

Using and explaining a heuristic

Awareness of classes of data (e.g. naughty strings)

Surprise me (e.g. screenshot to genAI)

RESULTS

Functional problems (e.g. off by one count, select all, tooltip)

Functional problem, browser dimension (e.g persistence, icon corruption)

Usability problems (e.g. light colors, lack of instructions)

Implementation problems (on console) e.g. messages in code and errors in console

Data-related problems: creating empty items

Data-related problems: trim whitespace

Data-related problems: special characters

Missing features (e.g. order items)

Typos

In-app consistency (e.g. always visible functionality that does not always work)

AUTOMATION

Working with selectors

Reading error messages

Scenario selection

Locators

Structure and naming

Describing choices

Readme for getting tests to run

MISTAKES THAT NEED EXPLAINING

Overfocus on locators while application is unknown and automation is not in play

Wanting to input SQL injection string

I ended up cleaning this all and making it available at GitHub: https://github.com/exploratory-testing-academy/todoapp-solution

Monday, February 3, 2025

That Pesky ToDo app

While I am generally of the opinion that we don't need injected problems on applications that are already target rich enough as is, today I went for three versions of a well-known test target problem, namely the ToDo MVC app.

Theoretically this is where a group of developers show how great they are at using modern javascript frameworks. There is a spec defining the scope, and the scope includes requirement for having this work on Modern browser (latest: Chrome, Firefox, Opera, Safari, IE11/Edge).

So I randomly sampled one today - the Elm version, https://todomvc.com/examples/elm/.

I took that one since it looked similar to in styles to what playwright uses as their demo, https://demo.playwright.dev/todomvc/, while the latest react version already has the extra light styles updated to something that you are more likely to be able to read.

I also took that one since it looked similar to the version flying around as target of testing with intentionally injected bugs, https://todolist.james.am/.

My idea was simple:

start with app, to explore the features
loop to documenting with test automation
switch over implementations to see if automation is portable over various versions of the app

I had no idea of the rabbit hole I was about to fall into.

The good-elm-version was less good than I expected:

Select all does not work
edit mode cannot be escaped with esc
unsaved new item not removed on refresh
edit to empty leaves the item while it should be removed
edit to empty messes the layout and I should not see it since 4) should be true

So I looked at the good-latest-react version, only to learn persistence is not implemented.

And that is where the rabbit hole went deep. I researched the project materials a bit, and explored the UI to come up with an updated list of claims. The list contains 40 claims. That would let me know that good-elm-version was 90% good, 10% not good.

Looking at the bugs seeded version, there's plenty more to complain:

Typos, so many typos: need's in placeholder, active uncapitalized, toodo in instructions
"Clear" is visible even when there are no completed items to clear
"Clear" does not clear, because it is really "Clear completed"
Counter has off by one error
Placeholder text vanishes as you add an item, but returns on refresh
Sideways a as icon for "mark all us complete" is not the visual I would expect, nor is the A with ~ on top for deleting - on chrome, after using it enough, but the state normalized on forced refresh.
Select all does not unselect all on second click
Whitespace trim is not in place if one edits items with whitespace, only when items are shown
in comments is probably intentionally added for fun
ToDo: Remove this eventually tooltip is probably added for fun as well
Errors on missing resources on console are probably added for fun too
"Clear" is missing the counter after it the spec asks for
usability with clear completed, since its functionality only works on filters all and completed, does it really need to be visible on the active filter
URL does not follow the technology pattern you would expect for the demo apps.

In statistics of the listing of features though, the pretty listing of capabilities is hard to map with the messiness of issues:

        ✓ should show placeholder text
        ✓ should allow to clear the completion state of all items
        ✓ should trim entered text 
        ✓ should display the current number of todo items
        ✓ should display the number of completed items
        ✓ should be hidden when there are no items that are completed
        ✓ should allow for route #!/

7/40 (17,5%) does not feel like it is essentially worse but then again, there are many types of problems that the list of functional capabilities does not lead one to.

There is also usability improvement conversation type of feedback, that is true for both the two versions.

The annoyingly light colors where seeing the UI and instructions is hard
None of these allow for reordering items and it feels like an omission even if intentional
None of these support word wrapping
usability of concepts "active" and "completed" for to do items is a conversation: are there better words that everyone would understand more clearly?
usability with a mouse, there's no adding with a mouse even if that feels by design
usability of the whole design of router / filter concept can be confusing, as you may have a filter that does not show the item you add
Stacked shadow effect in the bottom makes it seem like there are multiple layers. This does not connect with the filters / routing functionality well.
Delete, edit and select all options take some discovering.

You could also compare to what you get from a nicely set up demo screenshot of the bugged version.

The pesky realization remains: seeding bugs is unfortunately unnecessary. While I got "lucky" with elm-version's four bugs, I also got lucky with the refactored react version that is missing implementation of persistence.

There's also an idea that keeps coming up with experienced testers that we really need to stop throwing at random: SQL injections. For a frontend only application without database, it makes so little sense unless you can continue your story with an imagined future feature where local storage of json gets saved up and used with an integration. Separating things true now and risks for future are kind of relevant in communicating your results.

Playing more with the automation is left for another day. The 9 tests of today were just scratching surface, even if they 100% pass on playwright practice version and don't on any of the others.