Tuesday, February 25, 2025

Habit of AI in Quality Engineering

I did a talk today, and published slides. I encouraged actively seeking uses to learn, rather than seeking the one tool that does the thing for you. 

https://speakerdeck.com/maaretp/habit-of-ai-in-quality-engineering

Someone from my audience provided an AI-generated example of exactly that one-up attitude to your use of time and speech to text, test to summarized my messages with a basic approved chatbot genAI. 

Overlayed in red things where either I or AI missed the mark on the message. 

  • Introduction to AI in Quality Engineering:
    • The document begins with an introduction to the speaker's experiences with AI in quality engineering, highlighting the relevance of AI tools in improving testing processes.
    • The speaker shares their initial encounter with GitHub Copilot in 2021, emphasizing its potential to assist in coding and testing tasks.
  • Job Interview Experience:
    • The speaker recounts a job interview where they used GitHub Copilot during a pair programming session. This experience demonstrated how AI could assist in testing and coding tasks.
    • The interview involved converting integers to English words, and the speaker used GitHub Copilot to generate multiple solutions, showcasing the tool's capabilities and limitations.
  • Evolution of GitHub Copilot:
    • The document details the evolution of GitHub Copilot from its early access days to its current state in 2025. Initially, the tool had limitations, but it has since seen significant improvements and new features.
    • The speaker highlights the introduction of features like multi-file context, chat interfaces, and agent-type functionalities, which have enhanced the tool's usability and effectiveness.
  • Role of AI in Testing:
    • AI is portrayed as a valuable pair programmer and testing assistant, capable of generating code and test cases. However, human oversight and decision-making remain crucial to ensure quality and accuracy.
    • The speaker emphasizes the importance of evaluating AI-generated outputs and making informed decisions based on the context and requirements of the task at hand.
  • Building AI Habits:
    • The document discusses the importance of integrating AI tools into daily tasks to build effective habits. The speaker encourages using AI tools like GitHub Copilot and Edge Copilot to enhance productivity and learning.
    • Practical advice is given on how to start using AI tools in everyday work, such as seeking second opinions, refining prompts, and continuously learning from AI interactions.
  • Challenges and Considerations:
    • The speaker addresses the challenges of balancing automation with manual testing, noting that while AI can assist in many tasks, it is essential to maintain a human touch to ensure comprehensive testing.
    • Considerations are given to the quality and accuracy of AI-generated outputs, emphasizing the need for thorough evaluation and validation.
  • Future of AI in Quality Engineering:
    • The document concludes with a forward-looking perspective on the potential of AI to transform testing practices. The speaker envisions a future where AI tools are seamlessly integrated into quality engineering processes.
    • Continuous learning and adaptation are highlighted as key factors in leveraging AI effectively, with an emphasis on staying updated with new features and advancements in AI technology.

The summary layers a tone that isn't mine. The tone of *focusing on today*, considering learning a journey, and the conclusion that for anything and everything we do, we are either learning or contributing. It's that learning that saves percentages of time to allow us to do more, not the autogeneration of contributions. Learning is curation of AI-contributions, deciding on the reigns. 

Friday, February 21, 2025

Remembering job interviews

As I delivered a talk today on 'Lessons Learned from Landing a Job Offer with GenAI', someone from the audience wrote a comment on an impressive learning journey a serendipitous chain of events started for me. The comment was dwelling in the back of my mind, bringing out realization that I have quite a collection of stories of what job interviews can be like for a senior. Some of the jobs I ended up taking and others not, as interviews are two-sided explorations of needs and aspirations. I wanted to draft together a view into some of these. 

The interview when GitHub Copilot was new

I had gone through a few rounds of conversations, and the last step of the process was a pair programming interview. It was my first pair programming interview, and I approached the interview with concerns: 

  • Testing ME on programming skill? I'm a tester, and while I write code and pair on all kinds of tasks, writing code is not where I shine. Layering feedback on top of code as it's been written or as it has been written, that's my ballpark. 
  • Pairing in an interview? Watching me do without working with me is not pairing. That alone was enough to feel wary. 
The instructions were to come with IDE and setup of my own, and we'd work from there. 

The serendipitous event that made this a story worth a stage was that GitHub Copilot had just been released to the public. October 29th 2021 was release date, and December 10th 2021 was my interview date. I also was lucky to get access early on. 

When I could have been preparing for interviewing without generative AI, I just could not resist using it, and taking it along with me. I also had an idea: if I was to show what I do as *tester* on code, having a programmer who is not refusing to pair with me would be a good idea. 

I practiced with Roman Numerals -kata. I learned a lot about how it would be tested by doing the usual moves I do for exploratory testing. I read up some "specs". I generated a selection of outputs programmatically for inputs. I compared generated outputs for outputs of other programs, particularly a web application with praise on how good it was for roman numerals, and excel. 

While I did not become the Roman Numerals expert I am now, I got started. I am the expert on it I am now, because I turned that into an exercise I did with some hundred people, crowdsourcing learnings and collating those into a talk known as 'Let's Do a Thing and Call It Foo'. 

Showing up to the interview, I found it a little funny how the exercise I was expected to work on was not Roman numerals (1 --> I) but numbers to text (1 --> One). These are very similar problems. 

While I recognized the idea of my pair was to get me to write examples TDD style and grow the application, we did only a little bit of that. We ended up writing a line of comment, selecting our chosen implementation from GitHub Copilot 10 options list, and focusing on writing example tests and approval tests, and talking about my choices of those. 

I let the company know I would not be joining fairly soon after. Being the only tester and the only woman, and expecting my life to include bringing that perspective in did not feel like the right personal choice. They would not have understood extra load that places on me. 

The interview where we tested together for the whole day

I had again gone through multiple layers of conversations, and even a full day psychological evaluation to quality me as a possible leader for the organization. I had given them an option of seeing me in action. I was training a course concept of exploratory testing where we could teach my future colleagues (and I could meet them), with us testing their application. We set that up. 

The course is really fairly standard, and I have run that for a lot of different companies and knew what I was up against. The experience was also fairly standard: we found, with my facilitation, significant bugs in their latest version of software that they considered important and had not found without my facilitation. I learned my future colleagues would have been lovely, and they would have welcomed me to the organization. 

I ended up not accepting the position after I felt they should have offered to compensate me for the day I taught them, and I did not like the way one of the hiring managers was challenging testing when they should have shown support. 

The interview that was a workshop on creating my own job description

This interview was for a company that I knew already I wanted to work at. That is a lovely starting point, and the whole interview experience was really built around allowing me a good start at the right place of the company. 

The two interview sessions set up for me were with colleagues I interviewed for what they'd like from someone like me to incorporate that into my job description. I wrote my own job description that then became a part of the offer I accepted. It was a great way of landing me with support into a fairly big and complicated, even siloed organization. 

Exceptional, and I worked there for multiple years. 

The interview that was psych evaluation telling me I am not fit for my career

This one was me applying for a position with a standard approach. I went through interviews that I don't remember in particular, but the half-day testing at an evaluation center, that one I remember. This was my first experience of those, and I had significantly more than 10 years of tester career behind me by then. Unlike the other psych evaluation that was assessing my strengths and weaknesses as a manager, this one was testing my intellectual abilities and creating a profile of my preferences with a questionnaire of some sort. 

I will always remember how hilarious I found it that I got a paper telling me I am not likely to be successful in a tester career. Not only I was it then, but continued to be after it, but at least know I have it on paper that no one should let me test. Apparently I am not cut out for it. Or, they don't know why would be cut out to do a good job on testing. That is more likely.

The organization said they have a principle decision on not hiring without this service provider's recommendation. I did not get the job, and I am not convinced I would have taken it even if it was given to me. 

The interview where we went through an improvement plan where texts were written by me (CC licensed)

On this one, the recruiting manager came in with a TPI (test process improvement) assessment report, to discuss my approach to helping them improve testing while doing it. The conversation was lovely, but the report made it memorable. I had written most of the texts in that report. Not that the recruiting manager knew that before. 

It became soon clear that I knew the structure of the report, the likely conclusions, could enhance them in the moment. And correct some of the mistakes I had in my public creative commons -available materials that had been helping write that report. 

I took the job, and loved my time there. 

The interview where they made me test a text field

This one was a fun one. It was my second time joining the same organization, and a result of people I had worked with before inviting me to interview with them. While I had been gone, things had changed. In the interview I had an architect who wanted me to show I know how to test because that is apparently how you test testers. 

They asked me to test a text field. And I told them I was around when that assessment  exercise was created and had talked about the exercise on conference stages since, enhancing the context of the exercise to actually having a text field we could test - with real context to it. 

They asked me to test a chair then, or tell me how. I refused to play along, politely. I did not consider that something that was a worthwhile test for my skills, but more of a humorous conversation. 

I got the job, took the job and absolutely loved my time there. More than anywhere else, even if I have loved working where ever I have been. 

The interview where they made me test notepad

This one was my foundational interview, for 1st ever job I had in testing. I had no clue what testing is. They invited me to a classroom setup with many other people, sat me in front of a computer and told me to report discrepancies between English and Finnish versions of Notepad. Some bugs had been seeded into the Finnish version, and I was expected to systematically report them. 

This is how I became a tester. 

The interview where we talked about meaningful work

To conclude this, I need to talk about my last job interview that landed me this position I am in now. It was pleasant experience of meeting people twice, to talk about my aspirations, my search for meaningful work and meaningful systems, and their organization. 

It felt painless, collaborative and appreciative. Then again, the people interviewing me were aware of me and my work, even if they did not know me. 

I'm very happy I accepted the position, and I am even more happy that they made the position something I could not have known to ask. Carving the right shape for me is what I appreciate the most

Others I don't remember specifically

I'm sure there are others. After all, I have been around a while. I have been loyal to my employer for the time I am there, and open about my ideas of what I want to spend my limited time on next. Knowing I will commit a minimum of two years and work to leave my places of work in a better state than they were has generally been helpful. 

Are your stories as varied as mine? 


Saturday, February 8, 2025

Evolving scoring criteria for To Do App for recruiting

I have used various applications for assessing people's (exploratory) testing skills in recruiting. While it's already a high-stress scenario, I feel there needs to be a way to *test something* if that is the core of the job you'd get hired for. I may believe this because I too, was tested with a test of testing 28 years ago. 

Back when I was tested, the application was two versions of Notepad, one the original English and the other a localized version with seeded bugs. My setting for doing the test then was one hour in front of a computer in a classroom, being observed. There were 8-12 of us in total for each scheduled session, we were all nervous, did not know each other and most likely never even introduced ourselves. We did our thing, reported discrepancies we identified as clearly as we could, and got a call back or not. I remember they did these tests in scale. The testing teams we had weren't small, and we were all total newbies. 

This week I assigned To Do app as the thing to test. For newbies, I make it a take home assignment and recommend using under two hours. For experienced testers, I spend half an hour face to face out of the max two hours of interviewing time we allocate. The work of people is not back yet, but the work of me looking at the application myself got done. 

The most recent form of this take home assignment is one where I give two implementations of To Do App, and explain they are what developers use to show off their best practice for applying front end frameworks.
  1. Elm: https://todomvc.com/examples/elm/
  2. Angular: https://todolist.james.am/#/

I ask for outputs:

  • A clearly catalogued listing of identified issues you’d like to give as feedback to whoever authored the version
  • Listing of features you recognize while you test
  • Description of how you ended up doing the assignment
  • (optional) example of test automation in language and framework of your choice
The previous post listed the issues I am aware of, and today I created a listing of scoring the homework.  In case you have your version, or would like to talk about mine. I am still hoping for the day when some of the people doing the assignment *surprise me* by reading about how to approach these exercises from my blog. 

I can't expect complete newbies to get to all, but what worries me the most is that too many of seasoned testers don't even scratch the surface. We still expect testers to learn testing by testing, often without training or feedback. 

To Do Application -Assessment Grid

ESSENTIAL INSIGHTS
Architecture: frontend only
Same spec for both implementations
Material online to reuse
Reading the room, clarifying assumptions
Optional is chance to show more
Presenting your work is not just description of doing

ESSENTIAL ACTIONS
Research: find the spec as it is online
Research: ask questions
Meta: explain what and why you do
Learning: showing something changed in knowledge while testing
Bias to action: balance explaining and results
Modeling function, data, environments
Recognizing tools of environment
Choosing a constraint to control perspective
Stopping criteria: time or coverage
Classifying and prioritizing
Clarity of reporting issues
Reporting per implementation and common for both
TL;DR - expect lazy readers
Using and explaining a heuristic
Awareness of classes of data (e.g. naughty strings)
Surprise me (e.g. screenshot to genAI)

RESULTS
Functional problems (e.g. off by one count, select all, tooltip)
Functional problem, browser dimension (e.g persistence, icon corruption)
Usability problems (e.g. light colors, lack of instructions)
Implementation problems (on console) e.g. messages in code and errors in console
Data-related problems: creating empty items
Data-related problems: trim whitespace
Data-related problems: special characters
Missing features (e.g. order items)
Typos
In-app consistency (e.g. always visible functionality that does not always work)

AUTOMATION
Working with selectors
Reading error messages
Scenario selection
Locators
Structure and naming
Describing choices
Readme for getting tests to run

MISTAKES THAT NEED EXPLAINING
Overfocus on locators while application is unknown and automation is not in play
Wanting to input SQL injection string

I ended up cleaning this all and making it available at GitHub: https://github.com/exploratory-testing-academy/todoapp-solution 

Monday, February 3, 2025

That Pesky ToDo app

While I am generally of the opinion that we don't need injected problems on applications that are already target rich enough as is, today I went for three versions of a well-known test target problem, namely the ToDo MVC app

Theoretically this is where a group of developers show how great they are at using modern javascript frameworks. There is a spec defining the scope, and the scope includes requirement for having this work on Modern browser (latest: Chrome, Firefox, Opera, Safari, IE11/Edge). 

So I randomly sampled one today - the Elm version, https://todomvc.com/examples/elm/

I took that one since it looked similar to in styles to what playwright uses as their demo, https://demo.playwright.dev/todomvc/, while the latest react version already has the extra light styles updated to something that you are more likely to be able to read. 

I also took that one since it looked similar to the version flying around as target of testing with intentionally injected bugs, https://todolist.james.am/.

My idea was simple: 

  • start with app, to explore the features
  • loop to documenting with test automation
  • switch over implementations to see if automation is portable over various versions of the app
I had no idea of the rabbit hole I was about to fall into. 

The good-elm-version was less good than I expected: 
  1. Select all does not work
  2. edit mode cannot be escaped with esc
  3. unsaved new item not removed on refresh
  4. edit to empty leaves the item while it should be removed
  5. edit to empty messes the layout and I should not see it since 4) should be true
So I looked at the good-latest-react version, only to learn persistence is not implemented. 

And that is where the rabbit hole went deep. I researched the project materials a bit, and explored the UI to come up with an updated list of claims. The list contains 40 claims. That would let me know that good-elm-version was 90% good, 10% not good. 


Looking at the bugs seeded version, there's plenty more to complain: 

  1. Typos, so many typos: need's in placeholder, active uncapitalized, toodo in instructions
  2. "Clear" is visible even when there are no completed items to clear
  3. "Clear" does not clear, because it is really "Clear completed"
  4. Counter has off by one error
  5. Placeholder text vanishes as you add an item, but returns on refresh
  6. Sideways a as icon for "mark all us complete" is not the visual I would expect, nor is the A with ~ on top for deleting - on chrome, after using it enough, but the state normalized on forced refresh. 
  7. Select all does not unselect all on second click
  8. Whitespace trim is not in place if one edits items with whitespace, only when items are shown
  9. <!-- STUPID APP --> in comments is probably intentionally added for fun
  10. ToDo: Remove this eventually tooltip is probably added for fun as well
  11. Errors on missing resources on console are probably added for fun too
  12. "Clear" is missing the counter after it the spec asks for
  13. usability with clear completed, since its functionality only works on filters all and completed, does it really need to be visible on the active filter
  14. URL does not follow the technology pattern you would expect for the demo apps. 
In statistics of the listing of features though, the pretty listing of capabilities is hard to map with the messiness of issues: 

✓ should show placeholder text
✓ should allow to clear the completion state of all items
✓ should trim entered text
✓ should display the current number of todo items
✓ should display the number of completed items
✓ should be hidden when there are no items that are completed
✓ should allow for route #!/
 
7/40 (17,5%) does not feel like it is essentially worse but then again, there are many types of problems that the list of functional capabilities does not lead one to. 

There is also usability improvement conversation type of feedback, that is true for both the two versions. 
  1. The annoyingly light colors where seeing the UI and instructions is hard
  2. None of these allow for reordering items and it feels like an omission even if intentional
  3. None of these support word wrapping
  4. usability of concepts "active" and "completed" for to do items is a conversation: are there better words that everyone would understand more clearly? 
  5. usability with a mouse, there's no adding with a mouse even if that feels by design
  6. usability of the whole design of router / filter concept can be confusing, as you may have a filter that does not show the item you add
  7. Stacked shadow effect in the bottom makes it seem like there are multiple layers. This does not connect with the filters / routing functionality well. 
  8. Delete, edit and select all options take some discovering. 

You could also compare to what you get from a nicely set up demo screenshot of the bugged version. 





The pesky realization remains: seeding bugs is unfortunately unnecessary. While I got "lucky" with elm-version's four bugs, I also got lucky with the refactored react version that is missing implementation of persistence. 

There's also an idea that keeps coming up with experienced testers that we really need to stop throwing at random: SQL injections. For a frontend only application without database, it makes so little sense unless you can continue your story with an imagined future feature where local storage of json gets saved up and used with an integration. Separating things true now and risks for future are kind of relevant in communicating your results. 

Playing more with the automation is left for another day. The 9 tests of today were just scratching surface, even if they 100% pass on playwright practice version and don't on any of the others. 


Saturday, January 4, 2025

Framing pain to gratefulness

Today, I shed a few tears for feelings I needed to let out. Processing those feelings today, giving them the time box they needed was not sufficient but I felt the need of writing about them. 

I was in my feelings of pain because I made the final calls for choosing who are the four lucky people who get awarded SeleniumConf Valencia 2025 full scholarships including international travel, accommodation and conference tickets. I know I should be happy for the four, but today I am feeling the pain for the 269 that got listed but received a No. 

I volunteer with the Selenium Leadership Committee. This year I was trying very hard not to volunteer with SeleniumConf which is our flagship event, but some things are just too important to not show up for, especially if absence risks them. These scholarships are one of those things for me. 

I would like to see a world where all conferences set up a few free places, with or without travel included to change the face of the industry in the conferences. It does not happen from a great and worthwhile idea, but it needs someone doing the work. My tears today were part of that work because the work is not easy. Not doing it is so much easier. 

Selenium is a community that has been founded and built with a leadership that understands diversity needs work. I joined PLC because I saw that before my time in action. Talent is distributed equally, opportunity is not. And opportunities can and should be created to balance. 

The scholarships have been a part of SeleniumConf concept for some time now, and we do it for bringing participants in for free in hopes of building them up to speakers and contributors in the world. Last summer we also started another form of scholarships, which is for speaking of Selenium with underrepresented voices in conferences other than Selenium. 

I feel the pain of organizing 273 brilliant professionals in need of an opportunity, reducing the selection in the end to disabled, black, gay and women. I feel the joy for the lucky four that wouldn't exist without my pain. And working through that pain, I remember again to frame this with gratefulness. 

In other communities, I would have first needed to fight for such budget to exist. Selenium project already knows this is necessary. I'm grateful this is a routine we go through. 

The work for opening doors needs to be distributed. I am grateful I am in positions where holding the door is possible for me. 

With that said, I am looking forward to meeting the four great people that ended up on top of the shortlist. They will make my conference time just a little bit more joyful, and you all joining will be able to meet them as participants just like everyone else. Introducing them to everyone else participating, without the association to underrepresentation that opened the door, will be part of what makes my socially awkward form of extroversion in conferences a little easier. 

The change I want to see requires work. If it is a change you want to see, please volunteer for the work. I am happy to pass on the torch in a community that already holds space for it. 


Thursday, January 2, 2025

Socializing requirements

There's an article making rounds in the Finnish software communities about the greatness of low code platforms. As the story goes, a public servant learned to create a system he had domain expertise in on the side of his job, and the public organization saves up a lot of money. 

The public servant probably would not have learned a higher code tooling to do this, but did manage to build a LAMP stack application with one of the low code tools. 

The conversation around this notes the risks of maintenance - whether anyone else can or will take the system forward, but also the insight that a huge part of building software is communicating domain expectations between people with different sets of knowledge. The public servant explaining what the system should be like to someone who could use tools to build something like this would have probably been a project effort in its own scale. 

The less people we can have to complete the full circle of sufficient knowledge to build something in a relevant timeframe, the easier it is. Some decades in the domain and intricate details of where the pains and benefits lie most likely helped with the success. 

There are days when I wish I could just stop communicating with others, trying to explain what the problem we are solving is, and just solve and learn it myself. Those are days when I refer to myself as a #RegretfulManager. Because working in a contained scale with less people in the bubble is easier, progress feels faster and it's really easy to work on an illusion that value to me is value for everyone, and that I did not miss out anything for security, maintenance or impacts to people who aren't like me. 

---

Another article making rounds in the Finnish software communities is on delivering a system with some hundreds of requirements, and then having a disagreement on who is responsible for the experience of finding a lot of things missing or incorrect as per expectation interpreted. The conversation with that one makes the point the more complete interpretation of a requirement is the requirement when there is room for interpretation. 

The conversation of interpretations continues in court, even if it currently dwells in the articles. And we'll eventually learn agreements constraining parties in making their interpretations, and being in court everyone is already failing. 

---

Over the years of working with requirements from a testing perspective, I have come to learn a few things these articles making rounds nicely illustrate: 

  • Just like test plans aren't written but socialized, requirements are not written but socialized. Interpreting is socializing. And examples - describing what is (descriptive) are complete requirements - describing what should be (prescriptive). 
  • Features are well defined when we have a description of what is now and a description of what is after what should be. The journey needs stepwise agreements. 
  • No matter how well you prescribe and describe, you are bound to learn things someone did not expect. It's better to discuss those regularly rather than at the end of the project with 'acceptance testing'. Let your testing include starting of those conversations.