Wednesday, November 30, 2011

Manual Testing Radiator

Two days ago Finnish Association for Software Testing - that I just happen to be the chairman for - had an Exploratory Testing Dinner. The idea is to eat and talk testing, with some volunteering to talk on a topic, and then let others join in.

One of the three talks this time was by Petri Kuikka from F-Secure. He's my testing idol, someone I respect and admire, having worked with him for a few years at F-Secure. He shared a story on a manual testing radiator, that I just need to share with others even before they manage to put the system open source as they've planned.

In the years I worked at F-Secure, I brought in borrowed idea of a testing dashboard, adapted from James Bach's materials. I introduced the idea of area architecture based reporting on two dimensions, Quality and Coverage. We used traffic lights for the first, and numbers 0-3 for the latter. In some projects, I used a third entity, best-before commitment, which basically was an arrow showing if we'd need to say I'll change my mind about quality due to changes by tomorrow / a little longer / relatively stable. To collect the values, I used to collect people into rooms for thumbs up / down voting, making the other's opinions visible and just learning to report feelings based data in a consistent way. I learned that power of a crowd outweights any metrics-based reporting in delivering the message.

The problem that I never tackled but Petri and other current F-Secure employees have tackled, is keeping this style of reporting up to date, when you have continuous integration, with changes introduced from many teams, for areas separate and common.

For the continuous integration build system, they have a status radiator for automated tests coming directly from the tool of their choice. If a build fails, the visual radiator shows which part is failing, and when all is well (as per test automation being able to notice), the whole picture of parts remains blue.

The manual testing radiator does the same, but with a facebook-like thumb-voting icons. If a tester clicks thumbs up, that's counted as a positive vote. Positive votes remain valid only for a limited timeframe, as with continuous changes, if you did not test an area for a while, it's likely that you really know little of the area anymore. When several people vote thumbs up, the counts are summed and showed. Similarly, there's possibility to cast to kinds of negative votes. With the thumbs down icon, the tester gets to choose if the status is yellow ("talk before proceeding, concerns, bug report ID") or red ("significant problem, here's the bug report ID") . If a negative vote, even one, is cast, the positive votes counting goes back to zero. All the votes are saved in a database, this is just the visible part of it in the radiator.

Another type of a radiator they use is showing progress on statuses on regular intervals. Like in battleship game, they mark down where they feel they are with product releases and end-of-sprint versions. This is to show that not tested does not mean not working, but it means we really don't  know. With scrum and automation, the amount of exploratory testing is still significant and guided by time considerations.

Petri was talking about having the automated and manual testing radiators, on visible locations at office, side by side. I would imagine that causes positive buzz towards the results that exploratory testing can deliver, that are hard for the automation - assuming that there's stuff that exploration finds on areas that automation claims that is working fine.



Friday, November 18, 2011

Falling into a trap - talking of Exploratory Testing

I attended the public defense of dissertation for Juha Itkonen today. His doctoral dissertation with the title "Empirical studies on exploratory software testing" is a topic particularly close to my heart, since it was the topic I started with but failed to continue striving for in the academic sense. Juha started in the same research group after me, took the relevant topic and made all the effort to research and publish what he could in the limited timeframe (of nearly 10 years).

When Juha's opponent started, he went first for the definition of exploratory testing and distinguising exploratory testing and non-exploratory testing. This, I find, is a trap, and an easy one to fall into.

Juha explained some of the basic aspects of what he has summarized of exploratory testing, but struggled to make the difference of exploratory vs. non-exploratory. He pointed out that learning is essential in exploratory testing, and changing direction based on learning. And the the opponent pointed out that typically people who test based on test cases also learn and add new tests as needed based on what they learned. Juha went on explaining that if you automate a script and remove the human aspect, that is clearly not exploratory. And the opponent pointed out that a human could look at the results and learn.

What Juha did not address in his defense, or in the papers I browsed through, is the non-linear order of the tasks. He still works on the older definition of simultaneous activities, and thus, in my opinion, misses a point of tester being in control what happens first, after, again and for how long each of the activities endure to achieve a goal.

A relevant challenge (to me) is that all testing is exploratory. There's always a degree of freedom, tester's control and learning when you do testing. Trying to make an experienced tester follow a script, even if such existed, is not something that happens - they add, remove, repeat and do whatever needs to be done, including hiding the fact that they explore. All real-world approaches are exploratory testing to a degree.

I still have a problem to solve: if all testing is exploratory, what words should I use to describe the wasteful, document-oriented approaches, that take a lot of effort for creating little value. There's a lot of real-world examples where we plan too much, what we plan is the wrong things as we do the planning at time we know the least. And what's even worse, due to be belief in planning, we reserve too little time for the hands-on testing work and related learning, and with the schedule pressure, fail to learn even when the opportunity supposingly is there.

Low quality exploratory testing is still exploratory testing. Getting to the value comes from skill. And with skill, you are likely to get better results with some planning and enough exploration-while-testing, than skipping the planning. And yet with skill, you could use too much time on planning and preparing, due to logistics of how the software arrives to you.

Ending with the idea from the opponent for today's dissertation: the great explorers (Amundsen - south pole / example coming from a norwegian professor) planned a lot. What made the difference between those that were successful and those that were not, was not whether they planned or not, but that they planned holistically, thinking of all kinds of aspects, and keeping their minds open for learning while at it - making better choices for the situation at hand and not sticking to the plan like carrying back stone samples at cost of one's life.




Saturday, November 5, 2011

Management: 80 % feels better than 50 %

I never feel too good going back to Quality Center for the tests we just did, but this time we managed to use it so that it did not cause us too much trouble.

Our "test cases" in the test plan, are types of data. A typical test case on average seems to have 5 subdata attached to it. All in all, a lot of our planning was related to understanding what kinds of data we handle in production, what is essentially different and how we get our hands on that kind of subset.

We had two types of template tests in use for the steps:
  • Longer specific process flow descriptions: basically we rewrote the changes that were introduced in the specification in detail in minimum number of workflows, and ended up with about 40 steps, and 5 main flows.
  • Short reminders of common process flow parts: these had 5 steps and just basically mention the main parts that the system needs to go through, little advice for the testers.
Our idea was, that for the first tests we run, we support the newcoming testers with the longer flows. And when they get familiar (they have the business background, it doesn't take that much), we enable them to do things in exploratory fashion with the feeling of being in service role towards own colleagues - it's an internal system.

Before we started testing, I took a look at the documentation, numbers and allocated time, and made a prediction. We'd get to 50 % measured against the plan in the timeframe. After the first our of five weeks, we were at 4 %, and I was sure we'd stay far behind my pessimistic prediction.

We also needed to split the tests in Plan to 5 instances of the test in Lab, to make sure we would not end up showing no progress or being inable to share the work in our team, as one of the things to test were potentially a week worth of work. We added prioritization at this point, and split things so that one of the original test cases could actually be started on week one to be completed on week 5 - if it would ever be completed.

We did our finetuning while testing, moved from the detailed cases (creating more detailed documentation on what and how we were checking things) to the general ones, and with face-to-face time talking on risks we cut down each actual test run in different way, basically building together understanding of the types of things we had already seen and could go with the risk that it may not work but we won't bother, since there's stuff that is more relevant to know of right now.

With notetaking, I encouraged the testers to write down stuff they need but taught that blank means nothing relevant to add. They run the tests in QC lab and made notes there in the steps. The notes we write and need are nowhere near the examples that go around session-based test management.

The funny part came from talking with some of the non-testing managers, when they compared my "50 % is where we will get" to the end result of getting to 80 % as the metric was taken out from quality center. It was (and still is) difficult to explain that we actually did end up pretty much on the level I had assumed and needed to cut down about half of what we had planned for, we had just decided we'll cut out random amounts of each test, instead of something that they decided to assume was comparable.

Yet another great exploratory testing frame with the right amount of preparation and notetaking. And yet, with all the explaining of what really happened, management wants to think there was a detailed plan, we executed as planned and and the metric of coverage against plan is relevant.

Friday, November 4, 2011

Ending thoughts of an acceptance test

I spent over a year in my current organization to get to the point where my team completed a project's acceptance test the first time. Now, down to little over two years, a second project's acceptance test is ending.

I'm a test manager in quite a number of projects that are long enough that I feel that most of my time goes into waiting to actually getting into testing. Two more to complete in upcoming months, and another longer project just getting started.

I wanted to share a few bits on the now-about-to-be-finished acceptance testing.

Having reached this point, I can reflect back a month and admit, that I believed our contractor would not be able to find the relevant problems. I had reviewed some of their results, and noted that
  • system testing defects were mostly minor and there were not that much of those
  • it took about 4 days to plan & execute a test on average, 1 day to fix a bug when found during system testing
  • it took about 2 days to plan & execute a test on average, 0,7 days to fix during integration testing
On setting up the project, I had tried convincing steering group to let the customer side testers participate in the contractor testing phases in testing role, but was refused. We wanted to try out how things go when we allocate the responsibility over to the contractor side. So I wasn't convinced there would not be a number of "change requests" - bugs that the contractor can't or don't care to find, since they are not to be read directly from specifications or problems that you just can't read from spec directly.

Now that we're almost done with our testing, I can outline our results. We found a small yet relevant portion of problems during the project. I had estimated that if I can count number of bugs to fix with one hands fingers, we'll be able to do this in the allocated one-month timeframe, and that's where we ended. Another batch was must-have change requests, bugs with just another name. And we logged about 80 % of issues, that reflected either bugs in current production (and mostly those we don't fix, although it's not so straightforward) or problems in setting up the test scenario for the chain of synchronized data. Just counted that we used on finding issue on average 4,6 days. If I take out the ones that did not result in changes, 20 days to a relevant bug. I have no right to complain about the contractors efficiency. And I better do them right, and make our management clearly aware of this too.

So, I guess I have to admit. The contractor succeeded even though I remained sceptic up until the very end. I had a really good working relationship with the customer side test manager, and at this point I'm convinced she is a key element in this experience of success and another ones still going on, being over schedule with still relevant concerns on whether the testing we do on customer side will ever be enough.

Just wanted to share, amongst all the not-so-fortunate stories I have, that this one went really well. The only glitch during the project was at the time when our own project manager needed planning support to take the testing through. After that, she was also brilliant and allowed me to work on multiple projects on less managerial role but more of as a contributor to the actual testing.

Saturday, August 20, 2011

Tester to testing is NOT like surgeon to surgery

I wrote a post earlier (quite some time ago). Today I realized I had comments on that post and other ones that I did not know of.

The main point that I was trying to write about is that some people may be right in saying you don't need testers - as in full time test specialist team members - in scrum teams, you just need testing - the skills in team members that don't identify as testers.

James Bach made a good point in saying he has not met any / many people who would make serious commitments towards learning the skills needed in testing other than those who identify themselves as testers. I've met some, but too few.

However, this post is about arguing the point he made I placed in title
"It could also be said that you don't need surgeons -- only surgery"
or the milder version of the same that Ru Cindrea, a respected colleague within Finland made that you don't need developers either, only development.

I would argue that testing to development is not quite the same thing as surgery. I mean that in the sense that surgeons are not in the information providing industry as testers are to developers and stakeholders, but are more self-contained. Perhaps there would be a second surgeon in surgery, that looks out to provide a service to the other, but without knowing much of surgeons life, I would still guess that person is called a surgeon too.

If there was no development, there would be little testing related to the development that was never done.

We need people trained in the skills of testing. Well trained. Both those who identify as developers and those who identify as testers. The essential difference to me is that it is hard to keep up with the skills of testing with the limited amount of hours available, let alone having to use half - or more - of my time on development skills.

I just don't want to go with the assumption that developers can't do testing, when I've witnessed in numbers more testers who can't test than developers. It may not matter what you're called.

Wednesday, August 17, 2011

Testing Micromanagement

I spent the last week in CAST2011 in Seattle, talking with testing people and listening to the most interesting-sounding tracks I could find. One that left me thinking for a long time was by Carsten Feilberg on Managing Testing based on Sessions.

After his quite interesting talk, I had to ask if he felt like his style of SBTM had too much of micromanagement included and naturally he did not feel that way. I wrote in my notes that managers responsibility is not to manage, but to make sure it is managed, and have been pondering about my strong reaction to what I felt was micromanagement.

We briefly talked about the contextual differences, but way to shallow to actually yet know of the determining factors. One of the differences we identified is how we felt responsibility in our organizations is allocated, e.g. is test manager responsible for coverage or can that responsibility lie within the team members. My team members are subject-matter experts and I would not dream of controlling their work in less than 2 hours chunks, I just teach them to do that themselves. Thus I have sessions that are "private" and that are "shared", so I do heavy sampling to guide the team and test if they are on the right track as per I understand it.

I went for wikipedia to check what micromanagement is: a management style where a manager closely observes or controls the work of his or her subordinates or employees.

Having thought about it for a week, I still feel SBTM as it has been described originally is a form of micromanagement. It's less granular than detailed test cases, but it was intended for high accountability - that comes with a cost. I most often don't feel I need that.

In a local discussion whether SBTM is micromanagement or not, a lesson I picked up in agile circles several years back came to mind. In some material I read, there was a typical quadrants picture of two dimensions of building a self-organized team. One axis was Willing vs. Unwilling: whether there was attitude building to do with the individuals in the team. Another was Capable vs. Uncapable: whether they had deep enough skills to do the work that needed doing. For now, I think assumptions on where my team is on these scales are significant in deciding how often you'd need to control to keep the notes good enough.

I feel lucky to have a team that is willing and mostly capable. And that my capabilities amend to those that the team already has.

Saturday, March 5, 2011

Timing is essential!

In recent projects, I've had the pleasure of working with an organized frame for testing. The frame includes very traditional elements: first you design your tests, write them out for experts in the matter to review, to receive comments that as per your documentation they are not sure if what you do is enough and how you could improve. After ignoring most of the comments for various reasons, starts test execution phase, in which you're supposed to report which tests now pass and which fail, and which have you even tried out.

I work within an organization that provides contents to this frame of testing. We have testers, who supposingly could do the testing blind-folded, or with the help of other subject-matter experts. The frame is just for showing if we've done what we were supposed to, and if the schedule can hold up to its promises.

I'm having a small-scale rebellion within the frame.

For a particular area to test, I just did not manage to motivate myself into writing the tests as requested in the planning phase. It was a prioritization decision with too many things to do, to drop out something that would most likely be of little value. For testing in a different way, I have significant organizational support within my organization. The organization guiding the testing work is external, but my good fortune is that the somewhat-of-an-expert-in-test-positioning allows me more flexibility than others.

We skipped the "test planning phase". When "test execution phase" started, I trusted the 3 month timeframe would be enough for what we had thought and discussed on a very high level, checking a sample of production data, some hundreds of items. I assumed the people I work with knew how this particular thing is supposed to work, at least in some scale as they had participated in defining what we'd want.

We started executing tests as the version became available. We run tests on a selected sample of production data, selected against a business-oriented criteria of commonality and essential differences that we'd run into. We had selected a corner to start from, so this was just the first step - we were not sure what other steps would be required, but there was an idea that perhaps about 4 different rounds of attacking the software would be needed. The first round included 61 samples of data - 61 SOAP messages where the input data was the only variable.

Half of the response messages included wrong answers. We digged in deeper in our analysis, and identified 13 separate issues we reported. Same problems would happen in a lot of our samples, at least that was our assumption. Now, about a month later, I know that 5 out of our 13 issues have resulted in a code change to get the result we expected. One is in the queue. Others were due to us setting up some of our data incorrectly - a typical problem in our domain, looking for the stuff in the wrong
environment. With every step of our testing, we learned together, and adapted what we'd want our next steps to be. This was possible and easy, as we had not yet written the test specification that would fix the contents of our tests.

However, as was expected, the organized frame would make its comeback. I received polite reminders of delivering our test specification, up until the point I felt the "last responsible moment" had arrived. I reviewed examples of what the test spec should be like, and made the decision of not taking the advice from examples, which would have required me to write 61 test cases of the first round we did - documentation that no-one really needs. Instead, I wrote four test cases and no-one can argue those are not test cases. One just happens to handle multiple data samples.

At this point, we know from running the first round, that there's another round just like this one with a tweak of one of the most essential variables in addition to data. And we're safer to assume, knowing our skills and understanding the application better, that four samples of different changes to variables would cover the ground that we need to cover for us to be safe with the most significant risks. Thus, four tests cases.

Yesterday evening, two days after having delivered the test specification for review, I run the second batch of test. There's other people to do some more detailed analyses, but already as I was running them, I looked at all the results in a matrix to spot trends and increase my understanding of what this would take to test as far as we want to go. I realized, that if I had run my tests one-by-one as management wish was for reporting purposes, there would have been problems I could
not have spotted. They were obvious in a larger sample, but would have not been noted in single samples.

As I assumed, I also received the "how do we put these test into our reporting" -question, with kind request of writing tests as they had thought out for reporting purposes. I suggested two options:

1) four test cases, where one has been started but state is fail until bugs are fixed (which hides our progress, but serves as a rough way of seeing our progress)

2) add categorizing numbers to the test cases after we've run them, as we don't exactly know the contents before we're running them. They'd metrics-wise be the same as others, after the test have been run but not before. We just don't want to do extra work that does not create value, and them being able to follow our progress in detail provides little value if any.

Looking forward to how this turns out, and will write a better experience report later. I need these to change the status quo and allow good people do do good testing in our segment. There should be more consideration on timing when to do what and interleaving test design / execution. To allow my colleagues to work efficiently with better results than before, I need to help in creating a better frame. Time is right for that.