Test Talk: testing

Showing posts with label testing. Show all posts

Saturday, 4 February 2017

Beware tests that don't test what you think they do

Migrating tests from one framework to another we came across a test suite that taught us a valuable lesson, treat your automation with a level of distrust, unless you have valid reasons or proof that it is doing what you expect.

This test suite was seemingly perfect, it was reliable, ran against all the supported releases and had a run time of 5 minutes. It's name was simply the name of the technology that it tested. The only problem was that if it was going to be used as part of our CI pipeline then it would need to migrated to our new test framework.

Most of the components of the test were easily reusable, all that needed to be migrated was the code that provisioned the test system and invoked the test programs. This took a few days of effort and I was left to review the new migrated test suite. Everything looked fine. The only issue that I had was that the test method names were a little un descriptive. I spoke to the engineers and requested that they ran the test in a debug mode to understand exactly what the test did. I didn't want them to do any static analysis of the source, but to see what the test did at runtime.

A day later I met with the engineers to see how they were progressing, they were hitting problems. As fair as they could see the test never invoked any of the APIs that they would expect given the name of the test suite. I asked them to show me and I had to concur. It did appear that the test didn't use the technology we were expecting.

I gave the engineers a list of diagnostics and further tests to double check this finding. After this was done it was clear the test was simply not testing the function that we thought it did.

The problem with this test is clear. There had been an assumption that the test 'did what it said on the cover' and that since it always passed it was considered a great test asset. In fact this was probably the worst test we had. It built a false level of confidence in the team and could have let regressions into the field.

Naturally we have deleted all evidence of this test suite from all source code repositories and test archives. However even this action was not without complaints. A lot of the team felt that even though it was clear the test didn't do what we expected it did do 'something' and so we should keep it. This is the wrong thing to do!

I agree that the test did execute some of the SuT function. However the code it did execute it only exercised and did not test it. If the test found a regression would it report the problem or not?

Because the test always passed it wasn't until we decided to migrate it that this problem was uncovered. Tests that fail regularly (either due to a regression or a test failure) at least get eyeballs on them. These checks implicitly validate that the test does something of valid use.

So what did we learn from this:

Code coverage is great at ensuring that a test is at least executing the code you think that it is.
If the test is executing the code you expect and it regularly passes it is worth checking that it is testing the code and not just exercising it.
We needed more tests in this area - we have now done this and ensured that they actually do what we expect.

Friday, 24 April 2015

Difference between load and stress - using a metaphor

Load or stress testing a component are two different test techniques that often get confused. Here is an analogy which I have modified from a conversation I had with James O'Grady

A load test is driving the car for 874 miles at an average speed of 60mph, in 5th gear, while using the air conditioning, cruise control and CD player. Using lots of the capabilities of the car at expected limits for a length of time. During and at the end of the journey we would expect the car to still be operational and all the dials on the dashboard to be reading nominal values. A stress test is a completely different type of test.

In a stress test we want to push the system beyond its limits. Often the limits will not be clear and so often the test becomes exploratory or iterative in nature as the tester is pushing the system toward the limits. If we reuse the driving analogy we might start the same journey but now drive at 70mph in 3rd gear. Initially we think this might be enough to stress the car. After 60 minutes we increase the stress by removing some of the car oil and deflating the tyres. Now some of the dashboard lights are showing us that some of the car components are stressed. We then remove some of the coolant fluid and remove a spark plug. Now the car is seriously under stress. All the lights are on and eventually the car gracefully stops operating and we are forced to steer to the hard shoulder. Once safe we re-fill all the fluid and oil, re-inflate the tyres and repair the spark plug. Now we are able to restart the car and resume our journey driving properly.

A stress test is pushing the system beyond the limits it is designed to run at either by restricting resources or by increasing the workload (or often both). This is done until the system either gracefully shuts down or restricts further input until it is now longer under stress conditions.

Both tests are heavily contextual as it relies on a deep understanding on how the software will be used in the wild. Will a customer use the software for a long period of time under a load condition or do they just use it in short bursts. This question is more important when you consider software built in the cloud.

If your software is built in the cloud and you are re-deploying every 2 weeks then your view of load and stress testing will be different to testing an on prem application as the operational realities of using that software are contextually different.

Friday, 25 April 2014

Why No Test Cases

The test team I lead do not use test cases. We believe that testing is an active, changing exercise and that rigid test cases does not support this rate of change. I will give my arguments for not using test cases before explaining how we do plan and track our testing

What is a test case?

A test case defines a process that will show if the software you are testing displays a particular desirable attribute. It consists of 3 main parts:

Starting condition
Method
End condition

And may have other parameters

Priority
Owner
Status (open, in progress, blocked, closed)
Weight

A low level test plan is a collection of these test cases for a particular module or capability of the software being tested. The test plan is reviewed and approved by stakeholders rather than an individual review of each test case.

Reasons for change

Not all test cases are created equal

To paraphrase George Orwell "all test cases are created equally, but some test cases are more equal than others". Each test case differs from others in terms of it’s:

Risk
Complexity
Duration

Tests when executed reduce the risk of faulty software being shipped to the customer. From the customers perspective there are different levels of risk, some acceptable and some not. The risk of a small input validation error within a utility program might be just annoying but other errors could cause a loss of money within a business or even loss of life for aviation or medical software. Different tests mitigate different amounts of risk.

The complexity of a test case can also vary. Modern software is rarely executed on it’s own and will likely have many integration points. Configuring, executing and monitoring the software during the test can be a complex task. Some simpler test cases though are a lot less complex.

Tests not only take time to execute but also can take considerable time to prepare the required test configurations and test harness to execute the test. As well as time required by the tester to complete these tasks, longevity or workload tests can then take an extended period of elapsed time to execute the test. Validation of the test execution data once the test has executed will also add to the duration of the test case.

Some tests will have a natural ordering between them. Quite often a test will build on top of previously run tests. These tests will require the tests run earlier to pass in order for this test to be run.

The above points show how varied and different a set of tests can be. Defining these varied and many tests into a single unified ‘test case’ hides the differences between them. Once in this form they are counted as if they were all identical in duration, risk and complexity. Boiling this complexity into a single test case template just hurts the testing.

Preparatory work

Any test case will require some degree of preparatory work before the 'actual test' can get underway. This might be configuring automation, defining workloads or developing test harness to run the test. Is this work part of the testing process or just necessary delay that has to be done before the real work starts. Fellow testers seem to fall into one of two main schools of thought. One group see prep work as a required evil, work that must be done but isn't part of the actual testing. The other group are of the thinking that as soon as you start to interact with the new parts of the software, the process of testing has begun. Either way how do you classify this work? It doesn't fit into the test case template at all and yet the test cases are dependant on it being done. When a team has 10 test cases remaining you don't know what work needs to be done to execute those tests? They might need 3 days of prep work to be done first.

Once the prep work is completed you are left with a set of test cases and a period of time to complete them in. This turns the test case into a metric and ...

Someone always wants to track them

There are x number of test cases left to do and y amount of time left to do it. Simplistically this is modelled by a straight line graph. The model assumes that each test can be executed in the same amount of time.

However as we have said before not all test cases are equal, so to assume that we can execute through them in a linear fashion is short sighted. Also as test work progresses defects are raised which will require recreation and verification and this takes time, time that cannot then be spent on executing tests. Responding to this the chart is often re-drawn as an s-curve like this one:

Progress is charted against the graph and teams either are congratulated or receive 'management focus' depending on the state of the chart. This misses the point and it's largely the fault of the test cases. Progress through a list of predefined tests means nothing. For the progress to be meaningful you need to assume that the quality of the tests are high and that the tests mitigate the majority of the risk for the customer. Even if you can make that assumption, progress through the test cases isn't enough to judge progress you also need to consider:

How many defects testers are finding.
The spread of defects across the product
What tests have actually been done yet (have we just done the easy ones)
what if we have completed 90% of the test cases and found no defects?

Without this additional information the complexity of testing is abandoned in favour of an overly simplified metric which is too narrow to be of any real use.

They don't react well to change

Testing is an exploratory sport, an experienced tester working through any set of test cases will question their approach based upon; progress through the test case, prior defects and often just a gut feeling. Regardless of how much thought and effort is applied to the formulation of the test plan, the planned testing and the actual testing will be different. This is because the tester is learning about the software and how it was implemented as they execute the tests. A single test case might expose a bug which is part of a bug cluster within a piece of code. To fully examine the area new test cases will be created and executed on the fly. This ad-hoc, off piste or exploratory testing should result in further test cases being written and added to the plan. Adding extra tests often infuriates managers trying to track progress. The new tests are driving further quality into the product and their value isn't in the testcase being written but in it being executed and the defects that they find.

Test approaches such as scenario based, use-case or exploratory do not lend themselves to a formal test case report. They often do not have a particular method or starting condition but instead tend to rely on experienced confident testers examining the product in the same way the intended customer would. These approaches are better tracked using a time box or complexity rating like duration or story points.

Test cases seem like a good idea. However in reality they are too simple a model to use in practice. A single test case can define the intent of a test but the model breaks when used to suggest progress or the quality of the product.

I explained this to a junior member of my team and drew parallels between some exploratory testing he wanted to achieve and a prototyping story that development were planning. Both didn't have a clear plan of implementation but they did have allotted time and a focus on learning how to achieve their aim by 'working with the software'.

This was when an alternative became clear. This is just software engineering. Can't we measure software development and texecute.work in the same way? Would that makes things clearer

Whats the alternative

Test work is just engineering work

If you stop thinking about testing as being the execution of test cases and instead as another type of software engineering work then an alternative approach becomes obvious. Testers still write code, examine specifications and solve problems in the same way as a developer. We don't track developers by the number of changesets they produce so why not treat testers in the same way and use the same tracking artefacts that are used in development.

We now use test stories for our system. Test work. No test cases in sight.

A Test Story is similar to a development story. It outlines the objective of the work, priority, owner and complexity. Except in a test story the objective could be more deconstructive than that of development story. As a dev story is constructing something new, test are de-constructing it to reveal defects.

Test Stories can be planned for an iteration and split into multiple child tasks that describe each individual unit of work. The test story is given an estimate of the time it might take, a priority of how important it is that this test is executed and a story point rating to describe how complex this test is to implement and execute

Handling defects

Defects found during the story executuon are linked to the work item to show the 'fruit' of that story. Development and test stories can be associated to show the testing that will be applied to new capability

Defects that are found will often require recreating to provide further diagnostics and require verification once a fix is supplied. This work will need to be planned and committed to an iteration just like any other piece of work. We have created queries to total the amount of time spent 'working' these defects. This can help to explain delays in test progress as testers are working on defects.

Showing progress

We can now show our progress by explaining the work we have committed in an iteration and our burn down throughout the iteration. Not using the simple metric of testcases focuses people to ask more interesting questions to determine the quality of the project:

What types of testing have we done?
What defects have we found?
How much time do we need to complete the high priority work

All more valid and useful than a single percentage of test case completion.

We have been using test stories as a replacement for test cases in the system test team for a few releases. Using them has given us the following benefits:

The total amount of test work that needs to be done including prep, defect recreation and verification as well as actual testing. We can plan our iterations a lot better.
Tracking is a lot easier as we can understand the duration as well as the complexity of the work rather than simple counting measures
We are starting to ask better questions about the state of the test project.

We still are not perfect. We are still refining our process to ensure we estimate tasks better and ensure we don't fall into the trap of just counting test stories. Overall we have seen an improvement in out efficiency. Who needs test cases? Not us!

Saturday, 17 December 2011

Testing in the open

Over the last part of the year I have been looking at ways of sharing our internal test process with our customers. Some of our managers have been a little reluctant at spilling the beans of how we test such a quality product. But here are my reasons why I think this is a good idea.

Confidence Building - if you know how much time, effort and ingenuity goes into the test process then that will build your confidence that the new release is ready for production use.

Customer Relationships. Extensive testing of the product shows that the manufacturer really cares about the product and thus the customers that buy it.

Sharing of ideas. If I tell you what scenarios I have run to test a particular component, it might sprout potential usage scenarios for your business or application

Now I am not suggesting that we throw the doors open wide and allow every customer to peruse our test infrastructure. IBM is still in the business of protecting it's intellectual property so we do need to ensure that we protect any new ideas though publication or patent before talking about them.

A good solution would be to share our test plans with customers who are within our beta programme. If you are on the CICS beta program would you be interested to see how we are planning to test the new function to be delivered?

Tuesday, 11 May 2010

I found a bug (but one I wasn't meant to find)

The biggest issue with being a tester is that it is habit forming. After a while you find yourself looking for bugs and when things do fail you find yourself trying to recreate them so you can narrow down the exact set of circumstances that cause the bug.

I have found a bug on my blackberry.

My phone has this cool feature called bedside mode, in this mode all the radios are turned off making it impossible for calls,email,sms etc to ruin a good nights sleep. The phone also has a feature that turns off the radios when battery power gets too low. Once the phone is being charged the radios will be turned on automatically.

The above two functions work well in isolation but when they are tested together there is a bug. Here is my bug report:
Abstract: battery charging causes phone to turn radio on even in bedside mode

Description:
1)Let battery drain so radios are powered off automatically
2)Plug phone into mains
3)turn bedside mode on
4)Once battery power becomes normal the radios will turn on again (even though bedside mode is on) and your sleep is disturbed by incoming messages.

So what do I do now. If this was a project that I was testing then there would be a well defined process for me to raise this issue with the development team. But I am just a customer. I had a search on RIM's website and couldn't see a way to raise a defect or even contact the support team.

Tuesday, 20 April 2010

Quis custodiet ipsos custodes?

"Quis custodiet ipsos custodes?" or "Who will guard the guardians?" or in my case as a software tester - who tests the testers?

As a system tester I am the final link between software being 'in development' and it being live for real users to use. We are often seen as the guardians of the users, ensuring that before it is made generally available the software is as bug free as we can possibly make it.

But who's job is it to test the system testers? who are we culpable to? or to paraphrase who "invites me for tea, biccies and a little chat" when things go wrong?

During my day job at IBM I provide the development organisation with High level plans of what testing we plan to inflict upon their software. This plan is reviewed by both them and my test colleagues. During test execution we report how well we are doing against the plan.

So our intentions are reviewed and approved by the development stakeholders. However testing is not a exhaustive activity. We can never reach 100% completion and state that the product is bug free!

The people that get hit are the customers of the product. Using some configuration, use case, or pattern that we didn't think of first, the customer found a bug. Personally I hate it when a bug is found in a product that I was testing. I will ask myself; why didn't I find that, why didn't I try that configuration etc.

Apart from the moral pain it may cause the tester, for the customer it may mean delays to their work schedule as they workaround or wait for a fix to the problem. But there is another organisation that will also get impacted.

Most (if not all) software houses will have a support organisation whose job it is to fix bugs that customers have found in the product. They bear the brunt of a customer complaint when the software does not behave as expected. This group of people more than test or development understand what customers are doing when the software fails. They understand the usage patterns that a customer is using when they notice a bug in the software.

It is the service team that 'pay the price' along with the customer when things go wrong. Perhaps it is this position that means that they should be allowed to guard the guardians and have close scrutiny of test plans to ensure that the tester has thought of common usage patterns that a customer may use.

I'm not suggesting that this approach will mean we hit 100% bug free software but perhaps it will mean that customers will not hit bugs when they are doing something that 'is normal for them' and that has to be a good thing.

Wednesday, 7 April 2010

Function Vs System Tests

I'm often asked "what is the difference between a functional and a system test - is a system test just a larger more complex functional test?"

A functional test (FVT) test proves that a particular isolated function of a system works as designed. The function is given a particular input and the response from the system is checked against the 'known' good response. These tests will test what happens when incorrect data is used as input and input data that is on the edge of 'good' and 'bad'. I suppose the key phrase is that an FV test is quantitative. The input and ouput data can be described mainly because the function under test has been isolated from the rest of the system and is simple enough to be tested in this way.

System Testing (SVT) is often seen as being 'bigger better FV testing'. People often see SVT like this because a lot of effort has gone into developing the FV tests and running them again but harder and faster seems like an efficient use of an existing resource. However such a concept misses the real point. Customers do not use the functions of a system in isolation. You can see this in domestic application as well as large scale enterprise software. Take a word processor for example. A user does not use the save function in isolation. They use it alongside a document that; contains pictures, is in a different format, is actually stored on a network drive etc. thus just testing the function in isolation is not what your customer will do.

The real aim of SVT is to test the System as a whole in the way a customer will use it. This means using all useful combinations of the functions of a system to perform a particular task. When doing this we are looking to see if the system behaves in a 'reasonable' way. We can't use a specific input and expect a specific output as in a good system test there are just too many different options and circumstances for this to be a efficient method.