The Testing Pyramid

If you are a good developer/tester you probably ask yourselves these questions ūüôā

  • How many tests of each type should I have?
  • Do I need to write more unit tests?
  • Do I need to have less e2e tests?

Well luckily, we have the test pyramid below that can be used as a guideline for how many tests, roughly, you should have of each type. It is fairly self-explanatory; have a lot of unit tests and very few e2e tests. Why you ask?

test pyramid

Unit tests are cheap in terms of power/CPU usage/processes etc and e2e tests are the opposite. Unit tests give you fast feedback and tell you exactly where an issue is if one occurs. With an e2e test this is much more difficult. Because an e2e test most likely is using all the clients and services (or at least two or three), you have to dig more into the code to figure out what is going on.

The test pyramid also highlights testing strategies. This is called ‘Bottom Up’, i.e.

  • Test the domain
  • Tests closer to the code
  • Integrate early
  • Use mocks or stubs
  • Visualise test coverage

What is testing?

I quite rightly had a request from someone who is not in the technology world about what exactly testing is. So time to go back to basics.

To come up with one strict definition of testing is a bit difficult but put simply, testing is ensuring the quality of a product or application before it is released to customers. It is very unlikely that you will have a piece of software that is completely bug free. But it is the testers responsibility to ensure that the bugs that critical and of high importance (i.e. consumer facing/ make the experience of the customer worse) are dealt with and are fixed as soon as possible. It is also the tester’s job to find bugs. Whether this is through manual testing or automated tests it doesn’t matter. Find the flaws in the code and fix them (or get the developers to fix them!).

How do we test?

We test as the new code is developed or the old code is being re-factored/changed. We write automated tests that can run as part of our integration tools. That way we don’t have to go back and keep re-testing the same things over and over again. Regression testing (this is the kind of testing where you check that previous functionality has not been broken) should all ideally be automated. So when a new piece of functionality has been developed, you will have automated tests for it, do some manual testing (to check the standard flows) and then run the regression tests. Then you are done and can sign off for a release ūüôā

One another thing I do is change control. Because we release so frequently, I find it necessary to track the changes that are being made from one week to another. What I keep track of:

  1. The version number
  2. Details of changes
  3. Has the build passed?
  4. Is this the release version?
  5. When is it scheduled for release to production?
  6. Was the release successful or rolled back?


So if all is good and well, you have had your changes approved and you can release (yay!). But is it really yay? What is something goes wrong and you need to be able to fix it quickly…This is where monitoring comes in.

At the moment we are using splunk¬†and it is a really great log aggregator. You can take all your log files and get some meaningful information about what your users are doing. How many transactions are successful/failing? What cards are customers using? When are peak times? And in the case of errors, it can give you the exact service that is returning the error from a graph. What is great about seeing your logs returning useful information live is that they can also tell you that in some cases you are not doing great logging. And so, you can go and add better logging ūüôā

A note about splunk, the search mechanism it uses is all based on filters and field extractors. For example, let’s say you want to see transaction amounts against card type. You have to extract both these¬†field from the logs¬†then do a search query based on these field extractors.

The key to using these kind of tools usefully and successfully, is to have meaningful logs in the first place. You have to have done that work. This is a place where you want to know instantly whether everything is okay or not….


Unit Tests

What is a unit test?

Takes a very small piece of testable code and determines whether it behaves as expected. The size of the unit under test is not strictly defined, however unit tests are typically written at the class level or around a small group of related classes. The smaller the unit under test the easier it is to express the behaviour using a unit test since the branch complexity of the unit is lower.

If it is difficult to write a unit test, this can highlight when a module should be broken down into independent, more coherent pieces and tested individually.

Unit testing is a powerful design tool in terms of code and implementation, especially when combined with TDD.

What can/should you unit test?

  1. Test the behaviour of modules by observing changes in their state. This treats the unit as a black box tested entirely through its interface.
  2. Look at the interactions and collaborations between an object and its dependencies. These interactions and collaborations are replaced by mocks/stubs etc.

Purpose of unit tests

  • Constrain the behaviour of the unit
  • Fast Feedback


Failing tests in the pipeline

That¬†situation where there are failing tests in the pipeline and you ask someone about it and the response you get it ‘Oh, these tests are failing because so and so service is not running, so it’s fine; these tests can fail’. Sound familiar? I really dislike this response for threefold reasons

  1. Why did we write these tests if we are going to be fine with them failing?
  2. Surely if they fail, this should be a flag that something is wrong? (an unreliable, flaky service)
  3. If these tests do not give valuable feedback and are useless, just get rid of them. A failing test build should mean that there is no release. If you release with a failing test build and something goes wrong in production then what are you going to do? The tests highlighted that something was wrong and we chose to risk it.

If tests need certain services running to pass, have those services running. If a service stops running for random reasons then how can you be confident that it won’t be the same on production?

If tests need certain data to pass, have that data. That data would be in live right?

A test environment should simulate as close as possible the live environment. We can test in it all we want, but the environment configuration should be almost identical.

Automated tests should always have a meaning and purpose, otherwise there is just no point having them.

Testing in an agile environment

Here at Sagepay we work in weekly iterations, i.e. the aim at the end of every weekly iteration is to deliver something new to the user; whether it be a developer, a customer or the manager of a business.

To help with this, we use Jenkins as our continuous integration tool. Once development is complete (we work on branches), the new code is merged to master. This will trigger builds on the pipeline that will deploy to the development environment and run the e2e tests (also on dev). The deployment to the QA environment is manual; I do this when one particular story is complete and is ‘releasable’. I run some basic manual tests on QA to make sure basic flows are working as expected (I have a checklist!).

What I have learnt when you are working in this way is that you have to have 100% confidence in the automated tests that are on the pipeline and the coverage that these tests have. Otherwise, it would not be possible to release weekly and QA would almost always block a release if it were manual.

This leads on to pair programming. When we have decided which stories we are going to work on in a sprint, I pair with a developer and start discussing what tests we should have for this particular story. This usually starts off by looking at the acceptance criteria as a baseline. We write some basic tests, e2e tests are easiest to write first and make them fail. Unit tests/integration tests follow. I will be writing a separate blog post about the different types of testing and the purpose of each level of testing. Once these new tests have passed and regression tests have passed, I am confident that the new code we will release will not break anything.

One thing to note: my focus is not to test everything and anything. It is to focus on what matters to the user and what they will do. How are they using the application? That way you ensure you test what is necessary and what will give you the most useful feedback.


TDD (test driven development) and BDD (behaviour driven development) are usually used hand in hand in an agile environment. However, sometimes it seems like they are used in a way that they mean the same thing which is not true. TDD/BDD what’s the difference eh? Well, to me here it is:

TDD – the focus of writing and making tests pass in the implementation phase helps make the development process faster and more efficient (in terms of faster and useful feedback). Always write tests first when you can (sometimes it is difficult to write tests or it may not make sense to test something so small – I will write another blog post about this).

BDD – real user behaviour and interactions drive tests and development. You are focused on what the user does, whoever the user may be. You will consider the standard user on one end and the extreme user on the other (i.e. someone who will click every button on your site or do strange things like doing a GET on a URL that should only be doing a POST).