2016. január 25.

Behaviour testing with Gherkin

Tóth Ákos
Cloud Engineer

We live in a world where web services are increasingly complex architectural wonders, spanning multiple different sets of components and precisely performing a wide variety of tasks for millions of users with an incredible, nearly 100% availability. Providing this kind of precision and availability in a consistent manner, however, requires several layers of reliability from the development side - including a reliable testing infrastructure. This blog post aims to explore the basics of one of these layers, the behaviour tests.


Gherkin - a quick overview

Gherkin is a metalanguage that describes expectations for any kind of application in an easily readable format. This enables anyone - even people without any access to, or understanding of the source code - to write tests. Gherkin-based tests define the expected behaviours of the application, but is not semantically limited in doing so: the tested application may be an API that returns a status code and relevant data in an HTTP response or a custom application that runs locally and responds to data sent on UNIX sockets. Ultimately, it is the simplest swiss army knife when it comes to behaviour testing. An example of Gherkin-based test scenarios for a simple REST API:

  1. Feature: Example REST API
  2. 	In order to use the REST API
  3. 	As a customer
  4. 	I want to ensure I receive correct responses.
  6. 	Background: For these tests, use signature version v2
  7. 		Given that I am using signature version 2.0
  9. 	@tag @other_tag
  10. 	Scenario: Test a GET request
  11. 		Given the API at the URL "http://localhost:1024"
  12. 		When I send a signed GET request to the "/article/1" path
  13. 		Then a 200 status code should be returned
  14. 		 And the request body should contain the JSON representation of the "article" type
  16. 	Scenario Outline: Test retrieving different types
  17. 		Given the API at the URL "http://localhost:1025"
  18. 		When I send a signed GET request to the "/<type>/1" path
  19. 		Then a 200 status code should be returned
  20. 		 And the request body should contain the <format> representation of the "<type>" type
  22. 		Examples: The tested types
  23. 		| type        | format     |
  24. 		| article     | JSON       |
  25. 		| rss_entry   | XML        |
  26. </format>

The three primary keywords to note are given, when and then - these preface lines which define steps to be executed. Each test scenario is composed of a strictly ordered combination of these instructions - given some predefined state, when some actions are taken, then an expected set of results should be observable. These test scenarios are then grouped into a feature, which is effectively nothing more than a namespace - and a way for the reader to easily identify what sort of tests reside in the specific file. Features may house a background, which is a common set of given instructions, prepended to each scenario. The keywords and and but may be used in place of given, when or then. Lines beginning with these keywords mimic the behaviour of the previous keyword. Features may optionally have three strictly ordered comment lines, in the form of 'in order to (do something)', 'as a (role)', 'I want to (make something happen)'. These lines provide story-like introduction to the feature and serve no other purpose than business value and explanation for the reader. Implementations of Gherkin skip the parsing of these lines. Gherkin also allows you to tag your features scenarios. Tags are keywords, prefixed with @ and separated by spaces. In addition to providing keyword-based information about the scenario, tags may also be used in most implementations to filter which scenarios should be tested. Scenario outlines allow the definition of test templates. A scenario is then generated from each row of the supplied examples table - except the first row, which defines the name of the column. The values from each row are substituded into each place where the variable name appears in angle brackets - for example, for the type column in the above example. The above outline generates the following scenarios:

Scenario: Test retrieving different types@1
	Given the API at the URL "http://localhost:1025"
	When I send a signed GET request to the "/article/1" path
	Then a 200 status code should be returned
	 And the request body should contain the JSON representation of the "article" type

Scenario: Test retrieving different types@2
	Given the API at the URL "http://localhost:1025"
	When I send a signed GET request to the "/rss_entry/1" path
	Then a 200 status code should be returned
	 And the request body should contain the XML representation of the "rss_entry" type

Gherkin implementations

Another great advantage of Gherkin is that a testing suite exists for most popular languages - allowing teams to implement their behaviour tests in the same language as the code for the application itself. In order to assign semantics to each of these steps, they must be defined in a manner that is understood by a tool of choice.


Cucumber is the Ruby implementation for Gherkin. It makes full use of Ruby's unique syntax to make step definitions as smoothly readable as the tests themselves. To install the Cucumber tool, you simply need to install the gem itself:

 $ gem install cucumber

Installing the tool provides the cucumber binary, which examines all feature files in the working directory's features subdirectory, and attempts to match the steps within them with step definitions. Cucumber includes step definitions from all Ruby sources matching the pattern *_steps.rb within the features/step_definitions directory. To run your tests, you simply need to invoke the tool:

$ cucumber

The tool has a large amount of options allowing you to tweak the behaviour of the tests. These will not be detailed in this post. An example file system structure that Cucumber would understand would be:

features                   # the base directory for Gherkin tests
|---- api.feature          # a feature file that contains a single feature, with any amount of scenarios
|---- env.rb               # an environment file, included by Cucumber when launched
|---- step_definitions     # the subdirectory which contains all step definition sources
|---- all_steps.rb   # a Ruby source file which contains Cucumber-style step definitions

An example Cucumber implementation for some of the steps in the above example:

  1. # You can use regular expressions to gain full control over how the steps definition is matched and tokenized.
  2. # Each captured group is automatically assigned as a positional argument.
  3. Given /^the API at the URL "(.*?)"$/ do |url|
  4.     @url = url
  5. end
  7. # You can also use strings to define steps. Within strings, variables are denoted with a dollar ($) sign.
  8. # Behind the scenes, the strings are converted to regular expressions and variables are substituted with (.*)
  9. When "I send a signed $method request to the \"$path\" path" do |method, path|
  10.     @response = send_signed_request(method, "#{@url}#{path}")
  11. end
  13. # You can execute different steps from within steps. This effectively rewrites scenario.
  14. # This rewrite affects the output of the test run.
  15. When "I send an article listing request" do
  16.     step "When I send a signed GET request to the \"/article\" path"
  17. end
  19. # You can use RSpec expectations to define what the results should be.
  20. Then /^a ([0-9]+) status code should be returned$/ do |code|
  21.     expect(@response.code).to eq(code)
  22. end
  24. # Cucumber allows the definition of several hooks.
  25. # Before hooks run before the first step of each scenario
  26. Before do |scenario|
  27.     puts 'Running scenario'
  28. end
  30. # After hooks run after the last executed step of each scenario.
  31. # Execution of a scenario halts after the first failed step, or if there are no more steps to execute.
  32. After do |scenario|
  33.     puts 'Finished running scenario'
  34. end
  36. # Around hooks wrap scenarios. Cucumber passes a handle that you can use to execute the scenario itself.
  37. Around do |scenario, block|
  38.     block.call  # executes the scenario
  39. end
  41. # AfterStep hooks run after each step within a scenario.
  42. AfterStep do |scenario|
  43.     puts 'Finished executing a step'
  44. end
  46. # The execution of these hooks may be restricted to specific tags with the following syntax:
  47. Before('@tag1', '@tag2') do |scenario|
  48.     puts 'Executing a tagged scenario.'
  49. end



Behave is noteworthy as it is a solid implementation of Gherkin in Python. Python, along with Ruby, is generally popular as a language for weight-lifting scripts, and thus Behave will likely be your implementation of choice if your application or its associated scripts are written in Python as well. Behave also comes as a tool that can simply be installed through Python's package manager, pip. To get started, simply type: $ sudo pip install behave Be mindful of the Python version that you would like to use, as the Behave binary will be linked to the same Python version as the pip that was used to install it. If you would like to use Behave with Python 3, it is recommended to use pip3 to install it instead. The Behave tool includes step definitions from the features/steps directory. The only naming restriction for the step definition files is that they must have a .py extension. Running the tests is as simple as it is with Cucumber:

 $ behave 

Behave sports a set of command line configurations comparable to Cucumber's in magnitude. These options, however, will not be detailed in this guide. An example file system structure that Behave would understand would be:

features                   # the base directory for Gherkin tests
|---- api.feature          # a feature file that contains a single feature, with any amount of scenarios
|---- environment.py       # an environment file, included by Behave when launched
|---- steps                # the subdirectory which contains all step definition sources
|---- common.py      # a Python source file which contains Behave-style step definitions

Behave uses a concise and easily readable pythonic syntax for defining steps, by making full use of Python's decorators. Each step definition is prefaced with a @given, @when or @then decorator. The parameter of these decorators are the steps themselves. Behave is less flexible about the interchangeability of regular expressions and parsed strings - the type of expected parameter for the decorators must be set at runtime before the step itself is called. The tool defaults to using parsed strings. To switch to regular expressions instead, call:

  1. import behave
  2. behave.use_step_matcher('re')

Behave expects regular expressions to use named capture groups that correspond to the parameters expected to be passed to the step implementation function. An example Behave implementation for some of the steps in the above example:

  1. # The default step matcher for behave is 'parse', which parses strings like the one shown here.
  2. # No regular expression processing is done in parse mode. To switch to parse mode, call behave.use_step_matcher('parse').
  3. # Each name in curly braches - {example} - is parsed from the step string and passed as a named argument.
  4. # Python raises an error if an argument cannot be found in the parsed string.
  5. # A context is passed to each step definition. The context is an object that is persistent across steps and scenarios.
  6. # You may freely add, modify and delete the attributes of the context.
  7. @given('the API at the URL "{url}"')
  8. def step_impl(context, url):
  9.     context.url = url
  11. # You can then retrieve variables from the context in later steps.
  12. @when('I send a signed {method} request to the "{path}" path')
  13. def step_impl(context, method, path):
  14.     context.response = send_signed_request(method, "{0}{1}".format(context.url, path))
  16. # The context object also allows you to call different steps than the one currently being executed.
  17. # Unlike with Cucumber, doing this does not change the output text displayed when running the test.
  18. @when('I send an article listing request')
  19. def step_impl(context):
  20.     context.execute_steps('When I send a signed GET request to the "/article" path')
  22. # If you opt to use the 're' step matcher instead, you gain full control over how the step is parsed.
  23. # Remember that named groups must be used for regular expressions.
  24. # Behave considers a step failed if the step raises any exception. The simplest way verify data therefore is by using assertions.
  25. @then('^a (?P<status>[0-9]+) status code should be returned$')
  26. def step_impl(context, status):
  27.     assert response.status_code == int(status)
  29. # Behave allows you to define hooks which run before or after any element of the test.
  30. # The hooks are functions with the name before_<resource> or after_<resource>.
  31. # These functions receive the context and an object corresponding to the resource as arguments.
  32. # The resource may be: step, scenario, feature, tag or all. Tag hooks run before or after tagged sections.
  33. # All hooks run once at the beginning or end of the entire test.
  34. def before_all(context):
  35.     print('Running test')
  37. def after_scenario(context, scenario):
  38.     print('Finished running a scenario')
  40. def before_tag(context, tag):
  41.     if tag.startswith('@tag'):
  42.         print('Running a section tagged with @tag')



Behat is the PHP implementation of Gherkin and it should hit the closest to home for a Drupal developer. It uses a more bulky and verbose object-oriented PHP syntax, which should be a comfortable fit for those familiar with Drupal 8. The simplest way to install Behat is through Composer. You can insert the following lines into your project's composer.json to always get a version of Behat compatible with what is detailed below:

  1. "require-dev": {
  2.     "behat/behat": "^2.5"
  3. }

You can then install behat by invoking the composer executable within the same directory as the composer.json file:

$ composer

For details on how to install composer, check this page. Once Behat is installed - assuming your composer's binary directory is set to ./bin - you can invoke Behat by simply running the binary:

$ bin/behat

Like Cucumber and Behave, Behat also allows the usage of a respectable set of command line options. Behat scans for all .php files in features/bootstrap. Within these files, (by default) the class FeatureContext determines the way steps are run. An example directory structure:

  1. features                        # the base directory for Gherkin tests
  2. |---- api.feature               # a feature file that contains a single feature, with any amount of scenarios
  3. |---- bootstrap                 # the subdirectory for the PHP sources
  4. |---- FeatureContext.php  # the PHP source file for the FeatureContext class

In Behat, each step definition is a method of the context class, FeatureContext. The function of each method is determined using the docstring attributes present for that function. The example implementation below should be more descriptive than I could explain:

  1. <?php
  3. use Behat\Behat\Context\BehatContext;
  5. require_once 'PHPUnit/Autoload.php';
  6. require_once 'PHPUnit/Framework/Assert/Functions.php';
  8. class FeatureContext extends BehatContext
  9. {
  10.     public function __construct(array $parameters)
  11.     {
  12.         // Subcontexts are objects which alter the behaviour of the feature context by implementing new steps and hooks.
  13.         // The use of subcontexts is optional.
  14.         // Each subcontext is another class that extends BehatContext similarly to this one.
  15.         // To use a subcontext within this context, call the following function:
  16.         // $this->useContext('subcontext_alias', new SubContext());
  17.         // where SubContext is the name of the class implementing the subcontext.
  18.     }
  20.     // By assigning the BeforeFeature attribute, this method becomes a pre-feature hook.
  21.     // It will be called each time a new feature is being executed.
  22.     // Note that the name of the method does not matter - the attribute is used to determine what this method will do.
  23.     /** @BeforeFeature */
  24.     public function beforeFeatureHook() {}
  26.     // Behat uses regular expressions to match step definitions. This is the same Given implementation as for the
  27.     // Behave and Cucumber examples above.
  28.     /** @Given /^the API at the URL "(.*?)"$/ */
  29.     public function givenAPI($url) {
  30.         $this->url = $url;
  31.     }
  33.     /** @When /^I send a signed .*? request to the "(.*?)" path$/ */
  34.     public function whenSendRequest($method, $path) {
  35.         $this->response = send_http_request($method, $path)
  36.     }
  38.     // Like in Ruby and Python, the steps may be failed by throwing an exception.
  39.     /** @Then /^a ([0-9]+) status code should be returned$/ */
  40.     public function thenStatusCode($code) {
  41.         if ($this->response->code !== $code) {
  42.             throw new Exception("Expected code $code.")
  43.         }
  44.     }
  45. }
  46. ?>

So which one should I choose?

There is no good or bad choice involved for these tools. Each of these behaviour testing suites perform the testing tasks in a comparable, adequate way, and thus your choice should be based solely on what makes sense for the project. A heavily PHP oriented project should be tested using Behat, while a Go application is better suited for Python or Ruby based testing, depending on which language your related scripts are using, if any.

Structuring your features - management vs. engineering

When writing tests in Gherkin, one consideration to make is the granulation of testing scenarios into features. Each feature may contain an arbitrary amount of scenarios, and your suite may contain an arbitrary amount of features. But what are features exactly? From a project management standpoint, each feature is a small compartment of tasks that may be performed using your product. For example, considering Drupal 7 crudely, a feature is that you can perform CRUD operations on nodes, and another feature is that you can perform CRUD operations on users. From an engineering standpoint, Gherkin features are more likely to be a set of scenarios with a common background - a shared set of circumstances which must be present for the testing. For an ideally maintainable set of tests, both of these standpoints should be considered with an appropriate balance, and the scenarios should be rallied into slightly larger features that encompass tasks that can be performed with similar sets of circumstances. The reason behind this is that Gherkin provides the Background keyword - a list of givens which are set in stone for each scenario of the feature. Only one background may be present for each feature, and it significantly reduces the amount of givens that must be explicitly stated in each scenario. This should be the primary consideration when grouping scenarios into a feature. Sticking with the Drupal example, consider that we would want to test that, within a default installation of Drupal 7, an anonymous user, a registered user and an administrator see the expected results when performing CRUD on nodes and users. One way to organize this into features would be to group by entity type. This would make more sense at first glance, as it would spawn a feature that tests the node subsystem and one feature that tests the user subsystem. Each feature would have three scenarios, one for each role. However, the module being tested (node/user) is not actually a proper circumstance of the test, but the role of the browsing user is. Therefore this approach, while technically correct, does not properly minimize the amount of givens written. Instead, we should consider common circumstances - out of the six scenarios, each two has the common circumstance of the same given role. By grouping the scenarios into features by role - anonymous, registered, and administrator - we can place the role in the Background section of the appropriate feature, thereby reducing the amount of givens by half. From a maintainability standpoint, this is a more ideal choice.


Automated testing is a very important process within the development of a reliable product and Gherkin is a tool that significantly simplifies the maintenance of complicated test processes and allows testing scenarios be written using an intuitive, natural syntax. This post contains almost all of the necessary information to get you started with it. In further posts in this series, we'll explain how to integrate your behaviour tests seamlessly with CI systems, such as Travis or Jenkins. Is there anything you feel that we missed or should further explain? Let us know in the comments!

Related posts

2016. június 8.

In our previous post on the topic, we used output formatters to determine how Behat displays test results. Now we continue with exploring our possibilities on what tests to run together with Behat’s scenario selectors.

2016. április 19.

As a growing company with a strong Drupalist department, we have reached a point when continuous integration and automated testing is necessary to sustain pace, and given the characteristics of Drupal, behavior-driven testing with Behat is a logical candidate.