Sunday, October 12, 2014

Saving screenshot after each step in py.test framework that runs Selenium

Testing websites with Selenium is fun. Py.Test framework is really cool.
Joining those two and running tests inside of PyCharm IDE makes you wanna cry with joy. That is if you previously worked with iMacros :)

But sometimes testing takes hours, is run inside of crontab and you only need to check the results. The TimeoutException or NoSuchElementException tell you nothing if you haven't looked what the browser state was. The screenshot would really be helpful.

Acutally screenshot made after each test, regardless of the result would be helpful too - for example to detect layout errors or errors not covered by asserts.
I found a way to do this in pretty simple way. We will need a fixture that

  • is used automaticaly with every test step
  • calls the selenium driver to make a screenshot

I found no way yet to learn test outcome and make use of it when doing screenshot (like naming file with ``FAIL`` prefix or so on).

The key feature here is that our screenshot-making fixture uses the selenium driver fixture. And it is the same fixture that is provided to test functions.

Say we have selenium driver fixture that is module-wise parametrized with list of countries. That is a new browser session is delivered to module functions in a loop of countries. Eeach test module gets a browser and is run in a loop of countries given in params=:

__author__ = 'sigviper'

import os
import pytest
import datetime

@pytest.fixture(scope="module", params=["pl", "de"])
def driver(request):
    """Selenium driver that creates a loop by countries"""
    class FirefoxImprovedBySigviper(object):
        def screenshot(self, request_for_test_function, session_test_timestamp):
            country = request.param     # module-wise request (each param creates new)
            out_dir = "/tmp/shutter-{timestamp}/{country}/".format(timestamp=session_test_timestamp, country=country)
            except OSError:

            fname = "{out_dir}/{country}_{module}_{function}_{timestamp}.log".format(
            f = open(fname, "ab")
            f.write(repr(self) + "\n")

    return FirefoxImprovedBySigviper()

@pytest.fixture(scope="function", autouse=True)
def shutter(request, driver, session_test_timestamp):
    """Screenshot-making fixture"""
    def fin():
        driver.screenshot(request, session_test_timestamp)


def get_timestamp():
    """Provide formatted current timestamp"""

@pytest.fixture(scope="session", autouse=True)
def session_test_timestamp(request):
    """Provide timestamp of test start time - fixed for a session"""
    return get_timestamp()

Now the driver must implement screenshot logic. Please take a carefull look on usage of request and request_for_test_function variables - they are not the same. The request is a attribute of driver fixture that is injected into FirefoxImprovedBySigviper class. The request_for_test_function parameter is given to screenshot function by shutter fixture that is ran before each test function call. The scope in test session of those variables is different. The module-wise request knows nothing about current function. On the other hand, request_for_test_function function-wise knows nothing about current param for the loop in driver. Consequently we need to use both.

Order of execution:

create session-wide timestamp and provide to all modules and functions
for each module create a Selenium driver:
    for each parameter in driver run all test functions in a module:
       save screenshot after each function call

Let's say we have two test modules: TestCatching and TestCopycatch each with three similarly named functions: *_ok, *_failed and *_raises
The test session in PyCharm looks like this:

The disk output with screenshot (here .log files, because I needed simple and quick sandbox for testing) looks like this:

viper@OptiPlex780:/tmp$ ls -lrctR shutter-2014-10-12_16\:16\:07/
razem 8
drwxrwxr-x 2 viper viper 4096 paź 12 16:16 pl
drwxrwxr-x 2 viper viper 4096 paź 12 16:16 de

razem 24
-rw-rw-r-- 1 viper viper 57 paź 12 16:16 pl_test_catching_test_1ok_2014-10-12_16:16:08.log
-rw-rw-r-- 1 viper viper 57 paź 12 16:16 pl_test_catching_test_2assert_failed_2014-10-12_16:16:09.log
-rw-rw-r-- 1 viper viper 57 paź 12 16:16 pl_test_catching_test_3raises_2014-10-12_16:16:10.log
-rw-rw-r-- 1 viper viper 57 paź 12 16:16 pl_test_copycatch_test_1copy_ok_2014-10-12_16:16:16.log
-rw-rw-r-- 1 viper viper 57 paź 12 16:16 pl_test_copycatch_test_2copy_assert_failed_2014-10-12_16:16:18.log
-rw-rw-r-- 1 viper viper 57 paź 12 16:16 pl_test_copycatch_test_3copy_raises_2014-10-12_16:16:19.log

razem 24
-rw-rw-r-- 1 viper viper 57 paź 12 16:16 de_test_catching_test_1ok_2014-10-12_16:16:11.log
-rw-rw-r-- 1 viper viper 57 paź 12 16:16 de_test_catching_test_2assert_failed_2014-10-12_16:16:12.log
-rw-rw-r-- 1 viper viper 57 paź 12 16:16 de_test_catching_test_3raises_2014-10-12_16:16:13.log
-rw-rw-r-- 1 viper viper 57 paź 12 16:16 de_test_copycatch_test_1copy_ok_2014-10-12_16:16:22.log
-rw-rw-r-- 1 viper viper 57 paź 12 16:16 de_test_copycatch_test_2copy_assert_failed_2014-10-12_16:16:24.log
-rw-rw-r-- 1 viper viper 57 paź 12 16:16 de_test_copycatch_test_3copy_raises_2014-10-12_16:16:25.log

All screenshots saved. Easy to find and sort. Names match country and test name. In case of error, timestamp is saved (sometimes helps developers narrow down periods in logs).

I like it. I'll just copy&paste it into my working code monday morning! :)

Thursday, July 31, 2014

Quality of software vs. quality of trucks

When company is to choose between two trucks (cheaper and more expensive) to buy, the schema is pretty straightforward. One is cheaper, of lower quality - the other offers quality and reliability. Some companies will choose the cheaper, most will probably go with reliability.

This is easy decision, basic competence of any company owner or manager - ask them about such a choice and not even a second and they start talking.

This is not the case with software

Most people ordering software cannot even understand what a quality and reliability of software is. With trucks you have publicly available reliability reports.
Trucks quality evaluation is a common knowledge. Evaluating reliability of software is area clouded with misconceptions and poor measure ideas.

Site availability in % per month is one of greatest tools to evaluate quality. But it does not say whether a user could even log in into system for the whole time.

You then go to measuring process availability in % per month which this is difficult and needs special agreements between companies.

The deeper you go into trying to understand quality and reliability the bigger obstacles you face.

This is why we stick with known errors or problems

Simple accounting tells us some fixes are too expensive to be made. Too expensive means: it's cheaper to constantly fix bugs due to that problem than fix the problem.

Simple accounting cannot account for software reliability, final user experience and so on.

Possible measures

By investing into Real User Monitoring we could really reach some level of confidence in software quality. Ammount of error statuses per hour gives you pretty good picture about whether processes work or not.

Other ideas:

  1. RUM: ammount of errorpages per hour for logged-in users
  2. window.onerror -> log problem via ajax for reporting and analysis
  3. ammount of errors in system logs (assuming java here, you need to count stacktraces)

Processes for quality

No logging or reporting will improve software if you do nothing with gained knowledge. Logs need to be inspected and bugs need to be reported.
Then those bugs need to be fixed and your client will have to agree to pay for this.

Information about found and fixed bugs need to be avaliable and transparent for both developing company and it's clients.

Third-party testing raids and code inspections should be as common and obvious as asking other doctor for opinion or visiting independent mechanic workshop.
Are we ready for this?

Paying more for better, more reliable software is a difficult choice. Let's not blame managers for their inability to understand our point of view.

Yes, quality is expensive as commonly said. Let's forge this slogan into real procedures and measures.

Maybe one day simple accounting will tell manager: you should pay more, you'll earn more with better software.

Tuesday, February 11, 2014

Retest and close what you reported

Tester's work is to find bugs, right?
Not exactly in my opinion: our job is to get rid of bugs and promote software quality.

Our job is to make sure bugs were fixed properly, with no new bugs introduced.
We also should make sure the fix is not lost, but:
  • deployed to production environments
  • pushed to the right branch


I've been often mislead by resolution:Resolved and "Fix version" field in JIRA. The bug might have status "fixed", and:

  • .. is not fixed at all or not good enough. Retest and Reopen!
  • .. is fixed, but not deployed (not commited/pushed ets). Retest and Reopen!
  • .. is fixed by changing system or deleting functionality. Retest and consult system architects.
  • .. is really fixed. Retest and Close!

Please keep in mind that the issue is fixed, when it's fixed on production and development of further system versions include that fix. What good is a fix if end users never see it, or next system version does not include the fix (i.e. reintroduces the bug)?

How do we get rid of bugs then?

  1. We retest the fix on the test/staging environment and Close the issue in bugtracking.
  2. We make sure that proper repository branches include our fix. It might not if:
    1. Client requests that only selected fixes from branch deployed to test are to be deployed on production.
    2. The fix is commited on a branch that is never merged with branch that the next system version is made of.
  3. We periodicaly retest important fixes to make sure no regression occurred
  4. We need to understand the cause of the error and expand test cases based on that knowledge - talk to programmers, and demand analysis/cause description in issue comments. The "done" comment is not good enough.

Periodic retesting?

With 10 bugs every month, we have a whooping 120 test cases for each year of project. That's quite a heap of things to retest, right?

To manage such a pile of imporant knowledge, we should document test cases that need to be done regularly.
Some of them should enter our automatic regression testing scenarios.

Automation of the regression testing should be based on analysys of causes of errors in the system. For example if search engine has trouble when filtering is employed - we should add such a tests and run it often.

If we don't do this - nobody will

Because only we know in details what was broken - only we are able to make sure it's fixed.
No programmer, chief programmer, project manager or developer is able to take up the task of getting rid of detailed system problems.