The Siren Call of Automated Browser Testing

Perhaps you’ve heard of the Greek’s mythical Sirens. These were the seductive, deadly creatures who lured passing ships to certain doom upon the rocky coasts with such irresistible voice and song. They were something so appealing; yet to pursue them was fatal. Whenever I hear mention about automated browser tests (ABTs), I can’t help but think of these creatures. The reasons for using ABT are so appealing and seem so obvious, but it is extremely easy to get burned.

Don’t get me wrong, I am all for testing and automation—both are indispensable to making good software. The issue is that ABTs are rarely done well. The cons are never realized until after they are discovered the hard way.

Music to Management’s Ears

The reasons to use ABTs are plentiful and to management they seem like no-brainers.

In many automated browser testing platforms, tests can be written once and then run against any number of browsers. The team can finish developing the code and writing ABTs and then move on to the next effort. All along the ABTs can run daily without supervision, detecting when the latest Firefox/IE/Safari update breaks your site’s features. Then you can sleep soundly at night without having to spend a large amount of QA resources running mind-numbingly repetitive, manual regression tests each time a new browser update is released—which in the case of Firefox is every six weeks!

Be careful—the first time management learns about the concept of ABTs, they’ll think it’s the greatest thing since sliced bread. They will want to introduce ABTs into the process as soon as possible, which for new efforts means the moment the UI is started. Managers will remember ruefully how previous products have been plagued by browser changes and hard to reproduce JavaScript bugs that only occur on this one customer’s machine. “Not this time!” they’ll think. Unfortunately, these advantages that I have just outlined are half-truths at best, and oftentimes downright wishful thinking.

Good at Detecting Change, Bad at Dealing With It

Automated browser tests are very good at detecting changes in your UI. In fact, they’re so good that sometimes they’ll even throw a fit when practically nothing has changed at all (more on this in a bit). What ABTs aren’t good at is dealing with changes to your UI.

The problem lies in the fact that ABTs are highly coupled with the UI’s HTML/DOM. ABTs can range from fairly brittle to extremely brittle. The less brittle ABTs typically use some sort of DOM traversal like XPath or CSS-style selectors which can tolerate some UI changes like colors and minor layout changes as long as the changes are via CSS only and don’t dramatically affect the DOM. The more brittle tests are the recorded-point-and-click driven ones where even the slightest changes to the UI are liable to render large portions of tests obsolete since everything is spatially dependent. How resilient a test is to change is also largely driven by the skill and foresight of the person writing the test.

If the test author has the foresight, he can try and write the tests in the most generic and reusable manner possible. This of course, is if they’re using more of the scripting-based writing versus recorded-point-and-click approach. This requires a skill set more similar to a developer, which only some QA people possess. All too often, less developer-esque QA engineers go off and spend weeks or months writing tests that become nearly worthless after a few major UI changes because they were not written with change in mind and have to be completely redone. It is important to stress flexibility and re-usability early on with the test authors.

Since ABTs don’t cope well with change, they don’t mix well with Agile. Software development is by its nature full of change. From constantly evolving requirements to capricious product owners and designers, it’s a pretty safe bet that the way your UI looks today may be quite different a few sprints from now. Introducing automated browser tests early on in the development cycle doesn’t mean that test writing will be finished earlier, in lockstep with development. Rather, QA will just be wasting time writing automated tests that will have to be completely rewritten until the UI has settled down enough. This point is rarely realized by management and can be a point of frustration when progress is slowed by the constant fire-fighting from broken tests.

Clearly there are issues when UI changes are made. But remember how I told you that ABTs can throw a fit when seemingly nothing has changed? This is particularly the case in single-page JavaScript applications. Specifically the ones written in frameworks that have an unholy mix of logic and DOM (think AngularJs, et al.). Having behavior coupled with the DOM means that subtly changing the behavior or appearance of something via your framework may unexpectedly add, remove, or modify the DOM surrounding the component in question. Added an extra Angular directive to your page? Guess what? Now everything has an extra <div> tag around it and all your tests’ selectors have to be updated. It gets worse. Sometimes changing the browser type or even version can mean that these JavaScript frameworks output a different DOM. So the idea of writing tests using Firefox and then expecting them to run against IE11 may not always pan out. Soon, your tests will have special conditions for the different browser types scattered everywhere.

Man Versus Machine

Another problem with automated browser tests is that they’re not as valuable as you think. Yes, you may have a test for every single feature in your web application, but your app isn’t used by robots, it’s used by people. The funny thing about people is, that they can be rather unpredicatabe. They don’t steadily move the mouse in straight lines, they can type very slowly or very quickly. They can get impatient and click the back button if something takes more than three seconds to load. They can hit submit five times instead of one. To be able to represent all of these behaviors through automated test cases is impossible. There just isn’t enough time in the world to write that many tests. But a living, breathing tester could do all of those things effortlessly in no time at all.

The fallacy is that decision makers think they can leave everything to ABTs to tell them when the software is broken. Doing this will lead to embarrassment when time and time again subtle bugs are being discovered by irate users when all the while the automated tests are running cleanly. With today’s modern web-apps that are single-page, event-driven, and choc full ‘o JavaScript, there can be countless subtle bugs that only show themselves when things are clicked within a certain time window. Automated tests either click everything as fast as possible, or at some very controlled time interval. Even if you introduce some sort of fuzzy timing and randomness into your automated tests, you would have to run them infinitely many times to cover all of the scenarios that occur when real people are using the system.

The bottom line is that ABTs are good for a general smoke test, but they should never be the sole pillar upon which you’re gauging your software quality. It is still a good idea to have ABTs and to run them regularly, but nothing will replace due diligence and some manual tests done by a good ole fashion human.

Ugly Surprises

Even if you manage to be proactive in writing automated browser tests in a flexible, reusable manner, you’re still likely to run in to a few unpleasant surprises along the way. You’ll be surprised to find just how long ABTs take to run. To run all of the tests can sometimes take hours for more complicated applications. And then multiply that amount of time by the number of browsers you need to support. So it is not inconceivable that in order to have your test results every morning at 8AM, you may have to kick off your tests around midnight.

The time it takes to run the tests is often exacerbated by a lot of explicit delay statements in the test cases. Particularly for single-page/JavaScript-heavy applications, it is difficult to write tests that wait for a certain page element to reach the desired state, so test writers often write something along the lines of “Click Button X, wait 2 seconds, verify element Y is red.” Although a change may typically take less than a second in practice, when running the application through a testing platform, everything in the browser can be slowed down significantly. This is likely due to the way the testing platform is hooking in to the browser in order to control it—this feature usually comes at the price of performance.

Another unpleasant surprise is the amount of hardware you may need. Most testing platforms only allow you to run one browser at a time, so in order to speed up the lengthy test time you’ll have to run multiple machines at once. Also, the browser test can be unexpectedly hard on the CPU/Memory, likely due to inefficient testing platform code that controls the browser. Another performance issue can be due to heavily exercising your web-app in the same browser window for extended periods of time. Once again single-page JavaScript web-apps built in the popular flavor-of-the-week framework can occasionally be leak-y and cause browsers to eat up memory. So it is wise to tell the testing platform to close and reopen the browser after so many test cases.

By the way, if you are running your test suite on multiple machines, you probably need a license for each machine depending on what testing platform you’ve purchased.

The last wart you’ll find is that the number of browsers supported by ABTs are not as numerous as it seemed. Different test platforms may claim to support testing on things like Safari, but in reality they support testing on Safari for Windows, which by no stretch is the same thing as testing Safari on a Mac! Most test suites are limited to Windows only, although this is likely to become less common in the near future. Typically, these testing platforms use DLLs or some sort of plug-ins to be able to manipulate the different browser types during test runs. This means that when a new browser version is released, you may have to wait for the testing platform vendor to release the necessary DDL/plug-in in order to start running automated tests on the new browser. Depending on your vendor, this may be a significant delay.

It’s Not All Bad

I’ve beaten up quite a bit on automated browser testing, but it’s not all bad. If tests are written in a more developer-esque manner—by that I mean tests are written to be more flexible and reusable—and if automated browser testing isn’t introduced until the UI has solidified quite a bit, then ABTs can be useful in identifying changes to your application.

Since they are automatic, they can and should be run ad nauseam. To get around potentially long test run times and to reduce the amount of tests that need to be updated when the UI changes, ABTs are best used in proper moderation. Rather than testing every single minutia with ABTs, it might be wiser to only cover the core functionality of the application. This way, you can quickly run and re-run tests and have a lower cost of ownership in maintaining your set of browser tests when they need to be updated to reflect changes or new features in your application.

ABTs can have a positive impact when they are taken with a grain of salt. They are not the end-all-be-all. They cannot and should not be the only means for detecting bugs or changes in your software. They can be a useful tool used in conjunction with manual testing. In fact, they should alleviate manual testing, not replace it. The most repetitive and time-consuming manual steps should be the ones that get automated.

Keep these thoughts in mind next time you’re asked whether or not your team should start using automated browser tests. Or perhaps you’ve already integrated automated testing and are wondering how you’ve arrived at your current state. Hopefully you can glean some insight from this advice and improve your lot. Whenever someone mentions automated browser testing, remember the Sirens.

6 thoughts on “The Siren Call of Automated Browser Testing

  1. Pingback: 2 – The Siren Call of Automated Browser Testing

  2. Given ABT isn’t a magic bullet and requires good testers to write good tests. From a company standpoint, it’s a resource you have to hire.

    On a small team, what’s the best approach to quality control, considering any combination of these options along with any others that may be missed?
    – Hire a non-development QA tester and have them manually test the site via the UI
    – Hire a developer-experienced QA tester who is actually writing good ABT hitting the UI
    – Developers held accountable to write automated integration tests but not the UI
    – No dedicated effort to UI testing. Handle big issues as they pop up

    Bear in mind, option 2 (development-minded QA tester) may sound like an obvious choice, but it’s extremely rare (I’ve never seen a suitable one). Factor in the reality of being able to realistically build a team with these parts and any others.

    Like

  3. Good read and reference for people considering ABTs!

    Everything above aligns perfectly with my experience but I was a bit struck and made curious by this statement:

    “Different test platforms may claim to support testing on things like Safari, but in reality they support testing on Safari for Windows, which by no stretch is the same thing as testing Safari on a Mac!”

    Since Safari was discontinued on Windows in 2012 is this just old info but still drives at the point that ABTs are wrought with challenges or is there a different point?

    FWIW, in my experience trying to use selenium to test a complex app, (which we expended great effort on before eventually abandoning) Safari on Mac was definitely a pain point for us. One specific issue that caused a lot of frustration and caused us to descope what parts of the app could be tested is that the API that Safari exposes doesn’t let you do any work inside an iframe (maybe this was true in IE also but I forget now). At the time we embedded a fundamental part of our app in an iframe (luckily we don’t anymore) and we still load certain modules in iframe making us have to choose between not testing that part of the app at all or having a way to load the iframed part of the app outside an iframe for the purpose of testing and giving up on some of value of the typically full-stack approach to ABTs. Safari’s API also has other warts, see https://code.google.com/p/selenium/issues/detail?id=7937#c13. [And I think that should have been closed as wontfix, not fixed but maybe I’m wrong. Either way I stopped caring for these and most of the reasons you mention above.]

    Again great piece, thanks!

    Like

  4. I’d say ABT is working really well for us. We are using Behat for all our ABTs and I have to say it’s a pleasure to use when it does what I expect. There’s a bit of a learning curve and ultimately you will need to learn how the full stack works to debug tests (Mink, etc). Sometimes tests pass for one browser and not another and its for reasons such as micro time delays or how it handles clicks – e.g. Chrome complains if you have a clickable element inside another clickable element, Firefox does not.

    Like

  5. Pingback: Java Testing Weekly 5 / 2016

Leave a comment