Share this page with a friend.

Best Practices of Automated Visual Testing

Getting it Right for the Best UX

Addie: Hi, everyone. Thanks for joining us for today’s event, Best Practices of Automated Visual Testing: Getting it Right for the Best UX. My name is Addie and I’m the event’s moderator. Today’s speakers are Manish Mathuria, CEO of Infostretch, and Adam Carmi cofounder and CTO of Applitools. Before we kick off, a few important things to keep in mind. Feel free to ask questions during the webinar. You can do that in the GoToWebinar control panel on your right. We will try to answer as many questions as possible at the Q&A session at the end. And any unanswered questions will be answered via email later, tomorrow, or next week.

This webinar is recorded. A link to the recording and the slide deck will be emailed to you tomorrow. For optimal viewing experience, I recommend you close off any unnecessary apps running in the background. And if you happen to encounter any technical issues, please send me a private message in the chat room, also in the control panel to your right, and I will try to take care of it as soon as possible. So now I think we can kick off. Manish, the stage is yours.

Manish: Thank you, Addie. Let me get started here. So this is what broadly we’re going to go through today. We will introduce visual testing to you. We’ll also go through some core requirements that testers want on a visual testing tool. Furthermore, we will talk about what are the apps that lend themselves very well to the needs of visual testing. The business criticality of visual testing is prominent for certain acts and we’ll talk about that.

From that point, Adam will take over and he’ll talk about overall visual testing process and some critical best practices. That is what fundamentally today’s webinar is about. We’ll describe some of the web best practices and we will intersperse the demo of these best practices all through the presentation. So we hope we have put together a good presentation for you today. Let me get started.

I hope all of you have a good understanding of this testing parameter that we are showing here towards the left. The main message here I that the testing should be highly focused around unit tests, component tests, API tests and integration tests. What that lends itself to is higher control and higher manageability towards changes in the applications, and it allows you to shift left the quality earlier in the life cycle of testing.

Then, on the top, we have this automated GUI tests and manual tests or visual tests. The point here is that as we really work with a lot of customers, what we see in reality is that in practice, this pyramid ends up being more like a cylinder. What that means is that still in the industry, in the marketplace, a heavy emphasis towards doing automated GUI testing. When you talk about automated GUI testing, and I’ll try to introduce visual testing aligned with automated GUI testing, because it is a critical component of automated GUI testing. There are a few properties of this.

One that it happens really late in the life cycle. As you build an app, what happens is you build GUI towards the end of the app. And when it is available to be tested or automated, it is pretty late in the game. It is also a very manually intensive process. In order to exercise a test through the GUI and automated way, you have to pretty much teach the software to click on specific widgets on the screen and validate what you are seeing on the screen. It is also indispensable in the sense that automated GUI testing is what your users see. So you can’t not do it. And therefore, the cylinder phenomenon is there where most companies still put a pretty heavy emphasis on automate GUI testing.

And finally, the most important point that is actually relevant to today’s conversation is automated GUI tests, in most cases, are not very good at capturing visual deformities. They are very good at testing functional deformity. So if you are testing an app and it is a calculated then it will very well be able to tell if one plus one is two, but it will not be able to tell if the enter button or equals button is not aligned correctly.

Let’s go and understand some of these concepts a little better with a few examples. Here, the J. Crew website and their mobile app, and you can see that there are visual deformities on this page where the images are supposed to be, they are not. So this is a responsive web and a responsive mobile app. And in both cases, you can see there are visual deformities. These are actual examples where our GUI functional tests did not catch these deformities.

And needless to say, this particular brand got a black eye from releasing an app with these visual deformities because this is the first thing your user sees. And in this particular case, they actually could not complete the transaction. So this visual deformity came in the way of functional flow.

Here is another example, from Amazon. In this particular case, I’m sure a lot of us have seen such deformities where the CSS does not load. And as a result, your page cannot show what it is intended to show. Again, there is nothing wrong with the page. It is just that in that particular scenario, CSS did not load and the page did not display it correctly. So hopefully this gives you an example of where such visual deformities can hit you, and it obviously comes in the way of workflows of how you use the app, as well as it comes in the way of the reputation of your brand.

So moving on, let’s understand what do testers want from the perspective of how to go about testing for these visual deformities. As I explained, the process of GUI testing or GUI automation is already a very manually, labor intensive process. And the last thing you want is incorporation of yet another tool, and yet another methodology through which you are investing. So what you want is incrementally more intensive or least intrusive way of somehow interspersing your visual test within your GUI test. Something that magically came in and actually validated your GUI test also visually. That is what you want.

When we talk about visual testing, there can be, with a lot of open source tools and a lot of home-grown approaches, you will find there are a lot of false positives which means that the test is actually telling you it failed. But when you visually inspect the test or you run it manually, you find that that failure is acceptable. So you don’t want such false positives because it slows down the process of testing.

You also want very effective error reporting to the extent that it is actually quite integrated and engrained in your normal test reporting so that you don’t have to go in two places. And it gives you drill down capability and decision making capabilities around what the failure is and quickly points it out. And furthermore, there are certain technical capabilities that one would want. Such as capability to ignore dynamic content. You don’t want your tests coming back and telling you that because today’s data is different than when the test was written it has failed.

So there’s a lot of this dynamic content that needs to be captured. You want good integrate ability of APIs into your test automation framework. You want the capability to validate the layout, in addition to validate the unforeseen visual deformities. And whatever test you do, obviously with respect to the need for responsive design, has to be independent of screen resolution. A lot of times what we find is that we wrote the test for the original display on the Mac, and the test breaks when I test it for the normal – in a normal virtual machine because the test was too dependent on the visual resolution.

Fundamentally, the message here is that the tool that we choose has to seamlessly enter into a process of automated GUI testing and be least intrusive to that process.

Moving on, there are apps that even though visual testing applies to pretty much anything that you are looking to test, however transactional or however visually appealing it is, when we talk about actually testing apps for the enterprise, there are apps that are more prominent that should be tested visually. And there are apps that have a less impact because of visual deformities. So some of these circles on the left represent classic cases where these kinds of apps make it very high potential candidates for visual testing.

Anything you’re doing, you’re doing around responsive design naturally means that you need to test it for various form factors, various devices. And you need to take into account a lot of device fragmentation that is happening with respect to android and your other smartphone platforms. So it is fundamental, and it is easy to understand, that these kind of apps become very natural candidates for doing visual testing.

Furthermore, anything that is content rich, for example your corporate websites or software that passes through some kind of content managing system such as Adobe Experience Manager, or Sitecore, those implementations in itself, as in when you’re configuring your Adobe Experience Manager, that itself, that software can be tested with the visual testing so that you know that the templates and the protocols that you’re actually implementing in your content management system is validated in its place.

Furthermore, anything that is highly consumer driven, such as retail websites, travel websites, pretty much anything that your consumers touch is, again, very much a ripe candidate for doing visual testing because it will directly impact your brand.

Alright? So I hope this was a good introduction to what visual testing is. Let me transfer control now to Adam, who will take a deeper dive into what is the process of doing visual testing and take you through some demos. Adam?

Adam: Thank you, Manish. Let me share my screen. So thank you very much, again, for the introduction. So for those of you that are listening that are not familiar with visual testing, I’ll start by describing the overall workflow of visual test automation tool. The workflow is very simple. It consists of four steps. In the first step, you drive the application and the test and take screen shots. In the second, the tool takes those screenshots and compares them with baseline images.

These baseline images define the expected appearance of the application. And in the majority of cases, these are simply screenshots that were taken in previous test runs and were proved by a tester. In the third step, the tool takes the results of these image comparisons and generates a report that includes all the screenshots, the baseline images, and any differences that were found.

And in the fourth step, a tester has to look at the reports and decide for each difference, if there were any, whether it’s a bug, in which case he opens a bug. Or if it is a valid change to the application, he approves the new screenshot to be used as a baseline for subsequent test runs. Now of course, the first time you run a test, you still don’t have a baseline image or baseline images for it. So the first run sets the baseline images. And starting from the second run onwards, you always have baseline images to compare against and screenshots to approve.

The way that we are going to continue this talk is that we are going to drill down into each of these steps and discuss them more thoroughly. I will also address some of the issues that Manish raised earlier in the talk. And we’ll show various demos of how Applitools Eyes implements those steps, and we’ll share best practices on how to best accomplish them.

And we’ll start with comparing screenshots with baseline images. The reason I’m starting with this step is because basically, it’s the most important step to get right. So if you’re using a visual test information tool that produces false positives for application, you will never be able to scale up your test. You will always have a lot of maintenance on your hands. So this is the most important tip I can give you.

Now, there are many reasons for these false positives to occur. By the way, a false positive, as Manish mentioned in the beginning, with respect to visual testing is a case where the tool tells you that there is a difference, but it’s wrong. It actually could be a difference you cannot see, or it is too negligible for you to care about. And of course, it is very annoying to bother with false positives like that.

So there are many reasons for these false positives to happen. One of the most, let’s say primary reason for that is when you are rendering the same page of your application on different computers, different computers have f setups, different graphics cards, different device drivers, different operating systems or some other settings. And so the actual page, alth9ough it is the same, will get rendered with different pixels by the rendering gadget.

So here’s just one example to show you what I’m talking about. We have here on the navigation bar at the bottom, and we have the playlist tab that you can see magnified here. And this is how it is rendered on one machine. But here you can see how it is rendered on another. And you can see, if I hover between the two, how the pixels can be completely different when we look at them up close. But here at the bottom, you can see that they look exactly the same. So the reason these pixels are different – by the way, this is due to an image processing effect called anti aliasing.

The reason these pixels are different is just because there is a different computer with a different implementation of the anti aliasing algorithm that produces different pixels. Still to the human eye, this is completely unseen. But if you just take these two images and compare them pixel by pixel, then many pixels are different and the comparison will show you that there is a lot of difference although you cannot see it. So you need to pick a tool that will be smart enough to detect these issues and understand that a human being cannot see them, and just ignore them and not fail your test for these reasons.

Now, this brings us to Applitools. And basically, we’ve been developing these image processing algorithms for years, now. And they can handle all these types of causes for false positives very well. If it’s anti aliasing, pixel upsets, color similarities of different size in image scaling; all of these are handled perfectly. Now, the most important thing about these algorithms that distinguishes them is that first of all, they don’t rely on any error ratio or any configuration of thresholds that needs to be specified. So the only thing that matters is if a human being can see the differences, and if he would consider them as differences or just minor artifacts. And this happens out of the box.

For example, if you have a huge web page and a comma changed to a period, this could just be three pixels that are different on a huge page. Still, this is something that is important and visible to a human being. And because of the tool, we’ll highlight it as a difference. On the other hand, if o the same web page a table – column – would get to be one pixel wider. And that would shift the entire image one pixel to the side. It would cause a 70 percent pixel-wide difference. Still, this is something that a human cannot see and doesn’t care about. And because of that, the tool will to highlight that as a difference although the majority of pixels between the images are different.

The other important capability of the algorithms is the ability to analyze the structure of the page and basically compare the layout of the pages that appear in the different screenshots. And this is very useful, and this is our first demo. I’ll show you how that works.

So in this first example, you can see the PAYCHEX homepage. On the left hand side, you can see a baseline image. On the right hand side, you can see the screenshot that we’re validating. The baseline was taken on a Chrome browser and the screenshot that we’re validating was taken on IE. Now, if I toggle between these two images, you can see how different the two browsers render the same page. You can see that the font is slightly of different size. The position of element is slightly different. Here in this paragraph, the text wraps differently because of the differences in the font size, etc.

But again in terms of structure, these pages are consistent. And because of that layout matching, actually doesn’t consider all these things to be different. If we click the rater button over here, you can see that it does highlight a change here at the bottom of the page. And if we zoom into it, you can see that we do have a missing element on IE. So layout matching is a very powerful way to do across device and across browser testing, and still get a lot of coverage out of it.

So let’s take a look at another example, this time with a mobile app. We’ll look at Twitter. As you can see, we have a baseline with Samsung S4 and a current image with Samsung S5. If we click the rater button, you can see that we have a few differences here. First of all, you can see that the baseline image dictates that the first switch should be aligned to the right of the image. And this is being violated over here.

Second, you can see that the lest weight is expected to have an image but this is missing over here. And again, the rater button will correctly highlight those two differences. But if I toggle between the two images, the baseline and the current image, and we look at the two tweets in the middle, although they have different texts and images, still they are structurally equivalent and because of that, the layout algorithm does not highlight them as different.

So this brings us to the second important use of layout matching, which is the ability to validate dynamic applications and even monitor applications in production. And the last example that I want to show you on this subject is the Yahoo website, with the baseline and the current image that were taken in a 24-hour difference. And you can see that although the images are different and the articles are different – it’s a very dynamic website – still structurally, these pages are consistent. And because of that the test is passed.

Now, if I change the matching algorithm to do a more strict comparison, rather than layout, as you would expect all the dynamic parts are now highlighted as different, and all the static ones aren’t. But still, this is a very – the strict algorithm is a very advanced image matching algorithm. If I would change this to exact pixel matching, you can see – a pixel to pixel matching you can see that actually everything is different on these pages between these images. And it’s just that the strict algorithm was smart enough to ignore all those things that are invisible to the human eye.

Okay so now that we know how we can compare screenshots with baseline images in a very robust and stable way that would allow us to scale our tests and reduce maintenance, let’s talk about the first step of the workflow, which is driving the application on the test and taking screenshots. So with Applitools Eyes you can accomplish that by using various SDKs that we provide. We provide almost 30 different SDKs that allow you to plug in visual validation to just about any test automation framework that is out there, both for mobile and for web. We support of course all the different language bindings for Selenium and Appium. We support coded UI and UFD and Lin FD as well as Espresso and XCUI and others.

Now, let’s take a look at how we can build a fully automated visual test for a mobile website – a website on a mobile device. Well use Github.com. We’ll run it on a mobile device. We’ll use appium to to control the browser, and Applitools Eyes to perform the visual testing.

So what we’re going to do in our in demo is that we’re going to launch the Chrome browser on our device that is emulated on a genemotion emulator. We’re going to navigate to github.com. Then we’ll validate the entire page. We’ll then click the navigation button over here, and then click the open source nav icon, which will lead us to the open source page of github.com, and we’ll validate the entire page, as well.

The example I’m going to show you is in C# but it should be very easy with anyone who’s familiar with Java to follow on so I’m not worried about that. Okay so let’s start with just seeing the appium code that is needed, the selenium code that is needed to just simulate the user interactions with the github.com site. We don’t do any validations at this point.

So first of all, we define the desired capabilities that will allow us to control the Chrome browser. We specify android as the platform name. We point to the device that we want to test. Then, we’ll specify that we’re working with the Chrome browser on that device. Next, we create the driver as a remote web driver and point it to the selenium server that is running on my machine. In our test, we’ll start by navigating to github.com. Then we’ll find the navigation button, using its class name. And once we find it, we’ll just click it and this will open up the navigation menu.

Then we’ll locate the navigation item, the open source navigation item again by its class name. And once we find that one, we’ll click it and this will lead us to the open source space; very easy and simple. Now let’s add visual validation with Applitools Eyes. So basically, we’ll be using the Applitools Eyes SDK for Selenium for C# language bindings. Similar SDKs exist for all the other language bindings. And we start by creating an instance of the eyes object. We proceed to set the API key that identifies us as a customer. In this case, I’m getting the API key from an environment variable called Demo API Key.

Next I’m requesting the SDK to force full page screenshot. This means that for instance on Chrome, when I’m validating the page, I want to make sure that I get the entire web page and not just the part of the page that is available through the mobile device’s viewpoint. So the SDK will actually scroll up – scroll down the page and take multiple screenshots and stitch them together and simulate a full-page screenshot for us.

Lastly, in order for my report to look nice, I’m specifying that the environment where I’m running my test right now is actually a Google Nexus for device. Now let’s see how we add those validations. At the beginning of the test, I call Eyes Open, I pass in the web driver, I specify a name for the application, and a name for the test. This is how they will appear in the eventual report.

After I visit github.com – I call this single line Eyes Check window – which basically captures a full-page screenshot of the current page which is the homepage, github.com, and perform visual validation of that entire image, meaning that we get with a single line of code 100 percent coverage for that entire page. So basically this one line of code saves us from writing hundreds of equivalent lines of validation code, and of course maintaining them as the application changes.

Next we click on the menu button and when it’s open, we’ll do another check window. The text of the strings that you see here are just descriptive strings that will appear in the checkpoint in the report that would make it easier for someone reading the report to understand where we are in the test. You can specify any string you want, here, or not specify it at all.

Next, after we click the open source navigation and we move to the open source page, we validate it, as well. So as you can see, with just a few lines of code we actually covered entirely three UI states of our application, and get 100 percent coverage for them. So this saves a lot, a lot, a lot of effort. It is the fraction of the effort to build such a testing code and of course to maintain it moving forward.

So let’s run this test and see how it goes. Okay. So you can see the Chrome browser starting, navigating to github.com. Now you can see the SDK scrolling the page down, taking screenshot as it goes and building that full-page screenshot for us. Clicking the navigation button, scrolling down and capturing the entire screen again with that navigation button open. Clicking the open source page – navigation item moving to the open source page, and capturing the full-page screenshot.

And as you can see, our test has passed. If we move into the Applitools test manager, you can see that we have a successful run of the mobile GitHub test. We can see the meta data of the test; we can see that it passed. We can see the thumbprints of the different checkpoints that we had, the three of them. And we can also drill down and see the test up close. We can see that we have this timeline at the bottom that shows us different verification points.

We can actually play back the test and see everything that happened with the web driver; the click on the menu, the click on the open source button, moving between those visual tests as we validated. And if we look at each validation point, we can see the side-by-side representation just like we saw in the previous demo where we have the baseline image on a Google Nexus form and the current image, again on a Google Nexus form. The images are matching but if anything was different, anything of the multiple details that we have here, it would be captured. So we basically have this 100 percent coverage for that page.

Now, what we are going to do next is we are going to switch the device. We are going to simulate a situation where we just got – let’s say that we already have a baseline for Nexus 4 and it’s running as part of our CI. Now we’ve got a new device and we ant to validate it, as well. But we don’t have a baseline for it yet. So I start the Nexus 5 device. And while it’s starting, let’s continue to talk. What we want to do is we want to run the test on the new device, or Nexus 5 device.

But we want to compare it with the Nexus 4 baseline. So we don’t start from scratch. We want to at least make sure that in terms of layout, they are consistent – the application is consistent in terms of layout between these two devices although they have a different form factor. So let’s take a look at the test and see how we accomplish that.

It only requires a few, simple changes. First of all, we indicate that the new environment that we’re running on is Google Nexus 5. We indicate to the SDK that it shouldn’t use a new baseline for Google Nexus 5 but it should actually test against the baseline of Google Nexus 4. The third thing that we need to do is specify the match level as layout. Because if we would use a strict matching, of course everything would be different because the different devices would render the page in the different size and it won’t match. But with layout, we can do this across browser, across device test.

Now, the other thing that we’ll do before running it is that we will inject a visual bug into this example. So I will run on a Nexus 5 device, I will compare it with the Nexus 4 baseline, but I will modify the website just before taking the last screen capture, before the last validation points on the open source page where I will move the GitHub logo to the right. Okay, we’re getting the GitHub logo over here and adding a margin to its left. So it will move, and we want this to be captured as a layout bug. So let’s see that – okay, we need some time to let the device start. So let’s continue with the presentation and once it finishes, we can run it again.

Okay so a few tips about how to construct your tests. The first step is to always prefer full-page validation where you can. The reason is that on one hand, you get more coverage because you’re testing the entire page. It means that you can catch unexpected bugs. If you only focus on specific components, you can have issues between those components that you won’t catch; unexpected issues. But if you are matching the entire page, nothing will slip by. Any unexpected issue that will come by, you will capture it. Which of course is extremely important and a great value that you can get out of your automation.

The third thing is that it allows you to actually maintain your test without touching the code. It means that whenever you have a full-page validation, that something there changes, everyone in your team that knows the application can do the maintenance. They can look at the images, they can see the difference and decide if they want to open up a bug or accept a new baseline. You don’t need the guy that knows the test code and knows how to read the logs to go in and find the time, and start analyzing the failure and understand what happens. Everyone in the team can do it.

The second important thing that they can do it immediately. They see the issue. They can just choose to accept or reject it. They don’t need to wait, open a ticket for someone to finish this test and look at it, or maybe he’s on vacation. So the benefit here is huge. And of course, you avoid maintaining all those element locators that are required to pinpoint those specific components that you would otherwise have to test. And these element locators, as you well know, tend to break when your UI changes. So instead of having to – when the UI changes, having to go and fix those element locators in code, you can just approve or open bugs for the current screenshot that was taken.

Let’s see what goes on with our device. Okay, it’s still starting. Let’s give it another second to run. Okay, let’s continue to the next thing. When it comes to implementing those visual tests, there are basically three options that different things choose. And I’ll go over these three options and describe the pros and cons of each option. So we have our Nexus device up and running. I’ll just run the test so we don’t waste any more time. I run this test on our Nexus 5 device, and let’s see what we get.

Let’s see that the Chrome browser is starting. Yep, we’re started. Okay. Let’s let this run in the background and we’ll continue with the slides. So the first option is to write dedicated visual tests. So this is the most recommended way to go because it provides you full control over what you want to validate. You can write those tests to reach every corner of your application. You can decide between full-page screenshots and checking individual components. And of course it allows you to do all the tricks you want by manipulating the application and the test.

So let’s say that you have a certain UI stated. You have a back door to get to as a shortcut. You can just build a test that uses that and gets there and gets the screenshot validated so you can get the most coverage in the most optimized way. Of course the downside of it is that you need to write those tests from scratch. You have to invest some time to build those tests that will cover your application.

Now, it’s not as difficult as writing common, traditional functional tests because you don’t need to write the validation code. It’s just a simulation code, and it’s a fraction of the effort. But still, there is some effort especially if your application is complex and has a lot of UI.

Now, the second option, and many, many things go for that is to add visual checkpoints to your existing functional tests. So you already have tests that walk through your application and test functionality. You can just add those check window instructions within them at the places where you want to validate the page, just like we did in the demo before.

So this means that you don’t have all the flexibility to get to every state of the UI; you’re basically limited to whatever your tests are doing right now. And it means that you might not be able to take shortcuts because the functional test will usually go through the normal functional – the main use cases of how users use your application. But still, it would allow you to leverage your existing investment and information and quickly add visual validation to that.

Another thing that you’ll start seeing is there will be a lot of overlap between the assertions that you have in code and the visual differences that you’ll see in the test report. And this would eventually, for many teams, it leads to stop doing the assertions in code and relying more on the visual stuff, unless of course the assertions are related to stuff that are not visible in the UI.

The third option also works well for many teams but does not work for all situations, is to add implicit visual validations in the test framework itself. So you don’t touch the tests but if you have a test framework where all the clicks and mouse movements and keyboard strokes go through, you can add in place of visual validations as a response to certain triggers.

So for example, you can do it after navigating to a URL, take a checkpoint, or before clicking a button. So if you filled out a form and you’re about to click the submit button, it would validate that form. So the upside of that is that it’s very trivial to implement. Just a few lines of code and all your tests become visual tests at once. There are a few downsides, though. First of all, you can only do full-page validation because you’re doing a generic validation so you cannot address specific components in the framework.

And because of that, you are restricted further validation. So if there are pages that don’t work well with full-page validation because they are very dynamic, or only parts of them you want to validate and you need to exclude them, find a way to exclude them in your framework.

The second downside is that you get duplicate validation points. So if all your functional tests, for instance, goes through a log-in page, then all of them will validate that log-in page. So you’ll have multiple validation points validating the same screen, and that is a downside unless you figure out nice ways to avoid that screen and maybe just write a dedicated test for the log-in screen that is shared among all the others.

And of course if you have certain tests that are heavily parameterized in data, let’s say for example you have a form where you have input and output, and you have a table on the side with 100 pairs of inputs and outputs. And you run through all of these to make sure that the logic works correctly, then you don’t want to have visual tests on top of this test because you’ll have 100 visual validation points validating the same page, which is of course useless.

But all of these three options are valid. They are widely used and you should see what works best for you, and also in the short term versus the long term.

So our test has completed. Let’s go back to the dashboard and see what we got. Okay so we can see that we have a difference here; the test has failed. We have a difference here at the top of the page. If we drill down to it and zoom up close, we can see that indeed we have a baseline image with Nexus 4, a current image taken from the Nexus 5 device, and it’s a layout test but it did detect that the Google logo has moved, just like we hoped it will. But you should also know that the devices have a different form factor.

You can see that actually the width of the device is different and the height is different as a result. And because of that, everything is laid out differently. But still structurally, the pages are consistent and because of that, all these issues… because of that, no differences are highlighted. Now, of course if I would change this to strict matching, then everything would be different because nothing is the way – it’s not the same at all, right?

So this demonstrated how I can bridge, when I have a new device, bridge the baseline. If I now save and accept this baseline, I can start doing strict tests on IOS – on the Nexus 5 device against itself. But I did have a bridge that allowed me to, instead of starting from scratch with a new baseline to compare it against another, and in a similar way you can do across device testing. If you know that a certain device looks right, you can make sure that it looks right at least at the layout level on all other devices, in addition to your regression tests.

Now we’ve come to the last two steps of our workflow, which are viewing the differences and updating baselines. And again, I’ll show you a demo of how this is done with Applitools Eyes. So if we go back to our GitHub example, but let’s say that we had a little more time and it would write more tests. We could have a test suite that would cover many more pages on different layouts of github.com and on different devices and browsers. So in this case, we have a test suite that could be GitHub jobs, for instance.

That had 19 tests; all of them failed. They covered four different environments. Those would be desktop and mobile and form different four factors to capture different layout modes of the application. And all in all, we found 76 mismatches. We can see, though, a thumbprint of these matches pair tests, and we can also just look at them for the entire suite, regardless of the test they originated in.

Now, although it’s quite easy to see that the difference here is at the top at the logo, and we can immediately accept or reject from those thumbprints, and although we can zoom in on the images very quickly and see what happened so in this case the Google logo went away. And let’s say that we wanted to do it so we can approve it. Still, if we have like thousands of tests running on a test run, and we now have like a common change that appears in many of those pages because we are doing full-page comparison, it could be a bit overwhelming to go over all these images approving them one by one.

So what we can do is ask the tool to analyze those differences and only show us the unique differences. So with all these dozens of differences, we only have two unique ones. If we zoom in on the first one, we can see that in this case the Google logo disappeared, which is what we wanted. So we can just approve it and with that, we approve all of these changes across all of these steps. If we look at the second set of similar changes, we can see that in this case the Google logo didn’t go away but actually turned green.

So this is probably a CSS bug. It’s not what we intended. So we can just reject that. And if we save it, we just updated the entire baselines for these 19 separate tests. So with just a few clicks, we can maintain hundreds and thousands of images and differences. So we have customers today that are running thousands of tests every day. And with features like this, there are many others that make it very easy to maintain as [inaudible] [00:49:08] scale. They can scale up their tests to these levels without really increasing the amount of overhead that they have to spend on maintaining those baselines.

Now, with this I conclude my part of the demo. Before we move into the Q&A part, there are a few special offers I want to let yo know about. So Infostretch is offering a free, four-hour assessment of your visual testing strategy. You just go to this link and sign up if you’re interested. And Applitools invites you to join our free Visual Testing Muster Class where you can learn more about the advanced features and in depth capabilities of Applitools Eyes. Just go ahead to this link and sign up for the next master class. So during the Q&A session we’ll leave that slide on so you have the time to look it up and write it down if you need to.

And with this, Addie, I hand it over to you for the Q&A session.

Female Speaker: Great. Okay so we have quite a few questions. We’ll try to get through as many as we can. The first one: is Applitools a standalone tool or does it require additional tools, for example Selenium Web Driver in order to work properly?

Adam: Excellent question. So Applitools is not a standalone tool. We have, as I mentioned, many, many SDKs that are supported so that you can easily integrate visual testing within any test framework that you’re using. I can also say that in about a week or two, we’ll release a version of Selenium Builder from the Selenium project that has visual testing capabilities so you can actually record and playback visual tests very easily without writing code. And this ability is also available in the products of some of our partners that have record tools with record/playback capabilities that can also integrate visual testing into them.

Manish: So Adam, if I may add here, Infostretch also has an open source framework called QMetry Automation framework. It’s on GitHub. And there also Applitools has integrated just via a property setting. So all you have to do is write your test case and turning on a property you can basically start getting all your test cases validated visually, as well. So that is all available for free to anybody, basically.

Addie: Okay great, thank you. Next question. Could we use Applitools Eyes for localization testing, for example a test display that might be in any of five languages and pass those five ignoring irrelevant differences in screen locations, and not to pass any other text?

Adam: Yes, definitely. So there are many usages for visual testing that we see customers doing; localization testing is one of them. Similar ones, just to open up your minds is testing PDF forms, testing accessibility strings, etc. All of this can be done. With respect to a localization testing, there are two things that you can do. First of all, you can do regression testing on each language so that let’s say that you have your site in German and no one in the office knows German. So if there was a typo there, no one would notice.

But if you’re doing regression testing on the German version, if anything would change there it would immediately be picked up automatically and highlighted. You can then take that link and send it to someone who knows German who is a translator. He can tell you if it’s okay or not. So it saves you the effort of first of all isolating where a thing has changed, and easily sharing and having feedback on that. And of course if it’s okay, just a click of a button to accept.

The second thing that you can do is using layout as a bridge to introduce new languages. So let’s say that you have your UI in English and now you are adding a Bulgarian version for it. So you can use layout matching to at least verify that when you change languages, there were no layout defects in the Bulgarian version. Of course, this does not substitute the initial effort of making sure there are no typos and all the proper translations were made.

But still, once you’ve made the effort, you can start running your strict regression tests on those environments. And you can find out automatically if there are any layout issues on that new language that you’re introducing.

Addie: Great. Next question. For the baseline images, do you need baseline for each browser or each different device?

Adam: It’s a good question. By default, when you’re running your test, each unique environment gets its own baseline. So if I’m running a test on Chrome, it will compare with a baseline that is specific for Chrome. If I’m running on IE, it would compare against a baseline in IE. If there is no baseline, a new baseline will be created, etc. Now, as I’ve shown you in the demo, you can actually configure the SDK to not do this default behavior but actually to compare against another baseline. In our case, we used a baseline – we ran a test on Nexus 5 and we specified as a baseline Nexus 4.

But note that we also had to change the match level to do layout comparison because these screens would be very different because they have different form factors and so it doesn’t make sense to test them in a strict way.

Addie: Great. First of all, I wanted to say that there appears to be some problem with the link to the Visual Testing Master Class so I will print that in the chat room in a second. I will send you an updated link. And in the meanwhile, the next question for you, Adam, is how does the capture work when page elements that are not directly loaded and need some time before they are shown on the screen.

Adam: Excellent. So first of all, the Applitools Eyes SDKs have sophisticated mechanism that waits for the page to load. So when you do a checkpoint, it doesn’t just get the screenshot and send it and moves along with the test. Because the Eye server has the baseline in it, it knows how the page is expected to look like. There is a global timeout that is defined that during that timeout, the SDK will take multiple screenshots and try them until a match is found or until the timeout is exceeded. You can also specify specific timeouts for each check window.

So if you know there is a problematic page and you don’t want to bother with stabilizing the test, you wait for some element to appear or whatever, you can just increase the timeout and the SDK will actually poll again and again, trying out the screenshot until a match will be found or until the timeout will exceed, in which case there will be a failure. The nice thing about it is that if the page stabilizes after a second, then you only wait a second. The downside of it is that if there is a difference, you will have to wait for the full timeout until the difference is actually determined and the test moves on.

So the bottom line is you can rely on the Eyes SDK to stabilize. But like any other testing effort, the best results you’ll get is if you actually, as someone who knows you’re specific app and knows the specifics of your page, and knows what is the exact way to figure out that the page completed to load, you can improve the performance of your test by actually waiting for the page to load in your Selenium, for instance, code before you go to the Applitools Eyes checkpoint. But then again, it’s fine to work with the timeout. It saves a lot of efforts and headaches and it’s many, many things use that.

Addie: Okay. And since we are out of time, one last question. How forceful page screenshot works with pages that are endless, for example news streams.

Adam: Okay. Basically, there are two ways – okay, I don’t want to get too technical about it. But what I can say is that we know how to scroll the page in a way that doesn’t trigger more data to be loaded. So whatever is loaded at that specific moment, the stuff that is under default will get captured. But it won’t trigger any other events that would otherwise occur if a user would actually scroll the page down. So it works fine.

Addie: Okay, great. Unfortunately, this is all the time we have for today. I want to thank Manish and Adam for this very in-depth presentation. And I would like to thank everyone that joined us today, and I hope to see you at our next event.

Adam: Thank you, everyone.

Getting it Right for the Best UX

Addie: Hi, everyone. Thanks for joining us for today’s event, Best Practices of Automated Visual Testing: Getting it Right for the Best UX. My name is Addie and I’m the event’s moderator. Today’s speakers are Manish Mathuria, CEO of Infostretch, and Adam Carmi cofounder and CTO of Applitools. Before we kick off, a few important things to keep in mind. Feel free to ask questions during the webinar. You can do that in the GoToWebinar control panel on your right. We will try to answer as many questions as possible at the Q&A session at the end. And any unanswered questions will be answered via email later, tomorrow, or next week.

This webinar is recorded. A link to the recording and the slide deck will be emailed to you tomorrow. For optimal viewing experience, I recommend you close off any unnecessary apps running in the background. And if you happen to encounter any technical issues, please send me a private message in the chat room, also in the control panel to your right, and I will try to take care of it as soon as possible. So now I think we can kick off. Manish, the stage is yours.

Manish: Thank you, Addie. Let me get started here. So this is what broadly we’re going to go through today. We will introduce visual testing to you. We’ll also go through some core requirements that testers want on a visual testing tool. Furthermore, we will talk about what are the apps that lend themselves very well to the needs of visual testing. The business criticality of visual testing is prominent for certain acts and we’ll talk about that.

From that point, Adam will take over and he’ll talk about overall visual testing process and some critical best practices. That is what fundamentally today’s webinar is about. We’ll describe some of the web best practices and we will intersperse the demo of these best practices all through the presentation. So we hope we have put together a good presentation for you today. Let me get started.

I hope all of you have a good understanding of this testing parameter that we are showing here towards the left. The main message here I that the testing should be highly focused around unit tests, component tests, API tests and integration tests. What that lends itself to is higher control and higher manageability towards changes in the applications, and it allows you to shift left the quality earlier in the life cycle of testing.

Then, on the top, we have this automated GUI tests and manual tests or visual tests. The point here is that as we really work with a lot of customers, what we see in reality is that in practice, this pyramid ends up being more like a cylinder. What that means is that still in the industry, in the marketplace, a heavy emphasis towards doing automated GUI testing. When you talk about automated GUI testing, and I’ll try to introduce visual testing aligned with automated GUI testing, because it is a critical component of automated GUI testing. There are a few properties of this.

One that it happens really late in the life cycle. As you build an app, what happens is you build GUI towards the end of the app. And when it is available to be tested or automated, it is pretty late in the game. It is also a very manually intensive process. In order to exercise a test through the GUI and automated way, you have to pretty much teach the software to click on specific widgets on the screen and validate what you are seeing on the screen. It is also indispensable in the sense that automated GUI testing is what your users see. So you can’t not do it. And therefore, the cylinder phenomenon is there where most companies still put a pretty heavy emphasis on automate GUI testing.

And finally, the most important point that is actually relevant to today’s conversation is automated GUI tests, in most cases, are not very good at capturing visual deformities. They are very good at testing functional deformity. So if you are testing an app and it is a calculated then it will very well be able to tell if one plus one is two, but it will not be able to tell if the enter button or equals button is not aligned correctly.

Let’s go and understand some of these concepts a little better with a few examples. Here, the J. Crew website and their mobile app, and you can see that there are visual deformities on this page where the images are supposed to be, they are not. So this is a responsive web and a responsive mobile app. And in both cases, you can see there are visual deformities. These are actual examples where our GUI functional tests did not catch these deformities.

And needless to say, this particular brand got a black eye from releasing an app with these visual deformities because this is the first thing your user sees. And in this particular case, they actually could not complete the transaction. So this visual deformity came in the way of functional flow.

Here is another example, from Amazon. In this particular case, I’m sure a lot of us have seen such deformities where the CSS does not load. And as a result, your page cannot show what it is intended to show. Again, there is nothing wrong with the page. It is just that in that particular scenario, CSS did not load and the page did not display it correctly. So hopefully this gives you an example of where such visual deformities can hit you, and it obviously comes in the way of workflows of how you use the app, as well as it comes in the way of the reputation of your brand.

So moving on, let’s understand what do testers want from the perspective of how to go about testing for these visual deformities. As I explained, the process of GUI testing or GUI automation is already a very manually, labor intensive process. And the last thing you want is incorporation of yet another tool, and yet another methodology through which you are investing. So what you want is incrementally more intensive or least intrusive way of somehow interspersing your visual test within your GUI test. Something that magically came in and actually validated your GUI test also visually. That is what you want.

When we talk about visual testing, there can be, with a lot of open source tools and a lot of home-grown approaches, you will find there are a lot of false positives which means that the test is actually telling you it failed. But when you visually inspect the test or you run it manually, you find that that failure is acceptable. So you don’t want such false positives because it slows down the process of testing.

You also want very effective error reporting to the extent that it is actually quite integrated and engrained in your normal test reporting so that you don’t have to go in two places. And it gives you drill down capability and decision making capabilities around what the failure is and quickly points it out. And furthermore, there are certain technical capabilities that one would want. Such as capability to ignore dynamic content. You don’t want your tests coming back and telling you that because today’s data is different than when the test was written it has failed.

So there’s a lot of this dynamic content that needs to be captured. You want good integrate ability of APIs into your test automation framework. You want the capability to validate the layout, in addition to validate the unforeseen visual deformities. And whatever test you do, obviously with respect to the need for responsive design, has to be independent of screen resolution. A lot of times what we find is that we wrote the test for the original display on the Mac, and the test breaks when I test it for the normal – in a normal virtual machine because the test was too dependent on the visual resolution.

Fundamentally, the message here is that the tool that we choose has to seamlessly enter into a process of automated GUI testing and be least intrusive to that process.

Moving on, there are apps that even though visual testing applies to pretty much anything that you are looking to test, however transactional or however visually appealing it is, when we talk about actually testing apps for the enterprise, there are apps that are more prominent that should be tested visually. And there are apps that have a less impact because of visual deformities. So some of these circles on the left represent classic cases where these kinds of apps make it very high potential candidates for visual testing.

Anything you’re doing, you’re doing around responsive design naturally means that you need to test it for various form factors, various devices. And you need to take into account a lot of device fragmentation that is happening with respect to android and your other smartphone platforms. So it is fundamental, and it is easy to understand, that these kind of apps become very natural candidates for doing visual testing.

Furthermore, anything that is content rich, for example your corporate websites or software that passes through some kind of content managing system such as Adobe Experience Manager, or Sitecore, those implementations in itself, as in when you’re configuring your Adobe Experience Manager, that itself, that software can be tested with the visual testing so that you know that the templates and the protocols that you’re actually implementing in your content management system is validated in its place.

Furthermore, anything that is highly consumer driven, such as retail websites, travel websites, pretty much anything that your consumers touch is, again, very much a ripe candidate for doing visual testing because it will directly impact your brand.

Alright? So I hope this was a good introduction to what visual testing is. Let me transfer control now to Adam, who will take a deeper dive into what is the process of doing visual testing and take you through some demos. Adam?

Adam: Thank you, Manish. Let me share my screen. So thank you very much, again, for the introduction. So for those of you that are listening that are not familiar with visual testing, I’ll start by describing the overall workflow of visual test automation tool. The workflow is very simple. It consists of four steps. In the first step, you drive the application and the test and take screen shots. In the second, the tool takes those screenshots and compares them with baseline images.

These baseline images define the expected appearance of the application. And in the majority of cases, these are simply screenshots that were taken in previous test runs and were proved by a tester. In the third step, the tool takes the results of these image comparisons and generates a report that includes all the screenshots, the baseline images, and any differences that were found.

And in the fourth step, a tester has to look at the reports and decide for each difference, if there were any, whether it’s a bug, in which case he opens a bug. Or if it is a valid change to the application, he approves the new screenshot to be used as a baseline for subsequent test runs. Now of course, the first time you run a test, you still don’t have a baseline image or baseline images for it. So the first run sets the baseline images. And starting from the second run onwards, you always have baseline images to compare against and screenshots to approve.

The way that we are going to continue this talk is that we are going to drill down into each of these steps and discuss them more thoroughly. I will also address some of the issues that Manish raised earlier in the talk. And we’ll show various demos of how Applitools Eyes implements those steps, and we’ll share best practices on how to best accomplish them.

And we’ll start with comparing screenshots with baseline images. The reason I’m starting with this step is because basically, it’s the most important step to get right. So if you’re using a visual test information tool that produces false positives for application, you will never be able to scale up your test. You will always have a lot of maintenance on your hands. So this is the most important tip I can give you.

Now, there are many reasons for these false positives to occur. By the way, a false positive, as Manish mentioned in the beginning, with respect to visual testing is a case where the tool tells you that there is a difference, but it’s wrong. It actually could be a difference you cannot see, or it is too negligible for you to care about. And of course, it is very annoying to bother with false positives like that.

So there are many reasons for these false positives to happen. One of the most, let’s say primary reason for that is when you are rendering the same page of your application on different computers, different computers have f setups, different graphics cards, different device drivers, different operating systems or some other settings. And so the actual page, alth9ough it is the same, will get rendered with different pixels by the rendering gadget.

So here’s just one example to show you what I’m talking about. We have here on the navigation bar at the bottom, and we have the playlist tab that you can see magnified here. And this is how it is rendered on one machine. But here you can see how it is rendered on another. And you can see, if I hover between the two, how the pixels can be completely different when we look at them up close. But here at the bottom, you can see that they look exactly the same. So the reason these pixels are different – by the way, this is due to an image processing effect called anti aliasing.

The reason these pixels are different is just because there is a different computer with a different implementation of the anti aliasing algorithm that produces different pixels. Still to the human eye, this is completely unseen. But if you just take these two images and compare them pixel by pixel, then many pixels are different and the comparison will show you that there is a lot of difference although you cannot see it. So you need to pick a tool that will be smart enough to detect these issues and understand that a human being cannot see them, and just ignore them and not fail your test for these reasons.

Now, this brings us to Applitools. And basically, we’ve been developing these image processing algorithms for years, now. And they can handle all these types of causes for false positives very well. If it’s anti aliasing, pixel upsets, color similarities of different size in image scaling; all of these are handled perfectly. Now, the most important thing about these algorithms that distinguishes them is that first of all, they don’t rely on any error ratio or any configuration of thresholds that needs to be specified. So the only thing that matters is if a human being can see the differences, and if he would consider them as differences or just minor artifacts. And this happens out of the box.

For example, if you have a huge web page and a comma changed to a period, this could just be three pixels that are different on a huge page. Still, this is something that is important and visible to a human being. And because of the tool, we’ll highlight it as a difference. On the other hand, if o the same web page a table – column – would get to be one pixel wider. And that would shift the entire image one pixel to the side. It would cause a 70 percent pixel-wide difference. Still, this is something that a human cannot see and doesn’t care about. And because of that, the tool will to highlight that as a difference although the majority of pixels between the images are different.

The other important capability of the algorithms is the ability to analyze the structure of the page and basically compare the layout of the pages that appear in the different screenshots. And this is very useful, and this is our first demo. I’ll show you how that works.

So in this first example, you can see the PAYCHEX homepage. On the left hand side, you can see a baseline image. On the right hand side, you can see the screenshot that we’re validating. The baseline was taken on a Chrome browser and the screenshot that we’re validating was taken on IE. Now, if I toggle between these two images, you can see how different the two browsers render the same page. You can see that the font is slightly of different size. The position of element is slightly different. Here in this paragraph, the text wraps differently because of the differences in the font size, etc.

But again in terms of structure, these pages are consistent. And because of that layout matching, actually doesn’t consider all these things to be different. If we click the rater button over here, you can see that it does highlight a change here at the bottom of the page. And if we zoom into it, you can see that we do have a missing element on IE. So layout matching is a very powerful way to do across device and across browser testing, and still get a lot of coverage out of it.

So let’s take a look at another example, this time with a mobile app. We’ll look at Twitter. As you can see, we have a baseline with Samsung S4 and a current image with Samsung S5. If we click the rater button, you can see that we have a few differences here. First of all, you can see that the baseline image dictates that the first switch should be aligned to the right of the image. And this is being violated over here.

Second, you can see that the lest weight is expected to have an image but this is missing over here. And again, the rater button will correctly highlight those two differences. But if I toggle between the two images, the baseline and the current image, and we look at the two tweets in the middle, although they have different texts and images, still they are structurally equivalent and because of that, the layout algorithm does not highlight them as different.

So this brings us to the second important use of layout matching, which is the ability to validate dynamic applications and even monitor applications in production. And the last example that I want to show you on this subject is the Yahoo website, with the baseline and the current image that were taken in a 24-hour difference. And you can see that although the images are different and the articles are different – it’s a very dynamic website – still structurally, these pages are consistent. And because of that the test is passed.

Now, if I change the matching algorithm to do a more strict comparison, rather than layout, as you would expect all the dynamic parts are now highlighted as different, and all the static ones aren’t. But still, this is a very – the strict algorithm is a very advanced image matching algorithm. If I would change this to exact pixel matching, you can see – a pixel to pixel matching you can see that actually everything is different on these pages between these images. And it’s just that the strict algorithm was smart enough to ignore all those things that are invisible to the human eye.

Okay so now that we know how we can compare screenshots with baseline images in a very robust and stable way that would allow us to scale our tests and reduce maintenance, let’s talk about the first step of the workflow, which is driving the application on the test and taking screenshots. So with Applitools Eyes you can accomplish that by using various SDKs that we provide. We provide almost 30 different SDKs that allow you to plug in visual validation to just about any test automation framework that is out there, both for mobile and for web. We support of course all the different language bindings for Selenium and Appium. We support coded UI and UFD and Lin FD as well as Espresso and XCUI and others.

Now, let’s take a look at how we can build a fully automated visual test for a mobile website – a website on a mobile device. Well use Github.com. We’ll run it on a mobile device. We’ll use appium to to control the browser, and Applitools Eyes to perform the visual testing.

So what we’re going to do in our in demo is that we’re going to launch the Chrome browser on our device that is emulated on a genemotion emulator. We’re going to navigate to github.com. Then we’ll validate the entire page. We’ll then click the navigation button over here, and then click the open source nav icon, which will lead us to the open source page of github.com, and we’ll validate the entire page, as well.

The example I’m going to show you is in C# but it should be very easy with anyone who’s familiar with Java to follow on so I’m not worried about that. Okay so let’s start with just seeing the appium code that is needed, the selenium code that is needed to just simulate the user interactions with the github.com site. We don’t do any validations at this point.

So first of all, we define the desired capabilities that will allow us to control the Chrome browser. We specify android as the platform name. We point to the device that we want to test. Then, we’ll specify that we’re working with the Chrome browser on that device. Next, we create the driver as a remote web driver and point it to the selenium server that is running on my machine. In our test, we’ll start by navigating to github.com. Then we’ll find the navigation button, using its class name. And once we find it, we’ll just click it and this will open up the navigation menu.

Then we’ll locate the navigation item, the open source navigation item again by its class name. And once we find that one, we’ll click it and this will lead us to the open source space; very easy and simple. Now let’s add visual validation with Applitools Eyes. So basically, we’ll be using the Applitools Eyes SDK for Selenium for C# language bindings. Similar SDKs exist for all the other language bindings. And we start by creating an instance of the eyes object. We proceed to set the API key that identifies us as a customer. In this case, I’m getting the API key from an environment variable called Demo API Key.

Next I’m requesting the SDK to force full page screenshot. This means that for instance on Chrome, when I’m validating the page, I want to make sure that I get the entire web page and not just the part of the page that is available through the mobile device’s viewpoint. So the SDK will actually scroll up – scroll down the page and take multiple screenshots and stitch them together and simulate a full-page screenshot for us.

Lastly, in order for my report to look nice, I’m specifying that the environment where I’m running my test right now is actually a Google Nexus for device. Now let’s see how we add those validations. At the beginning of the test, I call Eyes Open, I pass in the web driver, I specify a name for the application, and a name for the test. This is how they will appear in the eventual report.

After I visit github.com – I call this single line Eyes Check window – which basically captures a full-page screenshot of the current page which is the homepage, github.com, and perform visual validation of that entire image, meaning that we get with a single line of code 100 percent coverage for that entire page. So basically this one line of code saves us from writing hundreds of equivalent lines of validation code, and of course maintaining them as the application changes.

Next we click on the menu button and when it’s open, we’ll do another check window. The text of the strings that you see here are just descriptive strings that will appear in the checkpoint in the report that would make it easier for someone reading the report to understand where we are in the test. You can specify any string you want, here, or not specify it at all.

Next, after we click the open source navigation and we move to the open source page, we validate it, as well. So as you can see, with just a few lines of code we actually covered entirely three UI states of our application, and get 100 percent coverage for them. So this saves a lot, a lot, a lot of effort. It is the fraction of the effort to build such a testing code and of course to maintain it moving forward.

So let’s run this test and see how it goes. Okay. So you can see the Chrome browser starting, navigating to github.com. Now you can see the SDK scrolling the page down, taking screenshot as it goes and building that full-page screenshot for us. Clicking the navigation button, scrolling down and capturing the entire screen again with that navigation button open. Clicking the open source page – navigation item moving to the open source page, and capturing the full-page screenshot.

And as you can see, our test has passed. If we move into the Applitools test manager, you can see that we have a successful run of the mobile GitHub test. We can see the meta data of the test; we can see that it passed. We can see the thumbprints of the different checkpoints that we had, the three of them. And we can also drill down and see the test up close. We can see that we have this timeline at the bottom that shows us different verification points.

We can actually play back the test and see everything that happened with the web driver; the click on the menu, the click on the open source button, moving between those visual tests as we validated. And if we look at each validation point, we can see the side-by-side representation just like we saw in the previous demo where we have the baseline image on a Google Nexus form and the current image, again on a Google Nexus form. The images are matching but if anything was different, anything of the multiple details that we have here, it would be captured. So we basically have this 100 percent coverage for that page.

Now, what we are going to do next is we are going to switch the device. We are going to simulate a situation where we just got – let’s say that we already have a baseline for Nexus 4 and it’s running as part of our CI. Now we’ve got a new device and we ant to validate it, as well. But we don’t have a baseline for it yet. So I start the Nexus 5 device. And while it’s starting, let’s continue to talk. What we want to do is we want to run the test on the new device, or Nexus 5 device.

But we want to compare it with the Nexus 4 baseline. So we don’t start from scratch. We want to at least make sure that in terms of layout, they are consistent – the application is consistent in terms of layout between these two devices although they have a different form factor. So let’s take a look at the test and see how we accomplish that.

It only requires a few, simple changes. First of all, we indicate that the new environment that we’re running on is Google Nexus 5. We indicate to the SDK that it shouldn’t use a new baseline for Google Nexus 5 but it should actually test against the baseline of Google Nexus 4. The third thing that we need to do is specify the match level as layout. Because if we would use a strict matching, of course everything would be different because the different devices would render the page in the different size and it won’t match. But with layout, we can do this across browser, across device test.

Now, the other thing that we’ll do before running it is that we will inject a visual bug into this example. So I will run on a Nexus 5 device, I will compare it with the Nexus 4 baseline, but I will modify the website just before taking the last screen capture, before the last validation points on the open source page where I will move the GitHub logo to the right. Okay, we’re getting the GitHub logo over here and adding a margin to its left. So it will move, and we want this to be captured as a layout bug. So let’s see that – okay, we need some time to let the device start. So let’s continue with the presentation and once it finishes, we can run it again.

Okay so a few tips about how to construct your tests. The first step is to always prefer full-page validation where you can. The reason is that on one hand, you get more coverage because you’re testing the entire page. It means that you can catch unexpected bugs. If you only focus on specific components, you can have issues between those components that you won’t catch; unexpected issues. But if you are matching the entire page, nothing will slip by. Any unexpected issue that will come by, you will capture it. Which of course is extremely important and a great value that you can get out of your automation.

The third thing is that it allows you to actually maintain your test without touching the code. It means that whenever you have a full-page validation, that something there changes, everyone in your team that knows the application can do the maintenance. They can look at the images, they can see the difference and decide if they want to open up a bug or accept a new baseline. You don’t need the guy that knows the test code and knows how to read the logs to go in and find the time, and start analyzing the failure and understand what happens. Everyone in the team can do it.

The second important thing that they can do it immediately. They see the issue. They can just choose to accept or reject it. They don’t need to wait, open a ticket for someone to finish this test and look at it, or maybe he’s on vacation. So the benefit here is huge. And of course, you avoid maintaining all those element locators that are required to pinpoint those specific components that you would otherwise have to test. And these element locators, as you well know, tend to break when your UI changes. So instead of having to – when the UI changes, having to go and fix those element locators in code, you can just approve or open bugs for the current screenshot that was taken.

Let’s see what goes on with our device. Okay, it’s still starting. Let’s give it another second to run. Okay, let’s continue to the next thing. When it comes to implementing those visual tests, there are basically three options that different things choose. And I’ll go over these three options and describe the pros and cons of each option. So we have our Nexus device up and running. I’ll just run the test so we don’t waste any more time. I run this test on our Nexus 5 device, and let’s see what we get.

Let’s see that the Chrome browser is starting. Yep, we’re started. Okay. Let’s let this run in the background and we’ll continue with the slides. So the first option is to write dedicated visual tests. So this is the most recommended way to go because it provides you full control over what you want to validate. You can write those tests to reach every corner of your application. You can decide between full-page screenshots and checking individual components. And of course it allows you to do all the tricks you want by manipulating the application and the test.

So let’s say that you have a certain UI stated. You have a back door to get to as a shortcut. You can just build a test that uses that and gets there and gets the screenshot validated so you can get the most coverage in the most optimized way. Of course the downside of it is that you need to write those tests from scratch. You have to invest some time to build those tests that will cover your application.

Now, it’s not as difficult as writing common, traditional functional tests because you don’t need to write the validation code. It’s just a simulation code, and it’s a fraction of the effort. But still, there is some effort especially if your application is complex and has a lot of UI.

Now, the second option, and many, many things go for that is to add visual checkpoints to your existing functional tests. So you already have tests that walk through your application and test functionality. You can just add those check window instructions within them at the places where you want to validate the page, just like we did in the demo before.

So this means that you don’t have all the flexibility to get to every state of the UI; you’re basically limited to whatever your tests are doing right now. And it means that you might not be able to take shortcuts because the functional test will usually go through the normal functional – the main use cases of how users use your application. But still, it would allow you to leverage your existing investment and information and quickly add visual validation to that.

Another thing that you’ll start seeing is there will be a lot of overlap between the assertions that you have in code and the visual differences that you’ll see in the test report. And this would eventually, for many teams, it leads to stop doing the assertions in code and relying more on the visual stuff, unless of course the assertions are related to stuff that are not visible in the UI.

The third option also works well for many teams but does not work for all situations, is to add implicit visual validations in the test framework itself. So you don’t touch the tests but if you have a test framework where all the clicks and mouse movements and keyboard strokes go through, you can add in place of visual validations as a response to certain triggers.

So for example, you can do it after navigating to a URL, take a checkpoint, or before clicking a button. So if you filled out a form and you’re about to click the submit button, it would validate that form. So the upside of that is that it’s very trivial to implement. Just a few lines of code and all your tests become visual tests at once. There are a few downsides, though. First of all, you can only do full-page validation because you’re doing a generic validation so you cannot address specific components in the framework.

And because of that, you are restricted further validation. So if there are pages that don’t work well with full-page validation because they are very dynamic, or only parts of them you want to validate and you need to exclude them, find a way to exclude them in your framework.

The second downside is that you get duplicate validation points. So if all your functional tests, for instance, goes through a log-in page, then all of them will validate that log-in page. So you’ll have multiple validation points validating the same screen, and that is a downside unless you figure out nice ways to avoid that screen and maybe just write a dedicated test for the log-in screen that is shared among all the others.

And of course if you have certain tests that are heavily parameterized in data, let’s say for example you have a form where you have input and output, and you have a table on the side with 100 pairs of inputs and outputs. And you run through all of these to make sure that the logic works correctly, then you don’t want to have visual tests on top of this test because you’ll have 100 visual validation points validating the same page, which is of course useless.

But all of these three options are valid. They are widely used and you should see what works best for you, and also in the short term versus the long term.

So our test has completed. Let’s go back to the dashboard and see what we got. Okay so we can see that we have a difference here; the test has failed. We have a difference here at the top of the page. If we drill down to it and zoom up close, we can see that indeed we have a baseline image with Nexus 4, a current image taken from the Nexus 5 device, and it’s a layout test but it did detect that the Google logo has moved, just like we hoped it will. But you should also know that the devices have a different form factor.

You can see that actually the width of the device is different and the height is different as a result. And because of that, everything is laid out differently. But still structurally, the pages are consistent and because of that, all these issues… because of that, no differences are highlighted. Now, of course if I would change this to strict matching, then everything would be different because nothing is the way – it’s not the same at all, right?

So this demonstrated how I can bridge, when I have a new device, bridge the baseline. If I now save and accept this baseline, I can start doing strict tests on IOS – on the Nexus 5 device against itself. But I did have a bridge that allowed me to, instead of starting from scratch with a new baseline to compare it against another, and in a similar way you can do across device testing. If you know that a certain device looks right, you can make sure that it looks right at least at the layout level on all other devices, in addition to your regression tests.

Now we’ve come to the last two steps of our workflow, which are viewing the differences and updating baselines. And again, I’ll show you a demo of how this is done with Applitools Eyes. So if we go back to our GitHub example, but let’s say that we had a little more time and it would write more tests. We could have a test suite that would cover many more pages on different layouts of github.com and on different devices and browsers. So in this case, we have a test suite that could be GitHub jobs, for instance.

That had 19 tests; all of them failed. They covered four different environments. Those would be desktop and mobile and form different four factors to capture different layout modes of the application. And all in all, we found 76 mismatches. We can see, though, a thumbprint of these matches pair tests, and we can also just look at them for the entire suite, regardless of the test they originated in.

Now, although it’s quite easy to see that the difference here is at the top at the logo, and we can immediately accept or reject from those thumbprints, and although we can zoom in on the images very quickly and see what happened so in this case the Google logo went away. And let’s say that we wanted to do it so we can approve it. Still, if we have like thousands of tests running on a test run, and we now have like a common change that appears in many of those pages because we are doing full-page comparison, it could be a bit overwhelming to go over all these images approving them one by one.

So what we can do is ask the tool to analyze those differences and only show us the unique differences. So with all these dozens of differences, we only have two unique ones. If we zoom in on the first one, we can see that in this case the Google logo disappeared, which is what we wanted. So we can just approve it and with that, we approve all of these changes across all of these steps. If we look at the second set of similar changes, we can see that in this case the Google logo didn’t go away but actually turned green.

So this is probably a CSS bug. It’s not what we intended. So we can just reject that. And if we save it, we just updated the entire baselines for these 19 separate tests. So with just a few clicks, we can maintain hundreds and thousands of images and differences. So we have customers today that are running thousands of tests every day. And with features like this, there are many others that make it very easy to maintain as [inaudible] [00:49:08] scale. They can scale up their tests to these levels without really increasing the amount of overhead that they have to spend on maintaining those baselines.

Now, with this I conclude my part of the demo. Before we move into the Q&A part, there are a few special offers I want to let yo know about. So Infostretch is offering a free, four-hour assessment of your visual testing strategy. You just go to this link and sign up if you’re interested. And Applitools invites you to join our free Visual Testing Muster Class where you can learn more about the advanced features and in depth capabilities of Applitools Eyes. Just go ahead to this link and sign up for the next master class. So during the Q&A session we’ll leave that slide on so you have the time to look it up and write it down if you need to.

And with this, Addie, I hand it over to you for the Q&A session.

Female Speaker: Great. Okay so we have quite a few questions. We’ll try to get through as many as we can. The first one: is Applitools a standalone tool or does it require additional tools, for example Selenium Web Driver in order to work properly?

Adam: Excellent question. So Applitools is not a standalone tool. We have, as I mentioned, many, many SDKs that are supported so that you can easily integrate visual testing within any test framework that you’re using. I can also say that in about a week or two, we’ll release a version of Selenium Builder from the Selenium project that has visual testing capabilities so you can actually record and playback visual tests very easily without writing code. And this ability is also available in the products of some of our partners that have record tools with record/playback capabilities that can also integrate visual testing into them.

Manish: So Adam, if I may add here, Infostretch also has an open source framework called QMetry Automation framework. It’s on GitHub. And there also Applitools has integrated just via a property setting. So all you have to do is write your test case and turning on a property you can basically start getting all your test cases validated visually, as well. So that is all available for free to anybody, basically.

Addie: Okay great, thank you. Next question. Could we use Applitools Eyes for localization testing, for example a test display that might be in any of five languages and pass those five ignoring irrelevant differences in screen locations, and not to pass any other text?

Adam: Yes, definitely. So there are many usages for visual testing that we see customers doing; localization testing is one of them. Similar ones, just to open up your minds is testing PDF forms, testing accessibility strings, etc. All of this can be done. With respect to a localization testing, there are two things that you can do. First of all, you can do regression testing on each language so that let’s say that you have your site in German and no one in the office knows German. So if there was a typo there, no one would notice.

But if you’re doing regression testing on the German version, if anything would change there it would immediately be picked up automatically and highlighted. You can then take that link and send it to someone who knows German who is a translator. He can tell you if it’s okay or not. So it saves you the effort of first of all isolating where a thing has changed, and easily sharing and having feedback on that. And of course if it’s okay, just a click of a button to accept.

The second thing that you can do is using layout as a bridge to introduce new languages. So let’s say that you have your UI in English and now you are adding a Bulgarian version for it. So you can use layout matching to at least verify that when you change languages, there were no layout defects in the Bulgarian version. Of course, this does not substitute the initial effort of making sure there are no typos and all the proper translations were made.

But still, once you’ve made the effort, you can start running your strict regression tests on those environments. And you can find out automatically if there are any layout issues on that new language that you’re introducing.

Addie: Great. Next question. For the baseline images, do you need baseline for each browser or each different device?

Adam: It’s a good question. By default, when you’re running your test, each unique environment gets its own baseline. So if I’m running a test on Chrome, it will compare with a baseline that is specific for Chrome. If I’m running on IE, it would compare against a baseline in IE. If there is no baseline, a new baseline will be created, etc. Now, as I’ve shown you in the demo, you can actually configure the SDK to not do this default behavior but actually to compare against another baseline. In our case, we used a baseline – we ran a test on Nexus 5 and we specified as a baseline Nexus 4.

But note that we also had to change the match level to do layout comparison because these screens would be very different because they have different form factors and so it doesn’t make sense to test them in a strict way.

Addie: Great. First of all, I wanted to say that there appears to be some problem with the link to the Visual Testing Master Class so I will print that in the chat room in a second. I will send you an updated link. And in the meanwhile, the next question for you, Adam, is how does the capture work when page elements that are not directly loaded and need some time before they are shown on the screen.

Adam: Excellent. So first of all, the Applitools Eyes SDKs have sophisticated mechanism that waits for the page to load. So when you do a checkpoint, it doesn’t just get the screenshot and send it and moves along with the test. Because the Eye server has the baseline in it, it knows how the page is expected to look like. There is a global timeout that is defined that during that timeout, the SDK will take multiple screenshots and try them until a match is found or until the timeout is exceeded. You can also specify specific timeouts for each check window.

So if you know there is a problematic page and you don’t want to bother with stabilizing the test, you wait for some element to appear or whatever, you can just increase the timeout and the SDK will actually poll again and again, trying out the screenshot until a match will be found or until the timeout will exceed, in which case there will be a failure. The nice thing about it is that if the page stabilizes after a second, then you only wait a second. The downside of it is that if there is a difference, you will have to wait for the full timeout until the difference is actually determined and the test moves on.

So the bottom line is you can rely on the Eyes SDK to stabilize. But like any other testing effort, the best results you’ll get is if you actually, as someone who knows you’re specific app and knows the specifics of your page, and knows what is the exact way to figure out that the page completed to load, you can improve the performance of your test by actually waiting for the page to load in your Selenium, for instance, code before you go to the Applitools Eyes checkpoint. But then again, it’s fine to work with the timeout. It saves a lot of efforts and headaches and it’s many, many things use that.

Addie: Okay. And since we are out of time, one last question. How forceful page screenshot works with pages that are endless, for example news streams.

Adam: Okay. Basically, there are two ways – okay, I don’t want to get too technical about it. But what I can say is that we know how to scroll the page in a way that doesn’t trigger more data to be loaded. So whatever is loaded at that specific moment, the stuff that is under default will get captured. But it won’t trigger any other events that would otherwise occur if a user would actually scroll the page down. So it works fine.

Addie: Okay, great. Unfortunately, this is all the time we have for today. I want to thank Manish and Adam for this very in-depth presentation. And I would like to thank everyone that joined us today, and I hope to see you at our next event.

Adam: Thank you, everyone.

Latest News, Events, and Thought Leadership

Stop by our booth and enter to win a Sonos Play: 1!
Learn More
Aug 30 – 31, 2017
Marriott Marquis
San Francisco, CA
Aug 31, 2017
See More Events
Intelligent QA – When Analytics Meets Software Testing
Register Now
September 12, 2017
10AM PT / 1PM ET
Online

Sep 13, 2017
See More News
Let’s Talk JIRA data for Predictive Quality Analytics at CAST 2017
Read More

This year’s Conference of the Association for Software Testing CAST is literally around the corner and we cannot...

See More Blogs