what are disadvantages of using ":xpath" attribute to identify an object using Watir? - watir

I am using :xpath attribute frequently to identify an element for my automation scripts using Watir and found it really amazing. It is least changing attribute so less work to maintain automated scripts.. off course for those elements which can't be identified otherwise easily through :id, :name, :value attributes..
I am bit concerned to take some expert advise before building so many automated scripts using :xpath.
What is disadvantage of using :xpath to identify an object using Watir?
Do :xpath value of an element will be same in IE, Chrome and FF?when
Is there anything else important i should be aware about using :xpath?

The xpath should always be the same in all browsers.
The problem with using xpath is that it is the easiest locator to break, as the locator for the element is dependant on nothing else in that xpath changing. e.g. if you are locating a results table on a page using an xpath and at a later date another table gets added above the table, then the xpath will be broken and your tests will fail until you update the xpath. If that table was located using an id then adding the second table wouldn't break anything as the new table would have a different id.
If the pages you're working on don't have id's and it isn't an option to add some/ ask for some to be added then remember that in watir you can use multiple locators.
e.g. #browser.table(class: 'results_table', text: /Original results table/)
This is a silly example but hopefully it illustrates the point. If there are cases when using multiple locators still won't work for any reason, then I would look into using css selectors instead of xpath as you should be able to achieve the same things but it will be less brittle.
The issue of how often tests break isn't too important in a small test suite, especially if you're the only one working on the tests. However, a couple of years from now when you have hundreds of tests to maintain and two or three people sharing the codebase you can end up spending longer fixing old tests that you spend writing the new ones. It's worth doing anything you reasonably can to minimise this as you go along as doing a rewrite later will always take longer.
Hopefully some of this helps!


Pulling Data from HTML Tables

So I understand how to pull data from a single weblink looking at tables. I cannot find not 1 tutorial anywhere on the web about how to do so getting it from Div elements and no one talks about it at all. Can someone please give me an example or something? Either Excel or Google Spreadsheets.
Im trying to teach myself doing so but using this website https://newworldstatus.com/regions/us-east for a small project I want to do.
Thank you in advance.
This is not a comprehensive answer, just intended to show you how some very basic concepts work. Second, an answer for Sheets, but let me preface all of this by saying that while your test URL seems simple enough, you will not be able to do any of this for that specific URL. They are either actively trying to stop scraping or they just have it set up in a way that makes it difficult to scrape by accident. If you directly make a web request to that URL, you will get back the JS code that actually handles the data load-in and not the data itself, so any kind of parsing you try to do will fail because what you see in the page isn't what is actually coming back on the initial page request. All the html that will be in the page is enough to show this:
You would need to either try to read through the code and figure out what they're doing, or do some tinkering in the javascript console, and probably some fairly high-level tinkering. So for a first project, or just to learn some basics, I think I would pick a different test case.
First, in VBA. It's both complicated and not all that complicated at the same time. If you know how web technologies work non-language specifically, then it all works pretty much the same way in VBA. First, you'll need to make a web request. You can do that with the winHTTP library or the msXML library. I usually use winHTTP, but unless what you're doing is complex, either one is fine.
You'll need to instantiate a request object. You can do that by either adding a reference to the library (tools->references-> and pick the library out of the list) or you can use late binding. I prefer to add the reference, because you get intellisense that way. Here are both:
Dim req As New WinHttp.WinHttpRequest
Set req = CreateObject("WinHttp.WinHttpRequest.5.1")
Then you open the request. I'm going to assume this is a straight GET. POST requests get a little more complicated:
req.Open "GET", url, TRUE
If you have the reference added and created the req with Dim, then you'll get the intellisense and as you type that the arguments will pop up and you can use that to refer to the documentation if you have questions. TRUE here is to send it asynchronously, which I would do. If you don't, it will block up the interface. This the Open method, which you can find in the documentation.
Then use
source = req.responseText
to send the request. WaitForResponse is needed only if you send the request asynchronously. The last part is to get the responseText into a variable.
Then you'll need to do some stuff with the MSHTML library, so add a reference to that. You can also late bind, but I would not, because it will be very helpful to you to have the prompts in intellisense.
First, set up a document
and write the source you just fetched to it:
Dim doc as new MSHTML.HTMLdocument
doc.write source
Now you have a document object you can manipulate. The trick is to get a reference to the element you want. There are two methods that will return an element:
If you are lucky, the element you are looking for will have a unique ID and you can just get it. If not so lucky, you can use a selector that identifies it uniquely. In either case, you will set up an IHTMLElement to return to:
Dim el as MSHTML.IHTMLElement
set el = doc.getElementById("uniqueID") 'whatever the unique ID is
Once you have that, you can use the methods and properties of the element to return information about it:
There are more specific interfaces, like
You can use the generic IHTMLElement, but sometimes there are advantages to using a specific element type, for instance, the properties that are available to it.
Sometimes you will have to set up an IHTMLElementCollection:
and iterate it to find the specific element you are looking for. There are four methods that return collections:
getElementsByClassName is sometimes problematic, so forewarned is forearmed.
If you need to do that, set up and IHTMLElementCollection and return the list to that:
dim els as MSHTML.IHTMLElementCollection
set els = doc.getElementsByTagName("tagName") 'for instance a for anchors, div for divs
That is about it. There is obviously more to it, but a comprehensive answer would be very long. This is mostly intended to point you in the right direction and give you more stuff to google.
I will say that you should test out some of these methods in the browser first. They exist in many languages, and all major browsers have developer tools. For Chrome, for instance, press Ctrl+Shift+I to bring up the dev tools, and then in the console window type something like:
and you should get the node. or
document.getElementsByClassName(".test") 'where test is the name of the class
document.querySelectorAll("div") ' where you pass a valid CSS selector
and you will get the node list.
It will be quicker to experiment there than to try to set it up and debug in VBA. once you have a good handle on how it works, try to transfer that knowledge to a VBA implementation.
Here is a basic overview of .querySelector to get you started on understanding how those work, although they can get very complicated. In fact, querySelector is my go to method for finding elements.
Now, Google Sheets:
You don't really want to use IMPORTHTML, even though it seems counterintuitive. That function (AFAIK) only supports tables and lists, and it's index based, too, which means you give it a number n and it returns the nth table or list in the page. That means that if they ever change the layout, or the layouts are dynamic in any way, then you won't be able to rely and an index to accurately identify what you want. Also, as you noted people don't really use tables much anymore, and when they say list I'm pretty sure they mean on and elements, which is also not going to be that useful to you. Here's the docs:
But you can use IMPORTXML. Even though it says XML, you can still use it to parse HTML (for reasons and with limitations that are out of scope for this answer). IMPORTXML takes a URL and an xpath selector. In this way it's similar to the document.querySelector and querySelectorAll methods. Here is some information on xpath in tutorial from from w3schools.
And if you want to test selectors in Chrome you can use $x("selector") in the javascript console in the dev tools. I believe Firefox also supports this, but I am not sure if other browsers do. If not, you can use document.evaluate:
Even though you can't actually use this in sheets against the URL you've given, let's take a look at a couple of xpath selectors in that context. Hit Ctrl+Shift+I to bring up the dev tools (hopefully you are using Chrome), and then go to the elements tab. If you don't have the javascript console showing in the bottom pane, hit Esc. You should see something like this:
Use the arrow icon in the top left of the dev tools to search the elements, and just click on the first row in the table:
so that you can see the structure of the elements, and figure out how to parse out what you want from it. You'll notice that the cell that's highlighted is contained in a div with a role of "row" and an attribute of row-id. I think that's where I would start. So an xpath to that container would look something like this:
where we are fetching all elements (//) that match div and have an attribute (#) of row-id = 1.
If you want to get the children of that container, you just add another level to the path
where we want to get all children (/) that are divs.
And I notice that they all have a col-id attribute, so if you wanted to fetch the "set" information you'd just specify divs that have an attribute of col-id = 'set':
and to get the text out of that:
since it looks like the second node is the one that has the team name in it. Again, you can see how this WOULD work in the dev tools, but you won't actually be able to use this for your URL.
I'm not going to spend a lot of time here. As already stated, you won't be able to use this method on your specific URL. If you can figure out the actual URL that your URL wraps around, then perhaps. Also, since there's only one argument, the selector, then there's not much more to expound on. If you needed something more complex, like the ability to iterate over a set of matching nodes, you could probably do it in Scripts, but I would probably just switch to Excel if it started getting that complicated. The only exception would be if the data was JSON formatted, in which case Scripts will be able to handle that better than VBA, although I would probably switch to a different language entirely in that case.
Since your URL is probably not good for testing, I'm going to point you to this tutorial from Geckoboard, which has a few different examples from sites like Wikipedia and Pinterest.
So google around, experiment, and let me know if you need any help. And this was all off the top of my head, so let me know if any of this stuff throws errors so I can edit the answer.
Also, be aware that Excel is not always the right tool for dealing with this. Very often, while the page might have the elements you are looking for, they will be loaded in with JSON and both php and javascript can natively handle JSON objects, while VBA doesn't. If the data is JSON formatted, it is much easier to parse it out of that than trying to parse it out of the DOM structure (DOM = document object model, another thing to google). Also, in many cases, if the data is loaded in with AJAX, it won't be returned with your winHTTP call, because that doesn't execute any javascript that might be in the page.
Further, in many cases you will need to set headers or cookies in the winHTTP call to get the data (calls without the right setings might return an error or a redirect). That is also not addressed in my answer, although you can set headers and cookies in winHTTP. You would need to sniff the calls, either with Fiddler or similar or with the network tab in dev tools, to find out the right combination of information to pass with your request.

Find Elements with dynamic id's and Xpaths's

I apologize if this is a duplicate of a question already but nothing I read seemed to do the trick.
I am trying to automate the process of adding my hours for my job. This entails using selenium to mimic the process I do to enter the hours for me.
The problem is, as I navigate through the process, I have run into an instance where one of the elements has a dynamic id and xpath (any maybe other things. I am not very proficient in HTML).
I need to select the "Day" button on the "View" drop down. The highlighted HTML corresponds to that button. I have already checked and both the ID and Xpath change every time I create a new session. I usually do the following to find my elements:
elem = driver.find_element_by_xpath('xpath')
Below is the xpath I currently see:
To further complicate things, the xpath for the "Week" selection is the following:
I tried to figure out how to use "contains" with the xpath but even so, the two are not different enough to differentiate by using "#id". The only constant thing and difference I see each time is that the
is present on the day element and
is present on the week element.
Does anyone have any experience finding the elements when a problem like such occurs? I am working in Python3.6 on a windows 7 computer.
Again, I apologize if this is a duplicate but I tried very hard to find an answer before coming here for help.
Thanks in advance!
You can use two of the below possible selectors
When you use identifier your main focus should be how to find something that is unique to that object. And it really doesn't matter if it is name or id or what not. Use what you think would work the best. And here data-automation-label implies that itself

Search Algorithm for a web application that needs to look for a specific value

I'm developing a webapp that will need to download the html form a website and then iterate through the code and try to find a specific but ever changing value (in our case it will be the price for the product).
For this, I was thinking about asking the user (upon installation and setup) to provide the system with a few lines of html from the page (that has the price) and then from then on, every time we need to fetch the price we would try to search for those lines and find the price.
Now, I believe this is a horrible and slow way of doing this and since there are no rules and the html can be totally different from one website to another (even the same website might change) I couldn't find a better way.
One improvement that I thought about was to iterate through the first time and record the line at which we find the code. Once found, the subsequent times we would then start from a few lines before the expected location and start the search. Any Thoughts on how I can improve on this?
I posted this question on https://cstheory.stackexchange.com/ but they commented that it's not on topic and that I should post it here.
I have the code for the above and if needed I can post it, I'm simply thinking that there must be a better, faster way of doing this.
This is actually something I tried for a project recently (using BeautifulSoup and Python). The solution that worked for me was to workout CSS selectors (which can map to jQuery selectors) that targeted the elements that contained the values I was looking for. In my case I was able to narrow down the full document to just the elements that contained what I was looking for but if you couldn't get exactly what you where after you could combine this with some extra lactic like test to see if it looks like a price (via regex) or test what it is next to.

Using class names in Watir

So our QA guy came by today to get me to put id's on items in our html so he could automate stuff using watir.
I don't know much about it, so I tried to see if we could use the class names instead, but that's a total crapshow.
I was just wondering why something like
link(:item, :id => 'save-btn')
works when you set it up in watir, but you can't do something like
links(:item, :class => 'save-btn')[0]
I also tried using the browser.links calls, but we would consistently get
element not visible errors
I was just wondering why this was so difficult, to where using ids on everything seems to be the recommended way to go with everything? Is there a way to use class names with watir or is that just the way things are done?
CSS class attributes are quite normal way of accessing elements with Watir. As is id.
For class attributes, however you have to specify usually the context where you're searching from because there might be more than one element with the same class as opposed to id attribute, which has to be unique for the whole page.
I'm not sure what framework you're using in your examples, but this is how i would do it in plain old watir if the save button is in some container element:
browser.div(id: "container").span(class: "save-btn")
The code above will find first span with class "save-btn" in the container element as expected.
Also, do not use xpath or css locators ever as suggested by another answer in here. Why? Because they are really fragile and make your tests too hard to read/maintain.
Short answer: Adding class names to source code is the correct direction and should be considered as good practice, however, after that, using :class locator isn't good enough in most of the case, try use :xpath or :css. Therefore, you, as a developer, go ahead and add the class names, but you need make sure your QA people know how to use Watir, don't simply use :id or :class for all locators.
Long answer: If the site is simple enough, adding IDs would be easiest and the best one. However, nowadays, many JavaScript frameworks like ExtJS, create dynamic IDs, so in that case adding class names to source code would be better.
After adding the class names, for example in your case, using :class locator is a bad choice, which might be even worse than :id, as IDs are supposed to be unique. For complex pages, :class locator is pretty much useless, which will find unwanted elements.
Here, your error message means you might have more than one elements with class save-btn, the first one isn't visible to be interacted with.
Selenium WebDriver or Watir WebDriver, both support XPath and CssSelector, so you should use :xpath or :css when necessary, instead of :id, :class etc.
For example, something like:
links(:item, :css=> '.save-btn:not([style*='display:none'])')[0]
Jarmo Pertman suggests using ID/Class is in favor of XPath/Css Selectors, which is not completely correct. ID/Class are just subsets of XPath/Css Selectors, if it's easy enough, use ID/Class, however unnecessary chaining is not a good practice, in the example he gave,
browser.div(id: "container").span(class: "save-btn")
is equivalent to CSS selector div#container span.save-btn, therefore, there are no fragile or maintenance issues for CSS selector, because they are identical.
XPath/CSS selectors are powerful tools, everyone should be learning, but only use them if really needed. Bad XPath/Css Selector are fragile, try find the good ones. (But bear in mind that XPath is slow, should be considered as last option)

Cannot locate a text_field with dynamic id

<div id="temp_1333021214801">
<input type="text"/>
I am getting error "unable to locate element", because the ID changes dynamically.
Please help me to set the text in the text field.
It seems like your dynamic id is temp_ so this should do it given information above:
browser.div(:id, /temp_\d+/).text_field.set 'something'
Issues with my solution is that it assumes id will always be temp_ regex matching any number set consecutively, which seems to be the case with your sample above. Also, it assumes there is no other div(:id, /temp_\d+/) combination in the DOM of that page, most likely should not be an issue.
If you have dynamic IDs I can suggest the following:
Code to object counts. For example
$browser.text_field(:index => 2)
gives the third text_field on the page.
Code to what is around the thing you're trying to find.
$browser.div(:name => 'mydiv').text_field(:index=>2)
gives the third text field in the div called 'mydiv'.
If your front-end is less-than-testable in this way I highly suggest you put time into thinking over your commitment to automated testing in the first place. Any minor change to the software is going to have you working until 9pm pulling your hair out and rocking back and forth as you update all your scripts, so unless code maintenance is your weekend hobby think about semi-automation or exploratory testing or manual scripts. Talk to development (whomever that might be. It might be you!) or the higher-ups (unless that's you too) to see if it can be made more testable. Also don't use xpaths unless you take some deviant pleasure in it.
Hope that was helpful, I can't do anything specific without the source HTML.
