getelementsbyID inner dt id values - excel

I am extracting data from HTML using Vb Script. This is the HTML code from which am trying to extract the data.
<dl id="overview">
<dt id="overview-summary-current-title" class="summary-current" style="display:block">
Current
</dt>
<dd class="summary-current" style="display:block">
<ul class="current">
<li>
Software Engineer
<span class="at">at </span>
<a class="company-profile-public" href="/company/ABC Systems?trk=ppro_cprof">
<span class="org summary">ABC Systems</span></a>
</li>
</ul>
</dd>
In my previous question, I had asked for a similar doubt. The link is Excel getElementById extract the span class information.
However, in that case, I wanted to extract the information corresponding to the dl id and it also had span id. In this case, I need to extract the information corresponding to the dt id.
In my VB Script, I tried something like this.
Dim openedpage as String
openedpage = iedoc1.getElementById("overview").getElementById("overview-summary-current-title").innerHTML
However, I am getting no output.
I want the output as Software Engineer at ABC systems.
Kindly help me out.

The object returned by getElementById() doesn't have a method .getElementById(), so the following line fails:
.getElementById("overview").getElementById("overview-summary-current-title")
If you don't get any output, not even an error message, you probably have On Error Resume Next somewhere in your script. Please don't use that unless you know exactly what you're doing and have sensible error handling code in place.
Also, the element with the ID "overview-summary-current-title" is this:
<dt id="overview-summary-current-title" class="summary-current" style="display:block">
Current
</dt>
So you couldn't possibly extract the text "Software Engineer at ABC systems" from that element.
Try selecting the first <ul> tag from the element with the ID "overview", and then use the innerText property instead of the innerHtml property:
Set ie = CreateObject("InternetExplorer Application")
ie.Navigate "..."
While ie.Busy : WScript.Sleep 100 : Wend
Set e1 = ie.document.getElementById("overview")
Set e2 = e1.getElementsByTagName("ul")(0)
WScript.Echo e2.innerText

Related

Getting text() element in <p> with VBA/Selenium

Using Excel 2019 VBA, I am trying to get data from a paragraph on a web page with this structure.
<p>
<strong>Release Date:</strong>
" May 30th 2022"
<br>
<strong>From:</strong>
<a href=URL>Title</a>
<br>
<strong>Performers:</strong>
<a href=URL1>Name1</a>,
<a href=URL2>Name2</a>,
<a href=URL3>Name3</a>
</p>
This is the xpath for the paragraph.
/html/body/div[11]/div/div/div[1]/div[1]/div/div/p[1]
To get the individual elements ("Release Date", "From" and "Performers"), I am having to parse the entire paragraph with "Instr"s or regular expressions.
Is there a way to directly reference these elements with XPath?
For example, the "Release Date" Xpath is:
/html/body/div[11]/div/div/div[1]/div[1]/div/div/p[1]/text()[1]
I have tried to get this directly with the following but none of them work.
webdriver.FindElementsByXPath("//div[11]/div/div/div[1]/div[1]/div/div/p[1]/text()")(1) - Invalid Selector
webdriver.FindElementsByXPath("//div[11]/div/div/div[1]/div[1]/div/div/p[1]").Attribute("text")(1) - returns nothing
webdriver.FindElementsByXPath("//div[11]/div/div/div[1]/div[1]/div/div/p[1]")(1).Attribute("text") - returns nothing
webdriver.FindElementsByXPath("//div[11]/div/div/div[1]/div[1]/div/div/p[1]").text(1) - invalid procedure call
webdriver.FindElementsByXPath("//div[11]/div/div/div[1]/div[1]/div/div/p[1]")(1).text - returns entire paragraph
Any advice would be greatly appreciated.

Generate Seletor from source code, for scrapy

I am trying to create a CSS selector from the source code of a dynamic web page. I have tried with no results with:
response.css('seller-info#region *::text').get()
response.css('seller-info > region *::text').get()
response.css('.seller-info#region ::text').get()
response.css('seller-info#region ::text').get()
response.css('seller-info > region ::text').get()
response.css('seller-info:contains("to extract")::text').get()
response.css('.seller-info:contains("to extract")::text').get()
response.css('.seller-info:contains("to extract") *::text').get()
response.css('seller-info:contains("to extract") *::text').get()
Response of each: "None"
I need the text: "to extract"
*The region name is repeated in other code trees
Source code
<seller-info
username='glorious'
ispro='true'
region="to extract"
phoneurl='/pg/0.gif"'
storeurl=""
seniority=''
category="1220"
phonevisible='true'
>
<div slot="avatar">
<div class="seller-info__header--icon-container">
<i class="icon-yapo icon-briefcase "></i>
</div>
</div>
</seller-info>```
Data from your source code that you are trying to extract - this is a tag attribute value (not tag text):
region = response.css("seller-info[region]::attr(region)").get()
or:
region = response.css("seller-info::attr(region)").get()
Selectors like tagname::text aimed to extract text between opening and closing tags like <tagname> text to extract </tagname>
Your <seller-info> tag - is self-closing tag (like img tag). It store data inside its attributes.

How to click on Web check box using Excel VBA?

How do I check the table checkbox?
I tried clicking.
ie.Document.getElementsByClassName("x-grid3-hd-checker").Checked = True
<div class="x-grid3-hd-inner x-grid3-hd-checker x-grid3-hd-checker-on" unselectable="on" style="">
<a class="x-grid3-hd-btn" href="#"></a>
<div class="x-grid3-hd-checker"> </div>
<img class="x-grid3-sort-icon" src="/javascript/extjs/resources/images/default/s.gif">
</div>
I can't see a checkbox in the HTML code. But you use getElementsByClassName() in a wrong way for your case. getElementsByClassName() generates a node collection. If you need a specific node, you must get it by it's index in the node collection. First element has index 0.
Please note that the div tag with the CSS class class="x-grid3-hd-inner x-grid3-hd-checker x-grid3-hd-checker-on " is also included in the Node Collection, because a part of the class identifier is identical to "x-grid3-hd-checker ". [Edit: I'm not realy sure if the part must maybe stand at the begin of the identifier]
If you want to check this:
<div class="x-grid3-hd-checker"> </div>
Your code needs the second index of the node collection:
ie.Document.getElementsByClassName("x-grid3-hd-checker")(1).Checked = True
But if there are more tags with the class name "x-grid3-hd-checker" the above line don't work. I can't say anymore until you don't post more HTML and VBA code. The best would be a link to the site.

Search entire text for images

I have a problem with a project.
I need to search a string for images.
I want to get the source of the image and modify the html form of the img tag.
For example the image form is:
and I want to change it to:
<div class="col-md-3">
<hr class="visible-sm visible-xs tall" />
<a class="img-thumbnail lightbox pull-left" href="upload/uploader/up_164.jpg" data-plugin-options='{"type":"image"}' title="Image title">
<img class="img-responsive" width="215" src="upload/uploader/up_164.jpg"><span class="zoom"><i class="fa fa-search"></i>
</span></a>
I have done some part of this.
I can find the image, change the form of the html but cannot loop this for all images found in the string.
My code goes like
Using the following function I get the string between two strings
// Get substring between
function GetBetween($var1="",$var2="",$pool){
$temp1 = strpos($pool,$var1)+strlen($var1);
$result = substr($pool,$temp1,strlen($pool));
$dd=strpos($result,$var2);
if($dd == 0){
$dd = strlen($result);
}
return substr($result,0,$dd);
}
And then I get the image tag from the string
$imageFile = GetBetween("img","/>",$newText);
The next was to filter the source of the image:
$imageSource = GetBetween('src="','\"',$imageFile);
And for the last part I call str_replace to do the job:
$newText = str_replace('oldform', 'newform', $newText);
The problem is in case there are more tha one images, I cannot loop this process.
Thank you in advance.
The best, simple and safe way to read an xml file is to use an xml parser.
And, I think you will gain a lot of time.

How I can access elements via a non-standard html property?

I'm try to implement test automation with watir-webdriver. By the way I am a freshman with watir-webdriver, ruby and co.
All our HTML-entities have a unique HTML-property named "wicketpath". It is possible to access the element with "name", "id" a.s.o, but not with the property "wicketpath". So I tried it with XPATH but I have no success.
Can anybody help me with a codesnippet how I can access the element via the propertie "wicketpath"?
Thanks in advance.
R.
You should be able to use xpath.
For example, consider the following HTML
<ul class="ui-autocomplete" role="listbox">
<li class="ui-menu-item" role="menuitem" wicketpath="false">Value 1</li>
<li class="ui-menu-item" role="menuitem" wicketpath="false">Value 2</li>
<li class="ui-menu-item" role="menuitem" wicketpath="true">Value 3</li>
</ul>
The following xpath will give the text of the li that has wicketpath = true:
puts browser.li(:xpath, "//li[#wicketpath='true']").text
#=>Value 3
Update - Alternative solution - Adding To Locators:
If you use a lot of wicketpath, you could add it to the locators.
After you require watir-webdriver, add this:
# This allows using :wicketpath in locators
Watir::HTMLElement.attributes << :wicketpath
# This allows accessing the wicketpath attribute
class Watir::Element
attribute(String, :wicketpath, 'wicketpath')
end
This will let you use 'wicketpath' as a locator:
p browser.li(:wicketpath, 'true').text
#=> "Value 3"
p browser.li(:text, 'Value 3').wicketpath
#=> true
Try this
puts browser.li(:css, ".ui-autocomplete > .ui-menu-item[wicketpath='true']").text
Please Let me know is the above scripting is working or not.

Resources