Groovy string replace add new line - groovy

Got a groovy script that is pulling some text from a soap connection and I am trying to add a bullet point before any bullet points. Here is the code I have but it does not work and it may never work, but thought I would ask.
td (it.#detail.toString().replaceAll('>', '>').replaceAll("•", "\n •"))

That should work.
ie, try:
println it.#detail.toString().replaceAll('>', '>').replaceAll("•", "\n •")
To see it working in the console output.
I guess you're viewing this in HTML with a browser?
Newlines don't appear in HTML normally, so you'd need to wrap the text in a <pre> tag.
Assuming this is with StreamingMarkupBuilder or similar, try:
td {
pre( it.#detail.toString().replaceAll('>', '>').replaceAll("•", "\n •") )
}

Related

How to get only rely content (not included quote content) using Selenium

I want to know what to get some content not include quote content.
https://forumd.hkgolden.com/view.aspx?type=BW&message=7219211
The following picture is the example
I want to get only "唔提冇咩人記得", but I use the following code will get both content.
content = driver_blank.find_element_by_xpath('/html/body/form/div[5]/div/div/div[2]/div[1]/div[5]/table[24]/tbody/tr/td/table/tbody/tr/td[2]/table/tbody/tr[1]/td/div')
print(content.text)
The following code is what I want to capture content:
<div class="ContentGrid">
<blockquote><div style="color: #0000A0;"><blockquote><div style="color: #0000A0;">腦魔都俾你地bam咗啦<img data-icons=":~(" src="/faces/cry.gif" alt=":~("></div></blockquote><br>珠。。。。。</div></blockquote><br>唔提冇咩人記得
<br><br><br>
</div>
Can anyone help me? Thanks~~~
Can not(starts-with 's method be solved?
Use below line of code to extract only text node content
element = driver.find_element_by_css_selector('div.ContentGrid')
text = driver.execute_script("return arguments[0].childNodes[3].textContent", element);
print(text)
Selenium won't allow you to directly locate an element using text node. Though you can use some JavaScript code to make it happen.
Code Explanation:
arguments[0].childNodes[3] indicates 3rd child element of your context node which is div.ContentGrid. Please note first 2 child element of the context node are blank (tried with the HTML code shared by you) that's why index 3 used.

Selenium - find element by link text

I am using selenium webdriver on Chrome; python 3 on Windows 10.
I want to scrape some reports from a database. I search with a company ID and a year, the results are a list of links formatted in a specific way: something like year_companyID_seeminglyRandomDateAndDoctype.extension, e.g. 2018_2330_20020713F04.pdf. I want to get all pdfs of a certain doctype. I can grab all links for a certain doctype using webdriver.find_elements_by_partial_link_text('F04') or all of that extension with '.pdf' instead of 'F04', but I cannot figure out a way to check for both at once. First I tried something like
links = webdriver.find_elements_by_partial_link_text('F04')
for l in links:
if l.find('.pdf') == -1:
continue
else:
#do some stuff
But unfortunately, the links are WebElements:
print(links[0])
>> <selenium.webdriver.remote.webelement.WebElement (session="78494f3527260607202e68f6d93668fe", element="0.8703868381417961-1")>
print(links[0].get_attribute('href'))
>> javascript:readfile2("F","2330","2015_2330_20160607F04.pdf")
so the conditional in the for loop above fails.
I see that I could probably access the necessary information in whatever that object is, but I would prefer to do the checks first when getting the links. Is there any way to check multiple conditions in the webdriver.find_elements_by_* methods?
You can try to use below code
links = [link.get_attribute('href') for link in webdriver.find_elements_by_partial_link_text('F04') if link.get_attribute('href').endswith('.pdf")')]
You can also try XPath as below
links = webdriver.find_elements_by_xpath('//a[contains(., "F04") and contains(#href, ".pdf")]')
Andersson's approach seems to work with a slight correction:
if link.get_attribute('href').endswith('.pdf')] rather than if link.get_attribute('href').endswith('.pdf")')], i.e. delete the ").

Why is scrapy printing \t\n\n where I expect there to be text?

I am a beginner with scrapy, but learning. I have been parsing this page.
and am attempting to scrape the address off of the page.
I have done this in the scrapy shell, so I start by:
scrapy shell https://www.marksandspencer.com/MSStoreDetailsView?storeId=10151&langId=-24&SAPStoreId=6952
Which works fine. Then I attempt to parse the address with:
response.xpath('//li[#class="address"]/text()').extract()
But my output is the following:
['\n\t\t', '\n\t\t\n\t\t']
Why am I not able to see the address as it appears on the page:
BELFAST ABBEY CENTRE, 1 Old Glenmount Road Newtonabbey, Newton Abbey, BT36 7DN
How would I go about getting this address out?
I appreciate anyone that takes the time to reply.
There is a couple a errors on how you are approaching this issue:
When using scrapy shell, you have to surround the url with "", because the terminal could interpret it as several processes because of the character & inside the url:
scrapy shell "https://www.marksandspencer.com/MSStoreDetailsView?storeId=10151&langId=-24&SAPStoreId=6952"
Your xpath is not correct because with /text() you are getting the text of that particular tag, and that li doesn't actually contain the information you want. The tag that includes that text is on the children of that li so you could use:
response.xpath('//li[#class="address"]//text()').extract()
or
response.xpath('//li[#class="address"]/p/text()').extract()

input is self closing and should not have content

When I load my Express webpage I'm getting the following error:
Express
500 Error: /app/views/index.jade:114 112| td 2 113| td 4 years > 114| input is self closing and should not have content.
112| td 2
113| td 4 years
> 114|
input is self closing and should not have content.
at Object.Compiler.visitTag (/app/node_modules/jade/lib/compiler.js:434:15)
at Object.Compiler.visitNode (/app/node_modules/jade/lib/compiler.js:210:37)
at Object.Compiler.visit (/app/node_modules/jade/lib/compiler.js:197:10)
at Object.Compiler.visitBlock (/app/node_modules/jade/lib/compiler.js:278:12)
at Object.Compiler.visitNode (/app/node_modules/jade/lib/compiler.js:210:37)
at Object.Compiler.visit (/app/node_modules/jade/lib/compiler.js:197:10)
at Object.Compiler.visitTag (/app/node_modules/jade/lib/compiler.js:443:12)
at Object.Compiler.visitNode (/app/node_modules/jade/lib/compiler.js:210:37)
at Object.Compiler.visit (/app/node_modules/jade/lib/compiler.js:197:10)
at Object.Compiler.visitBlock (/app/node_modules/jade/lib/compiler.js:278:12)
This doesn't show up when run locally with foreman start, only when its on the server.
Looks like you've got content inside your input tags. In HTML, input tags can't have content, therefore you should delete any whitespace or characters following input tags in your jade file.
Ex:
input(type="text",name="whatever") something
should be input(type="text",name="whatever",value="something")
Sometimes the answer is a little tricker than just some content after the tag on the same line (such as a few spaces). Watch out for the line following the input tag being indented by mistake!
After running into the same error I was checking the line of jade template marked in error report. It was actually containing input definition, but that definition was fine for there wasn't any whitespace and printable content succeeding it. The following line was even less indented (two levels up for starting another row of form) and thus there was definitely no content to input element defined in marked line.
However there was another input succeeding this marked one a few lines down the template. And that input element indeed was having some subordinated content. Removing content there was fixing somewhat false positive "here".
I had a similar problem I solved with this:
div
+inputWithTextContent('whatever', 'something')
mixin inputWithTextContent(name, message)
!='<input type="text" name="'+name+'">'+message+'</input>'
Another solution is to create a label after the input and then display it inline. This will sit the label along side the control. This is how I solved the issue with a checkbox input in jade.
JADE (Bootstrap):
.checkbox
label
input(type='checkbox', value='remember-me',)
label.inlineLabel Remember me
SASS:
label.inlineLabel
display: inline

How do I get the HTML in an element using Capybara?

I’m writing a cucumber test where I want to get the HTML in an element.
For example:
within 'table' do
# this works
find('//tr[2]//td[7]').text.should == "these are the comments"
# I want something like this (there is no "html" method)
find('//tr[2]//td[7]').html.should == "these are the <b>comments</b>"
end
Anyone know how to do this?
You can call HTML DOM innerHTML Property:
find('//tr[2]//td[7]')['innerHTML']
Should work for any browser or driver.
You can check all available properties on w3schools
This post is old, but I think I found a way if you still need this.
To access the Nokogiri node from the Capybara element (using Capybara 1.0.0beta1, Nokogiri 1.4.4) try this:
elem = find('//tr[2]//td[10]')
node = elem.native
#This will give you a Nokogiri XML element
node.children[1].attributes["href"].value.should == "these are the <b>comments</b>"
The last part may vary for you, but you should be able to find the HTML somewhere in that node variable
In my environment, find returns a Capybara::Element - that responds to the :native method as Eric Hu mentioned above, which returns a Selenium::WebDriver::Element (for me). Then :text gets the contents, so it could be as simple as:
results = find(:xpath, "//td[#id='#{cell_id}']")
contents = results.native.text
if you're looking for the contents of a table cell. There's no content, inner_html, inner_text, or node methods on a Capybara::Element. Assuming people aren't just making things up, perhaps you get something different back from find depending on what else you have loaded with Capybara.
Looks like you can do (node).native.inner_html to get the HTML content, for example with HTML like this:
<div><strong>Some</strong> HTML</div>
You could do the following:
find('div').native.inner_html
=> '<strong>Some</strong> HTML'
I ran into the same issue as Cyril Duchon-Doris, and per https://github.com/teampoltergeist/poltergeist/issues/629 the way to access the HTML of an Capybara::Poltergeist::Node is via the outerHTML property, e.g.:
find('//tr[2]//td[7]')['outerHTML']
Most of the other answers work only in Racktest (as they use Racktest-specific features).
If your driver supports javascript evaluation (like Selenium) you can use innerHTML :
html = page.evaluate_script("document.getElementById('my_id').innerHTML")
If you're using the Poltergeist driver, these methods will allow you to inspect what matches:
http://www.rubydoc.info/gems/poltergeist/1.5.1/Capybara/Poltergeist/Node
For example:
page.find('[name="form-field"]').native.value == 'something'
try calling find('//tr[2]//td[10]').node on it to get at the actual nokogiri object
Well, Capybara uses Nokogiri to parse, so this page might be appropriate:
http://nokogiri.org/Nokogiri/XML/Node.html
I believe content is the method you are looking for.
You could also switch to capybara-ui and do the following:
# define your widget, in this case in your role
class User < Capybara::UI::Role
widget :seventh_cell, [:xpath, '//tr[2]//td[7]']
end
# then in your tests
role = User.new
expect(role.widget(:seventh_cell).html).to eq(<h1>My html</h1>)

Resources