Regex for href value in Groovy

Regex for href value in Groovy - groovy

What's the best way in Groovy to capture a link's href value with regex?
example: some link
I want to capture just what's in bold.

It depends on what you're picking the anchor tag out of, but if you have that isolated to what you have shown, the following should work:
def link= """some "link"""
def url = (x =~ /\"(.*?)\"/)[0][1]

Related

How to use re.sub in Python

Please help me replace a particular string with re.sub()
'<a href="/abc-10063/" target="_blank">'
needs to be
'<a href="./abc-10063.html" target="_blank">'
Wrote a script below
import re
test = '<a href="/abcd-10063/" target="_blank">'
print(re.sub(r'/abcd-[0-9]','./abcd-[0-9].html', test))
which returns
<a href="./abcd-[0-9].html0063/" target="_blank">

First of all your regular expression is incorrect. It will match /abcd-1 only.
You need to change your regex to /abcd-[0-9]+. Adding a + will match all the numbers. Also to match the trailing /, you need to add that in your regex.
So final regex will be /abcd-[0-9]+/.
Now to reuse the matched content in substitution you need to create groups in your regex. Since we want to reuse just the /abcd-[0-9]+and not the /. Put /abcd-[0-9]+ in group, like this: (/abcd-[0-9]+)/.
Now we can use \1 to use matched group in the substitution, where 1 is the group number. If you wanted to use second group, you will use \2.
So your final code will be:
import re
test = '<a href="/abcd-10063/" target="_blank">'
print(re.sub(r'(/abcd-[0-9]+)/', r'.\1.html', test))

Can't acess dynamic element on webpage

I can't acess a textbox on a webpage box , it's a dynamic element. I've tried to filter it by many attributes on the xpath but it seems that the number that changes on the id and name is the only unique part of the element's xpath. All the filters I try show at least 3 element. I've been trying for 2 days, really need some help here.
from selenium import webdriver
def click_btn(submit_xpath): #clicks on button
submit_box = driver.find_element_by_xpath(submit_xpath)
submit_box.click()
driver.implicitly_wait(7)
return
#sends text to text box
def send_text_to_box(box_xpath, text):
box = driver.find_element_by_xpath(box_xpath)
box.send_keys(text)
driver.implicitly_wait(3)
return
descr = 'Can't send this text'
send_text_to_box('//*[#id="textfield-1285-inputEl"]', descr)' #the number
#here is the changeable part on the xpath
:
edit: it worked now with the following xpath //input[contains(#id, 'textfield') and contains(#aria-readonly, 'false') and contains (#class, 'x-form-invalid-field-default')] . Hopefully I found something specific on this element:

You can use partial string to find the element instead of an exact match. That is, in place of
send_text_to_box('//*[#id="textfield-1285-inputEl"]', descr)' please try send_text_to_box('//*[contains(#id,"inputEl")]', descr)'
In case if there are multiple elements that have string 'inputE1' in id, you should look for something else that remains constant(some other property may be). Else, try finding some other element and then traverse to the required input.

How to get content from div class using Selenium - Python?

I want to extract the contents on the left side using the div class <table__9d458b97>
I don't want to use xpath to do the job because some contents don't sit in the same position.
driver2 = webdriver.Chrome(r'XXXX\chromedriver.exe')
driver2.get("https://www.bloomberg.com/profiles/people/15103277-mark-elliot-zuckerberg")
Here is my code using the xpath (how can I use the class?):
boardmembership_table=driver2.find_elements_by_xpath('//*[#id="root"]/div/section/div[5]')[0]
boardmembership_table.text
Thanks for the help!

You could make use of css_selector
Your can use the following code
from selenium.webdriver import Chrome
driver2 = Chrome()
driver2.get("https://www.bloomberg.com/profiles/people/15103277-mark-elliot-zuckerberg")
els = driver2.find_elements_by_css_selector('.table__9d458b97[role="table"]')
for el in els:
print(el.text)
driver2.close()
Note that you are using find_elements_by_css_selector which will return a list of elements or an empty list if None found.

You can use the below xpath, if you want to access Board Membership table.
//*[#id="root"]/div/section/div[h2[.='Board Memberships']]

Also you can use following sibling to get the div next to the title 'Board Membership'
like this
'//h2[contains(.,"Board Membership")]//following-sibling::div'

Using BeautifuSoup to separate the hrefs and the anchor text

I'm using Python3 with Beautiful Soup 4 to separate hrefs from the text itself. Like:
LINK
I wanna (1) extract and print yoursite.com, and then get LINK.
If anyone could help me that would be great!

Locate the a element by, say, class name; use dictionary-like access to attributes; .get_text() to get the link text:
a = soup.find("a", class_="sample-class") # or soup.select_one("a.sample-class")
print(a["href"])
print(a.get_text())

A tag may have any number of attributes. The tag
has an attribute “class” whose value is “boldest”. You can access a
tag’s attributes by treating the tag like a dictionary:
> tag['class']
> # u'boldest'
A string corresponds to a bit of text within a tag. Beautiful Soup
uses the NavigableString class to contain these bits of text:
tag.string
# u'Extremely bold'
you can find this in Beautiful Soup Documentation

how to get the data in url using groovy code?

i want to get defect id from the url using groovy code (To build custom code in tasktop).
for example: I will have an dynamic url generated say www.xyz.com/abc/defect_123/ now I want to retrieve that letter that always starts from 17th position. and return the string
Please help..
Thanks in advance

Here are two possibilities. Please note that the "substring" option is very strict and will always start from the 16th position (what happens if the domain changes from www.xyz.com to www.xyzw.com?)
def str = 'www.xyz.com/abc/defect_123/';
def pieces = str.tokenize('/'); // prints defect_123
def from16 = str.substring(16); // prints defect_123/
println from16;
println pieces.last();

You should define this as dynamic url in UrlMappings.groovy file:
"www.xyz.com/abc/$defect_id" (controller: 'YourController', action: 'method_name')
and you can access the defect_id variable from YourController using params.defect_id

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Regex for href value in Groovy - groovy

What's the best way in Groovy to capture a link's href value with regex? example: some link I want to capture just what's in bold.

It depends on what you're picking the anchor tag out of, but if you have that isolated to what you have shown, the following should work: def link= """some "link""" def url = (x =~ /\"(.*?)\"/)[0][1]

Related

How to use re.sub in Python

Can't acess dynamic element on webpage

How to get content from div class using Selenium - Python?

Using BeautifuSoup to separate the hrefs and the anchor text

how to get the data in url using groovy code?

Categories

Resources