This is the website of skyscanner. After selecting flight I want to click on every 'Select button'(for $687) and then go to next page. On next page I want to click down arrow for outbound flight. Then I again I want to click select button ($715) and the again outbound flight arrow. So it goes in loop for all possible searches on all the pages possible.
This is how Downward arrow on next page is
Here is the code I have written so far
:
driver = webdriver.Chrome( )
driver.get('https://www.skyscanner.com/transport/flights/nyca/lax/170717/170718/airfares-from-new-york-to-los-angeles-international-in-july-2017.html?adults=1&children=0&adultsv2=1&childrenv2=&infants=0&cabinclass=economy&rtn=1&preferdirects=false&outboundaltsenabled=false&inboundaltsenabled=false&ref=home#results')
while True:
items=driver.find_element_by_class_name("fss-bpk-button expand-cba select-action")
for i, item in enumerate(items):
driver.find_element_by_class_name("fss-bpk-button expand-cba select-action").click()
for j, item2 in item
driver.find_element_by_class_name("leg-summary-container clearfix").click()
I also tried following but none worked:
links = [link.get_attribute('href') for link in driver.find_element_by_class_name("fss-bpk-button expand-cba select-action")]
for link in links:
driver.get(link)
Your second code should work after some changes:
links = [link.get_attribute('href') for link in driver.find_elements_by_css_selector("a.fss-bpk-button.expand-cba.select-action")]
Note that
To get list of elements you should use find_elements_...() instead of find_element_...()
Compound class names not allowed, so you might use search by CSS selector or XPath instead of search by class name if you want to use multiple class names to locate required elements
Related
I could scroll down or up a whole web page but I am having trouble in scrolling a particular div element within the webpage
The thing is when I open a webpage like whatsapp web and within a particular chat containing the messages(which is particular div element),I want to scrape all the messages from the beginning of the chat ,but I could only scrape the messages which is in the view (the last few messages),So I want to scroll that particular div element to the top of the chat to scrape all the messages.
Can someone help me with this thing in PYTHON.
Thank you.
Yes, there are multiple ways using which you could scroll to particular element
By using the move_to_element()
element = driver.find_element_by_xpath("_element_")
action = ActionChains(driver)
action.move_to_element(element).perform()
By scrollIntoView()
element = driver.find_element_by_xpath("_element_")
driver.execute_script("arguments[0].scrollIntoView(true);", element)
For reference check here
It is possible.
Let's say the element you need to scroll inside is div_el, it's locator is xpath_locator the code will be like this:
div_locator = "xpath_locator"
div_el = driver.find_element_by_xpath(div_locator)
driver.execute_script("arguments[0].scrollTop = arguments[0].scrollHeight", div_el)
See more here
I am using headless Firefox on Selenium and XPath Helper to identify insanely long paths to elements.
When the page initially loads, I can use XPath Helper to find the xpath of any element of interest, and selenium can find the element when given the xpath.
However, several buttons that I need to interact with on the page open menus when pressed that are either small or take up the whole "screen". No matter their size, these containers are overlaid on the original page, and although I can find their xpaths using XPath Helper, when I try to use those xpaths to find the elements using selenium, they can't be found.
I've checked, and there's no iframe funny business happening. I'm a bit stumped as to what could be happening. My guess is that the page's source code is being dynamically changed after I press the buttons that open the menu containers and when I call find_element_by_xpath on new elements in the containers, the original source is being searched, instead of the new source. Could that be it?
Any other ideas?
As a workaround, I can get around this issue by sending keystrokes to the body of the page, but I feel this solution is rather brittle and likely to fail. Would be a much more robust solution to actually specify all elements.
EDIT:
With selenium I can find the export button, but not the menu it opens.
Here is the code for the export button itself:
The element of interest for me is "Customize Export" which I have not been able to find using selenium. Here is the code for this element:
Notice the very top line of this last image (cdk-overlay-container)
Now, when I refresh the page and do NOT click the export button, the cdk-overlay-container section of the code is empty:
This suggests my that my hypothesis is correct -- that when the page loads initially, the "Customize Export" button is nowhere in the source code, but appears only after "Export" is clicked, and that selenium is using the original source code only --not the dynamically generated code that appears after clicking "Export" -- to find elements
Selenium could find the dynamic content after doing
driver.execute_script("return document.body.innerHTML")
The WebDriverWait is what you need to use to wait for a certain condition of elements. Here is an example of waiting for the elements to be clickable before the click with a timeout in 5 seconds:
wait = WebDriverWait(driver, 5)
button = wait.until(EC.element_to_be_clickable((By.XPATH, 'button xpath')))
button.click()
wait.until(EC.element_to_be_clickable((By.XPATH, 'menu xpath'))).click()
identify insanely long paths
is an anti pattern. You can try to not use XPath Helper and find xpath or selector yourself.
Update:
wait = WebDriverWait(driver, 10)
export_buttons = wait.until(EC. presence_of_all_elements_located((By.XPATH, '//button[contains(#class, "mat-menu-trigger") and contains(.="Export")]')))
print("Export button count: ", len(export_buttons))
export_button = wait.until(EC.element_to_be_clickable((By.XPATH, '//button[contains(#class, "mat-menu-trigger") and contains(.="Export")]')))
export_button.click()
cus_export_buttons = wait.until(EC. presence_of_all_elements_located((By.XPATH, '//button[contains(#class, "mat-menu-item") and contains(.="Customize Export")]')))
print("Customize Export button count: ", len(cus_export_buttons))
I'm trying to run the code:
for j in range(1,13):
driver.find_element_by_xpath('//*[#id="gateway-page"]/body/table/tbody/tr[3]/td[2]/table/tbody/tr[2]/td[2]/table/tbody/tr/td/table/tbody/tr[2]/td/div/div[2]/ul/li['+str(j)+']').click()
time.sleep(3)
To click every satisfying element on this website. But it ignores some elements every time, while it worked when I tried them not in the for loop but separately. Any idea why this happened?
Seems problem is with /ul/li['+str(j)+'] you are performing the click on <li> tag while actual link reside in it. That's why sometime the actual link won't receive the click without any error as link wrapped inside <li> tag .
Try to locate actual link tag. Use below code. I have tested on my system. Hope this will help you.
driver.get('http://catalog.sps.cuny.edu/preview_program.php?catoid=2&poid=607')
driver.implicitly_wait(10)
links = driver.find_elements_by_xpath("//div//h2[contains(.,'Electives')]/..//ul/li//span/a")
for link in links:
link.click()
time.sleep(3)
After observing xpath, I observed that you are trying to click the Elective option on that website. I think you have stored text of all electives in str array and using the loop, you are trying to click on each elective.
I suggest using another approach. Store all electives in list and then iterate over the elements and click them. e.g.
elements = driver.find_elements_by_xpath('///*[#id="gateway-page"]/body/table/tbody/tr[3]/td[2]/table/tbody/tr[2]/td[2]/table/tbody/tr/td/table/tbody/tr[2]/td/div/div[2]/ul/li')
for element in elements:
element.click()
time.sleep()
Probable problems in your solution
You are storing the name of electives in the array. If there is any typo, xPath will become invalid
You are starting loop from 1 to 13 but str is 0 indexed so start the loop from 0. because in you case you will always miss the first elective
Also after each click, elective expands. So you can also think about scrolling if an element is not found
Suggestion:
Also, use relative xpaths instead of absolute. Relative xpaths are more stable.
Happy Coding~
Dash App Table Not allowing links to be clickable. They appear to be clickable links when I hover my mouse over the title but do not do anything when I click on them.
The following is a function I am using to build the the table:
def build_rows(images,titles,author,category,price,link):
"""
builds a table body. All parameters expect a list type
"""
rows = []
for x in range(len(titles)):
rows.append(html.Tr([html.Img(src=images[x], height="100px"),
html.Td(dcc.Link(titles[x], href=link[x])), # <--- This the problem
html.Td(author[x]),
html.Td(category[x]),
html.Td(price[x]),
html.Td(link[x])]))
table_body = [html.Tbody(rows)]
return table_body
Thank you for your help.
NOTE: I always upvote and select a correct answer when applicable
If you just want a regular anchor tag with an href, then you want to use html.A (ie import dash_html_components as html) rather than dcc.Link.
The dcc.Link component is for use with the dcc.Location component, and allows to to create single page applications, by hooking up a callback function to the value of the current URL, returning different layout fragments into a specific container element. If this is what you're trying to do, see the docs on how to use the Location component.
I am trying to scrape content of a page.
Let's say this is the page:
http://finance.yahoo.com/quote/AAPL/key-statistics?p=AAPL
I know I need to use Selenium to get the data I want.
I found this example from Stackoverflow that shows how to do it:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.wait import WebDriverWait
driver = webdriver.Chrome()
driver.maximize_window()
driver.get("http://finance.yahoo.com/quote/AAPL/profile?p=AAPL")
# wait for the Full Time Employees to be visible
wait = WebDriverWait(driver, 10)
employees = wait.until(EC.visibility_of_element_located((By.XPATH, "//span[. = 'Full Time Employees']/following-sibling::strong")))
print(employees.text)
driver.close()
My question is this:
In the above example to find Full Time Employees the code that has been used is:
employees = wait.until(EC.visibility_of_element_located((By.XPATH, "//span[. = 'Full Time Employees']/following-sibling::strong")))
How the author has found that s/he needs to use:
"//span[. = 'Full Time Employees']/following-sibling::strong"
To find the number of employees.
For my example page: http://finance.yahoo.com/quote/AAPL/key-statistics?p=AAPL how can I find for example Trailing P/E?
Can you please tell me the steps you took to find this? I do right click and choose Inspect, but then what shall I do?
A picture is worth of thousand words.
In web dev. tools (F12) you do the following steps:
Choose Elements tab
Press Element Selector button
With that button pressed you click an element in the main browser window.
In the DOM-elements window you right-click that highlighted element.
The context menu gets transpired and you choose Copy.
Choose Copy XPath in a sub menu. Now you have that element xpath in a console buffer.
NOTE!
The browser makes/composes an element xpath based on its own algorithm. It might not be the way you think or the way that fits to your code. So, you have to understand xpath in nature, be acquainted with it.
See what xpath the Chrome browser has issued for Trailing P/E:
//*[#id="main-0-Quote-Proxy"]/section/div[2]/section/div/section/div[2]/div[1]/div[1]/div/table/tbody/tr[3]/td[1]/span
'//h3[contains(., "Valuation Measures")]/following-sibling::div[1]//tr[3]'
Here I have the answer for all your confusions.
It will be better to look on some xpath tutorials and do practice from yourself, then you will be able to decide what you have to use .
There are so many site. You can start Here or Here
Now come to your Query -
Suppose I am using following xpath to locate the element
//h3/span[text()='Financial Highlights']/../preceding-sibling::div//tr[3]/td/span
Your requirement to find Trailing P/E in your page, definatly you will look unique xpath which won't change. If you try to find this using firepath it shows some lengthy xpath
Now you will check alternative and find another element (may be sibling, child or ancestor of your element) based on that you can to locate your element
in My case, first will find the Financial Highlights text which I will be able to find using //h3/span[text()='Financial Highlights']
Now I move its parent tag which is h3 and I will do this using /..
I have Trailing P/E element in just above the current node so move on just above node using /preceding-sibling::div
And finally find your element in that <div> like -//tr[3]/td/span
See the screens as well -
Step 1 :
Step 2 :
Step 3 :
Step 4 :