How to recursively scrape table from pages using python selenium

How to recursively scrape table from pages using python selenium - python-3.x

I am new to python and I'm trying to scrape a table and from a website that has multiple pages. How should I try to make my code use .click() and where should the code be placed to get a dynamic scrape of the table.
The website I'm trying is https://free-proxy-list.net/ and I'm able to get the table from the first page. I'm trying to get all the pages and put them into a pandas data frame. I have already put the info from the table into a dictionary and tries to put the dict inside a dataframe. However just the first page is able to be inserted into the dataframe. I need all the data from the other pages as well

Initialized empty list.
Use while loop with condition to check max_page count and iterate the loop.
Append the list in each page iteration.
added the list into pandas Dataframe.
Import the entire data into CSV file.
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
import pandas as pd
driver=webdriver.Chrome()
driver.get('https://free-proxy-list.net/')
page=1
max_page=15
IP=[]
Port=[]
Code=[]
Country=[]
Anonymity=[]
Google=[]
Https=[]
LastCheck=[]
while page<=max_page:
rows= WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//table[#id='proxylisttable']/tbody//tr")))
for row in rows:
IP.append(row.find_element_by_xpath('./td[1]').text)
Port.append(row.find_element_by_xpath('./td[2]').text)
Code.append(row.find_element_by_xpath('./td[3]').text)
Country.append(row.find_element_by_xpath('./td[4]').get_attribute('textContent'))
Anonymity.append(row.find_element_by_xpath('./td[5]').text)
Google.append(row.find_element_by_xpath('./td[6]').get_attribute('textContent'))
Https.append(row.find_element_by_xpath('./td[7]').text)
LastCheck.append(row.find_element_by_xpath('./td[8]').get_attribute('textContent'))
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//a[#aria-controls='proxylisttable' and text()='Next']"))).click()
page=page+1
print('navigate to page: ' + str(page))
driver.close()
df=pd.DataFrame({"IP":IP,"Port":Port,"Code":Code,"Country":Country,"Anonymity":Anonymity,"Google":Google,"Https":Https,"Last_Checked":LastCheck})
print(df)
df.to_csv('output_IP.csv',index=False)

Related

ActionChains is not updating the data within the table

I am trying to scrape data from this website using Selenium.
There are three features in data, "Value", "Net change" and "percent change", including values for net and percentage changes for 1, 3, 6, and 12 months. I want to fetch 1 month's net change and percent change. For that, I need to click on the check boxes and click on the update button.
Now, I performed these actions using selenium's find element by XPath method but for percent change, I needed to use the ActionChains command, as I was getting "Element not clickable error".
When I execute the code, all three features should occur in the downloaded csv. But that's not happening. I am just able to fetch "Value" and "1 Month Net change". If anyone knows, may I know, why the is not getting updated and or how to fix it? Thanks
My code:
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.action_chains import ActionChains
from selenium import webdriver
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--headless')
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--disable-dev-shm-usage')
driver = webdriver.Chrome('chromedriver',chrome_options=chrome_options)
driver.get("https://beta.bls.gov/dataViewer/view/timeseries/CUUR0000SA0")
soup = BeautifulSoup(driver.page_source, "html.parser", from_encoding='utf-8')
driver.find_element(By.XPATH, '/html/body/div[2]/div/div/div[4]/div/div[1]/form/div[2]/fieldset/div[1]/table/tbody/tr[1]/td[1]/label/input').click() //1 month net change
element = WebDriverWait(driver, 60).until(EC.element_to_be_clickable((By.XPATH, '// [#id="percent_monthly_changes_div"]/table/tbody/tr[1]/td[1]/label/input')))
ActionChains(driver).move_to_element(element).click().perform() //1 month percent change
driver.find_element(By.XPATH, '/html/body/div[2]/div/div/div[4]/div/div[1]/form/div[4]/input').click() //update button
driver.find_element(By.XPATH, '//*[#id="csvclickCU"]').click() //download csv button

The website is showing N/A in the column of 1 Month Net change.
if you still not getting 1 month % change value you can do
driver.execute_script('document.querySelector("#percent_monthly_changes_div > table > tbody > tr:nth-child(1) > td:nth-child(1) > label > input").click()')
instead of:
element = WebDriverWait(driver, 60).until(EC.element_to_be_clickable((By.XPATH, '// [#id="percent_monthly_changes_div"]/table/tbody/tr[1]/td[1]/label/input')))
ActionChains(driver).move_to_element(element).click().perform()
this might not be the optimal solution, but it works fine.
and 1 month net change value is not given from the website itself.

To click on the elements with text as 1-Month Net Change and 1-Month % Change using ActionChains will be an overhead and you can avoid it easily.
Ideally, you need to induce WebDriverWait for the element_to_be_clickable() and you can use either of the following locator strategies:
Using CSS_SELECTOR:
driver.get("https://beta.bls.gov/dataViewer/view/timeseries/CUUR0000SA0")
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "input[value='1N']"))).click()
driver.find_element(By.CSS_SELECTOR, "input[value='1P']").click()
Using XPATH:
driver.get("https://beta.bls.gov/dataViewer/view/timeseries/CUUR0000SA0")
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//input[#value='1N']"))).click()
driver.find_element(By.XPATH, "//input[#value='1P']").click()
Note: You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
Browser Snapshot:

Can't locate elements from a website using selenium

Trying to scrape data from a business directory but I keep getting the data was not found
name =
driver.find_elements_by_xpath('/html/body/div[3]/div/div/div[1]/div/div[1]/div/div[1]/h4')[0].text
# Results in: IndexError: list index out of range
So I tried to use WebDriverWait to make the code wait for the data to load but it doesn't find the elements, even though the data get loaded to the website.
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import pandas as pd
from bs4 import BeautifulSoup
import requests
import time
url='https://www.dmcc.ae/business-search?directory=1&submissionGuid=2c8df029-a92e-4b5d-a014-7ef9948e664b'
driver = webdriver.Firefox()
driver.get(url)
wait=WebDriverWait(driver,50)
wait.until(EC.visibility_of_element_located((By.CLASS_NAME,'searched-list ng-scope')))
name = driver.find_elements_by_xpath('/html/body/div[3]/div/div/div[1]/div/div[1]/div/div[1]/h4')[0].text
print(name)

driver.switch_to.frame(driver.find_element_by_css_selector("#pym-0 iframe"))
wait = WebDriverWait(driver, 10)
wait.until(EC.presence_of_element_located(
(By.CSS_SELECTOR, '.searched-list.ng-scope')))
name = driver.find_elements_by_xpath(
'/html/body/div[3]/div/div/div[1]/div/div[1]/div/div[1]/h4')[0].text
its inside iframe , to interact with iframe element switch to it first. Here iframe doesn't have any unique identified . So we used the parent div which had unique id as reference from that we found the child iframe
now if you want to interact outside iframe use;
driver.switch_to.default_content()

<iframe src="https://dmcc.secure.force.com/Business_directory_Page?initialWidth=987&childId=pym-0&parentTitle=List%20of%20Companies%20Registered%20in%20Dubai%2C%20DMCC%20Free%20Zone&parentUrl=https%3A%2F%2Fwww.dmcc.ae%2Fbusiness-search%3Fdirectory%3D1%26submissionGuid%3D2c8df029-a92e-4b5d-a014-7ef9948e664b" width="100%" scrolling="no" marginheight="0" frameborder="0" height="3657px"></iframe>
Switch to iframe and handle the accept button.
driver.get('https://www.dmcc.ae/business-search?directory=1&submissionGuid=2c8df029-a92e-4b5d-a014-7ef9948e664b')
wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, "#hs-eu-confirmation-button"))).click()
wait.until(EC.frame_to_be_available_and_switch_to_it((By.CSS_SELECTOR,'#pym-0 > iframe')))
wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR,'.searched-list.ng-scope')))
name = driver.find_elements_by_xpath('//*[#id="directory_list"]/div/div/div/div[1]/h4')[0]
print(name.text))
Outputs
1 BOXOFFICE DMCC

Extracting data tables from HTML source after scraping using Selenium & Python

I am trying to scrape data from this link. I've researched on question that are asked and I've successfully did some scraping. But I've few issues in results that are generated. Following is the piece of code that I've used to scrape.
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait as wait
from selenium import webdriver
from bs4 import BeautifulSoup
import pandas as pd
from selenium import webdriver
from datetime import datetime
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import Select
from selenium.webdriver.common.keys import Keys
options = Options()
options.add_argument('--headless')
options.add_argument('--disable-gpu')
driver = webdriver.Chrome(chrome_options=options)
driver.get('http://www.scstrade.com/MarketStatistics/MS_HistoricalIndices.aspx')
inputElement_index = driver.find_element_by_id("txtSearch")
inputElement_index.send_keys('KSE ALL')
inputElement_date = driver.find_element_by_id("date1")
inputElement_date.send_keys('03/12/2019')
inputElement_date_end = driver.find_element_by_id("date2")
inputElement_date_end.send_keys('03/12/2020')
inputElement_viewprice = driver.find_element_by_id("btn1")
inputElement_viewprice.send_keys(Keys.ENTER)
tabel = driver.find_elements_by_css_selector('table > tbody')[0]
Aim is to extract data from the link with dates between 12th Mar 2020 to 03rd Mar 2020, with indices KSE ALL. Now the above code works but in the last line of the code table object is blank when the code runs for the first time if I re-run this last line it gives the table in string format that is on the 1st page. I want to know why don't I get the table when the code runs for the first time? How can I get a pandas DataFrame for the table object which is in string?
I tried the following code to get 1st page data into pandas DataFrame. But the table object turns out to be 'NoneType'.
htmlSource = driver.page_source
soup = BeautifulSoup(htmlSource, 'html.parser')
table = soup.find('table', class_='tbody')
Second, I want to extract entire data, not just the data on first page and number of pages would be dynamic they would change as date range changes. Now to move to next page I tried the following piece of code:
driver.find_element_by_id("next_pager").click()
I got the following error.
selenium.common.exceptions.ElementClickInterceptedException: Message: element click intercepted: Element <td id="next_pager" class="ui-pg-button" title="Next Page">...</td> is not clickable at point (790, 95). Other element would receive the click: <div class="loading row" id="load_list" style="display: block;">...</div>
I tried to look up on how can this issue be resolved wrote the code below to add some waiting time. But got the same error as above.
wait(driver, 10).until(EC.element_to_be_clickable((By.CSS_SELECTOR, '[title="Next Page"]'))).click()
How can I move to subsequent pages and extract data from all pages (no. of pages would be dynamic as per the date range set) and append it to data extracted from the previous page?

I would rather prefer using the api approach in this case, it would be faster and easy to get the data. And also you don't have to load number of pages in the table.
Below is the API code to get the response code (just changed the date range to make sure you will see multiple pages data in one request call)
import requests
url = "http://www.scstrade.com/MarketStatistics/MS_HistoricalIndices.aspx/chart"
payload = "{\"par\": \"KSE All\", \"date1\": \"01/03/2020\",\"date2\": \"03/12/2020\"}"
headers = {
'Content-Type': 'application/json'
}
response = requests.request("POST", url, headers=headers, data = payload)
print(response.text.encode('utf8'))
The only thing is you have to change the date format in the response.
result:
b'{"d":[{"kse_index_id":13362,"kse_index_type_id":1,"kse_index_date":"\\/Date(1577991600000)\\/","kse_index_open":30046.67,"kse_index_high":30053.64,"kse_index_low":29665.65,"kse_index_close":29774.00,"kse_index_value":322398592,"kse_index_change":-98.97,"kse_index_changep":-0.33},{"kse_index_id":13366,"kse_index_type_id":1,"kse_index_date":"\\/Date(1578250800000)\\/","kse_index_open":29547.06,"kse_index_high":29774.00,"kse_index_low":29101.65,"kse_index_close":29145.52,"kse_index_value":266525664,"kse_index_change":-628.48,"kse_index_changep":-2.11},{"kse_index_id":13370,"kse_index_type_id":1,"kse_index_date":"\\/Date(1578337200000)\\/","kse_index_open":29209.91,"kse_index_high":29393.74,"kse_index_low":29072.69,"kse_index_close":29375.75,"kse_index_value":206397936,"kse_index_change":230.23,"kse_index_changep":0.79},{"kse_index_id":13374,"kse_index_type_id":1,"kse_index_date":"\\/Date(1578423600000)\\/","kse_index_open":29157.77,"kse_index_high":29375.75,"kse_index_low":28882.75,"kse_index_close":29010.85,"kse_index_value":279807072,"kse_index_change":-364.90,"kse_index_changep":-1.24},{"kse_index_id":13378,"kse_index_type_id":1,"kse_index_date":"\\/Date(1578510000000)\\/","kse_index_open":29319.08,"kse_index_high":29667.92,"kse_index_low":29010.85,"kse_index_close":29654.66,"kse_index_value":361992128,"kse_index_change":643.81,"kse_index_changep":2.22},{"kse_index_id":13382,"kse_index_type_id":1,"kse_index_date":"\\/Date(1578596400000)\\/","kse_index_open":29732.02,"kse_index_high":30070.99,"kse_index_low":29654.66,"kse_index_close":30058.45,"kse_index_value":400051936,"kse_index_change":403.79,"kse_index_changep":1.36},{"kse_index_id":13386,"kse_index_type_id":1,"kse_index_date":"\\/Date(1578855600000)\\/","kse_index_open":30109.26,"kse_index_high":30194.74,"kse_index_low":29901.75,"kse_index_close":30020.98,"kse_index_value":365810592,"kse_index_change":-37.47,"kse_index_changep":-0.13},{"kse_index_id":13390,"kse_index_type_id":1,"kse_index_date":"\\/Date(1578942000000)\\/","kse_index_open":30059.23,"kse_index_high":30150.96,"kse_index_low":29932.22,"kse_index_close":29973.44,"kse_index_value":249556960,"kse_index_change":-47.54,"kse_index_changep":-0.16},{"kse_index_id":13394,"kse_index_type_id":1,"kse_index_date":"\\/Date(1579028400000)\\/","kse_index_open":29986.93,"kse_index_high":29999.17,"kse_index_low":29799.04,"kse_index_close":29892.79,"kse_index_value":171127728,"kse_index_change":-80.65,"kse_index_changep":-0.27},{"kse_index_id":13398,"kse_index_type_id":1,"kse_index_date":"\\/Date(1579114800000)\\/","kse_index_open":29913.22,"kse_index_high":30007.53,"kse_index_low":29779.46,"kse_index_close":29914.47,"kse_index_value":229585632,"kse_index_change":21.68,"kse_index_changep":0.07},{"kse_index_id":13402,"kse_index_type_id":1,"kse_index_date":"\\/Date(1579201200000)\\/","kse_index_open":29929.81,"kse_index_high":30037.83,"kse_index_low":29914.46,"kse_index_close":29998.45,"kse_index_value":211220464,"kse_index_change":83.98,"kse_index_changep":0.28},{"kse_index_id":13406,"kse_index_type_id":1,"kse_index_date":"\\/Date(1579460400000)\\/","kse_index_open":30043.65,"kse_index_high":30089.73,"kse_index_low":29734.95,"kse_index_close":29808.60,"kse_index_value":173774336,"kse_index_change":-189.85,"kse_index_changep":-0.63},{"kse_index_id":13410,"kse_index_type_id":1,"kse_index_date":"\\/Date(1579546800000)\\/","kse_index_open":29856.28,"kse_index_high":29928.72,"kse_index_low":29621.78,"kse_index_close":29735.95,"kse_index_value":177421264,"kse_index_change":-72.65,"kse_index_changep":-0.24},{"kse_index_id":13414,"kse_index_type_id":1,"kse_index_date":"\\/Date(1579633200000)\\/","kse_index_open":29746.05,"kse_index_high":29754.25,"kse_index_low":29308.76,"kse_index_close":29561.63,"kse_index_value":177486256,"kse_index_change":-174.32,"kse_index_changep":-0.59},{"kse_index_id":13418,"kse_index_type_id":1,"kse_index_date":"\\/Date(1579719600000)\\/","kse_index_open":29621.60,"kse_index_high":29759.68,"kse_index_low":29409.24,"kse_index_close":29456.52,"kse_index_value":230561152,"kse_index_change":-105.11,"kse_index_changep":-0.36},{"kse_index_id":13422,"kse_index_type_id":1,"kse_index_date":"\\/Date(1579806000000)\\/","kse_index_open":29440.00,"kse_index_high":29585.39,"kse_index_low":29318.90,"kse_index_close":29529.89,"kse_index_value":172677024,"kse_index_change":73.37,"kse_index_changep":0.25},{"kse_index_id":13426,"kse_index_type_id":1,"kse_index_date":"\\/Date(1580065200000)\\/","kse_index_open":29533.27,"kse_index_high":29594.55,"kse_index_low":29431.95,"kse_index_close":29462.60,"kse_index_value":198224992,"kse_index_change":-67.29,"kse_index_changep":-0.23},{"kse_index_id":13430,"kse_index_type_id":1,"kse_index_date":"\\/Date(1580151600000)\\/","kse_index_open":29457.47,"kse_index_high":29462.59,"kse_index_low":29230.53,"kse_index_close":29345.90,"kse_index_value":188781760,"kse_index_change":-116.70,"kse_index_changep":-0.40},{"kse_index_id":13434,"kse_index_type_id":1,"kse_index_date":"\\/Date(1580238000000)\\/","kse_index_open":29354.64,"kse_index_high":29446.90,"kse_index_low":29083.61,"kse_index_close":29135.35,"kse_index_value":197011200,"kse_index_change":-210.55,"kse_index_changep":-0.72},{"kse_index_id":13438,"kse_index_type_id":1,"kse_index_date":"\\/Date(1580324400000)\\/","kse_index_open":29132.60,"kse_index_high":29181.59,"kse_index_low":28969.60,"kse_index_close":29123.53,"kse_index_value":162120016,"kse_index_change":-11.82,"kse_index_changep":-0.04},{"kse_index_id":13442,"kse_index_type_id":1,"kse_index_date":"\\/Date(1580410800000)\\/","kse_index_open":29166.18,"kse_index_high":29257.79,"kse_index_low":28945.19,"kse_index_close":29067.54,"kse_index_value":193415040,"kse_index_change":-55.99,"kse_index_changep":-0.19},{"kse_index_id":13446,"kse_index_type_id":1,"kse_index_date":"\\/Date(1580670000000)\\/","kse_index_open":28941.02,"kse_index_high":29067.54,"kse_index_low":28246.97,"kse_index_close":28315.61,"kse_index_value":202691712,"kse_index_change":-751.93,"kse_index_changep":-2.59},{"kse_index_id":13450,"kse_index_type_id":1,"kse_index_date":"\\/Date(1580756400000)\\/","kse_index_open":28356.76,"kse_index_high":28506.86,"kse_index_low":28245.23,"kse_index_close":28493.84,"kse_index_value":145986304,"kse_index_change":178.23,"kse_index_changep":0.63},{"kse_index_id":13454,"kse_index_type_id":1,"kse_index_date":"\\/Date(1580929200000)\\/","kse_index_open":28577.12,"kse_index_high":28633.74,"kse_index_low":28375.60,"kse_index_close":28398.38,"kse_index_value":127719744,"kse_index_change":-95.46,"kse_index_changep":-0.34},{"kse_index_id":13458,"kse_index_type_id":1,"kse_index_date":"\\/Date(1581015600000)\\/","kse_index_open":28458.74,"kse_index_high":28458.75,"kse_index_low":27983.62,"kse_index_close":28042.82,"kse_index_value":193151648,"kse_index_change":-355.56,"kse_index_changep":-1.25},{"kse_index_id":13462,"kse_index_type_id":1,"kse_index_date":"\\/Date(1581274800000)\\/","kse_index_open":28043.58,"kse_index_high":28053.71,"kse_index_low":27470.38,"kse_index_close":27520.35,"kse_index_value":180630816,"kse_index_change":-522.47,"kse_index_changep":-1.86},{"kse_index_id":13466,"kse_index_type_id":1,"kse_index_date":"\\/Date(1581361200000)\\/","kse_index_open":27601.00,"kse_index_high":28017.17,"kse_index_low":27492.28,"kse_index_close":27865.16,"kse_index_value":161458304,"kse_index_change":344.81,"kse_index_changep":1.25},{"kse_index_id":13470,"kse_index_type_id":1,"kse_index_date":"\\/Date(1581447600000)\\/","kse_index_open":27959.20,"kse_index_high":28384.45,"kse_index_low":27865.16,"kse_index_close":28309.35,"kse_index_value":179861264,"kse_index_change":444.19,"kse_index_changep":1.59},{"kse_index_id":13474,"kse_index_type_id":1,"kse_index_date":"\\/Date(1581534000000)\\/","kse_index_open":28380.58,"kse_index_high":28468.96,"kse_index_low":28191.97,"kse_index_close":28256.09,"kse_index_value":197307008,"kse_index_change":-53.26,"kse_index_changep":-0.19},{"kse_index_id":13478,"kse_index_type_id":1,"kse_index_date":"\\/Date(1581620400000)\\/","kse_index_open":28327.55,"kse_index_high":28330.57,"kse_index_low":27917.81,"kse_index_close":28015.75,"kse_index_value":117521904,"kse_index_change":-240.34,"kse_index_changep":-0.85},{"kse_index_id":13482,"kse_index_type_id":1,"kse_index_date":"\\/Date(1581879600000)\\/","kse_index_open":28023.74,"kse_index_high":28130.89,"kse_index_low":27900.27,"kse_index_close":28002.69,"kse_index_value":99813272,"kse_index_change":-13.06,"kse_index_changep":-0.05},{"kse_index_id":13486,"kse_index_type_id":1,"kse_index_date":"\\/Date(1581966000000)\\/","kse_index_open":28036.95,"kse_index_high":28141.44,"kse_index_low":27758.54,"kse_index_close":27807.10,"kse_index_value":91269288,"kse_index_change":-195.59,"kse_index_changep":-0.70},{"kse_index_id":13490,"kse_index_type_id":1,"kse_index_date":"\\/Date(1582052400000)\\/","kse_index_open":27843.99,"kse_index_high":28108.02,"kse_index_low":27807.11,"kse_index_close":28063.85,"kse_index_value":142765888,"kse_index_change":256.75,"kse_index_changep":0.92},{"kse_index_id":13494,"kse_index_type_id":1,"kse_index_date":"\\/Date(1582138800000)\\/","kse_index_open":28122.04,"kse_index_high":28132.98,"kse_index_low":27989.14,"kse_index_close":28018.02,"kse_index_value":111998784,"kse_index_change":-45.83,"kse_index_changep":-0.16},{"kse_index_id":13498,"kse_index_type_id":1,"kse_index_date":"\\/Date(1582225200000)\\/","kse_index_open":28028.61,"kse_index_high":28039.38,"kse_index_low":27856.26,"kse_index_close":27895.15,"kse_index_value":85454400,"kse_index_change":-122.87,"kse_index_changep":-0.44},{"kse_index_id":13502,"kse_index_type_id":1,"kse_index_date":"\\/Date(1582484400000)\\/","kse_index_open":27880.35,"kse_index_high":27895.15,"kse_index_low":27200.92,"kse_index_close":27248.30,"kse_index_value":144128160,"kse_index_change":-646.85,"kse_index_changep":-2.32},{"kse_index_id":13506,"kse_index_type_id":1,"kse_index_date":"\\/Date(1582570800000)\\/","kse_index_open":27206.95,"kse_index_high":27321.33,"kse_index_low":26851.06,"kse_index_close":27018.98,"kse_index_value":124276016,"kse_index_change":-229.32,"kse_index_changep":-0.84},{"kse_index_id":13510,"kse_index_type_id":1,"kse_index_date":"\\/Date(1582657200000)\\/","kse_index_open":27058.85,"kse_index_high":27070.75,"kse_index_low":26560.92,"kse_index_close":26687.95,"kse_index_value":147798160,"kse_index_change":-331.03,"kse_index_changep":-1.23},{"kse_index_id":13514,"kse_index_type_id":1,"kse_index_date":"\\/Date(1582743600000)\\/","kse_index_open":26355.50,"kse_index_high":26687.95,"kse_index_low":25780.38,"kse_index_close":26396.96,"kse_index_value":248988672,"kse_index_change":-290.99,"kse_index_changep":-1.09},{"kse_index_id":13518,"kse_index_type_id":1,"kse_index_date":"\\/Date(1582830000000)\\/","kse_index_open":26302.05,"kse_index_high":26519.47,"kse_index_low":26181.00,"kse_index_close":26289.38,"kse_index_value":201662240,"kse_index_change":-107.58,"kse_index_changep":-0.41},{"kse_index_id":13522,"kse_index_type_id":1,"kse_index_date":"\\/Date(1583089200000)\\/","kse_index_open":26342.71,"kse_index_high":27096.59,"kse_index_low":26289.38,"kse_index_close":27059.34,"kse_index_value":215058320,"kse_index_change":769.96,"kse_index_changep":2.93},{"kse_index_id":13526,"kse_index_type_id":1,"kse_index_date":"\\/Date(1583175600000)\\/","kse_index_open":27200.11,"kse_index_high":27385.30,"kse_index_low":26854.16,"kse_index_close":27054.89,"kse_index_value":225222304,"kse_index_change":-4.45,"kse_index_changep":-0.02},{"kse_index_id":13530,"kse_index_type_id":1,"kse_index_date":"\\/Date(1583262000000)\\/","kse_index_open":27070.16,"kse_index_high":27069.35,"kse_index_low":26797.32,"kse_index_close":26919.79,"kse_index_value":186877760,"kse_index_change":-135.10,"kse_index_changep":-0.50},{"kse_index_id":13534,"kse_index_type_id":1,"kse_index_date":"\\/Date(1583348400000)\\/","kse_index_open":26961.15,"kse_index_high":27369.98,"kse_index_low":26919.79,"kse_index_close":27228.79,"kse_index_value":340043072,"kse_index_change":309.00,"kse_index_changep":1.15},{"kse_index_id":13538,"kse_index_type_id":1,"kse_index_date":"\\/Date(1583434800000)\\/","kse_index_open":27126.48,"kse_index_high":27228.79,"kse_index_low":26517.64,"kse_index_close":26557.85,"kse_index_value":244063824,"kse_index_change":-670.94,"kse_index_changep":-2.46},{"kse_index_id":13542,"kse_index_type_id":1,"kse_index_date":"\\/Date(1583694000000)\\/","kse_index_open":25878.94,"kse_index_high":26557.85,"kse_index_low":25304.60,"kse_index_close":25875.06,"kse_index_value":307753952,"kse_index_change":-682.79,"kse_index_changep":-2.57},{"kse_index_id":13546,"kse_index_type_id":1,"kse_index_date":"\\/Date(1583780400000)\\/","kse_index_open":25758.62,"kse_index_high":26210.06,"kse_index_low":25719.55,"kse_index_close":26184.13,"kse_index_value":274065504,"kse_index_change":309.07,"kse_index_changep":1.19},{"kse_index_id":13550,"kse_index_type_id":1,"kse_index_date":"\\/Date(1583866800000)\\/","kse_index_open":26331.02,"kse_index_high":26562.31,"kse_index_low":26061.81,"kse_index_close":26127.67,"kse_index_value":217595296,"kse_index_change":-56.46,"kse_index_changep":-0.22},{"kse_index_id":13554,"kse_index_type_id":1,"kse_index_date":"\\/Date(1583953200000)\\/","kse_index_open":26002.00,"kse_index_high":26127.67,"kse_index_low":25245.98,"kse_index_close":25310.97,"kse_index_value":230028032,"kse_index_change":-816.70,"kse_index_changep":-3.13}]}'

find aria-label in html page using soup python

i have html pages, with this code :
<span itemprop="title" data-andiallelmwithtext="15" aria-current="page" aria-label="you in page
number 452">page 452</span>
i want to find the aria-label, so i have tried this:
is_452 = soup.find("span", {"aria-label": "you in page number 452"})
print(is_452)
i want to get the result :
is_452 =page 452
i'm getting the result:
is_452=none
how to do it ?

It has line breaks in it, so it doesn't match through text.Try the following
from simplified_scrapy.simplified_doc import SimplifiedDoc
html='''<span itemprop="title" data-andiallelmwithtext="15" aria-current="page" aria-label="you in page
number 452">page 452</span>'''
doc = SimplifiedDoc(html)
is_452 = doc.getElementByReg('aria-label="you in page[\s]*number 452"',tag="span")
print (is_452.text)

Possibly the desired element is a dynamic element and you can use Selenium to extract the value of the aria-label attribute inducing WebDriverWait for the visibility_of_element_located() and you can use either of the following Locator Strategies:
Using CSS_SELECTOR:
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "section#header a.cart-heading[href='/cart']"))).get_attribute("aria-label"))
Using XPATH:
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//section[#id='header']//a[#class='cart-heading' and #href='/cart']"))).get_attribute("aria-label"))
Note : You have to add the following imports:
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

The reason soup fails in doing this is because of the line break. I have a simpler solution which doesn't use any separate library, just BeautifulSoup only. I know this question is old, but it has 1k views so it's clear many people search up this question.
You can use triple-quote strings to take into account the newline.
This:
is_452 = soup.find("span", {"aria-label": "you in page number 452"})
print(is_452)
Would become:
search_label = """you in page
number 452"""
is_452 = soup.find("span", {"aria-label": search_label})
print(is_452)

How to input Values in Google Maps using Python/Selenium

I cannot use send keys correctly to input values.
I would like to be able to insert text into the text box.
Tried 2 different methods
from selenium import webdriver
options = webdriver.ChromeOptions()
options.add_argument('--ignore-certificate-errors')
options.add_argument("--test-type")
driver = webdriver.Chrome('/Users/.../Documents/chromedriver')
driver.get('http://codepad.org/')
text_area = driver.find_element_by_id('textarea')
text_area.send_keys("This text is send using Python code.")
from selenium import webdriver
driver = webdriver.Chrome('/Users/.../Documents/chromedriver')
driver.get( 'https://www.google.com/maps/dir///#36.0667234,-115.1059052,15z')
driver.find_element_by_xpath("//*[#placeholder='Choose starting point, or click on the map...']").click()
driver.find_element_by_xpath("//*[#placeholder='Choose starting point, or click on the map...']").clear()
driver.find_element_by_xpath("//*[#placeholder='Choose starting point, or click on the map...']").send_keys("New York")
Put a value into the fields i am trying to put the values in

Here is the code that you can use, which will wait for the element to present and then set the value in the input box.
WebDriverWait(driver,30).until(EC.visibility_of_element_located((By.XPATH, "(//input[#class='tactile-searchbox-input'])[1]"))).send_keys("new york")
BTW you need below imports in order to work with explicit wait used in the above code.
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How to recursively scrape table from pages using python selenium - python-3.x

Related

ActionChains is not updating the data within the table

Can't locate elements from a website using selenium

Extracting data tables from HTML source after scraping using Selenium & Python

find aria-label in html page using soup python

How to input Values in Google Maps using Python/Selenium

Categories

Resources