Can't get multiple span class text with selenium python - python-3.x

I'm getting error when I try to scrape a flashscore match summary. Example:
flashscore
I want to get the for example all the results in those page but doing driver.find_element_by_class("h2h__result") it only takes the first result. (putted inside a for obv)
If i try to do driver.find_elements_by_class i get error and i can't understand why.
Code example:
driver.get("https://www.flashscore.com/match/Qs85KCdA/#h2h/overall")
time.sleep(2)
h2h = driver.find_elements_by_class_name("rows")
for x in h2h:
p = driver.find_element_by_css_selector("span.h2h__regularTimeResult")
print(p.text)
Can someone help me to understand where i'm doing wrong? Thank you a lot guys.

The elements with class-name rows is highlighting the whole table. Use the class-name h2h__row so that all the rows are focused and will be able to extract the details from that particular row.
Try below xpaths to get the elements.
from selenium.webdriver.common.by import By
driver.get("https://www.flashscore.com/match/Qs85KCdA/#h2h/overall")
rows = driver.find_elements(By.XPATH,"//div[#class='h2h__row']")
for row in rows:
results = row.find_element(By.XPATH,".//span[#class='h2h__regularTimeResult']") # Use a dot in the xpath to find elements with in an element
print(results.text)
You can also use below CSS_SELECTOR to get the elements directly.
regularTimeResult = driver.find_elements(By.CSS_SELECTOR,"div.h2h__row span.h2h__regularTimeResult")
for item in regularTimeResult:
print(item.text)
Update:
rows = driver.find_elements(By.XPATH,"//div[#class='h2h__row']")
for row in rows:
results = row.find_element(By.XPATH,".//span[#class='h2h__regularTimeResult']") # Use a dot in the xpath to find elements with in an element
if "0 : 0" not in results.text:
print(results.text)

Related

I'm trying to remove certain words from a column on each row in my dataframe

I'm still trying to understand how pandas works please bear with me. In this exercise, I,m trying to access a particular column ['Without Stop Words'] on each row which has a list of words. I wish to remove certain words from each row of that column. the words to be removed have been specified in a dictionary called {'stop_words_dict'}. here's my code, but the dataframe seems to be unchanged after running it.
def stop_words_remover(df):
# your code here
df['Without Stop Words']= df['Tweets'].str.lower().str.split()
for i, r in df.iterrows():
for word in df['Without Stop Words']:
if word in stop_words_dict.items():
df['Without Stop Words'][i] = df['Without Stop Words'].str.remove(word)
return df
this is how the input looks like
INPUT
EXPECTED OUTPUT
In Pandas, it's generally a bad idea to loop over your dataframe row by row to try to change rows. Instead, try using methods like .apply().
An example for stopwords, together with list comprehension:
test['Tweets'].apply(lambda x: [item for item in x if item not in stop_words_dict.items()])
See https://stackoverflow.com/a/29523440/12904151 for more context.

python selenium find_elements incorrect output

I am new to python+selenium. I am trying to return some stuff with find_elements* functions - see code below. When I ask on length of the list, I can see that the number of items is correct, however when I print the content of elements I can see that each element contains the same values and it should contains different values.
elems = browser.find_elements_by_xpath("//div[#class='some_class_name')]")
print(len(elems)) # returns correct number of items
for elem in elems:
print(elem.find_element_by_xpath("//span[#class='another_class_name']").text)
print(elem.find_element_by_xpath("//div[starts-with(#href, 'https://some_web_page_name.com/')]").get_attribute(
'data'))
print(elem.find_element_by_xpath("//div[#class='other_class_name']").text)
You should use . to reduce the scope of xpath to the current element children/grandchildren. In your case, the XPath is pointing to the first element on the page rather finding under the current elem
change code as show below
for elem in elems:
print(elem.find_element_by_xpath(".//span[#class='another_class_name']").text)
print(elem.find_element_by_xpath(".//div[starts-with(#href, 'https://some_web_page_name.com/')]").get_attribute(
'data'))
print(elem.find_element_by_xpath(".//div[#class='other_class_name']").text)
While looping or using find_element inside anther selector you have to add "." inside selector otherwise it will print same element multiple times
first_element = browser.find_elements_by_xpath("//div[#class='some_class_name')]")[0].find_element_by_xpath(".//span[#class='another_class_name']").text
In your case use following code
elems = browser.find_elements_by_xpath("//div[#class='some_class_name')]")
print(len(elems)) # returns correct number of items
for elem in elems:
print(elem.find_element_by_xpath(".//span[#class='another_class_name']").text)
print(elem.find_element_by_xpath(".//div[starts-with(#href, 'https://some_web_page_name.com/')]").get_attribute('data'))
print(elem.find_element_by_xpath(".//div[#class='other_class_name']").text)

Saving ord(characters) from different lines(one string) in different lists

i just can't figure it out.
I got a string with some lines.
qual=[abcdefg\nabcedfg\nabcdefg]
I want to convert my characters to the ascii value and saves those values in an other list for each line.
value=[[1,2,3,4,5,6],[1,2,3,4,5,6],[1,2,3,4,5,6]
But my codes saves them all in one list.
values=[1,2,3,4,5,6,1,2,3,4,5,6,1,2,3,4,5,6]
First of all my code:
for element in qual:
qs = ord(element)
quality_code.append(qs)
I also tried to split() the string but the result is still the same
qual=line#[:-100]
qually=qual.split()
for list in qually:
for element in list:
qs = ord(element)
quality.append(qs)
My next attempt was:
for element in qual:
qs = ord(element)
quality_code.append(qs)
for position in range(0, len(quality_code)):
qual_liste[position].append(quality_code[position])
With this code an IndexError(list index out of range) occurs.
There is probably a way with try and except but i dont get it.
for element in qual:
qs = ord(element)
quality_code.append(qs)
for position in range(0, len(quality_code)):
try:
qual_liste[position].append(quality_code[position])
except IndexError:
pass
With this code the qual_lists stays empty, probably because of the pass
but i dont know what to insert instead of pass.
Thanks a lot for help. I hope my bad english is excusable .D
Here you go, this should do the trick:
qual="abcdefg\nabcedfg\nabcdefg"
print([[ord(ii) for ii in i] for i in qual.split('\n')])
List comprehension is always the answer.

Loop json results

I'm totally new to python. I have this code:
import requests
won = 'https://api.pipedrive.com/v1/deals?status=won&start=0&api_token=xxxx'
json_data = requests.get(won).json()
deal_name = json_data ['data'][0]['title']
print(deal_name)
It prints the first title for me, but I would like it to loop through all titles in the json. But I can't figure out how. Can anyone guide me in the right direction?
You want to read up on dictionaries and lists. It seems like your json_data["data"] contains a list, so:
Seeing you wrote this:
deal_name = json_data ['data'][0]['title']
print(deal_name)
What you are looking for is:
for i in range(len(json_data["data"])):
print(json_data["data"][i]["title"])
Print it with a for loop
1. for item in json_data['data']: will take each element in the list json_data['data']
2. Then we print the title property of the object using the line print(item['title'])
Code:
import requests
won = 'https://api.pipedrive.com/v1/deals?status=won&start=0&api_token=xxxx'
json_data = requests.get(won).json()
for item in json_data['data']:
print(item['title'])
If you are ok with printing the titles as a list you can use List Comprehensions, Please refer the link in references to learn more.
print([x['title'] for x in json_data['data']])
References:
Python Loops
Python Lists
Python Comprehensions

Converting String of Ints and Floats to Individual Ints and Floats in a CSV

I'm using scrapy to scrape stock premarket data. Here is the code being used to scrape the website:
def parse(self, response):
for sel in response.xpath('//body'):
item = PremarketItem()
item['volume'] = sel.xpath('//td[#class="tdVolume"]/text()').extract()
item['last_price'] = sel.xpath('//div[#class="lastPrice"]/text()')[:30].extract()
item['percent_change'] = sel.xpath(
'//div[#class="chgUp"]/text()')[:15].extract() + sel.xpath('//div[#class="chgDown"]/text()')[:15].extract()
item['ticker'] = sel.xpath('//a[#class="symbol"]/text()')[:30].extract()
yield item
The output of the following code into the .csv file is something along the lines of this:
ticker,percent_change,last_price,volume
"HTGM,SNCR,SAEX,IMMU,OLED,DAIO","27.43%,20.39%,17.28%,17.19%,15.69%","5,298350,700,1090000,76320,27190,13010",etc
As you can see, the values are separated correctly, but they're all stuck in massive strings. I've tried multiple for loops, but nothing has worked, and I can't find anything. Thank you for the help!
Instead of splitting the massive strings you can fix the scrapy code so that the values are separated in the first place.
Your item XPaths start with // selecting all elements matching your specification and thus outputting all elements in one (massive) item. I suppose your target website has some structure with respect to the target items e.g. table rows.
Then you need to figure out a XPath expression that matches the rows and loop over those rows for parsing one item per row. See the following pseudo code:
def parse(self, response):
# Loop over table rows ...
for sel in response.xpath('//table/tr'):
item = PremarketItem()
# Use XPath starting in table row: Use dot at beginning
item['volume'] = sel.xpath('./td[#class="tdVolume"]/text()').extract()
# ... other fields ...
yield item
See scrapy documentation for examples of relative XPath expressions.

Resources