how to extract particular lines from the list?

how to extract particular lines from the list? - python-3.x

I have a list and wanted to extract a particular line from the list. Below is my list
I wanted to extract 'src link' from the above list
example:
(src="https://r-cf.bstatic.com/xdata/images/hotel/square600/244245064.webp?k=8699eb2006da453ae8fe257eee2dcc242e70667ef29845ed85f70dbb9f61726a&o="). My final aim is to extract only the link. I have 20 records in the list. Hence, the need to extract 20 links from the same
My code (I stored the list in 'aas')
links = []
for i in aas:
link = re.search('CONCLUSION: (.*?)([A-Z]{2,})', i).group(1)
links.append(link)
````
I am getting an error: "expected string or bytes-like object"
Any suggestions?

As per the Beautiful Soup documentation, you can access a tag’s attributes by treating the tag like a dictionary, like so:
for img in img_list:
print(img["src"])

Related

Python, Evaluating parsed data against a predefined list

I have a large set of data which is being parsed by feedparser and is enumerated as a string. For instance:
import feedparser
d = feedparser.parse('somefeed.xml')
will give elements such as d.entries['title'], d.entries['url'], and other d.entries which are strings.
I am wanting to compare these elements against a list I have defined to see if there is a match and I am not quite thinking something through correctly.
Below is what I tried, but got no output, any help is appreciated.
for i in d.entries:
my_list = ["Title One", "Title Two", "Etc"]
if my_list in i['title'].split("-"):
print(i)
If there is a match of parsed data and an element of my list, I want to print that element.

Adding a dynamic number of tags in LXML

I have the following codesnippet:
from lxml import etree
dataset = etree.Element("trace_data")
# create childs of dataset
pinfo = etree.SubElement(dataset, "pinfo")
pinfo.text = processinfo
traces = etree.SubElement(dataset, "traces")
#enumerate over all traces, create a subelement to the traces element with an index for every element
for index,trace in enumerate(traces):
trace_xml = etree.SubElement(traces,str(index))
trace_xml.text = trace
Which, to my confusion, does not fill the subelements of traces, but generates an XML like:
<trace_data><pinfo>1</pinfo><traces/></trace_data>
Even though I meant it to iterate over a list of strings, called trace, then add a tag for each of the elements in the list:
<trace_data><pinfo>1</pinfo><traces><trace1>"test"</trace1><trace2>"test2"</trace2</traces></trace_data>
and so on
I suspect that this may come from the way I am trying to create subelements for the traces tag.
What are some ways to create a subelement to the tag traces for each element in the list?
Thanks in advance.

Found the error:
processinfo always contains a numeric value, e.g. 1
LXML doesnt allow tags to start with a number and therefore throws an error.
I couldn't see this error due to poor exceptionhandling.

I want to get a element by Text in beautiful soup

elem = browser.find_element_by_partial_link_text("WEBSITE")
above code finds out element with a link text as WEBSITE, but I don't want to use Selenuim here and Find element by text by using bs4. I tried the following code but no results
elem = soup.find(text=re.compile('WEBSITE'))

By the documentation provided here. You can do something like below.
ele = soup.find('tag', string='text_for_search')

Why am i getting a error trying to loop through href in selenium python?

The code i used for looping is given below. I tried to take the hrefs that are relevent to me by slicing the event_list to which all the hrefs are saved but im getting the error:
---> 35 print(event_links[16,:])
36
37 #for link in event_links[16,:]:
TypeError: list indices must be integers or slices, not tuple
Can i change the code in any way so that i can slice the list and how is that the list indices are tuples?
event_links = []
for link in driver.find_elements_by_xpath("//a[#href]"):
url = link.get_attribute('href')
event_links.append(url)
print(event_links[16,:])

Tanzeem Alam:
First change [16,:] to [16:] , as indices are mentioned using colon ':' not comma ','.
To use list indices as tuples, you can use matrix format to store the list else you can numpy library of python to manipulate your array.

Converting String of Ints and Floats to Individual Ints and Floats in a CSV

I'm using scrapy to scrape stock premarket data. Here is the code being used to scrape the website:
def parse(self, response):
for sel in response.xpath('//body'):
item = PremarketItem()
item['volume'] = sel.xpath('//td[#class="tdVolume"]/text()').extract()
item['last_price'] = sel.xpath('//div[#class="lastPrice"]/text()')[:30].extract()
item['percent_change'] = sel.xpath(
'//div[#class="chgUp"]/text()')[:15].extract() + sel.xpath('//div[#class="chgDown"]/text()')[:15].extract()
item['ticker'] = sel.xpath('//a[#class="symbol"]/text()')[:30].extract()
yield item
The output of the following code into the .csv file is something along the lines of this:
ticker,percent_change,last_price,volume
"HTGM,SNCR,SAEX,IMMU,OLED,DAIO","27.43%,20.39%,17.28%,17.19%,15.69%","5,298350,700,1090000,76320,27190,13010",etc
As you can see, the values are separated correctly, but they're all stuck in massive strings. I've tried multiple for loops, but nothing has worked, and I can't find anything. Thank you for the help!

Instead of splitting the massive strings you can fix the scrapy code so that the values are separated in the first place.
Your item XPaths start with // selecting all elements matching your specification and thus outputting all elements in one (massive) item. I suppose your target website has some structure with respect to the target items e.g. table rows.
Then you need to figure out a XPath expression that matches the rows and loop over those rows for parsing one item per row. See the following pseudo code:
def parse(self, response):
# Loop over table rows ...
for sel in response.xpath('//table/tr'):
item = PremarketItem()
# Use XPath starting in table row: Use dot at beginning
item['volume'] = sel.xpath('./td[#class="tdVolume"]/text()').extract()
# ... other fields ...
yield item
See scrapy documentation for examples of relative XPath expressions.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

how to extract particular lines from the list? - python-3.x

As per the Beautiful Soup documentation, you can access a tag’s attributes by treating the tag like a dictionary, like so: for img in img_list: print(img["src"])

Related

Python, Evaluating parsed data against a predefined list

Adding a dynamic number of tags in LXML

I want to get a element by Text in beautiful soup

Why am i getting a error trying to loop through href in selenium python?

Converting String of Ints and Floats to Individual Ints and Floats in a CSV

Categories

Resources