So, I encountered something really interesting. I splitted a string with .splitlines().
When I print that list, it works just fine and outputs values in that form: ['Freitag 24.11', '08:00'].
But if I try to access any values of that list by an index, e.g. list[0], it should give me the first value of that list. In this case 'Freitag 24.11'.
splittedParameters = tag.text.splitlines()
print(splittedParameters[0])
So as explained, if I dont use the index [0] it just works fine and outputs the whole list. But in that form with the index it says: "IndexError: list index out of range"
the whole code:
from requests_html import HTMLSession
startDate = None
endDate = None
summary = None
date = ''
splittedDate = ''
url = 'ANY_URL'
session = HTMLSession()
r = session.get(url)
r.html.render()
aTags = r.html.find("a.ui-link")
for tag in aTags:
splittedParameters = tag.text.splitlines()
print(splittedParameters[0])
splittedParameters
is returning nothing, as in its given no value.
there's no text in aTags
print(aTags)
>>> [<Element 'a' class=('logo', 'ui-link') href='index.php'>]
After reading your comment I tried:
for tag in aTags:
print(tag.text)
splittedParameters = tag.text.splitlines()
I got nothing printed
I found the error.
The problem was that in the first run of the for loop the list "splittedParameters" empty is.
if len(splittedParameters) > 0:
print(splittedParameters[0])
this will work.
Related
I'm trying to display values from appended list of items that are scraped with bs4. Currently, my code only returns a whole set of data, but I'd like to have separate values displayed from the data. Now, all I get is:
NameError: value is not defined.
How to do it?
data = []
for e in soup.select('div:has(> div > a h3)'):
data.append({
'title':e.h3.text,
'url':e.a.get('href'),
'desc':e.next_sibling.text,
'email': re.search(r'[\w.+-]+#[\w-]+\.[\w.-]+', e.parent.text).group(0) if
re.search(r'[\w.+-]+#[\w-]+\.[\w.-]+', e.parent.text) else None
})
data
title = print(title) # name error
desc = print(desc) # name error
email = print(email) # name error
Main issue is that you try to reference only on keys without taking into account that there is data a list of dicts.
So you have to pick your dict by index from data if you like to print a specific one:
print(data[0]['title'])
print(data[0]['desc'])
print(data[0]['email'])
Alternative just iterate over data and print/operate on the values of each dict:
for d in data:
print(d['title'])
print(d['desc'])
print(d['email'])
or
for d in data:
title = d['title']
desc = d['desc']
email = d['email']
print(f'print title only: {title}')
You can do that like this way:
for e in soup.select('div:has(> div > a h3)'):
title=e.h3.text,
url=e.a.get('href'),
desc=e.next_sibling.text,
email= re.search(r'[\w.+-]+#[\w-]+\.[\w.-]+', e.parent.text).group(0) if re.search(r'[\w.+-]+#[\w-]+\.[\w.-]+', e.parent.text) else None
print(title)
print(desc)
print(email)
print(url)
I'm generating a common list of IDs by comparing two sets of IDs (the ID sets are from a dictionary, {ID: XML "RECORD" element}). Once I have the common list, I want to iterate over it and retrieve the value corresponding to the ID from a dictionary (which I'll write to disc).
When I compute the common ID list using my diff_comm_checker function, I'm unable to retrieve the dict value the ID corresponds to. It doesn't however fail with a KeyError. I can also print the ID out.
When I hard code the ID in as the common_id value, I can retrieve the dict value.
I.e.
common_ids = diff_comm_checker( list_1, list_2, "text")
# does nothing - no failures
common_ids = ['0603599998140032MB']
#gives me:
0603599998140032MB {'R': '0603599998140032MB'} <Element 'RECORD' at 0x04ACE788>
0603599998140032MB {'R': '0603599998140032MB'} <Element 'RECORD' at 0x04ACE3E0>
So I suspected there was some difference between the strings. I checked both the function output and compared it against the hard-coded values using:
print [(_id, type(_id), repr(_id)) for _id in common_ids][0]
I get exactly the same for both:
>>> ('0603599998140032MB', <type 'str'>, "'0603599998140032MB'")
I have also followed the advice of another question and used difflib.ndiff:
common_ids1 = diff_comm_checker( [x.keys() for x in to_write[0]][0], [x.keys() for x in to_write[1]][0], "text")
common_ids = ['0603599998140032MB']
print "\n".join(difflib.ndiff(common_ids1, common_ids))
>>> 0603599998140032MB
So again, doesn't appear that there's any difference between the two.
Here's a full, working example of the code:
from StringIO import StringIO
import xml.etree.cElementTree as ET
from itertools import chain, islice
def diff_comm_checker(list_1, list_2, text):
"""Checks 2 lists. If no difference, pass. Else return common set between two lists"""
symm_diff = set(list_1).symmetric_difference(list_2)
if not symm_diff:
pass
else:
mismatches_in1_not2 = set(list_1).difference( set(list_2) )
mismatches_in2_not1 = set(list_2).difference( set(list_1) )
if mismatches_in1_not2:
mismatch_logger(
mismatches_in1_not2,"{}\n1: {}\n2: {}".format(text, list_1, list_2), 1, 2)
if mismatches_in2_not1:
mismatch_logger(
mismatches_in2_not1,"{}\n2: {}\n1: {}".format(text, list_1, list_2), 2, 1)
set_common = set(list_1).intersection( set(list_2) )
if set_common:
return sorted(set_common)
else:
return "no common set: {}\n".format(text)
def chunks(iterable, size=10):
iterator = iter(iterable)
for first in iterator:
yield chain([first], islice(iterator, size - 1))
def get_elements_iteratively(file):
"""Create unique ID out of image number and case number, return it along with corresponding xml element"""
tag = "RECORD"
tree = ET.iterparse(StringIO(file), events=("start","end"))
context = iter(tree)
_, root = next(context)
for event, record in context:
if event == 'end' and record.tag == tag:
xml_element_2 = ''
xml_element_1 = ''
for child in record.getchildren():
if child.tag == "IMAGE_NUMBER":
xml_element_1 = child.text
if child.tag == "CASE_NUM":
xml_element_2 = child.text
r_id = "{}{}".format(xml_element_1, xml_element_2)
record.set("R", r_id)
yield (r_id, record)
root.clear()
def get_chunks(file, chunk_size):
"""Breaks XML into chunks, yields dict containing unique IDs and corresponding xml elements"""
iterable = get_elements_iteratively(file)
for chunk in chunks(iterable, chunk_size):
ids_records = {}
for k in chunk:
ids_records[k[0]]=k[1]
yield ids_records
def create_new_xml(xml_list):
chunk = 5000
chunk_rec_ids_1 = get_chunks(xml_list[0], chunk)
chunk_rec_ids_2 = get_chunks(xml_list[1], chunk)
to_write = [chunk_rec_ids_1, chunk_rec_ids_2]
######################################################################################
### WHAT'S GOING HERE ??? WHAT'S THE DIFFERENCE BETWEEN THE OUTPUTS OF THESE TWO ? ###
common_ids = diff_comm_checker( [x.keys() for x in to_write[0]][0], [x.keys() for x in to_write[1]][0], "create_new_xml - large - common_ids")
#common_ids = ['0603599998140032MB']
######################################################################################
for _id in common_ids:
print _id
for gen_obj in to_write:
for kv_pair in gen_obj:
if kv_pair[_id]:
print _id, kv_pair[_id].attrib, kv_pair[_id]
if __name__ == '__main__':
xml_1 = """<?xml version="1.0"?><RECORDSET><RECORD><CASE_NUM>140032MB</CASE_NUM><IMAGE_NUMBER>0603599998</IMAGE_NUMBER></RECORD></RECORDSET>"""
xml_2 = """<?xml version="1.0"?><RECORDSET><RECORD><CASE_NUM>140032MB</CASE_NUM><IMAGE_NUMBER>0603599998</IMAGE_NUMBER></RECORD></RECORDSET>"""
create_new_xml([xml_1, xml_2])
The problem is not in the type or value of common_ids returned from diff_comm_checker. The problem is that the function diff_comm_checker or in constructing the arguments to the function that destroys the values of to_write
If you try this you will see what I mean
common_ids = ['0603599998140032MB']
diff_comm_checker( [x.keys() for x in to_write[0]][0], [x.keys() for x in to_write[1]][0], "create_new_xml - large - common_ids")
This will give the erroneous behavior without using the return value from diff_comm_checker()
This is because to_write is a generator and the call to diff_comm_checker exhausts that generator. The generator is then finished/empty when used in the if-statement in the loop. You can create a list from a generator by using list:
chunk_rec_ids_1 = list(get_chunks(xml_list[0], chunk))
chunk_rec_ids_2 = list(get_chunks(xml_list[1], chunk))
But this may have other implications (memory usage...)
Also, what is the intention of this construct in diff_comm_checker?
if not symm_diff:
pass
In my opinion nothing will happen regardless if symm_diff is None or not.
data = ['{"osc":{"version":"1.0"}}']
or
data = ['{"device":{"network":{"ipv4_dante":{"auto":"testing"}}}}']
From the code above, I only get random outputs, but I need to get the last value i.e "1.0" or "testing" and so on.
I always need to get the last value. How can I do it using python?
Dictionaries have no "last" element. Assuming your dictionary doesn't branch and you want the "deepest" element, this should work:
import json
data = ['{"device":{"network":{"ipv4_dante":{"auto":"testing"}}}}']
obj = json.loads(data[0])
while isinstance(obj, dict):
obj = obj[list(obj.keys())[0]]
print(obj)
This should work -
import ast
x = ast.literal_eval(data[0])
while(type(x)==dict):
key = x.keys()[0]
x = x.get(key)
print(x)
i just can't figure it out.
I got a string with some lines.
qual=[abcdefg\nabcedfg\nabcdefg]
I want to convert my characters to the ascii value and saves those values in an other list for each line.
value=[[1,2,3,4,5,6],[1,2,3,4,5,6],[1,2,3,4,5,6]
But my codes saves them all in one list.
values=[1,2,3,4,5,6,1,2,3,4,5,6,1,2,3,4,5,6]
First of all my code:
for element in qual:
qs = ord(element)
quality_code.append(qs)
I also tried to split() the string but the result is still the same
qual=line#[:-100]
qually=qual.split()
for list in qually:
for element in list:
qs = ord(element)
quality.append(qs)
My next attempt was:
for element in qual:
qs = ord(element)
quality_code.append(qs)
for position in range(0, len(quality_code)):
qual_liste[position].append(quality_code[position])
With this code an IndexError(list index out of range) occurs.
There is probably a way with try and except but i dont get it.
for element in qual:
qs = ord(element)
quality_code.append(qs)
for position in range(0, len(quality_code)):
try:
qual_liste[position].append(quality_code[position])
except IndexError:
pass
With this code the qual_lists stays empty, probably because of the pass
but i dont know what to insert instead of pass.
Thanks a lot for help. I hope my bad english is excusable .D
Here you go, this should do the trick:
qual="abcdefg\nabcedfg\nabcdefg"
print([[ord(ii) for ii in i] for i in qual.split('\n')])
List comprehension is always the answer.
it's my first time using python, and i still learn about python. I have problem when i tried to using index. Index show the error "IndexError: list index out of range". But i want to pass or create condition next for it. Example like this :
links = "http://www.website_name.com/"
content_ig = BeautifulSoup(send.content, 'html.parser')
script = content_ig.find_all("script")[3].get_text()
script = script.split('openData = ')[1][:-1]
if not script:
#This condition i create to next if the value is out of index
else:
print("Works")
I mean when the index out of the range, i want create condition the next to another value, not just stop and show the error "IndexError: list index out of range".
To address your question, you can wrap your line of code inside a try-except brace:
try:
script = script.split('openData = ')[1][:-1]
print("Works")
except IndexError:
... # code to run if the value is out of index
Quick demo:
In [1739]: x = [0]
In [1740]: try:
...: print(x[1]) # only 1 element in x, so this is invalid
...: except IndexError:
...: print("List out of range!")
...:
List out of range!