access list values with index python - string

So, I encountered something really interesting. I splitted a string with .splitlines().
When I print that list, it works just fine and outputs values in that form: ['Freitag 24.11', '08:00'].
But if I try to access any values of that list by an index, e.g. list[0], it should give me the first value of that list. In this case 'Freitag 24.11'.
splittedParameters = tag.text.splitlines()
print(splittedParameters[0])
So as explained, if I dont use the index [0] it just works fine and outputs the whole list. But in that form with the index it says: "IndexError: list index out of range"
the whole code:
from requests_html import HTMLSession
startDate = None
endDate = None
summary = None
date = ''
splittedDate = ''
url = 'ANY_URL'
session = HTMLSession()
r = session.get(url)
r.html.render()
aTags = r.html.find("a.ui-link")
for tag in aTags:
splittedParameters = tag.text.splitlines()
print(splittedParameters[0])

splittedParameters
is returning nothing, as in its given no value.
there's no text in aTags
print(aTags)
>>> [<Element 'a' class=('logo', 'ui-link') href='index.php'>]
After reading your comment I tried:
for tag in aTags:
print(tag.text)
splittedParameters = tag.text.splitlines()
I got nothing printed

I found the error.
The problem was that in the first run of the for loop the list "splittedParameters" empty is.
if len(splittedParameters) > 0:
print(splittedParameters[0])
this will work.

Related

How to show separate values from appended list

I'm trying to display values from appended list of items that are scraped with bs4. Currently, my code only returns a whole set of data, but I'd like to have separate values displayed from the data. Now, all I get is:
NameError: value is not defined.
How to do it?
data = []
for e in soup.select('div:has(> div > a h3)'):
data.append({
'title':e.h3.text,
'url':e.a.get('href'),
'desc':e.next_sibling.text,
'email': re.search(r'[\w.+-]+#[\w-]+\.[\w.-]+', e.parent.text).group(0) if
re.search(r'[\w.+-]+#[\w-]+\.[\w.-]+', e.parent.text) else None
})
data
title = print(title) # name error
desc = print(desc) # name error
email = print(email) # name error
Main issue is that you try to reference only on keys without taking into account that there is data a list of dicts.
So you have to pick your dict by index from data if you like to print a specific one:
print(data[0]['title'])
print(data[0]['desc'])
print(data[0]['email'])
Alternative just iterate over data and print/operate on the values of each dict:
for d in data:
print(d['title'])
print(d['desc'])
print(d['email'])
or
for d in data:
title = d['title']
desc = d['desc']
email = d['email']
print(f'print title only: {title}')
You can do that like this way:
for e in soup.select('div:has(> div > a h3)'):
title=e.h3.text,
url=e.a.get('href'),
desc=e.next_sibling.text,
email= re.search(r'[\w.+-]+#[\w-]+\.[\w.-]+', e.parent.text).group(0) if re.search(r'[\w.+-]+#[\w-]+\.[\w.-]+', e.parent.text) else None
print(title)
print(desc)
print(email)
print(url)

Retrieving dict value via hardcoded key, works. Retrieving via computed key doesn't. Why?

I'm generating a common list of IDs by comparing two sets of IDs (the ID sets are from a dictionary, {ID: XML "RECORD" element}). Once I have the common list, I want to iterate over it and retrieve the value corresponding to the ID from a dictionary (which I'll write to disc).
When I compute the common ID list using my diff_comm_checker function, I'm unable to retrieve the dict value the ID corresponds to. It doesn't however fail with a KeyError. I can also print the ID out.
When I hard code the ID in as the common_id value, I can retrieve the dict value.
I.e.
common_ids = diff_comm_checker( list_1, list_2, "text")
# does nothing - no failures
common_ids = ['0603599998140032MB']
#gives me:
0603599998140032MB {'R': '0603599998140032MB'} <Element 'RECORD' at 0x04ACE788>
0603599998140032MB {'R': '0603599998140032MB'} <Element 'RECORD' at 0x04ACE3E0>
So I suspected there was some difference between the strings. I checked both the function output and compared it against the hard-coded values using:
print [(_id, type(_id), repr(_id)) for _id in common_ids][0]
I get exactly the same for both:
>>> ('0603599998140032MB', <type 'str'>, "'0603599998140032MB'")
I have also followed the advice of another question and used difflib.ndiff:
common_ids1 = diff_comm_checker( [x.keys() for x in to_write[0]][0], [x.keys() for x in to_write[1]][0], "text")
common_ids = ['0603599998140032MB']
print "\n".join(difflib.ndiff(common_ids1, common_ids))
>>> 0603599998140032MB
So again, doesn't appear that there's any difference between the two.
Here's a full, working example of the code:
from StringIO import StringIO
import xml.etree.cElementTree as ET
from itertools import chain, islice
def diff_comm_checker(list_1, list_2, text):
"""Checks 2 lists. If no difference, pass. Else return common set between two lists"""
symm_diff = set(list_1).symmetric_difference(list_2)
if not symm_diff:
pass
else:
mismatches_in1_not2 = set(list_1).difference( set(list_2) )
mismatches_in2_not1 = set(list_2).difference( set(list_1) )
if mismatches_in1_not2:
mismatch_logger(
mismatches_in1_not2,"{}\n1: {}\n2: {}".format(text, list_1, list_2), 1, 2)
if mismatches_in2_not1:
mismatch_logger(
mismatches_in2_not1,"{}\n2: {}\n1: {}".format(text, list_1, list_2), 2, 1)
set_common = set(list_1).intersection( set(list_2) )
if set_common:
return sorted(set_common)
else:
return "no common set: {}\n".format(text)
def chunks(iterable, size=10):
iterator = iter(iterable)
for first in iterator:
yield chain([first], islice(iterator, size - 1))
def get_elements_iteratively(file):
"""Create unique ID out of image number and case number, return it along with corresponding xml element"""
tag = "RECORD"
tree = ET.iterparse(StringIO(file), events=("start","end"))
context = iter(tree)
_, root = next(context)
for event, record in context:
if event == 'end' and record.tag == tag:
xml_element_2 = ''
xml_element_1 = ''
for child in record.getchildren():
if child.tag == "IMAGE_NUMBER":
xml_element_1 = child.text
if child.tag == "CASE_NUM":
xml_element_2 = child.text
r_id = "{}{}".format(xml_element_1, xml_element_2)
record.set("R", r_id)
yield (r_id, record)
root.clear()
def get_chunks(file, chunk_size):
"""Breaks XML into chunks, yields dict containing unique IDs and corresponding xml elements"""
iterable = get_elements_iteratively(file)
for chunk in chunks(iterable, chunk_size):
ids_records = {}
for k in chunk:
ids_records[k[0]]=k[1]
yield ids_records
def create_new_xml(xml_list):
chunk = 5000
chunk_rec_ids_1 = get_chunks(xml_list[0], chunk)
chunk_rec_ids_2 = get_chunks(xml_list[1], chunk)
to_write = [chunk_rec_ids_1, chunk_rec_ids_2]
######################################################################################
### WHAT'S GOING HERE ??? WHAT'S THE DIFFERENCE BETWEEN THE OUTPUTS OF THESE TWO ? ###
common_ids = diff_comm_checker( [x.keys() for x in to_write[0]][0], [x.keys() for x in to_write[1]][0], "create_new_xml - large - common_ids")
#common_ids = ['0603599998140032MB']
######################################################################################
for _id in common_ids:
print _id
for gen_obj in to_write:
for kv_pair in gen_obj:
if kv_pair[_id]:
print _id, kv_pair[_id].attrib, kv_pair[_id]
if __name__ == '__main__':
xml_1 = """<?xml version="1.0"?><RECORDSET><RECORD><CASE_NUM>140032MB</CASE_NUM><IMAGE_NUMBER>0603599998</IMAGE_NUMBER></RECORD></RECORDSET>"""
xml_2 = """<?xml version="1.0"?><RECORDSET><RECORD><CASE_NUM>140032MB</CASE_NUM><IMAGE_NUMBER>0603599998</IMAGE_NUMBER></RECORD></RECORDSET>"""
create_new_xml([xml_1, xml_2])
The problem is not in the type or value of common_ids returned from diff_comm_checker. The problem is that the function diff_comm_checker or in constructing the arguments to the function that destroys the values of to_write
If you try this you will see what I mean
common_ids = ['0603599998140032MB']
diff_comm_checker( [x.keys() for x in to_write[0]][0], [x.keys() for x in to_write[1]][0], "create_new_xml - large - common_ids")
This will give the erroneous behavior without using the return value from diff_comm_checker()
This is because to_write is a generator and the call to diff_comm_checker exhausts that generator. The generator is then finished/empty when used in the if-statement in the loop. You can create a list from a generator by using list:
chunk_rec_ids_1 = list(get_chunks(xml_list[0], chunk))
chunk_rec_ids_2 = list(get_chunks(xml_list[1], chunk))
But this may have other implications (memory usage...)
Also, what is the intention of this construct in diff_comm_checker?
if not symm_diff:
pass
In my opinion nothing will happen regardless if symm_diff is None or not.

How to get the specific value from the data using python?

data = ['{"osc":{"version":"1.0"}}']
or
data = ['{"device":{"network":{"ipv4_dante":{"auto":"testing"}}}}']
From the code above, I only get random outputs, but I need to get the last value i.e "1.0" or "testing" and so on.
I always need to get the last value. How can I do it using python?
Dictionaries have no "last" element. Assuming your dictionary doesn't branch and you want the "deepest" element, this should work:
import json
data = ['{"device":{"network":{"ipv4_dante":{"auto":"testing"}}}}']
obj = json.loads(data[0])
while isinstance(obj, dict):
obj = obj[list(obj.keys())[0]]
print(obj)
This should work -
import ast
x = ast.literal_eval(data[0])
while(type(x)==dict):
key = x.keys()[0]
x = x.get(key)
print(x)

Saving ord(characters) from different lines(one string) in different lists

i just can't figure it out.
I got a string with some lines.
qual=[abcdefg\nabcedfg\nabcdefg]
I want to convert my characters to the ascii value and saves those values in an other list for each line.
value=[[1,2,3,4,5,6],[1,2,3,4,5,6],[1,2,3,4,5,6]
But my codes saves them all in one list.
values=[1,2,3,4,5,6,1,2,3,4,5,6,1,2,3,4,5,6]
First of all my code:
for element in qual:
qs = ord(element)
quality_code.append(qs)
I also tried to split() the string but the result is still the same
qual=line#[:-100]
qually=qual.split()
for list in qually:
for element in list:
qs = ord(element)
quality.append(qs)
My next attempt was:
for element in qual:
qs = ord(element)
quality_code.append(qs)
for position in range(0, len(quality_code)):
qual_liste[position].append(quality_code[position])
With this code an IndexError(list index out of range) occurs.
There is probably a way with try and except but i dont get it.
for element in qual:
qs = ord(element)
quality_code.append(qs)
for position in range(0, len(quality_code)):
try:
qual_liste[position].append(quality_code[position])
except IndexError:
pass
With this code the qual_lists stays empty, probably because of the pass
but i dont know what to insert instead of pass.
Thanks a lot for help. I hope my bad english is excusable .D
Here you go, this should do the trick:
qual="abcdefg\nabcedfg\nabcdefg"
print([[ord(ii) for ii in i] for i in qual.split('\n')])
List comprehension is always the answer.

How to create condition next even the error is "the index out of range"

it's my first time using python, and i still learn about python. I have problem when i tried to using index. Index show the error "IndexError: list index out of range". But i want to pass or create condition next for it. Example like this :
links = "http://www.website_name.com/"
content_ig = BeautifulSoup(send.content, 'html.parser')
script = content_ig.find_all("script")[3].get_text()
script = script.split('openData = ')[1][:-1]
if not script:
#This condition i create to next if the value is out of index
else:
print("Works")
I mean when the index out of the range, i want create condition the next to another value, not just stop and show the error "IndexError: list index out of range".
To address your question, you can wrap your line of code inside a try-except brace:
try:
script = script.split('openData = ')[1][:-1]
print("Works")
except IndexError:
... # code to run if the value is out of index
Quick demo:
In [1739]: x = [0]
In [1740]: try:
...: print(x[1]) # only 1 element in x, so this is invalid
...: except IndexError:
...: print("List out of range!")
...:
List out of range!

Resources