Struggling to reassemble jsonl from stream - python-3.x

I am trying to process jsonlines from an API and I am running into an issue where requests.iter_lines() is not timely. I have to now try to incorporate requests.iter_content(chunk_size=1024*1024). I am trying to work through the logic I would need to take an incomplete jsonline[1] and attach it to the next chunk_size so it makes a complete one.
My current attempt is running a series of if statements against to detect an undesirable state [2] and then rebuild it and continue process, but i'm failing to reassemble it in all the various states this could end up in. Does someone have an example of a well thought out solution to this problem?
[1]
Example:
Last item from first chunk:
{'test1': 'value1', 'test2': 'valu
first item from second chunk:
e2', 'test3': 'value3'}
[2]
def incomplete_processor(main_chunk):
if not main_chunk[0].startswith('{') and not main_chunk[-1].endswith('\n'):
first_line = str(main_chunk[0])
last_line = str(main_chunk[-1])
main_chunk.pop(0)
main_chunk.pop(-1)
return first_line, last_line
if not main_chunk.startswith('{') and main_chunk[-1].endswith('\n'):
first_line = str(main_chunk[-1])
main_chunk.pop(0)
return first_line
if main_chunk.startswith('{') and not main_chunk[-1].endswith('\n'):
last_line = str(main_chunk[-1])
main_chunk.pop(-1)
return last_line

I solve this problem by converting my original rsplit('\n') into a deque and then caught any valueerrors raised as a result of the incomplete json. I stored the first value that errors out, waited for the next value to error out and then combined them.
while True:
try:
jsonline = main_chunk_deque.popleft()
jsonline = json.loads(jsonline)
except ValueError as VE:
if not jsonline.endswith('}'):
next_line = jsonline
elif not jsonline.startswith('{'):
first_line = jsonline
jsonline = json.loads(next_line + first_line)
continue
except IndexError:
break

Related

Read out .csv and hand results to a dictionary

I am learning some coding, and I am stuck with an error I can't explain. Basically I want to read out a .csv file with birth statistics from the US to figure out the most popular name in the time recorded.
My code looks like this:
# 0:Id, 1: Name, 2: Year, 3: Gender, 4: State, 5: Count
names = {} # initialise dict names
maximum = 0 # store for maximum
l = []
with open("Filepath", "r") as file:
for line in file:
l = line.strip().split(",")
try:
name = l[1]
if name in names:
names[name] = int(names[name]) + int(l(5))
else:
names[name] = int(l(5))
except:
continue
print(names)
max(names)
def max(values):
for i in values:
if names[i] > maximum:
names[i] = maximum
else:
continue
return(maximum)
print(maximum)
It seems like the dictionary does not take any values at all since the print command does not return anything. Where did I go wrong (incidentally, the filepath is correct, it takes a while to get the result since the .csv is quite big. So my assumption is that I somehow made a mistake writing into the dictionary, but I was staring at the code for a while now and I don't see it!)
A few suggestions to improve your code:
names = {} # initialise dict names
maximum = 0 # store for maximum
with open("Filepath", "r") as file:
for line in file:
l = line.strip().split(",")
names[name] = names.get(name, 0) + l[5]
maximum = [(v,k) for k,v in names]
maximum.sort(reversed=True)
print(maximum[0])
You will want to look into Python dictionaries and learn about get. It helps you accomplish the objective of making your names dictionary in less lines of codes (more Pythonic).
Also, you used def to generate a function but you never called that function. That is why it's not printing.
I propose the shorted code above. Ask if you have questions!
Figured it out.
I think there were a few flow issues: I called a function before defining it... is that an issue or is python okay with that?
Also I think I used max as a name for a variable, but there is a built-in function with the same name, that might cause an issue I guess?! Same with value
This is my final code:
names = {} # initialise dict names
l = []
def maxval(val):
maxname = max(val.items(), key=lambda x : x[1])
return maxname
with open("filepath", "r") as file:
for line in file:
l = line.strip().split(",")
name = l[1]
try:
names[name] = names.get(name, 0) + int(l[5])
except:
continue
#print(str(l))
#print(names)
print(maxval(names))

checking a word in the text and print() a command once

Using these commands I get the three sentences.
AnyText = driver.find_elements_by_xpath('AnyXpath')
for AnyText1 in AnyText:
print(AnyText1.text)
In the console, I get something like that:
**
1) Hello my name is John
**
2) Hello my name is Mark
**
3) Hello my name is Alex..
How can I check that all three sentences have the word "name"
and print("OK") if the word is in the sentence (element) and print("ERROR") if not.
Im try:
AnyText = driver.find_elements_by_xpath('AnyXpath')
Text = 'name'
if all(Text in AnyText1 for two in AnyText1):
print('OK')
else:
print('ERROR')
but this method only checks the first element (first sentence). I also tried something like this
AnyText = driver.find_elements_by_xpath('AnyXpath')
Text = 'name'
for AnyText1 in AnyText:
if all(Text in AnyText1):
print('OK')
else:
print('ERROR')
but I get many times OK or ERROR
UPD:
With a question on the text, I figured out with your help. Now I want to understand the numbers)
I have a loop that checks the next number more or less. If more, writes ERROR, if less, writes OK
sort_month=driver.find_element_by_xpath('/html/body/div[6]/div[2]/div/div[1]/div/div[13]/table/thead/tr/th[3]/a[4]').click()
month2=driver.find_element_by_xpath('//*[starts-with(#id, "td_")]/td[3]/span[3]')
month2=month2.text.replace("'", "").replace(" ", "")
buffer = 0
if int(month2) > buffer:
print()
buffer = int(month2)
month1=driver.find_elements_by_xpath('//*[starts-with(#id, "td_")]/td[3]/span[3]')
for spisok_month in month1:
spisok_month = spisok_month.text.replace("'", "").replace(" ", "")
if int(spisok_month) > buffer:
print('ERROR')
elif int(spisok_month) < buffer:
print('OK')
else:
print('==')
buffer = int(spisok_month)
here I would also like to see OK or ERROR only once.
Any ideas?
The problem seems to be with the short form for loop in your first snippet. Basically it should look like the below:
AnyText = driver.find_elements_by_xpath('AnyXpath')
Text = 'name'
if all(Text in AnyText1.text for AnyText1 in AnyText):
print('OK')
else:
print('ERROR')
UPDATE:
On the updated part of your question, this is a different implementation as you have to update the condition in each iteration. For readability, it probably makes sense to keep this expanded:
outcome = 'OK'
for spisok_month in month1:
spisok_month = spisok_month.text.replace("'", "").replace(" ", "")
if int(spisok_month) > buffer:
outcome = 'ERROR'
elif outcome == 'OK' and int(spisok_month) == buffer:
outcome = '=='
buffer = int(spisok_month)
print(outcome)
Note: The update is almost a separate question. this means that either your first question was not representative of the actual problem or you should ask in a separate post
In you code AnyText1 is a WebElement, not a text. You should use AnyText1.text to get text and then it will work:
AnyText = driver.find_elements_by_xpath('AnyXpath')
Text = 'name'
# AnyText1 is a WebElement and you should get text
if all(Text in AnyText1.text for AnyText1 in AnyText):
print('OK')
else:
print('ERROR')
Please check coding conventions to improve code style.

python v3.7 help on dictionary

I"m stuck on this problem when i tested the test cases and kept getting KeyError, is there another way to fix it?
All of the files are in the shared google drive.
https://drive.google.com/drive/folders/1OqrHxY42Cka9_H9pfA9VLQOkIuqoSQKN?usp=sharing
Code:
import csv
def read_votes(filename):
rows = []
columns = []
try:
with open(filename, 'r') as file:
csvreader = csv.reader(file)
column = next(csvreader)
for row in csvreader:
row.append(row)
dict{}
vote_dbase = {}
for row in rows:
state = row[0]
candidate = (row[1], row[2], row[3], row[4])
if int(row[3]) > 0:
if state in vote_dbase:
flag = 0
for i in range(len(vote_dbase[state])):
if row[1] < vote_dbase[state][i][0]:
vote_dbase[state].insert(i, candidate)
flag = 1
break
if flag == 0:
vote_dbase[state].append(candidate)
else:
vote_dbase[state] = [candidate]
return vote_dbase
except:
return False
Fail case with KeyError
It's not clear from your code what behaviour you want to occur, as we can't see the tests where the error occurs.
That said, doing some basic debugging, it seems you're not processing the input values properly. The problem is at
for row in csvreader:
row.append(row)
You are appending row to itself rather than your list rows. I think you want
for row in csvreader:
rows.append(row)
I would also recommend against putting everything in a try block and not doing anything with the exception. This means you can hit an error and you won't get the error message which would help you debug your code. Either don't use the try block or do something like this for the exception block:
except Exception as exception_instance:
print(exception_instance)
return False
There's also a random dict{} in your code which doesn't work.

Python 3.6.1: Code does not execute after a for loop

I've been learning Python and I wanted to write a script to count the number of characters in a text and calculate their relative frequencies. But first, I wanted to know the length of the file. My intention is that, while the script goes from line to line counting all the characters, it would print the current line and the total number of lines, so I could know how much it is going to take.
I executed a simple for loop to count the number of lines, and then another for loop to count the characters and put them in a dictionary. However, when I run the script with the first for loop, it stops early. It doesn't even go into the second for loop as far as I know. If I remove this loop, the rest of the code goes on fine. What is causing this?
Excuse my code. It's rudimentary, but I'm proud of it.
My code:
import string
fname = input ('Enter a file name: ')
try:
fhand = open(fname)
except:
print ('Cannot open file.')
quit()
#Problematic bit. If this part is present, the script ends abruptly.
#filelength = 0
#for lines in fhand:
# filelength = filelength + 1
counts = dict()
currentline = 1
for line in fhand:
if len(line) == 0: continue
line = line.translate(str.maketrans('','',string.punctuation))
line = line.translate(str.maketrans('','',string.digits))
line = line.translate(str.maketrans('','',string.whitespace))
line = line.translate(str.maketrans('','',""" '"’‘“” """))
line = line.lower()
index = 0
while index < len(line):
if line[index] not in counts:
counts[line[index]] = 1
else:
counts[line[index]] += 1
index += 1
print('Currently at line: ', currentline, 'of', filelength)
currentline += 1
listtosort = list()
totalcount = 0
for (char, number) in list(counts.items()):
listtosort.append((number,char))
totalcount = totalcount + number
listtosort.sort(reverse=True)
for (number, char) in listtosort:
frequency = number/totalcount*100
print ('Character: %s, count: %d, Frequency: %g' % (char, number, frequency))
It looks fine the way you are doing it, however to simulate your problem, I downloaded and saved a Guttenberg text book. It's a unicode issue. Two ways to resolve it. Open it as a binary file or add the encoding. As it's text, I'd go the utf-8 option.
I'd also suggest you code it differently, below is the basic structure that closes the file after opening it.
filename = "GutenbergBook.txt"
try:
#fhand = open(filename, 'rb')
#open read only and utf-8 encoding
fhand = open(filename, 'r', encoding = 'utf-8')
except IOError:
print("couldn't find the file")
else:
try:
for line in fhand:
#put your code here
print(line)
except:
print("Error reading the file")
finally:
fhand.close()
For the op, this is a specific occasion. However, for visitors, if your code below the for state does not execute, it is not a python built-in issue, most likely to be: an exception error handling in parent caller.
Your iteration is inside a function, which is called inside a try except block of caller, then if any error occur during the loop, it will get escaped.
This issue can be hard to find, especially when you dealing with intricate architecture.

How do you report duplicates in a txt. file?

In our class we were given the task to basically create a program that re-enacts the US election last year. One of the extra challenges is that when you enter an ID number that is already in the file, it should come up with an error and just stop. However, when I try to execute this code, it comes up with
ValueError: I/O operation on closed file.
This is the code I've done so far...
ID = input("Please input ID code ")
if(len(ID)) == 6:
print("ID length: Valid")
N += 1
else:
print("ID Code: Error")
sys.exit()
with open('ID.txt', 'a') as idc:
idc.write(ID + ' ')
already_seen = set()
for line in idc:
if line not in already_seen:
print("Valid")
else:
print("Error")
sys.exit()
Thanks
You should know the difference between the
with open('ID.txt', 'a') as idc:
do sth
and the
idc = open('ID.txt', 'a')
In the first case, after the do sth finished, the __exit__() of the idc will be called to close the file object.
I advise you to use the second expression that I indicate above. If you are new to Python, this blog will help you to understand the detail reasons.

Resources