Convert Dictionary to String and back in python - string

I am writing to a dictionary to a file to save the data stored in it. When I read the file and try to convert it back it converts to a list. I added print(type()) to see what type it is going into the file and what time it is coming out.
import ast
f = open("testfile.txt", "a+")
print (type(dic1))
f.write(str(dic1.items()) + "\n")
f.close()
this me writing it to the file
([('people', '1'), ('date', '01/01/1970'), ('t0', 'epoch'), ('time', '0'), ('p0', 'Tim Berners-Lee'), ('memory', 'This is the day time was created')])
this is what it looks like in the written file.
loadDict = ast.literal_eval(x)
print (type(loadDict))
this is the code when trying to convert back to a dictionary

try using pickle, it is the preferred way to store and load python objects:
to store:
import pickle
with open("testfile.txt", "w+") as f:
pickle.dump(dic1,f)
to load:
import pickle
with open("testfile.txt", "r+") as f:
dic1 = pickle.load(f)
if you want to save multiple objects in a list you can save a list to the file then just load the list from file, add what you want to add to it then save it again

Related

How to write on each iteration to a csv file

How do i write to a csv file in each iteration one line.
I would like to have this kind of behaviour.
import time
import csv
path = 'C:/Blender_Scripts/test.csv'
for i in range(0,100):
time.sleep(1)
with open(path, 'a+', newline='') as Pt_file:
Pt_writer = csv.writer(Pt_file)
Pt_writer.writerow([i])
Is there a way to do this in a perfomance useful way?

Unit test for reading an excel file witth pandas

I need to write a unit test case for the below code :
def read_data(self, data):
"""Read data from excel file.
:param data:str, data in file
:return:str, data after reading excel file
"""
try:
read_data = pd.read_excel(data)
return read_data
except Exception as e:
logger.info("Not able to read data. Error :- {}".format(e))
raise e
I am reading an excel file in the above code, which gives me data like this:
refer screenshot.
So, How to store the above data after reading from excel sheet as dummy data so that I can assert it to my original data?
Thanks
Necroposting this because I had the same need.
this answer can point you in the right direction:
See also Saving the Dataframe output to a string in the XlsxWriter docs.
From the example you can build something like this:
import pandas as pd
import io
# Create a Pandas dataframe from the data.
df = pd.DataFrame({'Data': [10, 20, 30, 20, 15, 30, 45]})
output = io.BytesIO()
# Use the BytesIO object as the filehandle.
writer = pd.ExcelWriter(output, engine='xlsxwriter')
# Write the data frame to the BytesIO object.
df.to_excel(writer, sheet_name='Sheet1', index=False)
writer.save()
# Read the BytesIO object back to a data frame - here you should use your method
xlsx_data = pd.read_excel(output)
# Assert that the data frame is the same as the original
pd.testing.assert_frame_equal(xlsx_data, df)
Basically you flip the problem around: you build a data frame with some data in it, save it in a temporary file-like object, pass that object to your method, and then assert that the data is the same as the one you created.
NOTE: It needs pandas 0.17+

Compress a CSV file written to a StringIO Buffer in Python3

I'm parsing text from pdf files into rows of ordered char metadata; I need to serialize these files to cloud storage, which is all working fine, however due to their size I'd also like to gzip these files but I've run into some issues there.
Here is my code:
import io
import csv
import zlib
# This data file is sent over Flask
page_position_data = pdf_parse_page_layouts(data_file)
field_order = ['char', 'position', 'page']
output_buffer = io.StringIO()
writer = csv.DictWriter(output_buffer, field_order)
writer.writeheader()
for page, rows in page_position_data.items():
for text_char_data_row in rows:
writer.writerow(text_char_data_row)
stored_format = zlib.compress(output_buffer)
This reads each row into the io.StringIO Buffer successfully, but gzip/zlib seem to only work with bytes-like objects like io.BytesIO so the last line errors; I cannot create read a csv into a BytesIO Buffer because DictWriter/Writer error unless io.StringIO() is used.
Thank you for your help!
I figured this out and wanted to show my answer for anyone who runs into this:
The issue is that zlib.compress is expecting a Bytes-like object; this actually doesn't mean either StringIO or BytesIO as both of these are "file-like" objects which implment read() and your normal unix file handles.
All you have to do to fix this is use StringIO() to write the csv file to and then call get the string from the StringIO() object and encode it into a bytestring; it can then be compressed by zlib.
import io
import csv
import zlib
# This data file is sent over Flask
page_position_data = pdf_parse_page_layouts(data_file)
field_order = ['char', 'position', 'page']
output_buffer = io.StringIO()
writer = csv.DictWriter(output_buffer, field_order)
writer.writeheader()
for page, rows in page_position_data.items():
for text_char_data_row in rows:
writer.writerow(text_char_data_row)
encoded = output_buffer.getvalue().encode()
stored_format = zlib.compress(encoded)
I have an alternative answer for anyone interested which should use less intermediate space, it needs python 3.3 and over to use the getbuffer() method:
from io import BytesIO, TextIOWrapper
import csv
import zlib
def compress_csv(series):
byte_buf = BytesIO()
fp = TextIOWrapper(byte_buf, newline='', encoding='utf-8')
writer = csv.writer(fp)
for row in series:
writer.writerow(row)
compressed = zlib.compress(byte_buf.getbuffer())
fp.close()
byte_buf.close()
return compressed

Python - Storing float values in CSV file

I am trying to store the positive and negative score of statements in a text file. I want to store the score in a csv file. I have implemented the below given code:
import openpyxl
from nltk.tokenize import sent_tokenize
import csv
from senti_classifier import senti_classifier
from nltk.corpus import wordnet
file_content = open('amazon_kindle.txt')
for lines in file_content:
sentence = sent_tokenize(lines)
pos_score,neg_score = senti_classifier.polarity_scores(sentence)
with open('target.csv','w') as f:
writer = csv.writer(f,lineterminator='\n',delimiter=',')
for val in range(pos_score):
writer.writerow(float(s) for s in val[0])
f.close()
But the code displays me the following error in for loop.
Traceback (most recent call last):
File "C:\Users\pc\AppData\Local\Programs\Python\Python36-32\classifier.py",
line 21, in for val in pos_score: TypeError: 'float' object is not iterable
You have several errors with your code:
Your code and error do not correspond with each other.
for val in pos_score: # traceback
for val in range(pos_score): #code
pos_score is a float so both are errors range() takes an int and for val takes an iterable. Where do you expect to get your list of values from?
And from usage it looks like you are expecting a list of list of values because you are also using a generator expression in your writerow
writer.writerow(float(s) for s in val[0])
Perhaps you are only expecting a list of values so you can get rid of the for loop and just use:
writer.writerow(float(val) for val in <list_of_values>)
Using:
with open('target.csv','w') as f:
means you no longer need to call f.close() and with closes the file at the end of the with block. This also means the writerow() needs to be in the with block:
with open('target.csv','w') as f:
writer = csv.writer(f,lineterminator='\n',delimiter=',')
writer.writerow(float(val) for val in <list_of_values>)

attaching python objects (dictionaries) to existing pickle file

I'm new to python and I'm trying to use pickle to store a few python objects into a file. I know that while adding new objects to an existing pickle file I can load the existing objects and concatenate the new one:
# l is a list of existing dictionaries stored in the file:
l = pickle.load(open('existing_file.p', 'rb'))
new_dict = {'a': 1, 'b':2}
l = l + [new_dict]
# overwriting old file with the new content
pickle.dump(open('existing_file.p', 'rw'), l)
I wanted to check if there is any better way of attaching an object like a dictionary to an existing pickled file without overwriting the whole content.
Any hint or suggestion will be appreciated.
pickle knows the length of its serialized objects so you can just keep appending new pickled objects to the end of the list and read them one at a time later. After creating some pickled objects by appending to my pickle file,
>>> with open('test.pickle', 'ab') as out:
... pickle.dump((1,2,3), out)
...
>>> with open('test.pickle', 'ab') as out:
... pickle.dump((4,5,6), out)
I can read them back until I get an EOFError to know I'm done
>>> my_objects = []
>>> try:
... with open('test.pickle', 'rb') as infile:
... while True:
... my_objects.append(pickle.load(infile))
... except EOFError:
... pass
...
>>> my_objects
[(1, 2, 3), (4, 5, 6)]

Resources