attaching python objects (dictionaries) to existing pickle file - python-3.x

I'm new to python and I'm trying to use pickle to store a few python objects into a file. I know that while adding new objects to an existing pickle file I can load the existing objects and concatenate the new one:
# l is a list of existing dictionaries stored in the file:
l = pickle.load(open('existing_file.p', 'rb'))
new_dict = {'a': 1, 'b':2}
l = l + [new_dict]
# overwriting old file with the new content
pickle.dump(open('existing_file.p', 'rw'), l)
I wanted to check if there is any better way of attaching an object like a dictionary to an existing pickled file without overwriting the whole content.
Any hint or suggestion will be appreciated.

pickle knows the length of its serialized objects so you can just keep appending new pickled objects to the end of the list and read them one at a time later. After creating some pickled objects by appending to my pickle file,
>>> with open('test.pickle', 'ab') as out:
... pickle.dump((1,2,3), out)
...
>>> with open('test.pickle', 'ab') as out:
... pickle.dump((4,5,6), out)
I can read them back until I get an EOFError to know I'm done
>>> my_objects = []
>>> try:
... with open('test.pickle', 'rb') as infile:
... while True:
... my_objects.append(pickle.load(infile))
... except EOFError:
... pass
...
>>> my_objects
[(1, 2, 3), (4, 5, 6)]

Related

Adding data from a list to csv columns that already have data

I am new to Python, so I would like to ask:
I have a csv with two columns A and B.
A
B
1
testa
2
testb
What I want to do is to add data to this CSV. I have the data which I want to add in a list in Python.
This is mydata_list:
[[3, 'testd'], [4, 'teste'], [5, 'testf'], [6, 'testg']]
How do I add this mydata_list to my csv columns, which already have data?
I have been trying with something like this, but it doesn't work.
with open(filename, 'w') as file:
writer = csv.DictWriter(file, fieldnames=["A", "B"])
if row['A'] == 4:
for e in mydata_list:
writer.writerow(e)
you can use csv library for the same. Refer below:
# Importing library
import csv
# data to be written row-wise in csv file
data = [[3, 'testd'], [4, 'teste'], [5, 'testf'], [6, 'testg']]
# opening the csv file in 'a+' mode
file = open(filename, 'a+', newline ='')
# writing the data into the file
with file:
write = csv.writer(file)
write.writerows(data)

How to plot multiple graphs in one using 3 different files and another file to sort them?

I have 2 CSV files. One of them has the sorted data and another unsorted. Example data is as shown below.
I am trying to do is to take the unsorted data and sort it according to index numbers from the sorted data. Ex: in the sorted data, I have index number "1" corresponds to "name001.a.a". So, since it iss index number = "1", In the unsorted file, I want "name 001.a.a,0001" to be the first in the list. The number after the comma in unsorted file is 4 digit number which does not play a role in sorting but is attached to the names.
One more sample would be: index "2" is for "name002.a.a", so after sorting, new file would have "name002.a.a,0002" as a second item in the list
unsorted.csv:
name002.a.a,0002
name001.a.a,0001
name005.a.a,0025
hostnum.csv (sorted):
"1 name001.a.a"
"2 name002.a.a"
"3 name005.a.a"
I need help to figure out where I have coded wrong and if possible, need help with completing it.
EDIT- CODE:
After changing the name csv_list to csv_file, I am receiving the following error
from matplotlib import pyplot as plt
import numpy as np
import csv
csv_file = []
with open('hostnum.csv', 'r') as f:
csvreader = csv.reader(f)
for line in csvreader:
csv_file.append(line)
us_csv_file = []
with open('unsorted.csv', 'r') as f:
csvreader = csv.reader(f)
for line in csvreader:
us_csv_file.append(line)
us_csv_file.sort(key=lambda x: csv_file.index(x[1]))
plt.plot([int(item[1]) for item in us_csv_file], 'o-')
plt.xticks(np.arange(len(csvfile)), [item[0] for item in csvfile])
plt.show()
ERROR:
Traceback (most recent call last):
File "C:/..../TEST_ALL.py", line 16, in <module>
us_csv_file.sort(key=lambda x: csv_file.index(x[1]))
File "C:/..../TEST_ALL.py", line 16, in <lambda>
us_csv_file.sort(key=lambda x: csv_file.index(x[1]))
ValueError: '0002' is not in list
Well, you haven't defined csv_list in your code. Looking quickly through your code, I'd guess changing us_csv_file.sort(key=lambda x: csv_list.index(x[1])) to us_csv_file.sort(key=lambda x: csv_file.index(x[1])) (i.e. using the correct variable name, which is csv_file and not csv_list), might just solve the problem.
Here's a new attempt. This one tries to extract the numbers from the second column from hostnum.csv and puts them onto a separate list, which it then uses to sort the items. When I run this code, I get ValueError: '025' is not in list but I assume that's because you haven't given us the entire files and there is indeed no such line that would contain name025.a.a in the snippet of hostnum.csv you gave us, I also added a [1:] to the sorting statement.
If this doesn't work, try removing that [1:] and changing csv_file_numbers.append(csv_file[-1][1][4:].split('.')[0]) to csv_file_numbers.append(csv_file[-1][1][4:].split('.')[0].zfill(4)). string.zfill(4) will add zeros to the beginning of a string so long that its length is at least 4.
Because your sorted file contains one more zero than the unsorted file, I also changed
from matplotlib import pyplot as plt
import numpy as np
import csv
csv_file = []
csv_file_numbers = []
##with open('hostnum.csv', 'r') as f:
## csvreader = csv.reader(f, dialect="excel-tab")
## for line in csvreader:
## csv_file.append(line)
## csv_file_numbers.append(line[-1][4:].split('.')[0])
with open('hostnum.csv', 'r') as f:
sorted_raw = f.read()
for line in sorted_raw.splitlines():
csv_file.append(line.split('\t'))
csv_file_numbers.append(csv_file[-1][1][4:].split('.')[0])
us_csv_file = []
with open('unsorted.csv', 'r') as f:
csvreader = csv.reader(f)
for line in csvreader:
us_csv_file.append(line)
us_csv_file.sort(key=lambda x: csv_file_numbers.index(x[1][1:]))
plt.plot([int(item[1]) for item in us_csv_file], 'o-')
plt.xticks(np.arange(len(csvfile)), [item[0] for item in csvfile])
plt.show()
This one worked on my computer:
from matplotlib import pyplot as plt
import numpy as np
import csv
csv_file = []
csv_file_dict = {}
##with open('hostnum.csv', 'r') as f:
## csvreader = csv.reader(f, dialect="excel-tab")
## for line in csvreader:
## csv_file.append(line)
## csv_file_numbers.append(line[-1][4:].split('.')[0])
with open('hostnum.csv', 'r') as f:
sorted_raw = f.read()
for line in sorted_raw.splitlines():
csv_file.append(line.split('\t'))
csv_file_dict[csv_file[-1][-1][:-1]] = int(csv_file[-1][0][1:])
us_csv_file = []
with open('unsorted.csv', 'r') as f:
csvreader = csv.reader(f)
for line in csvreader:
us_csv_file.append(line)
us_csv_file.sort(key=lambda x: csv_file_dict[x[0]])
plt.plot([int(item[1]) for item in us_csv_file], 'o-')
plt.xticks(np.arange(len(csv_file)), [item[0] for item in csv_file])
plt.show()
So now I created a dict which stores the index values as values and the names of each cell that is found in both files as keys. I also removed the quotations manually, as for some reason, csv.reader didn't seem to do it correctly, at least it didn't handle the tabs in the desired way. As I wrote in one of my comments, I don't know why for sure, I'd guess it's because the quotations are not closed within the cells in the file. Anyway, I decided to split each line manually with string.split('\t').
Also, you had missed the underscore in the variable name csv_file from a couple of places at the end, so I added them.

Unit test for reading an excel file witth pandas

I need to write a unit test case for the below code :
def read_data(self, data):
"""Read data from excel file.
:param data:str, data in file
:return:str, data after reading excel file
"""
try:
read_data = pd.read_excel(data)
return read_data
except Exception as e:
logger.info("Not able to read data. Error :- {}".format(e))
raise e
I am reading an excel file in the above code, which gives me data like this:
refer screenshot.
So, How to store the above data after reading from excel sheet as dummy data so that I can assert it to my original data?
Thanks
Necroposting this because I had the same need.
this answer can point you in the right direction:
See also Saving the Dataframe output to a string in the XlsxWriter docs.
From the example you can build something like this:
import pandas as pd
import io
# Create a Pandas dataframe from the data.
df = pd.DataFrame({'Data': [10, 20, 30, 20, 15, 30, 45]})
output = io.BytesIO()
# Use the BytesIO object as the filehandle.
writer = pd.ExcelWriter(output, engine='xlsxwriter')
# Write the data frame to the BytesIO object.
df.to_excel(writer, sheet_name='Sheet1', index=False)
writer.save()
# Read the BytesIO object back to a data frame - here you should use your method
xlsx_data = pd.read_excel(output)
# Assert that the data frame is the same as the original
pd.testing.assert_frame_equal(xlsx_data, df)
Basically you flip the problem around: you build a data frame with some data in it, save it in a temporary file-like object, pass that object to your method, and then assert that the data is the same as the one you created.
NOTE: It needs pandas 0.17+

Convert Dictionary to String and back in python

I am writing to a dictionary to a file to save the data stored in it. When I read the file and try to convert it back it converts to a list. I added print(type()) to see what type it is going into the file and what time it is coming out.
import ast
f = open("testfile.txt", "a+")
print (type(dic1))
f.write(str(dic1.items()) + "\n")
f.close()
this me writing it to the file
([('people', '1'), ('date', '01/01/1970'), ('t0', 'epoch'), ('time', '0'), ('p0', 'Tim Berners-Lee'), ('memory', 'This is the day time was created')])
this is what it looks like in the written file.
loadDict = ast.literal_eval(x)
print (type(loadDict))
this is the code when trying to convert back to a dictionary
try using pickle, it is the preferred way to store and load python objects:
to store:
import pickle
with open("testfile.txt", "w+") as f:
pickle.dump(dic1,f)
to load:
import pickle
with open("testfile.txt", "r+") as f:
dic1 = pickle.load(f)
if you want to save multiple objects in a list you can save a list to the file then just load the list from file, add what you want to add to it then save it again

Saving Numpy Structure Array to *.mat file

I am using numpy.loadtext to generate a structured Numpy array from a CSV data file that I would like to save to a MAT file for colleagues who are more familiar with MATLAB than Python.
Sample case:
import numpy as np
import scipy.io
mydata = np.array([(1, 1.0), (2, 2.0)], dtype=[('foo', 'i'), ('bar', 'f')])
scipy.io.savemat('test.mat', mydata)
When I attempt to use scipy.io.savemat on this array, the following error is thrown:
Traceback (most recent call last):
File "C:/Project Data/General Python/test.py", line 6, in <module>
scipy.io.savemat('test.mat', mydata)
File "C:\python35\lib\site-packages\scipy\io\matlab\mio.py", line 210, in savemat
MW.put_variables(mdict)
File "C:\python35\lib\site-packages\scipy\io\matlab\mio5.py", line 831, in put_variables
for name, var in mdict.items():
AttributeError: 'numpy.ndarray' object has no attribute 'items'
I'm a Python novice (at best), but I'm assuming this is because savemat is set up to handle dicts and the structure of Numpy's structured arrays is not compatible.
I can get around this error by pulling my data into a dict:
tmp = {}
for varname in mydata.dtype.names:
tmp[varname] = mydata[varname]
scipy.io.savemat('test.mat', tmp)
Which loads into MATLAB fine:
>> mydata = load('test.mat')
mydata =
foo: [1 2]
bar: [1 2]
But this seems like a very inefficient method since I'm duplicating the data in memory. Is there a smarter way to accomplish this?
You can do scipy.io.savemat('test.mat', {'mydata': mydata}).
This creates a struct mydata with fields foo and bar in the file.
Alternatively, you can pack your loop in a dict comprehension:
tmp = {varname: mydata[varname] for varname in mydata.dtype.names}
I don't think creating a temprorary dictionary duplicates data in memory, because Python generally only stores references, and numpy in particular tries to create views into the original data whenever possible.

Resources