Saving Numpy Structure Array to *.mat file - python-3.x

I am using numpy.loadtext to generate a structured Numpy array from a CSV data file that I would like to save to a MAT file for colleagues who are more familiar with MATLAB than Python.
Sample case:
import numpy as np
import scipy.io
mydata = np.array([(1, 1.0), (2, 2.0)], dtype=[('foo', 'i'), ('bar', 'f')])
scipy.io.savemat('test.mat', mydata)
When I attempt to use scipy.io.savemat on this array, the following error is thrown:
Traceback (most recent call last):
File "C:/Project Data/General Python/test.py", line 6, in <module>
scipy.io.savemat('test.mat', mydata)
File "C:\python35\lib\site-packages\scipy\io\matlab\mio.py", line 210, in savemat
MW.put_variables(mdict)
File "C:\python35\lib\site-packages\scipy\io\matlab\mio5.py", line 831, in put_variables
for name, var in mdict.items():
AttributeError: 'numpy.ndarray' object has no attribute 'items'
I'm a Python novice (at best), but I'm assuming this is because savemat is set up to handle dicts and the structure of Numpy's structured arrays is not compatible.
I can get around this error by pulling my data into a dict:
tmp = {}
for varname in mydata.dtype.names:
tmp[varname] = mydata[varname]
scipy.io.savemat('test.mat', tmp)
Which loads into MATLAB fine:
>> mydata = load('test.mat')
mydata =
foo: [1 2]
bar: [1 2]
But this seems like a very inefficient method since I'm duplicating the data in memory. Is there a smarter way to accomplish this?

You can do scipy.io.savemat('test.mat', {'mydata': mydata}).
This creates a struct mydata with fields foo and bar in the file.
Alternatively, you can pack your loop in a dict comprehension:
tmp = {varname: mydata[varname] for varname in mydata.dtype.names}
I don't think creating a temprorary dictionary duplicates data in memory, because Python generally only stores references, and numpy in particular tries to create views into the original data whenever possible.

Related

Changes csv row value

This is my code:
import pandas as pd
import re
# reading the csv file
patients = pd.read_csv("partial.csv")
# updating the column value/data
for patient in patients.iterrows():
cip=patient['VALOR_ID']
new_cip = re.sub('^(\w+|)',r'FIXED_REPLACED_STRING',cip)
patient['VALOR_ID'] = new_cip
# writing into the file
df.to_csv("partial-writer.csv", index=False)
print(df)
I'm getting this message:
Traceback (most recent call last):
File "/home/jeusdi/projects/workarea/salut/load-testing/load.py", line 28, in
cip=patient['VALOR_ID']
TypeError: tuple indices must be integers or slices, not str
EDIT
Form code above you can think I need to set a same fixed value to all rows.
I need to loop over "rows" and generate a random string and set it on each different "row".
Code above would be:
for patient in patients.iterrows():
new_cip = generate_cip()
patient['VALOR_ID'] = new_cip
Use Series.str.replace, but not sure about | in regex. Maybe should be removed it:
df = pd.read_csv("partial.csv")
df['VALOR_ID'] = df['VALOR_ID'].str.replace('^(\w+|)',r'FIXED_REPLACED_STRING')
#if function return scalars
df['VALOR_ID'] = df['VALOR_ID'].apply(generate_cip)
df.to_csv("partial-writer.csv", index=False)

enquiry on uniformly distributed random numbers using python

please how can I randomly generate 5,000 integers uniformly distributed in [1, 100] and find the mean using python. I tried the function np.random.randint(100, size=5000), but I got below error message while trying to get the mean.
Traceback (most recent call last):
File "", line 1, in
TypeError: 'numpy.ndarray' object is not callable
You can use random.randint:
import numpy as np
r=np.random.randint(0,100,5000)
Then use mean to find the mean of that:
>>> np.mean(r)
49.4686
You can also use the array method of mean():
>>> r.mean()
49.4686
you can use this:
np.random.randint(1, 100, size=1000).mean()

how calc values in array imported from csv.reader?

I have this csv file
Germany,1,5,10,20
UK,0,2,4,10
Hungary,6,11,22,44
France,8,22,33,55
and this script,
I would like to make some aritmetic operations with values in 2D array(data)
For example print value (data[1][3]) increased of 10,
Seems that I need some conversion to integer, right ?
What is best solution please ?
import csv
datafile = open('sample.csv', 'r')
datareader = csv.reader(datafile, delimiter=',')
data = []
for row in datareader:
data.append(row)
print ((data[1][3])+10)
I got this error
/python$ python3 read6.py
Traceback (most recent call last):
File "read6.py", line 8, in <module>
print ((data[1][3])+10)
TypeError: must be str, not int
You'll have to manually convert to integers as you suspected:
import csv
datafile = open('sample.csv', 'r')
datareader = csv.reader(datafile, delimiter=',')
data = []
for row in datareader:
data.append([row[0]] + list(map(int, row[1:])))
print ((data[1][3])+10)
Specifically this modification on line 7 of your code:
data.append([row[0]] + list(map(int, row[1:])))
The csv package docs mention that
No automatic data type conversion is performed unless the QUOTE_NONNUMERIC format option is specified (in which case unquoted fields are transformed into floats).
Since the strings in your CSV are not quoted (i.e. "Germany" instead of Germany), this isn't useful for your case, so converting manually is the way to go.

Python - Storing float values in CSV file

I am trying to store the positive and negative score of statements in a text file. I want to store the score in a csv file. I have implemented the below given code:
import openpyxl
from nltk.tokenize import sent_tokenize
import csv
from senti_classifier import senti_classifier
from nltk.corpus import wordnet
file_content = open('amazon_kindle.txt')
for lines in file_content:
sentence = sent_tokenize(lines)
pos_score,neg_score = senti_classifier.polarity_scores(sentence)
with open('target.csv','w') as f:
writer = csv.writer(f,lineterminator='\n',delimiter=',')
for val in range(pos_score):
writer.writerow(float(s) for s in val[0])
f.close()
But the code displays me the following error in for loop.
Traceback (most recent call last):
File "C:\Users\pc\AppData\Local\Programs\Python\Python36-32\classifier.py",
line 21, in for val in pos_score: TypeError: 'float' object is not iterable
You have several errors with your code:
Your code and error do not correspond with each other.
for val in pos_score: # traceback
for val in range(pos_score): #code
pos_score is a float so both are errors range() takes an int and for val takes an iterable. Where do you expect to get your list of values from?
And from usage it looks like you are expecting a list of list of values because you are also using a generator expression in your writerow
writer.writerow(float(s) for s in val[0])
Perhaps you are only expecting a list of values so you can get rid of the for loop and just use:
writer.writerow(float(val) for val in <list_of_values>)
Using:
with open('target.csv','w') as f:
means you no longer need to call f.close() and with closes the file at the end of the with block. This also means the writerow() needs to be in the with block:
with open('target.csv','w') as f:
writer = csv.writer(f,lineterminator='\n',delimiter=',')
writer.writerow(float(val) for val in <list_of_values>)

attaching python objects (dictionaries) to existing pickle file

I'm new to python and I'm trying to use pickle to store a few python objects into a file. I know that while adding new objects to an existing pickle file I can load the existing objects and concatenate the new one:
# l is a list of existing dictionaries stored in the file:
l = pickle.load(open('existing_file.p', 'rb'))
new_dict = {'a': 1, 'b':2}
l = l + [new_dict]
# overwriting old file with the new content
pickle.dump(open('existing_file.p', 'rw'), l)
I wanted to check if there is any better way of attaching an object like a dictionary to an existing pickled file without overwriting the whole content.
Any hint or suggestion will be appreciated.
pickle knows the length of its serialized objects so you can just keep appending new pickled objects to the end of the list and read them one at a time later. After creating some pickled objects by appending to my pickle file,
>>> with open('test.pickle', 'ab') as out:
... pickle.dump((1,2,3), out)
...
>>> with open('test.pickle', 'ab') as out:
... pickle.dump((4,5,6), out)
I can read them back until I get an EOFError to know I'm done
>>> my_objects = []
>>> try:
... with open('test.pickle', 'rb') as infile:
... while True:
... my_objects.append(pickle.load(infile))
... except EOFError:
... pass
...
>>> my_objects
[(1, 2, 3), (4, 5, 6)]

Resources