np.save is converting floats to weird characters - python-3.x

I am attempting to append results to an ongoing csv file. Each result comes out as an nd.array:
[IN]: Print(savearray)
[OUT]: [[ 0.55219001 0.39838119]]
Initially I tried
np.savetxt('flux_ratios.csv', savearray,delimiter=",")
But this overwrites the old data every time I save, so instead I am attempting to append the data like this:
f = open('flux_ratios.csv', 'ab')
np.save(f, 'a',savearray)
f.close()
This is (in a sense) appending, however it is saving the numerical data as weird characters, as can be seen in this screenshot:
I have no idea why or how this is happening so any help would be greatly appreciated!

First off, np.save does not write text whereas np.savetxt does. You are trying to combine binary with text, which is why you get the odd characters when you try to read the file.
You could just change np.save(f, 'a', savearray) to np.savetxt(f, savearray, delimiter=',').
Otherwise you could also consider using pandas.to_csv in append mode.

Related

Why does page 323 from Automate The boring Stuff generate an int of 21?

I'm going back through the book "Automate the boring stuff" (which has been a great book btw)as I need to brush up on CSV parsing for a project and I'm trying to understand why each output is generated. Why does this code from page 323 create an output of '21', when it's four words, 16 characters, and three commas. Not to mention that I'm entering strings and it outputs numbers.
#%%
import csv
outputFile = open('output.csv', 'w', newline='')
outputWriter = csv.writer(outputFile)
outputWriter.writerow(['spam', 'eggs', 'bacon', 'ham'])
First I thought ok it's the number of characters, but that adds up to 16. Then I thought ok each word maybe has a space plus one at the beginning and end of the CSV file? Which does technically maybe explain but nothing explicit, it's more "oh it's obvious because " but it's not explicitly stated. I'm not seeing a reference to the addition of or how that number is created.
There seems like a plausible explanation but I don't understand why it's 21.
I've tried breakpoint or pdb but I'm still learning how to use those, to get the following breakdown which I don't see containing anything that answers it. No counting or summation that I can see.
The docs state that csv.csvwriter.csvwriterow returns "the return value of the call to the write method of the underlying file object.".
In your example
outputFile = open('output.csv', 'w', newline='')
is your underlying file object which you then hand to csv.writer().
If we look a bit deeper we can find the type of outputFile with print(type(outputFile)).
<class '_io.TextIOWrapper'>
While the docs don't explicitly define the write method for TextIOWrapper, it does state that it inherits from TextIOBase, which defines it's write() method as "Write the string s to the stream and return the number of characters written.".
If we look at the text file written:
spam,eggs,bacon,ham
We see that it indeed has 21 characters.

Read .csv that contains commas

I have a .csv file that contains multiple columns with texts in it. These texts contain commas, which makes things messy when I try to read the file into Python.
When I tried:
import pandas as pd
directory = 'some directory'
dataset = pd.read_csv(directory)
I got the following error:
ParserError: Error tokenizing data. C error: Expected 3 fields in line 42, saw 5
After doing some research, I found the clevercsv package.
So, I ran:
import clevercsv as csv
dataset = csv.read_csv(directory)
Running this, I got the error:
UnicodeDecodeError: 'charmap' codec can't decode byte 0x8f in position 4359705: character maps to <undefined>
To overcome this, I tried:
dataset = csv.read_csv(directory, encoding="utf8")
However, 10 hours later my computer was still working on reading it. So I expect that something went wrong there.
Furthermore, when I open the file in Excel, it does split cells well. Therefore, What I tried was to save the .csv file as a .xlsx and then save it as .csv in Python with an uncommon delimiter ('~'). However, when I save my .csv file as a .xlsx file, the size of the file gets smaller, which indicates that only a part of the file is saved and that is not what I want.
Lastly, I have tried the solutions given here and here. But neither seem to work for my problem.
Given that Excel reads in the file without problems, I do expect that it should be possible to read it into Python as well. Who can help me with this?
UPDATE:
When using dataset = pd.read_csv(directory, sep = ',', error_bad_lines=False)I manage to read in the .csv. But many lines are skipped. Is there a better way for this?
panda should be work https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html
Dou you tried somthing like dataset = pd.read_csv(directory, sep = ',', header = None)
Regards

How to remove special usual characters from a pandas dataframe using Python

I have a file some crazy stuff in it. It looks like this:
I attempted to get rid of it using this:
df['firstname'] = map(lambda x: x.decode('utf-8','ignore'), df['firstname'])
But I wound up with this in my dataframe: <map object at 0x0000022141F637F0>
I got that example from another question and this seems to be the Python3 method for doing this but I'm not sure what I'm doing wrong.
Edit: For some odd reason someone thinks that this has something to do with getting a map to return a list. The central issue is getting rid of non UTF-8 characters. Whether or not I'm even doing that correctly has yet to be established.
As I understand it, I have to apply an operation to every character in a column of the dataframe. Is there another technique or is map the correct way and if it is, why am I getting the output I've indicated?
Edit2: For some reason, my machine wouldn't let me create an example. I can now. This is what i'm dealing with. All those weird characters need to go.
import pandas as pd
data = [['🦎Ale','Αλέξανδρα'],['��Grain','Girl🌾'],['Đỗ Vũ','ên Anh'],['Don','Johnson']]
df = pd.DataFrame(data,columns=['firstname','lastname'])
print(df)
Edit 3: I tired doing this using a reg ex and for some reason, it still didn't work.
df['firstname'] = df['firstname'].replace('[^a-zA-z\s]',' ')
This regex works FINE in another process, but here, it still leaves the ugly characters.
Edit 4: It turns out that it's image data that we're looking at.

read login data from text file into dictionary error

Using the answer on Stack Overflow shown on this link: https://stackoverflow.com/a/4804039, I have attempted to read in the file contents into a dictionary. There is an error that I cannot seem to fix.
Code
def login():
print("====Login====")
userinfo={}
with open("userinfo.txt","r") as f:
for line in f:
(key,val)=line.split()
userinfo[key]=val
print(userinfo)
File Contents
{'user1': 'pass'}
{'user2': 'foo'}
{'user3': 'boo'}
Error:
(key,val)=line.split()
ValueError: not enough values to unpack (expected 2, got 0)
I have a question to which I would very much appreciate a two fold answer
What is the best and most efficient way to read in file contents, as shown, into a dictionary, noting that it has already been stored in dictionary format.
Is there a way to WRITE to a dictionary to make this "reading" easier? My code for writing to the userinfo.txt file in the first place is shown below
Write code
with open("userinfo.txt","a",newline="")as fo:
writer=csv.writer(fo)
writer.writerow([{username:password}])
Could any answers please attempt the following
Provide a solution to the error using the original code
Suggest the best method to do the same thing (simplest for teaching purposes) Note, that I do not wish to use pickle, json or anything other than very basic file handling (so only reading from a text file or csv reader/writer tools). For instance, would it be best to read the file contents into a list and then convert the list into a dictionary? Or is there any other way?
Is there a method of writing a dictionary to a text file using csv reader or other basic txt file handling, so that the reading of the file contents into a dictionary could be done more effectively on the other end.
Update:
Blank line removed, and the code works but produces the erroneous output:
{"{"Vjr':": "'open123'}", "{'mvj':": "'mvv123'}"}
I think I need to understand the split and strip commands and how to use them in this context to produce the desired result (reading the contents into a dictionary userinfo)
Well let's start with the basics first. The error message:
ValueError: not enough values to unpack (expected 2, got 0)
means a line was empty, so do you have a blank line in the file?
Yes, there are other options on saving your dictionary out and bringing it back, but first you should understand this, and may work just fine for you. :-) The split() is acting on the string you read from the file, and by default will split on the space, so that is what you are seeing. You could format your text file like 'username:pass' instead and then use split(':").
File Contents
user1:pass
user2:foo
user3:boo
Code
def login():
print("====Login====")
userinfo={}
with open("userinfo.txt","r") as f:
for line in f:
(key,val)=line.split(':')
userinfo[key]=val.strip()
print(userinfo)
if __name__ == '__main__':
login()
This simple format may be best if you want to be able to edit the text file by hand, and I like to keep it simple as possible. ;-)

Overwriting specific lines in Python

I have a simple program that manipulates some stored data on some text files. However I have to store the name and the password on different files for python to read.
I was wondering if I could get these two words (The name and the password) on two separate lines on one file and get python to overwrite just one of the lines based on what I choose to overwrite (either the password or the name).
I can get python to read specific lines with:
linenumber=linecache.getline("example.txt",4)
Ideally id like something like this:
linenumber=linecache.writeline("example.txt","Hello",4)
So this would just write "Hello" in "example.txt" only on line 4.
But unfortunately it doesn't seem to be as simple as that, I can get the words to be stored on separate files but overall doing this on a larger scale, I'm going to have a lot of text files all named differently and with different words on them.
If anyone would be able to help, it would be much appreciated!
Thanks, James.
You can try with built in open() function:
def overwrite(filename,newline,linenumber):
try:
with open(filename,'r') as reading:
lines = reading.readlines()
lines[linenumber]=newline+'\n'
with open(filename,'w') as writing:
for i in lines:
writing.write(i)
return 0
except:
return 1 #when reading/writing gone wrong, eg. no such a file
Be careful! It is writing all the lines all over again in a loop and when it comes to exception example.txt may already be blank. You may want to store all the lines in list all the time to write them back to file in exception. Or keep backup of your old files.

Resources