Read .csv that contains commas - python-3.x

I have a .csv file that contains multiple columns with texts in it. These texts contain commas, which makes things messy when I try to read the file into Python.
When I tried:
import pandas as pd
directory = 'some directory'
dataset = pd.read_csv(directory)
I got the following error:
ParserError: Error tokenizing data. C error: Expected 3 fields in line 42, saw 5
After doing some research, I found the clevercsv package.
So, I ran:
import clevercsv as csv
dataset = csv.read_csv(directory)
Running this, I got the error:
UnicodeDecodeError: 'charmap' codec can't decode byte 0x8f in position 4359705: character maps to <undefined>
To overcome this, I tried:
dataset = csv.read_csv(directory, encoding="utf8")
However, 10 hours later my computer was still working on reading it. So I expect that something went wrong there.
Furthermore, when I open the file in Excel, it does split cells well. Therefore, What I tried was to save the .csv file as a .xlsx and then save it as .csv in Python with an uncommon delimiter ('~'). However, when I save my .csv file as a .xlsx file, the size of the file gets smaller, which indicates that only a part of the file is saved and that is not what I want.
Lastly, I have tried the solutions given here and here. But neither seem to work for my problem.
Given that Excel reads in the file without problems, I do expect that it should be possible to read it into Python as well. Who can help me with this?
UPDATE:
When using dataset = pd.read_csv(directory, sep = ',', error_bad_lines=False)I manage to read in the .csv. But many lines are skipped. Is there a better way for this?

panda should be work https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html
Dou you tried somthing like dataset = pd.read_csv(directory, sep = ',', header = None)
Regards

Related

python 3 coding issue

I create a JSON file with VSC (or Notepad++). This contains an array of strings. One of the strings is "GRÜN". Then I read the file with Python 3.
with codecs.open(file,'r',"iso-8859-15") as infile:
dictionarry = json.load(infile)
If I print the array (inside in "dictionary") to the console, I see: "GRÃ\x9cN"
How can I convert "GRÃ\x9cN" to "GRÜN" again?
I try to read the JSON file with codec "iso-8859-1" too, but the issue still occurred.

Removing double quotes from all values in CSV file before upload in S3 [duplicate]

I have a process where a CSV file can be downloaded, edited then uploaded again. On the download, the CSV file is in the correct format, with no wrapping double quotes
1, someval, someval2
When I open the CSV in a spreadsheet, edit and save, it adds double quotes around the strings
1, "someEditVal", "someval2"
I figured this was just the action of the spreadsheet (in this case, openoffice). I want my upload script to remove the wrapping double quotes. I cannot remove all quotes, just incase the body contains them, and I also dont want to just check first and last characters for double quotes.
Im almost sure that the CSV library in python would know how to handle this, but not sure how to use it...
EDIT
When I use the values within a dictionary, they turn out as follows
{'header':'"value"'}
Thanks
For you example, the following works:
import csv
writer = csv.writer(open("out.csv", "wb"), quoting=csv.QUOTE_NONE)
reader = csv.reader(open("in.csv", "rb"), skipinitialspace=True)
writer.writerows(reader)
You might need to play with the dialect options of the CSV reader and writer -- see the documentation of the csv module.
Thanks to everyone who was trying to help me, but I figured it out. When specifying the reader, you can define the quotechar
csv.reader(upload_file, delimiter=',', quotechar='"')
This handles the wrapping quotes of strings.
For Python 3:
import csv
writer = csv.writer(open("query_result.csv", "wt"), quoting=csv.QUOTE_NONE, escapechar='\\')
reader = csv.reader(open("out.txt", "rt"), skipinitialspace=True)
writer.writerows(reader)
The original answer gives this error under Python 3. Also See this SO for detail: csv.Error: iterator should return strings, not bytes
Traceback (most recent call last):
File "remove_quotes.py", line 11, in
writer.writerows(reader)
_csv.Error: iterator should return strings, not bytes (did you open the file in text mode?)

result shows file full of some symbols rather than text when I loop files

I was looping some files to copy the content of somes file to a new file but after I run the code, the result shows lot of symbols in the new file, not the text content of the files I looped.
first, when I ran the code without putting the 'encoding' attribute in open file line, it showed an error message like,
UnicodeEncodeError: 'charmap' codec can't encode character '\x8b' in position 12: character maps to .
I tried various encodings like utf-8,latin1 but nothing worked and when i put 'errors=ignore' in the open file line, then the result showed like I described above.
import os
import glob
folder = os.path.join('R:', os.sep, 'Files')
def notes():
for doc in glob.glob(folder + r'\*'):
if doc.endswith('.pdf'):
with open(doc,'r') as f:
x = f.readlines()
with open('doc1.text', 'w+') as f1:
for line in x:
f1.write(line)
notes()
If I understand your example correctly and you’re trying to read PDF files, your problem is not one of encoding but of file format. PDF files don’t just to store your text in coding materials are unique format that you need to be able to read in order to extract the text. There are a couple of python libraries that can read PDF files (such as Py2PDF), please refer to this thread for more information: How to extract text from a PDF file?

np.save is converting floats to weird characters

I am attempting to append results to an ongoing csv file. Each result comes out as an nd.array:
[IN]: Print(savearray)
[OUT]: [[ 0.55219001 0.39838119]]
Initially I tried
np.savetxt('flux_ratios.csv', savearray,delimiter=",")
But this overwrites the old data every time I save, so instead I am attempting to append the data like this:
f = open('flux_ratios.csv', 'ab')
np.save(f, 'a',savearray)
f.close()
This is (in a sense) appending, however it is saving the numerical data as weird characters, as can be seen in this screenshot:
I have no idea why or how this is happening so any help would be greatly appreciated!
First off, np.save does not write text whereas np.savetxt does. You are trying to combine binary with text, which is why you get the odd characters when you try to read the file.
You could just change np.save(f, 'a', savearray) to np.savetxt(f, savearray, delimiter=',').
Otherwise you could also consider using pandas.to_csv in append mode.

read login data from text file into dictionary error

Using the answer on Stack Overflow shown on this link: https://stackoverflow.com/a/4804039, I have attempted to read in the file contents into a dictionary. There is an error that I cannot seem to fix.
Code
def login():
print("====Login====")
userinfo={}
with open("userinfo.txt","r") as f:
for line in f:
(key,val)=line.split()
userinfo[key]=val
print(userinfo)
File Contents
{'user1': 'pass'}
{'user2': 'foo'}
{'user3': 'boo'}
Error:
(key,val)=line.split()
ValueError: not enough values to unpack (expected 2, got 0)
I have a question to which I would very much appreciate a two fold answer
What is the best and most efficient way to read in file contents, as shown, into a dictionary, noting that it has already been stored in dictionary format.
Is there a way to WRITE to a dictionary to make this "reading" easier? My code for writing to the userinfo.txt file in the first place is shown below
Write code
with open("userinfo.txt","a",newline="")as fo:
writer=csv.writer(fo)
writer.writerow([{username:password}])
Could any answers please attempt the following
Provide a solution to the error using the original code
Suggest the best method to do the same thing (simplest for teaching purposes) Note, that I do not wish to use pickle, json or anything other than very basic file handling (so only reading from a text file or csv reader/writer tools). For instance, would it be best to read the file contents into a list and then convert the list into a dictionary? Or is there any other way?
Is there a method of writing a dictionary to a text file using csv reader or other basic txt file handling, so that the reading of the file contents into a dictionary could be done more effectively on the other end.
Update:
Blank line removed, and the code works but produces the erroneous output:
{"{"Vjr':": "'open123'}", "{'mvj':": "'mvv123'}"}
I think I need to understand the split and strip commands and how to use them in this context to produce the desired result (reading the contents into a dictionary userinfo)
Well let's start with the basics first. The error message:
ValueError: not enough values to unpack (expected 2, got 0)
means a line was empty, so do you have a blank line in the file?
Yes, there are other options on saving your dictionary out and bringing it back, but first you should understand this, and may work just fine for you. :-) The split() is acting on the string you read from the file, and by default will split on the space, so that is what you are seeing. You could format your text file like 'username:pass' instead and then use split(':").
File Contents
user1:pass
user2:foo
user3:boo
Code
def login():
print("====Login====")
userinfo={}
with open("userinfo.txt","r") as f:
for line in f:
(key,val)=line.split(':')
userinfo[key]=val.strip()
print(userinfo)
if __name__ == '__main__':
login()
This simple format may be best if you want to be able to edit the text file by hand, and I like to keep it simple as possible. ;-)

Resources