Why is csvreader for Python starting good then producing NULL bytes? - excel

OKay so I am reading an excel workbook. I read the file for a while and it started off a .csv after debugging and doing other things below the code i am showing you it changed to a xlsx I started getting IOError no such file or directory. I figured out why and changed FFA.csv to FFA.xlsx and it worked error free. Then I started doing other things and debugging. Got up this morning and now i get the following Error : line contains NULL byte. weird because the code started out good. Now it can't read. I put in the print repr() to debug and it infact now prints NULL bytes. So how do i fix this and prevent it in the future? here is the 1st 200 bytes:
PK\x03\x04\x14\x00\x06\x00\x08\x00\x00\x00!\x00b\xee\x9dh^\x01\x00\x00\x90\x04\x00\x00\x13\x00\x08\x02[Content_Types].xml \xa2\x04\x02(\xa0\x00\x02\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
import csv
def readFile():
count = 0
print repr(open("FFA.xlsx", "rb").read(200)) #dump 1st 200 bytes
with open("FFA.xlsx","rb") as csvfile:
FFAreader = csv.reader(csvfile, delimiter=",")
for row in FFAreader:
idd = row[0]
name = row[1]
pos = row[2]
team = row[3]
pts = row[4]
oecr = row[5]
oR = row[6]
posR = row[7]
up = row[8]
low =row[9]
risk = row[10]
swing = row[11]
readFile()

The code you have posted have a small but dangerous mistake, since you are leaking the file handle by opening it twice.
1) You are opening the file and reading 200 bytes from it, but not closing it.
2) You are then opening the file the proper way, via a context manager, which in fact could read anything from it.
Some questions that may help you to debug the problem:
Is the file you are opening stored in a network'd resource? (CIFS, NFS, etc)
Have you checked the file is not opened by another process? lsof can help you to check that.
Is this running on windows or Linux? Can you test in under linux, if it happens in windows, and viceversa?
I forgot to mention that you should not use CSV for anything related to Excel, even when the file seems to be a CSV data-wise. Use XLRD module (https://pypi.python.org/pypi/xlrd) , it's cross-platform and opens and reads perfectly fine both XSL and XSLX files since version 0.8.
This little piece of code will show you how to open the workbook and parse it in a basic manner:
import xlrd
def open_excel():
with xlrd.open_workbook('FFA.xlsx') as wb:
sh = wb.sheet_by_name('Sheet1')
for rownum in xrange(sh.nrows):
[Do whatever you need here]

I agree with Marc, I did a training exercise importing an excel file and I think pandas library would help in that case where you can import pandas as pd and use pd.read_excel(file_name) as part of a data_processing function like read_file() post import.

So this is what I did. But I am intersted in learning the xlrd method i have the module but no documentation. This works no error messages. Still not sure why it changed from .csv to xlsx but its working now. What is the script like in xlrd?
import csv
def readFile():
count = 0
#print repr(open("FFA.csv", "rb").read(200)) #dump 1st 200 bytes check if null values produced.
with open("FFA.csv","rb") as csvfile:
FFAreader = csv.reader(csvfile, delimiter=",")
for row in FFAreader:
idd = row[0]
name = row[1]
pos = row[2]
team = row[3]
pts = row[4]
oecr = row[5]
oR = row[6]
posR = row[7]
up = row[8]
low =row[9]
risk = row[10]
swing = row[11]
readFile()

Related

Python 3.9: For loop is not producing output files eventhough no errors are displayed

everyone, I am fairly new to using python for data analysis,so apologies for silly questions:
IDE : PyCharm
What I have : A massive .xyz file (with 4 columns) which is a combination of several datasets, each dataset can be determined by the third column of the file which goes from 10,000 to -10,000 with 0 in between and 100 as spacing and repeats (so every 201 rows is one dataset)
What I want to do : Split the massive file into its individual datasets (201 rows each)and save each file under a different name.
What I have done so far :
# Import packages
import os
import pandas as pd
import numpy as np #For next steps
import math #For next steps
#Check and Change directory
path = 'C:/Clayton/lines/profiles_aufmod'
os.chdir(path)
print(os.getcwd()) #Correct path is printed
# split the xyz file into different files for each profile
main_xyz = 'bathy_SPO_1984_50x50_profile.xyz'
number_lines = sum(1 for row in (open(main_xyz)))
print(number_lines) # 10854 is the output
rowsize = 201
for i in range(number_lines, rowsize):
profile_raw_df = pd.read_csv(main_xyz, delimiter=',', header=None, nrows=rowsize,
skiprows=i)
out_xyz = 'Profile' + str(i) + '.xyz'
profile_raw_df.to_csv(out_xyz, index=False,
header=False, mode='a')
Problems I am facing :
The for loop was at first giving output files as seen in the image,check Proof of output but now it does not produce any outputs and it is not rewriting the previous files either. The other mystery is that I am not getting an error either,check Code executed without error.
What I tried to fix the issue :
I updated all the packages and restarted Pycharm
I ran each line of code one by one and everything works until the for loop
While counting the number of rows in
number_lines = sum(1 for row in (open(main_xyz)))
you have exhausted the iterator that loops over the lines of the file. But you do not close the file. But this should not prevent Pandas from reading the same file.
A better idiom would be
with open(main_xyz) as fh:
number_lines = sum(1 for row in fh)
Your for loop as it stands does not do what you probably want. I guess you want:
for i in range(0, number_lines, rowsize):
so, rowsize is the step-size, instead of the end value of the for loop.
If you want to number the output files by data set, keep a counnt of the dataset, like this
data_set = 0
for i in range(0, number_lines, rowsize):
data_set += 1
...
out_xyz = f"Profile{data_set}.xyz"
...

Update pandas data into existing csv

I have a csv which I'm creating from pandas data-frame.
But as soon as I append it, it throws: OSError: [Errno 95] Operation not supported
for single_date in [d for d in (start_date + timedelta(n) for n in range(day_count)) if d <= end_date]:
currentDate = datetime.strftime(single_date,"%Y-%m-%d")
#Send request for one day to the API and store it in a daily csv file
response = requests.get(endpoint+f"?startDate={currentDate}&endDate={currentDate}",headers=headers)
rawData = pd.read_csv(io.StringIO(response.content.decode('utf-8')))
outFileName = 'test1.csv'
outdir = '/dbfs/mnt/project/test2/'
if not os.path.exists(outdir):
os.mkdir(outdir)
fullname = os.path.join(outdir, outFileName)
pdf = pd.DataFrame(rawData)
if not os.path.isfile(fullname):
pdf.to_csv(fullname, header=True, index=False)
else: # else it exists so append without writing the header
with open(fullname, 'a') as f: #This part gives error... If i write 'w' as mode, its overwriting and working fine.
pdf.to_csv(f, header=False, index=False, mode='a')
I am guessing it because you opened the file in an append mode and then you are passing mode = 'a' again in your call to to_csv. Can you try simply do that?
pdf = pd.DataFrame(rawData)
if not os.path.isfile(fullname):
pdf.to_csv(fullname, header=True, index=False)
else: # else it exists so append without writing the header
pdf.to_csv(fullname, header=False, index=False, mode='a')
It didn't work out, with appending. So I created parque files and then read them as data frame.
I was having a similar issue and the root cause was Databrick Runtime > 6 does not support append or random write operation on the files which exist in DBFS. It was working fine for me until I updated my runtime from 5.5 to 6 as they suggested to do this because they were no longer supporting Runtime < 6 at that time.
I followed this workaround, read the file in code, appended the data, and overwritten it.

add new row to numpy using realtime reading

I am using a microstacknode accelerometer and intend to save it into csv file.
while True:
numpy.loadtxt('foo.csv', delimiter=",")
raw = accelerometer.get_xyz(raw=True)
g = accelerometer.get_xyz()
ms = accelerometer.get_xyz_ms2()
a = numpy.asarray([[raw['x'],raw['y'],raw['z']]])
numpy.savetxt("foo.csv",a,delimiter=",",newline="\n")
However, the saving is only done on 1 line. Any help given? Still quite a noobie on python.
NumPy is not the best solution for this type of things.
This should do what you intend:
while True:
raw = accelerometer.get_xyz(raw=True)
fobj = open('foo.csv', 'a')
fobj.write('{},{},{}\n'.format(raw['x'], raw['y'], raw['z']))
fobj.close()
Here fobj = open('foo.csv', 'a') opens the file in append mode. So if the file already exists, the next writing will go to the end of file, keeping the data in the file.
Let's have look at your code. This line:
numpy.loadtxt('foo.csv', delimiter=",")
reads the whole file but doe not do anything with the at it read, because you don't assign to a variable. You would need to do something like this:
data = numpy.loadtxt('foo.csv', delimiter=",")
This line:
numpy.savetxt("foo.csv",a,delimiter=",",newline="\n")
Creates a new file with the name foo.csv overwriting the existing one. Therefore, you see only one line, the last one written.
This should do the same but dos not open and close the file all the time:
with open('foo.csv', 'a') as fobj:
while True:
raw = accelerometer.get_xyz(raw=True)
fobj.write('{},{},{}\n'.format(raw['x'], raw['y'], raw['z']))
The with open() opens the file with the promise to close it even in case of an exception. For example, if you break out of the while True loop with Ctrl-C.

How to check if a value in a variable in a list held in another variable

import csv
samsung = ['samsung','s1','s2','s3','s4','s5','s6','s7','galaxy']
iphone = ['iphone']
problemTypes = []
solution = []
instruction = []
def solve():
def foundProblem():
for queryPart in whatProblem:
for problem in problemTypes:
if queryPart == problem:
return True
solved = False
readCSV = csv.reader(csvfile, delimiter = ',')
for row in readCSV:
solution = row[0]
problemTypes = row[1].split()
instruction = row[2]
if foundProblem():
print('The solution is to '+solution)
print(instruction)
solved = True
if solved == False:
print('Solution not found.\nPlease contact your supplier')
whatProblem = str(input('What seems to be the issue with your smartphone?\n')).lower().split()
version = input('What type of phone do you have?\n').lower().split()
if version == iphone:
with open('iPhone.csv') as csvfile:
solve()
elif version in samsung:
with open('samsung.csv') as csvfile:
solve()
else:
print('Phone not supported')
This is an attempt at creating a trouble shooter using multiple csv files however I am met with the problem of the samsung part. It seems that it cannot notice that the input is actually part of the samsung variable. I am new here so if I have formatted this wrong please notify me and if the solution is extremely simple please know I am new to coding.
Try at least to change this line:
version = input('What type of phone do you have?\n').lower().split()
into:
version = input('What type of phone do you have?\n').lower().split()[0]
But reading input you currently force the user to enter 'samsung' which is not the most accessible approach. Keep on learning and trying and it will work out fine!
It's not entirely clear which bit you're having a problem with, but this extract looks wrong:
with open('iPhone.csv') as csvfile:
solve()
You probably intend to use csvfile within the block:
with open('iPhone.csv') as csvfile:
solve(csvfile)
and to change the implementation of solve to accept csvfile as an argument. As it is, it looks like you're trying (and failing) to communicate via a global variable; even if that did work, it's a poor practice that leads to unmaintainable code!
I'm not sure exactly what your problem is either but maybe try this statement instead of your other elif statement:
elif any(version in s for s in samsung):
Check if a Python list item contains a string inside another string

Editing a .odt File using python

First off i must say i am VERY new to programming (less then a week experience in total). I set out to write a program that generates a series of documents of an .odt template. I want to use a template with a specific keyword lets say "X1234X" and so on. This will then be replaced by values generated from the program. Each document is a little different and the values are entered and calculated via a prompt (dates and other things)
I wrote most of the code so far but i am stuck since 2 days on that problem. I used the ezodf module to generate a new document (with different filenames) from a template but i am stuck on how to edit the content.
I googled hard but came up empty hope someone here could help. I tried reading the documentations but i must be honest...its a bit tough to understand. I am not familiar with the "slang"
Thanks
PS: a ezodf method would be great, but any other ways will do too. The program doesnt have to be pretty it just has to work (so i can work less ^_^)
Well i figured it out. nd finished the program. I used a ezodf to create the file, then zipfile to extract and edit the content.xml and then repacked the whole thing via a nice >def thingy< from here. I tried to mess with etree...but i couldnt figure it out...
from ezodf import newdoc
import os
import zipfile
import tempfile
for s in temp2:
input2 = s
input2 = str(s)
input1 = cname[0]
file1 = '.odt'
namef = input2 + input1 + file1
odt = newdoc(doctype='odt', filename=namef, template='template.odt')
odt.save()
a = zipfile.ZipFile('template.odt')
content = a.read('content.xml')
content = str(content.decode(encoding='utf8'))
content = str.replace(content,"XXDATEXX", input2)
content = str.replace(content, 'XXNAMEXX', input1)
def updateZip(zipname, filename, data):
# generate a temp file
tmpfd, tmpname = tempfile.mkstemp(dir=os.path.dirname(zipname))
os.close(tmpfd)
# create a temp copy of the archive without filename
with zipfile.ZipFile(zipname, 'r') as zin:
with zipfile.ZipFile(tmpname, 'w') as zout:
zout.comment = zin.comment # preserve the comment
for item in zin.infolist():
if item.filename != filename:
zout.writestr(item, zin.read(item.filename))
# replace with the temp archive
os.remove(zipname)
os.rename(tmpname, zipname)
# now add filename with its new data
with zipfile.ZipFile(zipname, mode='a', compression=zipfile.ZIP_DEFLATED) as zf:
zf.writestr(filename, data)
updateZip(namef, 'content.xml', content)

Resources