removing extra column in a csv file while exporting data using python3 - python-3.x

I wrote a function in python3 which merges some files in the same directory and returns a csv file as the output but the problem with csv file is that I get one extra column at the beginning which does not have header and the other rows of that columns are numbers starting from 0. do you know how I write the csv file without getting the extra column?

you can split by ,, and then use slicing to remove the first element.
example:
original = """col1,col2,col3
0,val01,val02,val03
1,val11,val12,val13
2,val21,val22,val23
"""
original_lines = original.splitlines()
result = original_lines[:1] # copy header
for line in original_lines[1:]:
result.append(','.join(line.split(',')[1:]))
print('\n'.join(result))
Output:
col1,col2,col3
val01,val02,val03
val11,val12,val13
val21,val22,val23

Related

Stop reading the CSV file after finding empty rows python

I am trying to read a CSV file that has four parts that are on the same page but distinguished by putting some empty rows in the middle of the spreadsheet. I want to somehow ask pandas to stop reading the rest of the file as soon as it finds the empty row.
Edit: I need to elaborate on the problem. I have a CSV file, that has 4 different sections that separated with 3-4 empty rows. I need to extract each of these sections or at least the first section. In other words, I want read_csv stop when it finds the first empty row(of course after skipping rows with detail about the file)
url = urlopen("https://mba.tuck.dartmouth.edu/pages/faculty/ken.french/ftp/30_Industry_Portfolios_CSV.zip")
zipfile = ZipFile(BytesIO(url.read()))
data = pd.read_csv(zipfile.open('30_Industry_Portfolios.CSV'),
header = 0, index_col=0,
skiprows=11,parse_dates=True)
You could use a generator.
Suppose the csv module is generating rows.
(We might use yield from sheet,
except that we'll change the loop in a moment.)
import csv
def get_rows(csv_fspec, skip_rows=12):
with open(csv_fspec) as fin:
sheet = csv.reader(fin)
for _ in range(skip_rows):
next(sheet) # discard initial rows
for row in sheet:
yield row
df = pd.DataFrame(get_rows(my_csv))
Now you want to ignore rows after encountering some condition,
perhaps after initial column is empty.
Ok, that's simple enough, just change the loop body:
for row in sheet:
if row[0]:
yield row
else:
break # Ignore rest of input file.

Deleting a particular column/row from a CSV file using python

I want to delete a particular row from a given user input that matches with a column.
Let's say I get an employee ID and delete all it's corresponding values in the row.
Not sure how to approach this problem and other sources suggest using a temporary csv file to copy all values and re-iterate.
Since these are very primitive requirements, I would just do it manually.
Read it line by line - if you want to delete the current line, just don't write it back.
If you want to delete a column, for each line, parse it as csv (using the module csv - do not use .split(',')!) and discard the correct column.
The upside of these solutions is that it's very light on the memory and as fast as it can be runtime-wise.
That's pretty much the way to do it.
Something like:
import shutil
file_path = "test.csv"
# Creates a test file
data = ["Employee ID,Data1,Data2",
"111,Something,Something",
"222,Something,Something",
"333,Something,Something"]
with open(file_path, 'w') as write_file:
for item in data:
write_file.write(item + "\n")
# /Creates a test file
input("Look at the test.csv file if you like, close it, then press enter.")
employee_ID = "222"
with open(file_path) as read_file:
with open("temp_file.csv", 'w') as temp_file:
for line in read_file:
if employee_ID in line:
next(read_file)
temp_file.write(line)
shutil.move("temp_file.csv", file_path)
If you have other data that may match the employee ID, then you'll have to parse the line and check the employee ID column specifically.

Reading from file returns 2 dictionaries

data = [line.strip('\n') for line in file3]
# print(data)
data2 = [line.split(',') for line in data]
data_dictionary = {t[0]:t[1] for t in data2}
print(data_dictionary)
So I'm reading content from a file under the assumption that there is no whitespace at the beginning of each line and not blank lines anywhere.
when I read this file I first strip the newline character and the split the data by a ',' because that is what the data in the file is separated by. but when I make the dictionary it returns two dictionaries instead of one it's doing that for other files where I use this procedure. how do I fix this?

Trouble writing a header line with a comma to an Excel csv file

I'm trying to write a simple header line in Intel Fortran (containing actual content commas) to an Excel csv. What I'd like to see in the first two columns is:
FMG(1,1) FMG(2,1)
Enclosing each term in quotes "FGM(i,j)" worked when I did it line by line:
Code: write (*,*) "FMG(1,1), kg/s (O2): ", FMG(1,1)
Output: FMG(1,1), kg/s (O2): 0.129000000000000
Some of the things I've tried include:
code: write (10,*) "FMG(1,1)","FMG(2,1)"
csv column output: FMG(1 1)FMG(2 1)
code: write (10,*) "FMG(1,1)" , "FMG(2,1)"
csv column output: FMG(1 1)FMG(2 1) (same thing)
code: write (10,*) " FMG(1,1)," "FMG(2,1)"
csv column output: FMG(1 1) FMG(2,1)
got the 2nd one correctly
CSV by name means Comma Separated Values. If you output "FMG(1,1),FMG(1,2)" then removing the commas, you will get
FMG(1
1)
FMG(1
2)
which is what you are seeing. To include the commas, the strings need to be enclosed in quotes. If you write
write (10,*) '"FMG(1,1)","FMG(2,1)"'
it might achieve what you are looking for.

Rename images in file based on csv in Python

I have a folder with a couple thousand images named: 10000.jpg, 10001.jpg, etc.; and a csv file with two columns: id and name.
The csv id matches the images in the folder.
I need to rename the images as per the name column in the csv (e.g. from 10000.jpg to name1.jpg.
I've been trying the os.rename() inside a for loop as per below.
with open('train_labels.csv') as f:
lines = csv.reader(f)
for line in lines:
os.rename(line[0], line[1])
This gives me an encoding error inside the loop.
Any idea what I'm missing in the logic?
Also tried another strategy (below), but got the error: IndexError: list index out of range.
with open('train_labels.csv', 'rb') as csvfile:
lines = csv.reader(csvfile, delimiter = ' ', quotechar='|')
for line in lines:
os.rename(line[0], line[1])
I also got the same error. When i opened CSV file in notepad, I found that there was no comma between ID and name. So please check it. otherwise you can see the solutions in Renaming images in folder

Resources