Merge line in csv file python - python-3.x

I have this in csv file:
Titre,a,b,c,d,e
01,jean,paul,,
01,,,jack,
02,jeanne,jack,,
02,,,jean
and i want :
Titre,a,b,c,d,e
01,jean,paul,jack,
02,jeanne,jack,,jean
can you help me ?

In general, a good approach is to read the csv file and iterate through the rows using Python's CSV module.
CSV will create an iterator that will let you loop through your file like this:
import csv
with open('your filename.csv', 'r') as infile:
reader = csv.reader(infile)
for line in reader:
for value in line:
# Do your thing
You're going to need to construct a new data set that has different properties. The requirements you described:
Ignore any empty cells
Any time you encounter a row that has a new index number, add a new row to your new data set
Any time you encounter a row that has an index number you've seen before, add it to the row that you already created (except for that index number value itself)
I'm not writing that part of the code for you because you need to learn and grow. It's a good task for a beginner.
Once you've constructed that data set, it will look like this:
example_processed_data = [["Titre","a","b","c","d","e"],
["01","jean","paul","jack"],
["02","jeanne","jack","","jean"]]
You can then create a CSV writer, and create your outfile by iterating over that data, similarly to how you iterated over the infile:
with open('outfile.csv', 'w') as outfile:
writer = csv.writer(outfile)
for line in example_processed_data:
writer.writerow(line)
print("Done! Wrote", len(example_processed_data), "lines to outfile.csv.")

Related

Python problems writing rows in CSV

I have this script that reads a CSV and saves the second column to a list, I'm trying to get it to write the contents of the list to a new CSV. The problem is every entry should have its own row but the new file sets everything into the same row.
I've tried moving the second with open code to within the first with open and I've tried adding a for loop to the second with open but no matter what I try I don't get the right results.
Here is the code:
import csv
col_store=[]
with open('test-data.csv', 'r') as rf:
reader = csv.reader(rf)
for row in reader:
col_store.append(row[1])
with open('meow.csv', 'wt') as f:
csv_writer = csv.writer(f)
csv_writer.writerows([col_store])
In your case if you have a column of single letters/numbers then Y.R answer will work.
To have a code that works in all cases, use this.
with open('meow.csv', 'wt') as f:
csv_writer = csv.writer(f)
csv_writer.writerows(([_] for _ in col_store))
From here it is mentioned that writerows expect an an iterable of row objects. Every row object should be an iterable of strings or numbers for Writer objects
The problem is that you are using 'writerows' treating 'col_store' as a list with one item.
The simplest approach to fixing this is calling
csv_writer.writerows(col_store)
# instead of
csv_writer.writerows([col_store])
However, this will lead to a probably unwanted result - having blank lines between the lines.
To solve this, use:
with open('meow.csv', 'wt', newline='') as f:
csv_writer = csv.writer(f)
csv_writer.writerows(col_store)
For more about this, see CSV file written with Python has blank lines between each row
Note: writerows expects 'an iterable of row objects' and 'row objects must be an interable of strings or numbers'.
(https://docs.python.org/3/library/csv.html)
Therefore, in the generic case (trying to write integers for examlpe), you should use Sam's solution.

Deleting a particular column/row from a CSV file using python

I want to delete a particular row from a given user input that matches with a column.
Let's say I get an employee ID and delete all it's corresponding values in the row.
Not sure how to approach this problem and other sources suggest using a temporary csv file to copy all values and re-iterate.
Since these are very primitive requirements, I would just do it manually.
Read it line by line - if you want to delete the current line, just don't write it back.
If you want to delete a column, for each line, parse it as csv (using the module csv - do not use .split(',')!) and discard the correct column.
The upside of these solutions is that it's very light on the memory and as fast as it can be runtime-wise.
That's pretty much the way to do it.
Something like:
import shutil
file_path = "test.csv"
# Creates a test file
data = ["Employee ID,Data1,Data2",
"111,Something,Something",
"222,Something,Something",
"333,Something,Something"]
with open(file_path, 'w') as write_file:
for item in data:
write_file.write(item + "\n")
# /Creates a test file
input("Look at the test.csv file if you like, close it, then press enter.")
employee_ID = "222"
with open(file_path) as read_file:
with open("temp_file.csv", 'w') as temp_file:
for line in read_file:
if employee_ID in line:
next(read_file)
temp_file.write(line)
shutil.move("temp_file.csv", file_path)
If you have other data that may match the employee ID, then you'll have to parse the line and check the employee ID column specifically.

Write into CSV only if row does not exist

I am saving an array into a csv file, using this code:
def save_annotations(**kwargs):
ann = request.get_json()
print(ann)
filename = ann[3].split('.')[0]
run_id = ann[4]
run_number = ann[4].split('/')[0]
exp_id = ann[4].split('/')[1]
ann_type = ann[2]
if ann_type == 'wrongDetection':
with open(f"/code/data/mlruns/{run_number}/{exp_id}/wrong_annotations_{filename}_{run_id.replace('/', '_')}.csv",'a') as w_ann:
writer = csv.writer(w_ann, delimiter=',')
writer.writerow(ann[0:2])
w_ann.close()
else:
with open(f"/code/data/mlruns/{run_number}/{exp_id}/new_detections_{filename}_{run_id.replace('/', '_')}.csv",'a') as w_ann:
writer = csv.writer(w_ann, delimiter=',')
writer.writerow(ann[0:2])
w_ann.close()
However, I don't want repeated rows in my csv file. I only want to write to csv if ann[0] and ann[1] are not in the csv already.
What would be the best approach to do this?
kind regards
One way to do this would be to collect the already existing values in a set, and check new values to see if they are in the set before processing. You would need a set for each csv file.
For example:
def build_set(filename):
with open(filename, 'r') as f:
reader = csv.reader(f)
# Skip header row, if necessary
next(reader)
return {tuple(row[0:2]) for row in reader}
Then in your function you could do:
if tuple(ann[0:2]) in set_for_this_file:
continue
set_for_this_file.add(tuple(ann[0:2]))
# write data to file
Building the sets would require reading through all the csv files each time the program is executed, which might be inefficient if the files were large and/or numerous.
A more efficient approach might be to store the data in a database table, with columns for ann[0], ann[1], anntype, exp_id, run_mumber and run_id. Add a unique constraint over these columns and you would have the same functionality.

Writerows gives me a blank csv file

I made an array called list_of_rows, and looped appending list_of_cells to aforemention list_of_rows
I then attempt to make a csv file by creating one and assigning a variable called writer and well, it gives me a blank file
outfile = open("./inmates.csv", "wb")
writer = csv.writer(outfile)
writer.writerows(list_of_rows)
Its supposed to give me a list of rows with cells and I don't know what's wrong.

Original order of columns in csv not retained in unicodecsv.DictReader

I am trying read a CSV file into python 3 using unicodecsv library. Code follows :
with open('filename.csv', 'rb') as f:
reader = unicodecsv.DictReader(f)
Student_Data = list(reader)
But the order of the columns in the CSV file is not retained when I output any element from the Student_Data. The output contains any random order of the columns. Is there anything wrong with the code? How do I fix this?
As stated in csv.DictReader documentation, the DictReader object behaves like a dict - so it is not ordered.
You can obtain the list of the fieldnames with:
reader.fieldnames
But if you only want to obtain a list of the field values, in original order, you can just use a normal reader:
with open('filename.csv', 'rb') as f:
reader = unicodecsv.reader(f)
for row in reader:
Student_Data = row

Resources