Rename images in file based on csv in Python - python-3.x

I have a folder with a couple thousand images named: 10000.jpg, 10001.jpg, etc.; and a csv file with two columns: id and name.
The csv id matches the images in the folder.
I need to rename the images as per the name column in the csv (e.g. from 10000.jpg to name1.jpg.
I've been trying the os.rename() inside a for loop as per below.
with open('train_labels.csv') as f:
lines = csv.reader(f)
for line in lines:
os.rename(line[0], line[1])
This gives me an encoding error inside the loop.
Any idea what I'm missing in the logic?
Also tried another strategy (below), but got the error: IndexError: list index out of range.
with open('train_labels.csv', 'rb') as csvfile:
lines = csv.reader(csvfile, delimiter = ' ', quotechar='|')
for line in lines:
os.rename(line[0], line[1])

I also got the same error. When i opened CSV file in notepad, I found that there was no comma between ID and name. So please check it. otherwise you can see the solutions in Renaming images in folder

Related

I have one person in a dataframe that keeps showing up as \ufeff in my dataframe when I print to console

I have python code that loads a group of exam results. Each exam is saved in it's own csv file.
files = glob.glob('Exam *.csv')
frame = []
files1 = glob.glob('Exam 1*.csv')
for file in files:
frame.append(pd.read_csv(file, index_col=[0], encoding='utf-8-sig'))
for file in files1:
frame.append(pd.read_csv(file, index_col=[0], encoding='utf-8-sig'))
There is one person in the whole dataframe in their name column it shows up as
\ufeffStudents Name
It happens for every single exam. I tried using the encoding argument but that's not fixing the issue. I am out of ideas. Anyone else have anything?
That character is the BOM or "Byte Order Mark."
There are serveral ways to resovle it.
First, I want to suggest to add engine parameter (for example, engine='python' in pd.read_csv() when reading csv files.
pd.read_csv(file, index_col=[0], engine='python', encoding='utf-8-sig')
Secondly, you can simply remove it by replacing with empty string ('').
df['student_name'] = df['student_name'].apply(lambda x: x.replace("\ufeff", ""))

Python problems writing rows in CSV

I have this script that reads a CSV and saves the second column to a list, I'm trying to get it to write the contents of the list to a new CSV. The problem is every entry should have its own row but the new file sets everything into the same row.
I've tried moving the second with open code to within the first with open and I've tried adding a for loop to the second with open but no matter what I try I don't get the right results.
Here is the code:
import csv
col_store=[]
with open('test-data.csv', 'r') as rf:
reader = csv.reader(rf)
for row in reader:
col_store.append(row[1])
with open('meow.csv', 'wt') as f:
csv_writer = csv.writer(f)
csv_writer.writerows([col_store])
In your case if you have a column of single letters/numbers then Y.R answer will work.
To have a code that works in all cases, use this.
with open('meow.csv', 'wt') as f:
csv_writer = csv.writer(f)
csv_writer.writerows(([_] for _ in col_store))
From here it is mentioned that writerows expect an an iterable of row objects. Every row object should be an iterable of strings or numbers for Writer objects
The problem is that you are using 'writerows' treating 'col_store' as a list with one item.
The simplest approach to fixing this is calling
csv_writer.writerows(col_store)
# instead of
csv_writer.writerows([col_store])
However, this will lead to a probably unwanted result - having blank lines between the lines.
To solve this, use:
with open('meow.csv', 'wt', newline='') as f:
csv_writer = csv.writer(f)
csv_writer.writerows(col_store)
For more about this, see CSV file written with Python has blank lines between each row
Note: writerows expects 'an iterable of row objects' and 'row objects must be an interable of strings or numbers'.
(https://docs.python.org/3/library/csv.html)
Therefore, in the generic case (trying to write integers for examlpe), you should use Sam's solution.

Deleting a particular column/row from a CSV file using python

I want to delete a particular row from a given user input that matches with a column.
Let's say I get an employee ID and delete all it's corresponding values in the row.
Not sure how to approach this problem and other sources suggest using a temporary csv file to copy all values and re-iterate.
Since these are very primitive requirements, I would just do it manually.
Read it line by line - if you want to delete the current line, just don't write it back.
If you want to delete a column, for each line, parse it as csv (using the module csv - do not use .split(',')!) and discard the correct column.
The upside of these solutions is that it's very light on the memory and as fast as it can be runtime-wise.
That's pretty much the way to do it.
Something like:
import shutil
file_path = "test.csv"
# Creates a test file
data = ["Employee ID,Data1,Data2",
"111,Something,Something",
"222,Something,Something",
"333,Something,Something"]
with open(file_path, 'w') as write_file:
for item in data:
write_file.write(item + "\n")
# /Creates a test file
input("Look at the test.csv file if you like, close it, then press enter.")
employee_ID = "222"
with open(file_path) as read_file:
with open("temp_file.csv", 'w') as temp_file:
for line in read_file:
if employee_ID in line:
next(read_file)
temp_file.write(line)
shutil.move("temp_file.csv", file_path)
If you have other data that may match the employee ID, then you'll have to parse the line and check the employee ID column specifically.

I want to create a corpus in python from multiple text files

I want to do text analytics on some text data. Issue is that so far i have worked on CSV file or just 1 file, but here I have multiple text files. So, my approach is to combine them all to 1 file and then use nltk to do some text pre processing and further steps.
I tried to download gutenberg pkg from nltk, and I am not getting any error in the code. But I am not able to see content of 1st text file in 1 cell, 2nd text file in 2nd cell and so on. Kindly help.
filenames = [
"246.txt",
"276.txt",
"286.txt",
"344.txt",
"372.txt",
"383.txt",
"388.txt",
"392.txt",
"556.txt",
"665.txt"
]
with open("result.csv", "w") as f:
for filename in filenames:
f.write(nltk.corpus.gutenberg.raw(filename))
Expected result - I should get 1 csv file with contents of these 10 texts files listed in 10 different rows.
filenames = [
"246.txt",
"276.txt",
"286.txt",
"344.txt",
"372.txt",
"383.txt",
"388.txt",
"392.txt",
"556.txt",
"665.txt"
]
with open("result.csv", "w") as f:
for index, filename in enumerate(filenames):
f.write(nltk.corpus.gutenberg.raw(filename))
# Append a comma to the file content when
# filename is not the content of the
# last file in the list.
if index != (len(filenames) - 1):
f.write(",")
Output:
this,is,a,sentence,spread,over,multiple,files,and,the end
Code and .txt files available at https://github.com/michaelhochleitner/stackoverflow.com-questions-57081411 .
Using Python 2.7.15+ and nltk 3.4.4 . I had to move the .txt files to /home/mh/nltk_data/corpora/gutenberg .

removing extra column in a csv file while exporting data using python3

I wrote a function in python3 which merges some files in the same directory and returns a csv file as the output but the problem with csv file is that I get one extra column at the beginning which does not have header and the other rows of that columns are numbers starting from 0. do you know how I write the csv file without getting the extra column?
you can split by ,, and then use slicing to remove the first element.
example:
original = """col1,col2,col3
0,val01,val02,val03
1,val11,val12,val13
2,val21,val22,val23
"""
original_lines = original.splitlines()
result = original_lines[:1] # copy header
for line in original_lines[1:]:
result.append(','.join(line.split(',')[1:]))
print('\n'.join(result))
Output:
col1,col2,col3
val01,val02,val03
val11,val12,val13
val21,val22,val23

Resources