Python 3.7: Batch renaming numbered files in a directory while preserving their sequence - python-3.x

I'm relatively new to Python, and have only recently started trying to use it for data analysis. I have a list of image files in a directory that have been acquired in sequence, and they have been named as so:
IMG_E5.1.tif
IMG_E5.2.tif
IMG_E5.3.tif
...
...
IMG_E5.107.tif
I would like to replace the dot and the number following it with an underscore and a four-digit integer, while preserving the initial numbering of the file, like so:
IMG_E5_0001.tif
IMG_E5_0002.tif
IMG_E5_0003.tif
...
...
IMG_E5_0107.tif
Could you advise me on how this can be done, or if there is already an answer that I'm not aware, link me to it? Many thanks!

I managed to find a method that works for this
import os
import os.path as path
from glob import glob
# Get current working directory
file_path = os.getcwd()
file_list = []
for i in range(1, 500):
# Generate file name (with wildcards) to search for
file_name = path.abspath(file_path + "/IMG*" + "." + str(i) + ".tif")
# Search for files
file = glob(file_name)
# If found, append to list
if len(file) > 1:
file_list.append(file[0])
elif len(file) == 1:
file_list.append(file[0])
for file in file_list:
# Use the "split" function to split the string at the periods
file_name, file_num, file_ext = file.split(".")
file_new = path.abspath(file_name + "_"
+ str(file_num).zfill(4)
+ "." + file_ext)
os.rename(file, file_new)
I am still relatively inexperienced with coding, so if there is a more straightforward and efficient way to tackle this problem, do let me know. Thanks.

Related

Loop through xmls and check if contents are in a csv

I have a .csv containing a list of names, I'm trying to check if those names are contained within a bunch of .xmls on a directory. I've tried my best to make the code open each .xml, check if the name inside it, matches one in my list csv.
my csv has no headers, is just the column of names. Examples of the names:
epsilon-prod-tps
display-eng-sl
alantest-prod-ab
So I need the code to open an xml, check if the name listed inside the .xml is inside my csv, close the xml, and move to the next one ... recording any that dont match of course. The xml part works, and so does the check if in csv part. I'm just struggling to combine the two so they work together. if that makes sense.
My code is as follows:
import os
from xml.etree import cElementTree as ET
InputPath = open('//auditdrive.local/audittest/Oisin/py/Auditor/List.csv', 'r')
string_append = ''
file_path = r'\\prod.mfg\xmlfolder'
directory = os.listdir(file_path)
for fname in directory:
if os.path.isfile(file_path + os.sep + fname + os.sep + fname+'.xml.'):
with open(file_path + os.sep + fname+ os.sep +fname+'.xml.', 'r') as xml:
product_count += 1
print(file_path + os.sep + fname+ os.sep +fname+'.xml.')
tree = ET.parse(xml)
root = tree.getroot()
for recipe in root.findall('RecName'):
rec_name = root.find('RecName').text
print( rec_name, 'extracted from - ', fname+ '.xml')
goodlist = InputPath.read()
print(' Checking for match in /List.csv')
for i in goodlist:
if rec_name in goodlist:
print(' PASS')
else:
print(' FAIL ...')
print('Invalid name in %s' % fname)
string_append = string_append + file_path + os.sep + fname+ os.sep +fname+'.xml.' + ' ,'
xml.close()
Currently it just prints the following:
PASS
PASS
PASS
PASS
PASS
PASS
PASS
PASS
\\prod.mfg\xmlfolder\projecta\projecta.xml.
epsilon-prod-tps extracted from - projecta.xml
Checking for match in /List.csv
\\prod.mfg\xmlfolder\projectb\projectb.xml.
display-eng-sl extracted from - projectb.xml
Checking for match in /List.csv
\\prod.mfg\xmlfolder\projectc\projectc.xml.
alantest-prod-ab extracted from - projectc.xml
Checking for match in /List.csv
It seems to loop through my csv, print a pass each of the rows in my csv .... (approx 200)
then it does the other for loop and doesnt check if theyre in the csv at all.
This is my first python project, sorry for any errors or mistakes in my question

saving text files to .npy file

I have many text files in a directory with numerical extension(example: signal_data1.9995100000000001,signal_data1.99961 etc)
The content of the files are as given below
signal_data1.9995100000000001
-1.710951390504200198e+00
5.720409824754981720e-01
2.730176313110273423e+00
signal_data1.99961
-6.710951390504200198e+01
2.720409824754981720e-01
6.730176313110273423e+05
I just want to arrange the above files into a single .npy files as
-1.710951390504200198e+00,5.720409824754981720e-01, 2.730176313110273423e+00
-6.710951390504200198e+01,2.720409824754981720e-01, 6.730176313110273423e+05
So, I want to implement the same procedure for many files of a directory.
I tried a loop as follows:
import numpy as np
import glob
for file in glob.glob(./signal_*):
np.savez('data', file)
However, it does not give what I want as depicted above. So here I need help. Thanks in advance.
Here is another way of achieving it:
import os
dirPath = './data/' # folder where you store your data
with os.scandir(dirPath) as entries:
output = ""
for entry in entries: # read each file in your folder
dataFile = open(dirPath + entry.name, "r")
dataLines = dataFile.readlines()
dataFile.close()
for line in dataLines:
output += line.strip() + " " # clear all unnecessary characters & append
output += '\n' # after each file break line
writeFile = open("a.npy", "w") # save it
writeFile.write(output)
writeFile.close()
You can use np.loadtxt() and np.save():
a = np.array([np.loadtxt(f) for f in sorted(glob.glob('./signal_*'))])
np.save('data.npy', a)

copy and rename files starting at a specific integer value in python

My codes work but for a few pain points that perhaps you can help me understand. I want to copy files from one directory to another and rename them at the same time. for example:
c:\path\
octo.jpeg
novem.jpeg
decem.jpeg
to:
c:\newpath\
001.jpeg
002.jpeg
003.jpeg
The codes I wrote from a cursory google search are as follows but I'm not sure why I need the 'r' in the path variables. The 'files = os.listdir(srcPath)' line I'm sure I don't need. This will move the files and renames them using the 'count' variable in the for loop but I want to name each file starting at a specific number, say 65. Should I use the shutil library and copy2 method to first copy the files and then rename or is there an easier way?
import os
from os import path
srcPath = r'C:\Users\Talyn\Desktop\New folder\Keep\New folder'
destPath = r'C:\Users\Talyn\Desktop\New folder\Keep\hold'
#files = os.listdir(srcPath)
def main():
for count, filename in enumerate(os.listdir(srcPath)):
dst = '{:03d}'.format(count) + ".jpeg"
os.rename(os.path.join(srcPath, filename), os.path.join(destPath, dst))
if __name__=="__main__":
main()
From the official Python Docs:
Both string and bytes literals may optionally be prefixed with a letter 'r' or 'R'; such strings are called raw strings and treat backslashes as literal characters.
The r is telling python interpreter to treat the backslashes(\) in the path string as literal characters and not as escaping characters.
For naming the files from a specific number:
dst = '{:03d}'.format(count + your_number) + ".jpeg"
Using copyfile from shutil
copyfile(srcPath + filename, destPath + dst)

how do i manipulate the path name so it doesn't print out the entire name

I'm new to programming. i need to index three separate txt files. And do a search from an input. When i do a print it gives me the entire path name. i would like to print the txt file name.
i've trying using os.list in the function
import os
import time
import string
import os.path
import sys
word_occurrences= {}
def index_text_file (txt_filename,ind_filename, delimiter_chars=",.;:!?"):
try:
txt_fil = open(txt_filename, "r")
fileString = txt_fil.read()
for word in fileString.split():
if word in word_occurrences:
word_occurrences[word] += 1
else:#
word_occurrences [word] = 1
word_keys = word_occurrences.keys()
print ("{} unique words found in".format(len(word_keys)),txt_filename)
word_keys = word_occurrences.keys()
sorted(word_keys)
except IOError as ioe: #if the file can't be opened
sys.stderr.write ("Caught IOError:"+ repr(ioe) + "/n")
sys.exit (1)
index_text_file("/Users/z007881/Documents/ABooks_search/CODE/booksearch/book3.txt","/Users/z007881/Documents/ABooks_search/CODE/booksearch/book3.idx")
SyntaxError: invalid syntax
(base) 8c85908188d1:CODE z007881$ python3 indexed.py
9395 unique words found in /Users/z007881/Documents/ABooks_search/CODE/booksearch/book3.t
xt
i would like it to say 9395 unique words found in book3.txt
One way to do it would be to split the path on the directory separator / and pick the last element:
file_name = txt_filename.split("/")[-1]
# ...
# Then:
print("{} unique words found in".format(len(word_keys)), file_name)
# I would prefer using an fstring, unless your Python version is too old:
print(f"{len(word_keys)} found in {file_name}")
I strongly advise to change the name of txt_filename into something less misleading like txt_filepath, since it does not contain a file name but a whole path (including, but not limited to, the file name).

Count multiple files in a directory with the same name

I'm relatively new to Python and was working on a project where the user can navigate to a folder, after which the program does a count of all the files in that folder with a specific name.
The problem is that I have a folder with over 5000 files many of them sharing the same name but different extensions. I wrote code that somewhat does what I want the final version to do but its VERY redundant and I can't see myself doing this for over 600 file names.
Wanted to ask if it is possible to make this program "automated" or less redundant where I don't have to manually type out the names of 600 files to return data for.
Sample code I currently have:
import os, sys
print(sys.version)
file_counting1 = 0
file_counting2 = 0
filepath = input("Enter file path here: ")
if os.path.exists(filepath):
for file in os.listdir(filepath):
if file.startswith('expressmail'):
file_counting1 += 1
print('expressmail')
print('Total files found:', file_counting1)
for file in os.listdir(filepath):
if file.startswith('prioritymail'):
file_counting2 += 1
print('prioritymail')
print('Total files found:', file_counting2)
Sample Output:
expressmail
Total files found: 3
prioritymail
Total files found: 1
The following script will count occurrences of files with the same name. If the file does not have an extension, the whole filename is treated as the name. It also does not traverse subdirectories, since the original question just asks about files in the given folder.
import os
dir_name = "."
files = next(os.walk(dir_name))[2] # get all the files directly in the directory
names = [f[:f.rindex(".")] for f in files if "." in f] # drop the extensions
names += [f for f in files if "." not in f] # add those without extensions
for name in set(names): # for each unique name-
print("{}\nTotal files found: {}".format(name, names.count(name)))
If you want to support files in subdirectories, you could use something like
files = [os.path.join(r,file) for r,d,f in os.walk(dir_name) for file in f]
If you don't want to consider files without extensions, just remove the line:
names += [f for f in files if "." not in f]
There are a number of ways you can do what you're trying to do. Partly it depends on whether or not you need to recover the list of extension for a given duplicated file.
Counter, from the collections module - use this for a simple count of file. Ignore the extensions when building the count.
Use the filename without extension as a dictionary key, add a list of items as the key-value, where the list of items is each occurrence of the file.
Here's an example using the Counter class:
import os, sys, collections
c = collections.Counter()
for root, dirs,files in os.walk('/home/myname/hg/2018/'):
# discard any path data and just use filename
for names in files:
name, ext = os.path.splitext(names)
# discard any extension
c[name] += 1
# Counter.most_common() gives the values in the form of (entry, count)
# Counter.most_common(x) - pass a value to display only the top x counts
# e.g. Counter.most_common(2) = top 2
for x in c.most_common():
print(x[0] + ': ' + str(x[1]))
you can use regular expressions:
import os, sys, re
print(sys.version)
filepath = input("Enter file path here: ")
if os.path.exists(filepath):
allfiles = "\n".join(os.listdir(filepath))
file_counting1 = len(re.findall("^expressmail",allfiles,re.M))
print('expressmail')
print('Total files found:', file_counting1)
file_counting2 = len(re.findall("^prioritymail",allfiles,re.M))
print('prioritymail')
print('Total files found:', file_counting2)

Resources