I have a folder containing 5 files named respectively 'out1.jpg', 'out2a.jpg', 'out2b.jpg', 'out3.jpg' and 'out4.jpg' in addition to other files in different formats.
I have this Python script which is supposed to print all the filenames that match:
import fnmatch
import os
c = 1
for file in os.listdir('.'):
if fnmatch.fnmatch(file, 'out'+str(c)+'*.jpg'):
print(file)
c +=1
However, when I run this script,the output is limited to the following:
out1.jpg
out2a.jpg
out3.jpg
Anyone please has an idea how to change the script in order to display all the filenames that match (which are the 5 filenames that I mentioned)?
You are increasing c on each iteration (well, on each iteration that found a match but anyway...), so it cannot obviously match "out2a.jpg" AND "out2b.jpg". Assuming you want all file names that match "out" + some number + eventually something else, you can use character ranges instead; ie:
for file in os.listdir('.'):
if fnmatch.fnmatch(file, 'out[0-9]*.jpg'):
print(file)
NB : you might have to adjust the exact fnmatch pattern according to your needs and what you have in your directory.
You can also use glob.glob instead, which is both simpler and (according to the doc) more efficient:
import glob
for file in glob("out[0-9]*.jpg"):
print(file)
EDIT :
I totally understand why it does not display out2a.jpg and out2b.jpg together, but I didn't get why out4.jpg is not displayed!
Quite simply because os.listdir() does not necessarily returns the filenames in the same order as you seemed to expect (on my linux station here, "out4.jpg" comes before the other "outXXX.jpg" files). You can inspect what's happening just by adding a couple prints:
c = 1
for file in os.listdir('.'):
exp = 'out{}*.jpg'.format(c)
print("file: {} - c : {} - exp : {}".format(file, c, exp))
if fnmatch.fnmatch(file, exp):
print(file)
c +=1
And the result here:
file: sofnm.py~ - c : 1 - exp : out1*.jpg
file: out4.jpg - c : 1 - exp : out1*.jpg
file: out2b.jpg - c : 1 - exp : out1*.jpg
file: out1.jpg - c : 1 - exp : out1*.jpg
out1.jpg
file: out2a.jpg - c : 2 - exp : out2*.jpg
out2a.jpg
file: sofnm.py - c : 3 - exp : out3*.jpg
file: out42a.jpg - c : 3 - exp : out3*.jpg
file: out3.jpg - c : 3 - exp : out3*.jpg
out3.jpg
As you can see, your assumption that os.listdir() would return the files in a given order (starting with "out1.jpg" and ending with "out4.jpg") was wrong. As a general rule, when your code don't behave as you expect, tracing the code execution (and the relevant values) is most often the simplest way to find out why.
You are incrementing c after a file match, it is quite possible that file name is out2a.jpg but the value of c is 1. So, it will not match. You should either list all the files in ascending order so that out1 will come before out2 in listdir or you should use a generic numeric match instead of one by one like shown below:
import fnmatch
import os
for file in os.listdir('.'):
#print(file)
if fnmatch.fnmatch(file, 'out[0-9]*.jpg'):
print(file)
Running through this for loop, you're checking each file in the directory against a very specific file name (first out1*.jpg, then out2*.jpg) with no guarantee that the order of these files matches. When I tried to run the code locally for example, it first compared out2a.jpg with the pattern out1*.jpg, then out2b.jpg with out2*.jpg, then test.py (the script) with out3*.jpg.
You'd be better off using a module like glob (https://docs.python.org/3/library/glob.html) to search for 'out*.jpg': glob.glob('out[0-9]*.jpg').
Related
I'm trying to list files from a directory. The problem is that some of these files have numbers after their extensions that look like sample1.csv.1. When I try to list files in the directory, these files are omitted from the list. This is not a problem for files like sample.txt.1. I've tried these 3 approaches:
import os
path = 'C:\\my\\path\\here'
missingFiles1 = os.listdir(path)
missingFiles2 = []
for files in os.walk(path):
missingFiles2.append(files)
import glob
missingFiles3 = []
withRegex = path + "\sample*.[0-9]" # Actual file starts with an L
for files in glob.glob(withRegex):
missingFiles3.append(files)
I tried an iterator too but have already forgotten what the code looked like. I got some good pointers here but I couldn't get it to work. Any help would be greatly appreciated.
Using:
Python 3.6.8
glob3 0.0.1
\sample.[0-9]
Should match sample.1, but not sample.txt.1 and not sample1.txt or sample1.txt.1
What you're looking for is not possible to express using glob patterns.
You could get all files starting with sample and ending in any extension and .1 using this glob-pattern:
sample*.*.[0-9]
This would fit any file starting with sample, followed by anything, followed by a dot, followed by anything followed by yet another dot and a number
Glob patterns are not regular expressions and you can't make it match "one or none".
Glob only knows:
* => any characters (1-N) or no character at all
? => one character, but not none
[ab] => either a or b, but not none
[0-9] => any from 0-9, but not none
You're looking to match a number or no number in front of the first extension and then a number in the second extension if I understand you correctly.
import os
path = 'C:\\Users\\mastacheata\\test'
missingFiles1 = os.listdir(path)
missingFiles2 = []
for files in os.walk(path):
missingFiles2.append(files)
import glob
missingFiles3 = []
withRegex = path + "/sample*.*.[0-9]" # Actual file starts with an L
for files in glob.glob(withRegex):
missingFiles3.append(files)
print(missingFiles3)
In C:\Users\mastacheata\test there are 3 files: findfiles.py, sample1.txt.1 and sample.txt.1
This is the output:
['C:\\Users\\mastacheata\\test\\sample.txt.1', 'C:\\Users\\mastacheata\\test\\sample1.txt.1']
I'm new to programming. i need to index three separate txt files. And do a search from an input. When i do a print it gives me the entire path name. i would like to print the txt file name.
i've trying using os.list in the function
import os
import time
import string
import os.path
import sys
word_occurrences= {}
def index_text_file (txt_filename,ind_filename, delimiter_chars=",.;:!?"):
try:
txt_fil = open(txt_filename, "r")
fileString = txt_fil.read()
for word in fileString.split():
if word in word_occurrences:
word_occurrences[word] += 1
else:#
word_occurrences [word] = 1
word_keys = word_occurrences.keys()
print ("{} unique words found in".format(len(word_keys)),txt_filename)
word_keys = word_occurrences.keys()
sorted(word_keys)
except IOError as ioe: #if the file can't be opened
sys.stderr.write ("Caught IOError:"+ repr(ioe) + "/n")
sys.exit (1)
index_text_file("/Users/z007881/Documents/ABooks_search/CODE/booksearch/book3.txt","/Users/z007881/Documents/ABooks_search/CODE/booksearch/book3.idx")
SyntaxError: invalid syntax
(base) 8c85908188d1:CODE z007881$ python3 indexed.py
9395 unique words found in /Users/z007881/Documents/ABooks_search/CODE/booksearch/book3.t
xt
i would like it to say 9395 unique words found in book3.txt
One way to do it would be to split the path on the directory separator / and pick the last element:
file_name = txt_filename.split("/")[-1]
# ...
# Then:
print("{} unique words found in".format(len(word_keys)), file_name)
# I would prefer using an fstring, unless your Python version is too old:
print(f"{len(word_keys)} found in {file_name}")
I strongly advise to change the name of txt_filename into something less misleading like txt_filepath, since it does not contain a file name but a whole path (including, but not limited to, the file name).
I'm relatively new to Python and was working on a project where the user can navigate to a folder, after which the program does a count of all the files in that folder with a specific name.
The problem is that I have a folder with over 5000 files many of them sharing the same name but different extensions. I wrote code that somewhat does what I want the final version to do but its VERY redundant and I can't see myself doing this for over 600 file names.
Wanted to ask if it is possible to make this program "automated" or less redundant where I don't have to manually type out the names of 600 files to return data for.
Sample code I currently have:
import os, sys
print(sys.version)
file_counting1 = 0
file_counting2 = 0
filepath = input("Enter file path here: ")
if os.path.exists(filepath):
for file in os.listdir(filepath):
if file.startswith('expressmail'):
file_counting1 += 1
print('expressmail')
print('Total files found:', file_counting1)
for file in os.listdir(filepath):
if file.startswith('prioritymail'):
file_counting2 += 1
print('prioritymail')
print('Total files found:', file_counting2)
Sample Output:
expressmail
Total files found: 3
prioritymail
Total files found: 1
The following script will count occurrences of files with the same name. If the file does not have an extension, the whole filename is treated as the name. It also does not traverse subdirectories, since the original question just asks about files in the given folder.
import os
dir_name = "."
files = next(os.walk(dir_name))[2] # get all the files directly in the directory
names = [f[:f.rindex(".")] for f in files if "." in f] # drop the extensions
names += [f for f in files if "." not in f] # add those without extensions
for name in set(names): # for each unique name-
print("{}\nTotal files found: {}".format(name, names.count(name)))
If you want to support files in subdirectories, you could use something like
files = [os.path.join(r,file) for r,d,f in os.walk(dir_name) for file in f]
If you don't want to consider files without extensions, just remove the line:
names += [f for f in files if "." not in f]
There are a number of ways you can do what you're trying to do. Partly it depends on whether or not you need to recover the list of extension for a given duplicated file.
Counter, from the collections module - use this for a simple count of file. Ignore the extensions when building the count.
Use the filename without extension as a dictionary key, add a list of items as the key-value, where the list of items is each occurrence of the file.
Here's an example using the Counter class:
import os, sys, collections
c = collections.Counter()
for root, dirs,files in os.walk('/home/myname/hg/2018/'):
# discard any path data and just use filename
for names in files:
name, ext = os.path.splitext(names)
# discard any extension
c[name] += 1
# Counter.most_common() gives the values in the form of (entry, count)
# Counter.most_common(x) - pass a value to display only the top x counts
# e.g. Counter.most_common(2) = top 2
for x in c.most_common():
print(x[0] + ': ' + str(x[1]))
you can use regular expressions:
import os, sys, re
print(sys.version)
filepath = input("Enter file path here: ")
if os.path.exists(filepath):
allfiles = "\n".join(os.listdir(filepath))
file_counting1 = len(re.findall("^expressmail",allfiles,re.M))
print('expressmail')
print('Total files found:', file_counting1)
file_counting2 = len(re.findall("^prioritymail",allfiles,re.M))
print('prioritymail')
print('Total files found:', file_counting2)
I am trying to see if I can extract the file names from a os.listdir() output by omitting the '.csv' part in one single line for loop.
for example my list of file names look like this :
files = ['OPS020.csv','OPS340.csv',OPS230.csv','OPS349.csv']
Then all i could do was this
file_names = [f.split('.') for f in files]
file_names = [f[0] for f in file_names]
Is there a more elegant and shorter way to do this ?
the output i'm expecting is
file_names : ['OPS020','OPS340','OPS230','OPS349']
I guess, something like this would work.
from os import path
files = ['OPS020.csv','OPS340.csv','OPS230.csv','OPS349.csv']
filenames = [path.splitext(x)[0] for x in files]
Docs
I am trying to write a python program that takes n number of text files , each file contains names , each name on a separate line like this
Steve
Mark
Sarah
what the program does is that it prints out only the names that exist in all the inputted files .
I am new to programming so I don't really know how to implement this idea , but I thought in recursion , still the program seems to run in an infinite loop , I am not sure what's the problem . is the implementation wrong ? if so , do you have a better idea of how to implement it ?
import sys
arguments = sys.argv[1:]
files = {}
file = iter(arguments)
for number in range(len(sys.argv[1:])):
files[number] = open(next(file))
def close_files():
for num in files:
files[num].close()
def start_next_file(line,files,orderOfFile):
print('starting next file')
if orderOfFile < len(files): # to avoid IndexError
for line_searched in files[orderOfFile]:
if line_searched.strip():
line_searched = line_searched[:-1]
print('searched line = '+line_searched)
print('searched compared to = ' + line)
if line_searched == line:
#good now see if that name exists in the other files as well
start_next_file(line,files,orderOfFile+1)
elif orderOfFile >= len(files): # when you finish searching all the files
print('got ya '+line) #print the name that exists in all the files
for file in files:
# to make sure the cursor is at the beginning of the read files
#so we can loop through them again
files[file].seek(0)
def start_find_match(files):
orderOfFile = 0
for line in files[orderOfFile] :
# for each name in the file see if it exists in all other files
if line.strip():
line = line[:-1]
print ('starting line = '+line)
start_next_file(line,files,orderOfFile+1)
start_find_match(files)
close_files()
I'm not sure how to fix your code exactly but here's one conceptual way to think about it.
listdir gets all the files in the directory as a list. We narrow that to only .txt files. Next, open, read, split on newlines, and lower to make a larger list containing names. So, files will be a list of lists. Last, find the intersection across all lists using some set logic.
import os
folder = [f for f in os.listdir() if f[-4:] == '.txt']
files = []
for i,file in enumerate(folder):
with open(file) as f:
files.append([name.lower() for name in f.read().splitlines()])
result = set.intersection(*map(set, files))
Example:
#file1.txt
john
smith
mary
sue
pretesh
ashton
olaf
Elsa
#file2.txt
David
Lorenzo
Cassy
Grant
elsa
Felica
Salvador
Candance
Fidel
olaf
Tammi
Pasquale
#file3.txt
Jaleesa
Domenic
Shala
Berry
Pamelia
Kenneth
Georgina
Olaf
Kenton
Milly
Morgan
elsa
Returns:
{'olaf', 'elsa'}