Iterate over images with pattern - python-3.x

I have thousands of images which are labeled IMG_####_0 where the first image is IMG_0001_0.png the 22nd is IMG_0022_0.png, the 100th is IMG_0100_0.png etc. I want to perform some tasks by iterating over them.
I used this fnames = ['IMG_{}_0.png'.format(i) for i in range(150)] to iterate over the first 150 images but I get this error FileNotFoundError: [Errno 2] No such file or directory: '/Users/me/images/IMG_0_0.png' which suggests that it is not the correct way to do it. Any ideas about how to capture this pattern while being able to iterate over the specified number of images i.e in my case from IMG_0001_0.png to IMG_0150_0.png

fnames = ['IMG_{0:04d}_0.png'.format(i) for i in range(1,151)]
print(fnames)
for fn in fnames:
try:
with open(fn, "r") as reader:
# do smth here
pass
except ( FileNotFoundError,OSError) as err:
print(err)
Output:
['IMG_0000_0.png', 'IMG_0001_0.png', ..., 'IMG_0148_0.png', 'IMG_0149_0.png']
Dokumentation: string-format()
and format mini specification.
'{:04d}' # format the given parameter with 0 filled to 4 digits as decimal integer
The other way to do it would be to create a normal string and fill it with 0:
print(str(22).zfill(10))
Output:
0000000022
But for your case, format language makes more sense.

You need to use a format pattern to get the format you're looking for. You don't just want the integer converted to a string, you specifically want it to always be a string with four digits, using leading 0's to fill in any empty space. The best way to do this is:
'IMG_{:04d}_0.png'.format(i)
instead of your current format string. The result looks like this:
In [2]: 'IMG_{:04d}_0.png'.format(3)
Out[2]: 'IMG_0003_0.png'

generate list of possible names and try if exist is slow and horrible way to iterate over files.
try look to https://docs.python.org/3/library/glob.html
so something like:
from glob import iglob
filenames = iglob("/path/to/folder/IMG_*_0.png")

Related

F string is adding new line

I am trying to make a name generator. I am using F string to concatenate the first and the last names. But instead of getting them together, I am getting them in a new line.
print(f"Random Name Generated is:\n{random.choice(firstname_list)}{random.choice(surname_list)}")
This give the output as:
Random Name Generated is:
Yung
heady
Instead of:
Random Name Generated is:
Yung heady
Can someone please explain why so?
The code seems right, perhaps could be of newlines (\n) characters in element of list.
Check the strings of lists.
import random
if __name__ == '__main__':
firstname_list = ["yung1", "yung2", "yung3"]
surname_list = ["heady1", "heady2", "heady3"]
firstname_list = [name.replace('\n', '') for name in firstname_list]
print(f"Random Name Generated is:\n{random.choice(firstname_list)} {random.choice(surname_list)}")
Output:
Random Name Generated is:
yung3 heady2
Since I had pulled these values from UTF-8 encoded .txt file, the readlines() did convert the names to list elements but they had a hidden '\xa0\n' in it.
This caused this particular printing problem. Using .strip() helped to remove the spaces.
print(f"Random Name Generated is:\n{random.choice(firstname_list).strip()} {random.choice(surname_list).strip()}")

How do I delete rows in one CSV based on another CSV

I am working with two CSV files, both contain only one column of data, but are over 50,000 rows. I need to compare the data from CSV1 against CSV2 and remove any data that displays in both of these files. I would like to print out the final list of data as a 3rd CSV file if possible.
The CSV files contain usernames. I have tried running deduplication scripts but realize that this does not remove entries found in both CSV files entirely since it only removes the duplication of a username. This is what I have been currently working with but I can already tell that this isn't going to give me the results I am looking for.
import csv
AD_AccountsCSV = open("AD_Accounts.csv", "r")
BA_AccountsCSV = open("BA_Accounts.csv", "r+")
def Remove(x,y):
final_list =[]
for item in x:
if item not in y:
final_list.append(item)
for i in y:
if i not in x:
final_list.append(i)
print (final_list)
The way that I wrote this code would print the results within the terminal after running the script but I realize that my output may be around 1,000 entries.
# define the paths
fpath1 = "/path/to/file1.csv"
fpath2 = "/path/to/file2.csv"
fpath3 = "/path/to/your/file3.csv"
with open(fpath1) as f1, open(fpath2) as f2, open(fpath3, "w") as f3:
l1 = f1.readlines()
l2 = f2.readlines()
not_in_both = [x for x in set(l1 + l2) if x in l1 and x in l2]
for x in not_in_both:
print(x, file=f3)
The with open() as ... clause takes care of closing the file.
You can combine several file openings under with.
Assuming, that the elements in the files are the only elements per line, I used simple readlines() (which automatically removes the newline character at the end). Otherwise it becomes more complicated in this step.
List-expressions make it nice to filter lists by conditions.
Default end='\n' in print() adds newline at end of each print.
In the way you did
For formatting code, please follow official style guides, e.g.
https://www.python.org/dev/peps/pep-0008/
def select_exclusive_accounts(path_to_f1,path_to_f2, path_to_f3):
# you have quite huge indentations - use 4 spaces!
with open(path_to_f1) as f1, open(path_to_f2) as f2, \
open(path_to_f3, "w") as f3:
for item in in_f1:
if item not in in_f2:
f3.write(item)
for i in in_f2:
if i not in in_f1:
f3.write(item)
select_exclusive_accounts("AD_Accounts.csv",
"BA_Accounts.csv",
"exclusive_accounts.csv")
Also here no imports not needed because these are standard Python commands.

Python 3 img2pdf wrong order of images in pdf

I am working on a small program that takes images from a website and puts them into a pdf for easy access and simpler viewing.
I have a small problem as the img2pdf module seems to put the images into the pdf in the wrong order and I don't really get why.
It seems to put the files in order of 1,10,11.
import urllib.request
import os
import img2pdf
n = 50
all = 0
for counter in range(1,n+1):
all = all + 1
urllib.request.urlretrieve("https://website/images/"+str(all)+".jpg", "img"+str(all)+".jpg")
cwd = os.getcwd()
if all == 50:
with open("output2.pdf", "wb") as f:
f.write(img2pdf.convert([i for i in os.listdir(cwd) if i.endswith(".jpg")]))
Without seeing the filenames you're trying to read in, a guess is that your filenames include numbers that are not zero-padded. Lexicographic ordering (sorting in alphabetical order) of a sequence of files called 0.jpg, 1.jpg, ... 11.jpg will lead to this ordering: 0.jpg, 1.jpg, 10.jpg, 11.jpg, 2.jpg, 3.jpg, 4.jpg, 5.jpg, 6.jpg, 7.jpg, 8.jpg, 9.jpg, because "1" < "2".
To combine your files such that 2 comes before 10, you can zero-pad the filenames (but also beware that some software will interpret leading zeros as indicators of an octal representation of a number, as opposed to just a leading zero.)
If you can't manipulate the filenames, then you could change your file-getting code as follows: use a regular expression to extract the numbers, as int type, from the filenames of your entire list of files, then sort the list of filenames by those extracted numbers (which will be sorted as int, for which 2 < 10).

Unicode manipulation and garbage '[]' characters

I have a 4GB text file which I can't even load to view so I'm trying to separate it but I need to manipulate the data a bit at a time.
The problem is I'm getting these garbage white vertical rectangular characters and I can't search for what they are in a search engine because it won't paste nor can I get rid of them.
They look like these square parenthesis '[]' but without that small amount of space in the middle.
Their Unicode values differ so I can't just select one value and get rid of it.
I want to get rid of all of these rectangles.
Two more questions.
1) Why are there any Unicode characters here (in the img below) at all? I decoded them. What am I missing? Note: Later on I get string output that looks like a normal string such as 'code1234' etc but there are those Unicode exceptions there as well.
2) Can you see why larger end values would get this exception list index out of range? This only happens towards the end of the range and it isn't constant i.e. if end is 100 then maybe the last 5 will throw that exception but if end is 1000 then ONLY the LAST let's say 10 throw that exception.
Some code:
from itertools import islice
def read_from_file(file, start, end):
with open(file,'rb') as f:
for line in islice(f, start, end):
data.append(line.strip().decode("utf-8"))
for i in range(len(data)-1):
try:
if '#' in data[i]:
a = data.pop(i)
mail.append(a)
else:
print(data[i], data[i].encode())
except Exception as e:
print(str(e))
data = []
mail = []
read_from_file('breachcompilationuniq.txt', 0, 10)
Some Output:
Image link here as it won't let me format after pasting.
There's also this stuff later on, I don't know what these are either.
It appears that you have a text file which is not in the default encoding assumed by python (UTF-8), but nevertheless uses bytes values in the range 128-255. Try:
f = open(file, encoding='latin_1')
content = f.read()

Skipping over array elements of certain types

I have a csv file that gets read into my code where arrays are generated out of each row of the file. I want to ignore all the array elements with letters in them and only worry about changing the elements containing numbers into floats. How can I change code like this:
myValues = []
data = open(text_file,"r")
for line in data.readlines()[1:]:
myValues.append([float(f) for f in line.strip('\n').strip('\r').split(',')])
so that the last line knows to only try converting numbers into floats, and to skip the letters entirely?
Put another way, given this list,
list = ['2','z','y','3','4']
what command should be given so the code knows not to try converting letters into floats?
You could use try: except:
for i in list:
try:
myVal.append(float(i))
except:
pass

Resources