python use for loop to modify list of variables - python-3.x

I have a script using argparse to gather a list of user defined directories. On the command line they may or may not specify a trailing "/" symbol. I'd like to do something up front so that all variables have the trailing "/" so I can reliably do:
# What I want:
with open(args.a + filename, "w") as fileout:
#do stuff
print('whatever', file=fileout)
rather than having to include an extra "/" in the name like this:
# What I have:
with open(args.a + "/" + filename, "w") as fileout:
#do stuff
print('whatever', file=fileout)
I also know that dir/ect/ory and dir//ect//ory are nearly equivalent save some fringe cases which are not applicable, but putting + "/" + all over the place seems wrong/wasteful.
In attempting to make a small function to run on all relevant variable I'm only seeing the desired outcome when I explicitly call the function on the variable not on a list containing the elements.
def trailingSlash(x):
if x.endswith("/"):
return x
else:
return x + "/"
a = 'ok/'
b = 'notok'
c = 'alsonotok'
for _ in [a, b, c]:
_ = trailingSlash(_)
print(a,b,c) #gives ok/ notok alsonotok
c = trailingSlash(c)
print(c) #gives alsonotok/
I understand why changing a list as you are iterating over it is generally bad, and understand that in the for loop the iterator is not actually pointing to a, b, or c. I also know if I wanted the values in a new list i could do something like [trailingSlash(x) for x [a,b,c]] but I need to maintain the a,b,c handle. in I know that I can also solve this by specifically calling x = trailingSlash(x) on every individual variable, but seems like there should be a better way. Any solutions I'm missing?

You can use os.path.join() to ignore the whole issue. It behaves no matter whether there are slashes at the end or not, and is platform-independent as a bonus (that is, it uses \\ instead of / when running on Windows, for example):
import os
...
os.path.join("dir/", "ect", "ory")
# "dir/ect/ory" on Unix, "dir\\ect\\ory" on Windows
In your case you'd want to do
with open(os.path.join(args.a, filename), "w") as fileout:
...

Related

Trying to dynamically change part of a function call

I might be (probably am) using the wrong terminology for some of this, but here's what I'm trying to do.
This is the current code:
with open('matchup' + MatchupStr_Test + '_' + HomeAway[MatchupNum_Test] + 'team_name_NewTest.txt', 'w') as f:
f.write(str(box_scores[MatchupNum[MatchupNum_Test]].home_team.team_name))
I am attempting to take the word home from the second line (.home_team.team_name) and having it dynamically change so the word home would change depending on an array.
Here's what I thought would work, but doesn't.
with open('matchup' + MatchupStr_Test + '_' + HomeAway[MatchupNum_Test] + 'team_name_NewTest.txt', 'w') as f:
f.write(str(box_scores[MatchupNum[MatchupNum_Test]].[HomeAway[MatchupNum_Test]]_team.team_name))
Absolute beginner here, so sorry if I'm wording this in a confusing way. Just trying to have some fun on a Raspbi I wasn't currently using, so it doesn't have to be perfect.
There is a function called getattr, which should work for this. The way getattr works is that it dynamically retrieves a property of an object using the name of the property.
For example:
team_name = HomeAway[MatchupNum_Test]
with open(f'matchup{MatchupStr_Test}_{team_name}team_name_NewTest.txt', 'w') as f:
opposing_team = getattr(box_scores[MatchupNum[MatchupNum_Test]], f'{team_name}_team')
f.write(str(opposing_team.team_name))
In the above, we dynamically retrieve the attribute f'{team_name}_team' from the object box_scores[MatchupNum[MatchupNum_Test]].
So if team_name was "home" for example, then getattr(box_scores[MatchupNum[MatchupNum_Test]], f'{team_name}_team') will be equivalent to box_scores[MatchupNum[MatchupNum_Test]].home_team
The f in front of the strings has nothing to do with the f representing the file. They are called f-strings and are a much nicer way of combining strings, than using +.

Replacing "DoIt.py" script with flexible functions that match DFs on partial string matching of column names [Python3] [Pandas] [Merge]

I spent too much time trying to write a generic solution to a problem (below this). I ran into a couple issues, so I ended up writing a Do-It script, which is here:
# No imports necessary
# set file paths
annofh="/Path/To/Annotation/File.tsv"
datafh="/Path/To/Data/File.tsv"
mergedfh="/Path/To/MergedOutput/File.tsv"
# Read all the annotation data into a dict:
annoD={}
with open(annofh, 'r') as annoObj:
h1=annoObj.readline()
for l in annoObj:
l=l.strip().split('\t')
k=l[0] + ':' + l[1] + ' ' + l[3] + ' ' + l[4]
annoD[k]=l
keyset=set(annoD.keys())
with open(mergedfh, 'w') as oF:
with open(datafh, 'r') as dataObj:
h2=dataObj.readline().strip(); oF.write(h2 + '\t'+ h1) # write the header line to the output file
for l in dataObj:
l=l.strip().split('\t') # Read through the data to be annotated line-by-line:
if "-" in l[13]:
pos=l[13].split('-')
l[13]=pos[0]
key=l[12][3:] + ":" + l[13] + " " + l[15] + " " + l[16]
if key in annoD.keys():
l = l + annoD[key]
oF.write('\t'.join(l) + '\n')
else:
oF.write('\t'.join(l) + '\n')
The function of DoIt.py (which functions correctly, above ^ ) is simple:
first read a file containing annotation information into a dictionary.
read through the data to be annotated line-by-line, and add annotation info. to the data by matching a string constructed by pasting together 4 columns.
As you can see, this script contains index positions, that I obtained by writing a quick awk one-liner, finding the corresponding columns in both files, then putting these into the python script.
Here's the thing. I do this kind of task all the time. I want to write a robust solution that will enable me to automate this task, *even if column names vary. My first goal is to use partial string matching; but eventually it would be nice to be even more robust.
I got part of the way to doing this, but at present the below solution is actually no better than the DoIt.py script...
# Across many projects, the correct columns names vary.
# For example, the name might be "#CHROM" or "Chromosome" or "CHR" for the first DF, But "Chrom" for the second df.
# in any case, if I conduct str.lower() then search for a substring, it should match any of the above options.
MasterColNamesList=["chr", "pos", "ref", "alt"]
def selectFields(h, columnNames):
##### currently this will only fix lower case uppercase problems. need to fix to catch any kind of mapping issue, like a partial string match (e.g., chr will match #CHROM)
indices=[]
h=map(str.lower,h)
for fld in columnNames:
if fld in h:
indices.append(h.index(fld))
#### Now, this will work, but only if the field names are an exact match.
return(indices)
def MergeDFsByCols(DF1, DF2, colnames): # <-- Single set of colnames; no need to use indices
pass
# eventually, need to write the merge statement; I could paste the cols together to a string and make that the indices for both DFs, then match on the indices, for example.
def mergeData(annoData, studyData, MasterColNamesList):
####
import pandas as pd
aDF=pd.read_csv(annoData, header=True, sep='\t')
sDF=pd.read_csv(studyData, header=True, sep='\t')
####
annoFieldIdx=selectFields(list(aVT.columns.values), columnNames1) # currently, columnNames1; should be MasterColNamesList
dataFieldIdx=selectFields(list(sD.columns.values), columnNames2)
####
mergeDFsByCols(aVT, sD):
Now, although the above works, it is actually no more automated than the DoIt.py script, because the columnNames1 and 2 are specific to each file and still need to be found manually ...
What I want to be able to do is enter a list of generic strings that, if processed, will result in the correct columns being pulled from both files, then merge the pandas DFs on those columns.
Greatly appreciate your help.

I'm trying to 'shuffle' a folder of music and there is an error where random.choice() keeps choosing things that it is supposed to have removed

I'm trying to make a python script that renames files randomly from a list and I used numbers.remove(place) on it but it keeps choosing values that are supposed to have been removed.
I used to just use random.randint but now I have moved to choosing from a list then removing the chosen value from the list but it seems to keep choosing chosen values.
'''python
from os import chdir, listdir, rename
from random import choice
def main():
chdir('C:\\Users\\user\\Desktop\\Folders\\Music')
for f in listdir():
if f.endswith('.mp4'):
numbers = [str(x) for x in range(0, 100)]
had = []
print(f'numbers = {numbers}')
place = choice(numbers)
print(f'place = {place}')
numbers.remove(place)
print(f'numbers = {numbers}')
while place in had:
input('Place has been had.')
place = choice(numbers)
had.append(place)
name = place + '.mp4'
print(f'name = {name}')
print(f'\n\nRenaming {f} to {name}.\n\n')
try:
rename(f, name)
except FileExistsError:
pass
if __name__ == '__main__':
main()
'''
It should randomly number the files without choosing the same value for a file twice but it does that and I have no idea why.
When you call listdir() the first time, that's the same list that you're iterating over the entire time. Yes, you're changing the contents of the directory, but python doesn't really care about that because you only asked for the contents of the directory at a specific point in time - before you began modifying it.
I would do this in two separate steps:
# get the current list of files in the directory
dirlist = os.listdir()
# choose a new name for each file
to_rename = zip(
dirlist,
[f'{num}.mp4' for num in random.sample(range(100), len(dirlist))]
)
# actually rename each file
for oldname, newname in to_rename:
try:
os.rename(oldname, newname)
except FileExistsError:
pass
This method is more concise than the one you're using. First, I use random.sample() on the iterable range(100) to generate non-overlapping numbers from that range (without having to do the extra step of using had like you're doing now). I generate exactly as many as I need, and then use the built-in zip() function to bind together the original filenames with these new numbers.
Then, I do the rename() operations all at once.

os.path.exists() always returns false

I am trying to check if a file exits or not in the specified directory. If it is, then I would move the file to another directory. Here is my code
def move(pnin, pno):
if (os.path.exists(pnin)):
shutil.move(pnin, pno)
here is an example of pnin and pno
pnin='D:\\extracted\\extrimg_2016000055202500\\2016000055202500_65500000007006_11_6.png'
pno=D:\folder\discarded
I have a bit more than 8000 input directories. I copied this pnin from the output of print(pnin).When I define pnin externally as in the example, the if statement works. But when I want to run 'move' function iteratively, if statement is never executed. What could be the problem and how can I solve this?
Here is how I call move function:
def clean_Data(inputDir, outDir):
if (len(listf) > 1):
for l in range(1,len(listf)):
fname = hashmd5[m][l]
pathnamein = os.path.join(inputDir, fname)
pathnamein = "%r"%pathnamein
pathnameout = outfile
move(pathnamein, pathnameout)
When I try below code it does not give any output. For loop şs working. When I use print(pathnamein) in the for loop it shows all the values of pathnamein.
def move(pnin, pno):
os.path.exists(pnin)
You should use backslash to escape backslashes in your pno string:
pno='D:\\folder\\discarded'
or use a raw string instead:
pno=r'D:\folder\discarded'
Otherwise \f would be considered a formfeed character.

Python changing file name

My application offers the ability to the user to export its results. My application exports text files with name Exp_Text_1, Exp_Text_2 etc. I want it so that if a file with the same file name pre-exists in Desktop then to start counting from this number upwards. For example if a file with name Exp_Text_3 is already in Desktop, then I want the file to be created to have the name Exp_Text_4.
This is my code:
if len(str(self.Output_Box.get("1.0", "end"))) == 1:
self.User_Line_Text.set("Nothing to export!")
else:
import os.path
self.txt_file_num = self.txt_file_num + 1
file_name = os.path.join(os.path.expanduser("~"), "Desktop", "Exp_Txt" + "_" + str(self.txt_file_num) + ".txt")
file = open(file_name, "a")
file.write(self.Output_Box.get("1.0", "end"))
file.close()
self.User_Line_Text.set("A text file has been exported to Desktop!")
you likely want os.path.exists:
>>> import os
>>> help(os.path.exists)
Help on function exists in module genericpath:
exists(path)
Test whether a path exists. Returns False for broken symbolic links
a very basic example would be create a file name with a formatting mark to insert the number for multiple checks:
import os
name_to_format = os.path.join(os.path.expanduser("~"), "Desktop", "Exp_Txt_{}.txt")
#the "{}" is a formatting mark so we can do file_name.format(num)
num = 1
while os.path.exists(name_to_format.format(num)):
num+=1
new_file_name = name_to_format.format(num)
this would check each filename starting with Exp_Txt_1.txt then Exp_Txt_2.txt etc. until it finds one that does not exist.
However the format mark may cause a problem if curly brackets {} are part of the rest of the path, so it may be preferable to do something like this:
import os
def get_file_name(num):
return os.path.join(os.path.expanduser("~"), "Desktop", "Exp_Txt_" + str(num) + ".txt")
num = 1
while os.path.exists(get_file_name(num)):
num+=1
new_file_name = get_file_name(num)
EDIT: answer to why don't we need get_file_name function in first example?
First off if you are unfamiliar with str.format you may want to look at Python doc - common string operations and/or this simple example:
text = "Hello {}, my name is {}."
x = text.format("Kotropoulos","Tadhg")
print(x)
print(text)
The path string is figured out with this line:
name_to_format = os.path.join(os.path.expanduser("~"), "Desktop", "Exp_Txt_{}.txt")
But it has {} in the place of the desired number. (since we don't know what the number should be at this point) so if the path was for example:
name_to_format = "/Users/Tadhg/Desktop/Exp_Txt_{}.txt"
then we can insert a number with:
print(name_to_format.format(1))
print(name_to_format.format(2))
and this does not change name_to_format since str objects are Immutable so the .format returns a new string without modifying name_to_format. However we would run into a problem if out path was something like these:
name_to_format = "/Users/Bob{Cat}/Desktop/Exp_Txt_{}.txt"
#or
name_to_format = "/Users/Bobcat{}/Desktop/Exp_Txt_{}.txt"
#or
name_to_format = "/Users/Smiley{:/Desktop/Exp_Txt_{}.txt"
Since the formatting mark we want to use is no longer the only curly brackets and we can get a variety of errors:
KeyError: 'Cat'
IndexError: tuple index out of range
ValueError: unmatched '{' in format spec
So you only want to rely on str.format when you know it is safe to use. Hope this helps, have fun coding!

Resources