for loop and file saving - python-3.x

I am using Jupyter and which is working on pyspark(python).
I have used "for" loop to iterate the process and trying to save the file after each iteration.
for example:
name = "mea"
for i in range(2):
print "name[i]"
i +=1
and output is:
name[i]
name[i]
this above algorithm is the short explaination related to the main algorithm that i am working on.
the problem is it is giving an output name[i] and I want it to give me name1 and for second iteration name[2].
I need to use " " because i wanted to save my file to specific folder and i need to speacify the path in " ". So after firsdt iteration it should save the file as name1 and after second iteration it should save the file as name[2].
enter image description here
so from image in my actual algorithm, result is the output that i am getiing after each for loop iteration and for each output, i wanted to save it in new files like result[0],result1,result[2] instead of result[i],result[i],result[i]. because the latter one, it is replacing the file to the old one.

I guess it has nothing specific to pyspark that you are trying to achieve. As per your example, what you need is - use of variable in strings,
so this will suffice for your example:
name = "mea"
for i in range(2):
print "name[%s]" % i
i +=1

You can modify your print statement as follows
print "name[" + str(i) + "]"

Related

Replacing "DoIt.py" script with flexible functions that match DFs on partial string matching of column names [Python3] [Pandas] [Merge]

I spent too much time trying to write a generic solution to a problem (below this). I ran into a couple issues, so I ended up writing a Do-It script, which is here:
# No imports necessary
# set file paths
annofh="/Path/To/Annotation/File.tsv"
datafh="/Path/To/Data/File.tsv"
mergedfh="/Path/To/MergedOutput/File.tsv"
# Read all the annotation data into a dict:
annoD={}
with open(annofh, 'r') as annoObj:
h1=annoObj.readline()
for l in annoObj:
l=l.strip().split('\t')
k=l[0] + ':' + l[1] + ' ' + l[3] + ' ' + l[4]
annoD[k]=l
keyset=set(annoD.keys())
with open(mergedfh, 'w') as oF:
with open(datafh, 'r') as dataObj:
h2=dataObj.readline().strip(); oF.write(h2 + '\t'+ h1) # write the header line to the output file
for l in dataObj:
l=l.strip().split('\t') # Read through the data to be annotated line-by-line:
if "-" in l[13]:
pos=l[13].split('-')
l[13]=pos[0]
key=l[12][3:] + ":" + l[13] + " " + l[15] + " " + l[16]
if key in annoD.keys():
l = l + annoD[key]
oF.write('\t'.join(l) + '\n')
else:
oF.write('\t'.join(l) + '\n')
The function of DoIt.py (which functions correctly, above ^ ) is simple:
first read a file containing annotation information into a dictionary.
read through the data to be annotated line-by-line, and add annotation info. to the data by matching a string constructed by pasting together 4 columns.
As you can see, this script contains index positions, that I obtained by writing a quick awk one-liner, finding the corresponding columns in both files, then putting these into the python script.
Here's the thing. I do this kind of task all the time. I want to write a robust solution that will enable me to automate this task, *even if column names vary. My first goal is to use partial string matching; but eventually it would be nice to be even more robust.
I got part of the way to doing this, but at present the below solution is actually no better than the DoIt.py script...
# Across many projects, the correct columns names vary.
# For example, the name might be "#CHROM" or "Chromosome" or "CHR" for the first DF, But "Chrom" for the second df.
# in any case, if I conduct str.lower() then search for a substring, it should match any of the above options.
MasterColNamesList=["chr", "pos", "ref", "alt"]
def selectFields(h, columnNames):
##### currently this will only fix lower case uppercase problems. need to fix to catch any kind of mapping issue, like a partial string match (e.g., chr will match #CHROM)
indices=[]
h=map(str.lower,h)
for fld in columnNames:
if fld in h:
indices.append(h.index(fld))
#### Now, this will work, but only if the field names are an exact match.
return(indices)
def MergeDFsByCols(DF1, DF2, colnames): # <-- Single set of colnames; no need to use indices
pass
# eventually, need to write the merge statement; I could paste the cols together to a string and make that the indices for both DFs, then match on the indices, for example.
def mergeData(annoData, studyData, MasterColNamesList):
####
import pandas as pd
aDF=pd.read_csv(annoData, header=True, sep='\t')
sDF=pd.read_csv(studyData, header=True, sep='\t')
####
annoFieldIdx=selectFields(list(aVT.columns.values), columnNames1) # currently, columnNames1; should be MasterColNamesList
dataFieldIdx=selectFields(list(sD.columns.values), columnNames2)
####
mergeDFsByCols(aVT, sD):
Now, although the above works, it is actually no more automated than the DoIt.py script, because the columnNames1 and 2 are specific to each file and still need to be found manually ...
What I want to be able to do is enter a list of generic strings that, if processed, will result in the correct columns being pulled from both files, then merge the pandas DFs on those columns.
Greatly appreciate your help.

Updating values in an external file only works if I restart the shell window

Hi there and thank you in advance for your response! I'm very new to python so please keep that in mind as you read through this, thanks!
So I've been working on some code for a very basic game using python (just for practice) I've written out a function that opens another file, selects a variable from it and adjusts that variable by an amount or if it's a string changes it into another string, the funtion looks like this.
def ovr(file, target, change):
with open(file, "r+") as open_file:
opened = open_file.readlines()
open_file.close()
with open(file, "w+") as open_file:
position = []
for appended_list, element in enumerate(opened):
if target in element:
position.append(appended_list)
if type(change) == int:
opened[position[0]] = (str(target)) + (" = ") + (str(change)) + (str("\n"))
open_file.writelines(opened)
open_file.close()
else:
opened[position[0]] = (str(target)) + (" = ") + ("'") + (str(change)) + ("'") + (str("\n"))
open_file.writelines(opened)
open_file.close()
for loop in range(5):
ovr(file = "test.py", target = "gold", change = gold + 1)
At the end I have basic loop that should re-write my file 5 times, each time increasing the amount of gold by 1. If I write this ovr() funtion outside of the loop and just run the program over and over it works just fine increasing the number in the external file by 1 each time.
Edit: I should mention that as it stands if I run this loop the value of gold increases by 1. if I close the shell and rerun the loop it increases by 1 again becoming 2. If I change the loop to happen any number of times it only ever increases the value of gold by 1.
Edit 2: I found a truly horrific way of fixing this isssue, if anyone has a better way for the love of god please let me know, code below.
for loop in range(3):
ovr(file = "test.py", target = "gold", change = test.gold + 1)
reload(test)
sleep(1)
print(test.gold)
The sleep part is because it takes longer to rewrite the file then it does to run the full loop.
you can go for a workaround and write your new inforamtion into a file called: file1
So you can use ur working loop outside of the write file. Anfter using your Loop you can just change the content of your file by the following steps.
This is how you dont need to rewrite your loop and still can change your file content.
first step:
with open('file.text', 'r') as input_file, open('file1.txt', 'w') as output_file:
for line in input_file:
output_file.write(line)
second step:
with open('file1.tex', 'r') as input_file, open('file.tex', 'w') as output_file:
for line in input_file:
if line.strip() == '(text'+(string of old value of variable)+'text)':
output_file.write('text'+(string of new value of variable)+' ')
else:
output_file.write(line)
then you have updated your text file.

I would like a way to have a "try again" for wrong user inputs. Is there a way to do this?

So I've got a list of files I'm looping over and a list of folders, I match my filenames to the folders that contain matching words and that works fine. My code will detect if there's a matching_folder for a file and tell me which one/s, then I can type the name of that folder and it will move it to that folder. It loops over files in the list, and it can be a large list sometimes. However, if I accidentally type the name of the folder wrong, which is a user input, it passes my file and moves on to the next one. Is there a way I can get my code to NOT move onto the next file. but instead prompt me again?
if len(matching_folders) >= 2:
print(f"There is MORE than one folder for {filename}" + "\n")
if not filename in files_to_move:
continue
for item in matching_folders:
print(item + "\n")
answer_2 = input(f"Type name of folder: " + "\n")
item_words = answer_2.lower().split(' ')
for folder in folder_list:
count = 0
folder_words = folder.lower().split(' ')
for word in item_words:
if word in folder_words:
count += 1
if count == 2:
folder_path = os.path.join(paths[1], folder)
destination_file_path = os.path.join(folder_path, filename)
shutil.move(source_file_path, destination_file_path)
print(f"File moved to --> {folder}")
Excuse me if this is bad code but I'm just learning. I am taking it one step at time. But again, if I make a typo, my code goes onto the next file in the loop (there's actually a for loop for filename one level above all my code, but there's a lot of other irrelevant stuff above it so I didn't include it). I want it to not go to the next file if I make a typo. Thanks.
Use os.path.isdir(user_inputted_folder) to control your logic. This will return True if the user_inputted_folder exists and is a directory (not a regular file).
https://docs.python.org/3/library/os.path.html#os.path.isdir
If the actual folder is named "Folder1", and you misinput "Folderrr1", then this will return False as long as "Folderrr1" doesn't also exist (if it exists, then this "typo" can't be caught).

how would i go about reading a .txt file then ordering the data in descending order

ok so i would like it that when the user wants to check the high scores the output would print the data in descending order keep in mind that there are both names and numbers on the .txt file which is why im finding this so hard. If there is anything else you need please tell me in the
def highscore():
global line #sets global variable
for line in open('score.txt'):
print(line)
#=================================================================================
def new_highscores():
global name, f #sets global variables
if score >= 1:#if score is equal or more than 1 run code below
name = input('what is your name? ')
f = open ('score.txt', 'a') #opens score.txt file and put it into append mode
f.write (str(name)) #write name on .txt file
f.write (' - ') #write - on .txt file
f.write (str(score)) #write score on .txt file
f.write ('\n') #signifies end of line
f.close() #closes .txtfile
if score <= 0: #if score is equal to zero go back to menu 2
menu2()
I added this just in case there was a problem in the way i was writing on the file
The easiest thing to do is just maintain the high scores file in a sorted state. That way every time you want to print it out, just go ahead and do it. When you add a score, sort the list again. Here's a version of new_highscores that accomplishes just that:
def new_highscores():
""""Adds a global variable score to scores.txt after asking for name"""
# not sure you need name and f as global variables without seeing
# the rest of your code. This shouldn't hurt though
global name, f # sets global variables
if score >= 1: # if score is equal or more than 1 run code below
name = input('What is your name? ')
# here is where you do the part I was talking about:
# get the lines from the file
with open('score.txt') as f:
lines = f.readlines()
scores = []
for line in lines:
name_, score_ = line.split(' - ')
# turn score_ from a string to a number
score_ = float(score_)
# store score_ first so that we are sorting by score_ later
scores.append((score_, name_))
# add the data from the user
scores.append((score, name))
# sort the scores
scores.sort(reverse=True)
# erase the file
with open('score.txt', 'w') as f:
# write the new data
for score_, name_ in scores:
f.write('{} - {}\n'.format(name_, score_))
if score <= 0: # if score is equal to zero go back to menu 2
menu2()
You'll notice I'm using the with statement. You can learn more about that here, but essentially it works like this:
with open(filename) as file:
# do stuff with file
# file is **automatically** closed once you get down here.
Even if you leave the block for another reason (an Exception is thrown, you return from a function early, etc.) Using a with statement is a safer way to deal with files, because you're basically letting Python handle the closing of the file for you. And Python will never forget like a programmer will.
You can read more about split and format here and here
P.S., There is a method called binary search that would be more efficient, but I get the feeling you're just starting so I wanted to keep it simple. Essentially what you would do is search for the location in the file where the new score should be inserted by halving the search area at each point. Then when you write back to the file, only write the stuff that's different (from the new score onward.)

Something's wrong with my Python code (complete beginner)

So I am completely new to Python and can't figure out what's wrong with my code.
I need to write a program that asks for the name of the existing text file and then of the other one, that doesn't necessarily need to exist. The task of the program is to take content of the first file, convert it to upper-case letters and paste to the second file. Then it should return the number of symbols used in the file(s).
The code is:
file1 = input("The name of the first text file: ")
file2 = input("The name of the second file: ")
f = open(file1)
file1content = f.read()
f.close
f2 = open(file2, "w")
file2content = f2.write(file1content.upper())
f2.close
print("There is ", len(str(file2content)), "symbols in the second file.")
I created two text files to check whether Python performs the operations correctly. Turns out the length of the file(s) is incorrect as there were 18 symbols in my file(s) and Python showed there were 2.
Could you please help me with this one?
Issues I see with your code:
close is a method, so you need to use the () operator otherwise f.close does not do what your think.
It is usually preferred in any case to use the with form of opening a file -- then it is close automatically at the end.
the write method does not return anything, so file2content = f2.write(file1content.upper()) is None
There is no reason the read the entire file contents in; just loop over each line if it is a text file.
(Not tested) but I would write your program like this:
file1 = input("The name of the first text file: ")
file2 = input("The name of the second file: ")
chars=0
with open(file1) as f, open(file2, 'w') as f2:
for line in f:
f2.write(line.upper())
chars+=len(line)
print("There are ", chars, "symbols in the second file.")
input() does not do what you expect, use raw_input() instead.

Resources