Assigning a name to an output file with a string variable - python-3.x

I'm writing a program to concatenate 3 csv files into a single new output csv file. As part of this I have to ask the user for the name they want to use for the output filename. My problem is that my output filename is declared in the arguments of the function and therefore, I get an error on the first line of this function because myFile is not declared (see 2 lines later for the declaration).
def concatenate(indir="C:\\Conc", outfile="C:\\Conc\\%s.csv" %myFile):
os.chdir(indir)
myFile = input("Please type the name of the file: ")
fileList=glob.glob("*.csv")
dfList=[]
colnames=["Symbol", "Name", "LastSale", "MarketCap", "ADR TSO", "IPOYear", "Sector", "Industry", "Summary Quote", " "]
for filename in fileList:
print("merging " + filename + "...")
df=pandas.read_csv(filename, skiprows=1,header=None)
dfList.append(df)
concatDf=pandas.concat(dfList, axis=0)
concatDf.columns=colnames
concatDf = concatDf.sort_values(by=["Symbol"], axis=0, ascending=True)
concatDf.to_csv(outfile, index=None)
print("Completed. Your merged file can be found at: " + outfile + "\n")
This function is called from a menu function (as below) so I was wondering if it's possible to pass it from the menu??
if choice == "1":
myFile = input("Please type the name of the file: ")
concatenate(myFile)
menu()
But neither options seem to work because I always get an error saying that myFile is not declared. I know this is basic stuff but I'm scratching my head trying to figure it out.

Your reputation suggests you're new to stack overflow and your name suggests you're new to python. So welcome to both! Here's a quite verbose answer to hopefully make things clear.
There are a few issues here:
concatenate() takes two arguments, indir and outdir, but you're only passing one argument when you're calling it: concatenate(myFile). Your arguments are what is called keyword arguments, since you've given them names (i.e. indir and outdir). When you're only passing one argument (myFile), without using the keyword, the myFile variable is passed as the first argument, which, in this case, is the indir (this doesn't happen yet in your code, as the error you're getting precedes it and stops it from being executed). However, you seem to want the myFile variable assigned to your outfile argument. You can achieve this by being explicit in your call to concatenate() like so: concatenate(outfile=myFile). Now, your indir argument remains the default you've set (i.e. "C:\\Conc"), but your outfile argument has the value of myFile. This would, however, not fix your problem (detailed in point 2). I suggest you change the outfile argument to represent an output directory (see full example below).
This is where you get an error. Your concatenate() function has no knowledge of the myFile variable you've declared in your if-statement, as the function and your if-statement are different scopes. That's why function arguments exist, to pass data from one scope (your if-statement) to another scope (your function). So you get an error in your function saying that myFile is not declared because, to the function's eyes, it doesn't exist. It can only see indir and outfile, not myFile. Also, don't do string manipulation in arguments. I'm not sure how it works, but it surely doesn't work how you expect in this case.
You're asking the user for myFile twice. Once in the if-statement and once in your concatenate() function. You only need to ask for it once, I suggest to keep it in the function only.
Small correction: You should combine directory paths and filenames in a platform-independent manner. The os module has a function for that (os.path.join(), see example below). Paths in Windows and paths in Linux look different, the os module handles those differences for us:)
Here's a final suggestion, with comments, addressing all points:
# Import your modules
import pandas
import os # <-- We're gonna need this
# Your code
# ...
# The concatenate function, with arguments having a default value, allowing to optionally specify the input directory and the output *directory* (not output file)
def concatenate(indir="C:\\Conc", outdir="C:\\Conc"):
os.chdir(indir)
myFile = input("Please type the name of the file: ") # <-- We ask for the output filename here
# Your code
fileList=glob.glob("*.csv")
dfList=[]
colnames=["Symbol", "Name", "LastSale", "MarketCap", "ADR TSO", "IPOYear", "Sector", "Industry", "Summary Quote", " "]
for filename in fileList:
print("merging " + filename + "...")
df=pandas.read_csv(filename, skiprows=1,header=None)
dfList.append(df)
concatDf=pandas.concat(dfList, axis=0)
concatDf.columns=colnames
concatDf = concatDf.sort_values(by=["Symbol"], axis=0, ascending=True)
# Saving the data
# First, we need to create the full output path, i.e. the output directory + output filename. We use os.join.path() for that
output_path = os.path.join(outdir, myFile)
# The rest of your code
concatDf.to_csv(output_path, index=None) # <-- Note output_path, not outfile
print("Completed. Your merged file can be found at: " + output_path + "\n")
# The if-statement calling the ´concatenate()´ function
if choice == "1":
# We're calling concatenate() with no arguments, since we're asking for the filename within the function.
# You could, however, ask the user for the input directory or output directory and pass that along, like this:
# input_directory_from_user = input("Please type the path to the input directory: ")
# output_directory_from_user = input("Please type the path to the output directory: ")
# concatenate(indir=input_directory_from_user, outdir=output_directory_from_user)
concatenate()
menu()

Related

os.path.exists() always returns false

I am trying to check if a file exits or not in the specified directory. If it is, then I would move the file to another directory. Here is my code
def move(pnin, pno):
if (os.path.exists(pnin)):
shutil.move(pnin, pno)
here is an example of pnin and pno
pnin='D:\\extracted\\extrimg_2016000055202500\\2016000055202500_65500000007006_11_6.png'
pno=D:\folder\discarded
I have a bit more than 8000 input directories. I copied this pnin from the output of print(pnin).When I define pnin externally as in the example, the if statement works. But when I want to run 'move' function iteratively, if statement is never executed. What could be the problem and how can I solve this?
Here is how I call move function:
def clean_Data(inputDir, outDir):
if (len(listf) > 1):
for l in range(1,len(listf)):
fname = hashmd5[m][l]
pathnamein = os.path.join(inputDir, fname)
pathnamein = "%r"%pathnamein
pathnameout = outfile
move(pathnamein, pathnameout)
When I try below code it does not give any output. For loop şs working. When I use print(pathnamein) in the for loop it shows all the values of pathnamein.
def move(pnin, pno):
os.path.exists(pnin)
You should use backslash to escape backslashes in your pno string:
pno='D:\\folder\\discarded'
or use a raw string instead:
pno=r'D:\folder\discarded'
Otherwise \f would be considered a formfeed character.

for loop and file saving

I am using Jupyter and which is working on pyspark(python).
I have used "for" loop to iterate the process and trying to save the file after each iteration.
for example:
name = "mea"
for i in range(2):
print "name[i]"
i +=1
and output is:
name[i]
name[i]
this above algorithm is the short explaination related to the main algorithm that i am working on.
the problem is it is giving an output name[i] and I want it to give me name1 and for second iteration name[2].
I need to use " " because i wanted to save my file to specific folder and i need to speacify the path in " ". So after firsdt iteration it should save the file as name1 and after second iteration it should save the file as name[2].
enter image description here
so from image in my actual algorithm, result is the output that i am getiing after each for loop iteration and for each output, i wanted to save it in new files like result[0],result1,result[2] instead of result[i],result[i],result[i]. because the latter one, it is replacing the file to the old one.
I guess it has nothing specific to pyspark that you are trying to achieve. As per your example, what you need is - use of variable in strings,
so this will suffice for your example:
name = "mea"
for i in range(2):
print "name[%s]" % i
i +=1
You can modify your print statement as follows
print "name[" + str(i) + "]"

How to write output of os.walk() to a file in python 3

Below Python code will read the "/home/sam" and traverse it using os.walk().
The three attributes that we get from os.walk(), that will be read using the "for" loop and then will be written to the file "Dir_traverse_date.txt"
My problem is when the program is done executing the code. The only word written to the file "Dir_traverse_date.txt" is -- None.
How to fix this ? How to get the output of the function into the text file
================================CODE=====================================
import os
def dir_trav():
os.chdir("/home/sam")
print("Current Directory", os.getcwd())
for dirpath,dirname,filename in os.walk(os.getcwd()):
print ("Directory Path ----> ", dirpath)
print ("Directory Name ----> ", dirname)
print ("File Name ----> ", filename)
return
funct_out=dir_trav()
new_file=open('Dir_traverse_date.txt','w')
new_file.write(str(funct_out))
new_file.close()
========================================================================
In Python return must be followed by the object you wish the function to return. You can begin by manually placing a hard coded string in the return line. For example return "To Sender" Your file should now contain the text "To Sender" instead of "None". Try this with a few other strings or even numbers. Regardless of where you run os.walk your output will always be the same. What matters is what you place beside return.
Your goal is to construct a string from the the data gathered for you by os.walk and return it. I see that you are already printing some of the data. Let's begin fixing this by just gathering file names. Start off with an empty string and then accumulate your output with the += operator.
def dir_trav():
os.chdir("/home/sam")
print("Current Directory", os.getcwd())
output = ''
for dirpath, dirname, filename in os.walk(os.getcwd()):
output += filename
return output
Now, you'll notice that your output will change to include filenames, but they'll all be stuck together end to end (e.g. file1file2file3) This is because we need to ensure that we insert newlines at after each piece of data we are extracting.
def dir_trav():
os.chdir("/home/sam")
print("Current Directory", os.getcwd())
output = ''
for dirpath, dirname, filename in os.walk(os.getcwd()):
output += filename + '\n'
return output
From this point you should be able to move closer to the results you were looking for. String concatenation (+) is not the most efficient method for building strings from multiple data, but it will serve your purposes.
Note: Functions in Python can return multiple values, but they are technically a bound in a single object that is essentially a tuple.
You didn't return anything. Aren't functions NoneTypes?
import os
def dir_trav():
os.chdir("/home/sam")
print("Current Directory: ", os.getcwd())
data = []
for dirpath, dirnames, filenames in os.walk(os.getcwd()):
for name in filenames:
filename = os.path.join(dirpath, name)
data.append(filename)
return data
new_file = open('Dir_traverse_date.txt', 'w')
for filename in dir_trav():
new_file.write(filename)
new_file.write('\n');
new_file.close()

Python changing file name

My application offers the ability to the user to export its results. My application exports text files with name Exp_Text_1, Exp_Text_2 etc. I want it so that if a file with the same file name pre-exists in Desktop then to start counting from this number upwards. For example if a file with name Exp_Text_3 is already in Desktop, then I want the file to be created to have the name Exp_Text_4.
This is my code:
if len(str(self.Output_Box.get("1.0", "end"))) == 1:
self.User_Line_Text.set("Nothing to export!")
else:
import os.path
self.txt_file_num = self.txt_file_num + 1
file_name = os.path.join(os.path.expanduser("~"), "Desktop", "Exp_Txt" + "_" + str(self.txt_file_num) + ".txt")
file = open(file_name, "a")
file.write(self.Output_Box.get("1.0", "end"))
file.close()
self.User_Line_Text.set("A text file has been exported to Desktop!")
you likely want os.path.exists:
>>> import os
>>> help(os.path.exists)
Help on function exists in module genericpath:
exists(path)
Test whether a path exists. Returns False for broken symbolic links
a very basic example would be create a file name with a formatting mark to insert the number for multiple checks:
import os
name_to_format = os.path.join(os.path.expanduser("~"), "Desktop", "Exp_Txt_{}.txt")
#the "{}" is a formatting mark so we can do file_name.format(num)
num = 1
while os.path.exists(name_to_format.format(num)):
num+=1
new_file_name = name_to_format.format(num)
this would check each filename starting with Exp_Txt_1.txt then Exp_Txt_2.txt etc. until it finds one that does not exist.
However the format mark may cause a problem if curly brackets {} are part of the rest of the path, so it may be preferable to do something like this:
import os
def get_file_name(num):
return os.path.join(os.path.expanduser("~"), "Desktop", "Exp_Txt_" + str(num) + ".txt")
num = 1
while os.path.exists(get_file_name(num)):
num+=1
new_file_name = get_file_name(num)
EDIT: answer to why don't we need get_file_name function in first example?
First off if you are unfamiliar with str.format you may want to look at Python doc - common string operations and/or this simple example:
text = "Hello {}, my name is {}."
x = text.format("Kotropoulos","Tadhg")
print(x)
print(text)
The path string is figured out with this line:
name_to_format = os.path.join(os.path.expanduser("~"), "Desktop", "Exp_Txt_{}.txt")
But it has {} in the place of the desired number. (since we don't know what the number should be at this point) so if the path was for example:
name_to_format = "/Users/Tadhg/Desktop/Exp_Txt_{}.txt"
then we can insert a number with:
print(name_to_format.format(1))
print(name_to_format.format(2))
and this does not change name_to_format since str objects are Immutable so the .format returns a new string without modifying name_to_format. However we would run into a problem if out path was something like these:
name_to_format = "/Users/Bob{Cat}/Desktop/Exp_Txt_{}.txt"
#or
name_to_format = "/Users/Bobcat{}/Desktop/Exp_Txt_{}.txt"
#or
name_to_format = "/Users/Smiley{:/Desktop/Exp_Txt_{}.txt"
Since the formatting mark we want to use is no longer the only curly brackets and we can get a variety of errors:
KeyError: 'Cat'
IndexError: tuple index out of range
ValueError: unmatched '{' in format spec
So you only want to rely on str.format when you know it is safe to use. Hope this helps, have fun coding!

Something's wrong with my Python code (complete beginner)

So I am completely new to Python and can't figure out what's wrong with my code.
I need to write a program that asks for the name of the existing text file and then of the other one, that doesn't necessarily need to exist. The task of the program is to take content of the first file, convert it to upper-case letters and paste to the second file. Then it should return the number of symbols used in the file(s).
The code is:
file1 = input("The name of the first text file: ")
file2 = input("The name of the second file: ")
f = open(file1)
file1content = f.read()
f.close
f2 = open(file2, "w")
file2content = f2.write(file1content.upper())
f2.close
print("There is ", len(str(file2content)), "symbols in the second file.")
I created two text files to check whether Python performs the operations correctly. Turns out the length of the file(s) is incorrect as there were 18 symbols in my file(s) and Python showed there were 2.
Could you please help me with this one?
Issues I see with your code:
close is a method, so you need to use the () operator otherwise f.close does not do what your think.
It is usually preferred in any case to use the with form of opening a file -- then it is close automatically at the end.
the write method does not return anything, so file2content = f2.write(file1content.upper()) is None
There is no reason the read the entire file contents in; just loop over each line if it is a text file.
(Not tested) but I would write your program like this:
file1 = input("The name of the first text file: ")
file2 = input("The name of the second file: ")
chars=0
with open(file1) as f, open(file2, 'w') as f2:
for line in f:
f2.write(line.upper())
chars+=len(line)
print("There are ", chars, "symbols in the second file.")
input() does not do what you expect, use raw_input() instead.

Resources