python receive filename not contents - variable refers to (python3) - python-3.x

I have a script which I want to pass to odo. odo takes a filename as input, as I need to tidy the csv up first I pass it through a script to create a new file which I reference with a variable.
How can I get just the filename from the variable so I can pass it as an argument to odo(from blaze project).
You can see here that from this script pasted to ipython I get the entire contents of the file.
In [8]: %paste
from odo import odo
import pandas as pd
from clean2 import clean
import os
filegiven = '20150704RHIL0.csv'
myFile = clean(filegiven)
toUse = (filegiven + '_clean.csv')
print(os.path.realpath(toUse))
## -- End pasted text --
Surfin' Safari 3 0
... Many lines later
Search Squad (NZ) 4 5
C:\Users\sayth\Repos\Notebooks\20150704RHIL0.csv_clean.csv # from print
I just need to be able to get this name so my script could be, where myFile would give odo the filename not contents.
from odo import odo
import pandas as pd
from clean2 import clean
filegiven = '20150704RHIL0.csv'
myFile = clean(filegiven)
odo(myFile, pd.DataFrame)
Solution
this is how I solved it there would be better ways likely.
from odo import odo
import pandas as pd
from clean2 import clean
import os.path
filegiven = '20150704RHIL0.csv'
clean(filegiven)
fileName = os.path.basename(filegiven)
fileNameSplit = fileName.split(".")
fileNameUse = fileNameSplit[0] + '_clean.' + fileNameSplit[1]
odo(fileNameUse, pd.DataFrame)

To get a filename from a file object (assumings its standard File object in Python created using open() ) , you can use name variable in it.
Example -
>>> f = open("a.py",'r')
>>> f.name
'a.py'
Please note, for your situation this is unnecessary, maybe you can have your clean(filegiven) return filename instead of file object, and then if you really need the file object you can open it in your script.

Related

how to remove the last characters of a variable in python?

I want to remove a file's last characters that it's name is somedigits plus .py and plus .BR like 0001.py.BR or 0005.py.BR and remove the .BR from the string.
I tried this code
import os
x = input("")
os.rename(x, x[7])
but it sometimes don't work for some file that their names are larger like 00001.py.BR it renames it to 00001.p so is there a way that I just do like this x - ".BR".
if you talking about file path,
then use os.path.splitext()
>>> import os
>>> os.path.splitext('00001.py.BR')[0]
'00001.py'
>>>
You can use the built-in split function like this:
import os
x=input("")
x_new = x.split(".BR")[0]
os.rename(x, x_new)
If you're using Python 3, check the standard pathlib:
from pathlib import Path
old_path = Path(input(""))
if old_path.suffix == '.BR':
old_path.rename(old_path.stem)
else:
print('this is not a .BR file')

Parse filename information into multiple columns in the concatenated csv file

I have multiple csv files in a folder and each has a unique file name such as W10N1_RTO_T0_1294_TL_IV_Curve.csv. I would like to concatenate all files together and create multiple columns based on the filename information. For example, W10N1 is one column called DieID.
I am a beginner on programming and Python. I couldn't figure how to do it easily.
import os
import glob
import pandas as pd
import csv
os.chdir('filepath')
extension='csv'
all_filenames=[i for i in glob.glob('*.{}'.format(extension))]
combined_csv=pd.concat([pd.read_csv(f) for f in all_filenames])
combined_csv.to_csv('combined_csv.csv',index=False
import os
os.listdir("your_target_direcotry")
will return a list of all files and directories in "your_target_direcotry".
Then it is just string manipulation. e.g
x = ‘blue_red_green’
x.split(“_”)
[‘blue’, ‘red’, ‘green’]
>>>
>>> a,b,c = x.split(“_”)
>>> a
‘blue’
>>> b
‘red’
>>> c
‘green’
Also do separate for "." first to remove .csv
At last, create a CSV which can operate by any separator u want.
f= open("yourfacnyname.csv","w+")
f.write("DieID You_fancy_other_IDs also_if_u_want_variable_use_this_%d\r\n" % (i+1))
f.close()
EZ as A B C

How do I use pathlib to create files named for range of dates?

I can print a range of filenames with the below code but I need to write empty files instead of printing. I know that pathlib.Path.touch and accomplish this but . . . do I need to define a method for the touch?
import datetime
import pathlib
for i in range(0, 180):
print((datetime.date.today() + datetime.timedelta(i)).strftime("%Y%m%d" + ".exml"))
I don't think you need to define a method. Try this:
import datetime
import pathlib
for i in range(0, 180):
fname = (datetime.date.today() + datetime.timedelta(i)).strftime("%Y%m%d" + ".exml")
print((fname))
pathlib.Path(fname).touch()

How to move files older than y days from Archival folder to somefolder using python in Databricks

I have to find all files older than y days in archival folder and move those files to somefolder.I have found some files older than y days in archival and tried moving to other folder.i have written code using python.while running the code i'm getting this error "java.io.FileNotFoundException: /dbfs/FileStore/Archival/testparquet.parquet".I have checked,file exists in dbfs .Can someone please help me on this
from pathlib import Path
import arrow
import os, time, sys
vFilePath="/dbfs/FileStore/"
path = "/dbfs/FileStore/Archival/"
path1="dbfs:/FileStore/Archival/"
#####FOR Dbutils path###
vDbuPath="/FilsStore/Archival/"
deleteFullPath="FileStore/Deleted/"
now = time.time()
print (now)
vdelFullPath=deleteFullPath+"/"
for f in os.listdir(path):
Filename=str(print(f))
print(Filename)
f = os.path.join(path,f)
print(os.stat(os.path.join(path,f)).st_mtime)
if os.stat(os.path.join(path,f)).st_mtime < now - 1 * 86400:
print("f value: "+f)
filename=os.path(f)
print("dbutilspath: " +filename)
if not os.path.exists("dbfs:/"+deleteFullPath + Filename): dbutils.fs.mv(filename,"dbfs:/"+deleteFullPath+"testparquet.parquet",recurse=True)
One way to do this is using hadoop filesystem. Below you will get a list of dictionnaries with the file dates and names.
You can then do your magic to move the files if they are old enough.
import time
from time import mktime
from datetime import datetime
list_of_files=[]
fs = spark._jvm.org.apache.hadoop.fs.FileSystem.get(spark._jsc.hadoopConfiguration())
path_exists = fs.exists(spark._jvm.org.apache.hadoop.fs.Path(source_dir))
if path_exists == True:
file_list = fs.listFiles(spark._jvm.org.apache.hadoop.fs.Path(source_dir), True)
while file_list.hasNext():
file = file_list.next()
list_of_files.append({'filedate' : datetime.fromtimestamp(mktime(time.localtime(int(str(file.getModificationTime())[:-3])))),"filename" : str(file.getPath())})

PyPDF2 difference resulting in 1 character per line

im trying to create a simple script that will show me the difference (similar to github merging) by using difflib's HtmlDiff function.
so far ive gotten my pdf files together and am able to print their contents in binary using PyPDF2 functions.
import difflib
import os
import PyPDF2
os.chdir('.../MyPythonScripts/PDFtesterDifflib')
file1 = 'pdf1.pdf'
file2 = 'pdf2.pdf'
file1RL = open(file1, 'rb')
pdfreader1 = PyPDF2.PdfFileReader(file1RL)
PageOBJ1 = pdfreader1.getPage(0)
textOBJ1 = PageOBJ1.extractText()
file2RL = open(file2, 'rb')
pdfreader2 = PyPDF2.PdfFileReader(file2RL)
PageOBJ2 = pdfreader2.getPage(0)
textOBJ2 = PageOBJ2.extractText()
difference = difflib.HtmlDiff().make_file(textOBJ1,textOBJ2,file1,file2)
diff_report = open('...MyPythonScripts/PDFtesterDifflib/diff_report.html','w')
diff_report.write(difference)
diff_report.close()
the result is this:
How can i get my lines to read normally?
it should read:
1.apples
2.oranges
3. --this line should differ--
i am running python 3.6 on mac
Thanks in advance!

Resources