Bokeh how to load a file with its directory? - python-3.x

I am trying to get the file to work in bokeh. When using the input widget, it only gives me the filename. How to make a file name with a directory, when opening the file there were no errors?
csvfile = FileInput() # csvfile = ('C:/matlab0012.csv')
csvopen = myfun(csvfile) # myfun-my function that creates graphs from data from a file

#bigreddot is right. you cannot get full path. but you can reach selected file. However, you have to decode it first.
minimal example :
from pybase64 import b64decode
def get_file(attr, old, new):
file = io.BytesIO(b64decode(new))
new_data = pd.read_csv(file) # pandas or just use open.
file_input = FileInput(name="fileinput", accept="<.csv>")
file_input.on_change('value', get_file)

This is impossible. For security reasons, browsers will not provide the full path. They will only provide the filename and the file contents from the file that was requested.
Assuming those is a Bokeh server application, you can only respond to a file selection with an on_change callback that you add to the value property of the input widget.
If this is standalone output (not Bokeh server) then you can only respond with a JavaScript js_on_change callback since the Bokeh content displayed in the browser is not connected to any Python process.
In either case, all that the browser will provide is the file contents (which Bokeh stores as base64 encoded strings in the value property).

Related

Exporting panda data frame as excel file on FTP

I am exporting a panda data frame as an excel file on FTP and using the below code. The code is creating a file on FTP. The issue here is that if I am make any change in the code and expecting a different output file it is creating the same output file as before. However if I change the file name in: myFTP.storbinary('STOR %s.xlsx' %filename,bio)..It works fine. Moreover, if I made the output on my local keeping the same name it also works fine. I dont want to change the file name every time I make some change in my code."It is not creating a different file with the same name" Below is the code:
myFTP = ftplib.FTP("ftp address","username","password)
myFTP.cwd("change directory/")
buffer=io.BytesIO()
df.to_excel(buffer,index=False)
text = buffer.getvalue()
bio = io.BytesIO(text)
file name = 'FileName_{0}{1}'.format(current_year,current_month)
myFTP.storbinary('STOR %s.xlsx'%file_name,bio)
myFTP.close()
Name of the output file must be: FileName_currentyearcurrentmonth
file name = 'FileName_{0}{1}'.format(current_year,current_month)
If this line of code is as it is in your code, well. It seens you have a syntax error. Also in cases like this contextual manager are actually pretty usefull. Why dont you try doing like this. So if you get an error well you dont keep your file open
with ftplib.FTP("ftp address","username","password) as myFTP:
myFTP.cwd("change directory/")
buffer=io.BytesIO()
df.to_excel(buffer,index=False)
text = buffer.getvalue()
bio = io.BytesIO(text)
file name = 'FileName_{0}{1}'.format(current_year,current_month)
myFTP.storbinary('STOR %s.xlsx'%file_name,bio)

How to paste image from clipboard into PyQT5 Document?

I'm a noob PyQt5 user following a tutorial and I'm confused how I might extend the sample code below.
The two handlers canInsertFromMimeData and insertFromMimeData Qt5 methods accept an image mime datatype dragged and dropped onto document (that works great). They both receive a signal parameter source which receives a QMimeData object.
However, If I try to paste an image copied from the Windows clipboard into the document it just crashes as there is no handler for this.
Searching the Qt5 documentation at https://doc.qt.io/qt-5/qmimedata.html just leads me to further confusion as I'm not a C++ programmer and I'm using Python 3.x and PyQt5 to do this.
How would I write a handler to allow an image copied to the clipboard to be pasted into the document directly?
class TextEdit(QTextEdit):
def canInsertFromMimeData(self, source):
if source.hasImage():
return True
else:
return super(TextEdit, self).canInsertFromMimeData(source)
def insertFromMimeData(self, source):
cursor = self.textCursor()
document = self.document()
if source.hasUrls():
for u in source.urls():
file_ext = splitext(str(u.toLocalFile()))
if u.isLocalFile() and file_ext in IMAGE_EXTENSIONS:
image = QImage(u.toLocalFile())
document.addResource(QTextDocument.ImageResource, u, image)
cursor.insertImage(u.toLocalFile())
else:
# If we hit a non-image or non-local URL break the loop and fall out
# to the super call & let Qt handle it
break
else:
# If all were valid images, finish here.
return
elif source.hasImage():
image = source.imageData()
uuid = hexuuid()
document.addResource(QTextDocument.ImageResource, uuid, image)
cursor.insertImage(uuid)
return
super(TextEdit, self).insertFromMimeData(source)
code source: https://www.learnpyqt.com/examples/megasolid-idiom-rich-text-editor/
I was exactly in the same position as you. I am also new to Python, so there might be mistakes.
The variable uuid in document.addResource(QTextDocument.ImageResource, uuid, image) is not working. It should be a path -> QUrl(uuid).
Now you can insert the image. However, because the path to an image from the clipboard is changing, it would be better to use a different path, for example to the directory where you are also saving the files.
Also be aware that the user has to select the file type when saving (.html)
For my own project I am going to print the file as pdf. That way you dont have to worry about paths to images ^-^
I got around this by converting to base64 inline embedding of the images, then no resource files as it is all in one file.

Opening another .py file in a function to pass agruments in Python3.5

I'm pretty new to Python and the overall goal of the project I am working on is to setup a SQLite DB that will allow easy entries in the future for non-programmers (this is for a small group of people who are all technically competent). The way I am trying to accomplish this right now is to have people save their new data entry as a .py file through a simple text editor and then open that .py file within the function that enters the values into the DB. So far I have:
def newEntry(material=None, param=None, value=None):
if param == 'density':
print('The density of %s is %s' % (material, value))
import fileinput
for line in fileinput.input(files=('testEntry.py'))
process(line)
Then I have created with a simple text editor a file called testEntry.py that will hopefully be called by newEntry.py when newEntry is executed in the terminal. The idea here is that some user would just have to put in the function name with the arguments they are inputing within the parentheses. testEntry.py is simply:
# Some description for future users
newEntry(material='water', param='density', value='1')
When I run newEntry.py in my terminal nothing happens. Is there some other way to open and execute a .py file within another that I do not know of? Thank you very much for any help.
Your solution works, but as a commenter said, it is very insecure and there are better ways. Presuming your process(...) method is just executing some arbitrary Python code, this could be abused to execute system commands, such as deleting files (very bad).
Instead of using a .py file consisting of a series of newEntry(...) on each line, have your users produce a CSV file with the appropriate column headers. I.e.
material,param,value
water,density,1
Then parse this csv file to add new entries:
with open('entries.csv') as entries:
csv_reader = csv.reader(entries)
header = True
for row in csv_reader:
if header: # Skip header
header = False
continue
material = row[0]
param = row[1]
value = row[2]
if param == 'density':
print('The density of %s is %s' % (material, value))
Your users could use Microsoft Excel, Google Sheets, or any other spreadsheet software that can export .csv files to create/edit these files, and you could provide a template to the users with predefined headers.

Python3 multiprocessing

I am an absolute beginner. I fumble my way through code by analogy to examples so apologies for any misuse of terminology.
I have written a small piece of code in python 3 which:
takes a user input (a folder on their computer)
searches the folder for pdf files
turns each page of the PDF to an image with sequential numbering. Iterates through the jpgs in order of numbering, turning them black and white. OCR scans the files and outputs the text into an object, saves the text contents to a .txt file (via pytesseract). Deletes jpgs, leaving .txt file. Most time is taken in converting to jpgs and possibly making them black and white.
The code works, though I am sure it could be improved. It takes a while so I thought I'd try multiprocessing using Pools.
My code appears to create pools. I can also get the function to print a list of files in the folder, so it appears to have the list passed to it in one form or another.
I cannot get it to work and have now hacked the code about repeatedly with various errors. I think the main problem is, I am clueless.
My code begins:
User input block (asks for a folder in the user's directory, checks it is a valid folder etc).
OCR block as a function (parses PDF then outputs contents into single .txt file)
For loop block as a function (is supposed to loop over each PDF in folder and execute OCR block on it.
Multiprocessing block (is supposed to feed the list of files in the directory to the loop block.
To avoid writing War and Peace, I set out last version of the loop block and multiprocessing blocks below:
#import necessary modules
home_path = os.path.expanduser('~')
#ask for input with various checking mechanisms to make sure a useful pdfDir is obtained
pdfDir = home_path + '/Documents/' + input('Please input the folder name where the PDFs are stored. The folder must be directly under the Documents folder. It cannot have a space in it. \n \n Name of folder:')
def textExtractor():
#convert pdf to jpeg with a tesseract friendly resolution
with Img(filename=pdf_filename, resolution=300) as img: #some can be encrypted so use OCR instead of other libraries
#various lines of code here
compilation_temp.close()
def per_file_process (subject_files):
for pdf in subject_files:
#decode the whole file name as a string
pdf_filename = os.fsdecode(pdf)
#check whether the string ends in .pdf
if pdf_filename.endswith(".pdf"):
#call the OCR function on it
textExtractor()
else:
print ('nonsense')
if __name__ == '__main__':
pool = Pool(2)
pool.map(per_file_process, os.listdir(pdfDir))
Is anyone willing/able to point out my errors, please?
The relevant bits of the code whilst working:
#import necessary
home_path = os.path.expanduser('~')
#block accepting input
pdfDir = home_path + '/Documents/' + input('Please input the folder name where the PDFs are stored. The folder must be directly under the Documents folder. It cannot have a space in it. \n \n Name of folder:')
def textExtractor():
#convert pdf to jpeg with a tesseract friendly resolution
with Img(filename=pdf_filename, resolution=300) as img: #need to think about using generic expanduser or other libraries to allow portability
#various lines of code to OCR and output .txt file
compilation_temp.close()
subject_files = os.listdir(pdfDir)
for pdf in subject_files:
#decode the whole file name as a string you can see
pdf_filename = os.fsdecode(pdf)
#check whether the string ends in /pdf
if pdf_filename.endswith(".pdf"):
textExtractor()
else:
#print for debugging
Pool.map calls the worker function repeatedly with each name returned by os.listdir. In per_file_process, subject_files is a single filename and for pdf in subject_files: is enumerating the individual characters in the name. Further, listdir only shows the base name, without subdirectories, so you aren't looking in the right place for the pdf. You can use glob to filter by extension name and return a working path to the file.
Your example is confusing... textExtractor() takes no parameters so how is it to know which file it is processing? I'm going out on a limb and assuming that it really does take the path to the file processing. If so, you can parallelize rather easily just by feeding pdf's directory it via map. Assuming processing time will vary by pdf, I am setting chunksize to 1 so that an early finishing worker can grap extra files to process.
from glob import glob
import os
from multiprocessing import Pool
def textExtractor(pdf_filename):
#convert pdf to jpeg with a tesseract friendly resolution
with Img(filename=pdf_filename, resolution=300) as img: #some can be encrypted so use OCR instead of other libraries
#...various lines of code here
compilation_temp.close()
if __name__ == '__main__':
#pdfDir is the folder inputted by user
with Pool(2) as pool:
# assuming call signature: textExtractor(path_to_file)
pool.map(textExtractor,
(filename for filename in glob(os.path.join(pdfDir, '*.pdf'))
if os.path.isfile(filename))
chunksize=1)

Use Selenium to Save File to Specific Location with A Specific Name

I am trying to download a vcard to a specific location on my desktop, with a specific file name (which I define).
I have the code the can download the file to my desktop.
url = "http://www.kirkland.com/vcard.cfm?itemid=10485&editstatus=0"
fp = webdriver.FirefoxProfile()
fp.set_preference("browser.download.folderList",2)
fp.set_preference("browser.download.manager.showWhenStarting",False)
fp.set_preference("browser.download.dir", os.getcwd())
fp.set_preference("browser.helperApps.neverAsk.saveToDisk", "text/x-vcard")
browser = webdriver.Firefox(firefox_profile=fp)
browser.get(url)
Note, the URL above is a link to a vcard.
This is saving to the same directory where the code itself exists, and using a file name that was generated by the site I am downloading from.
I want to specify the directory where the file goes, and the name of the file.
Specifically, I would like to call the file something.txt
Also Note, I realize there are much easier ways to do this (using urllib, or urllib2). I need to do it this specific way (if possible) b/c some links are javascript, which require me to use Selenium. I used the above URL as an example to simplify the situation. I can provide other examples/code to show more complex situations if necessary.
Finally, thank you very much for the help I am sure I will get for this post, and for all the help you have provided me for the last year. I dont know how I would have learned all I have learned in this last year had it not been for this community.
I have code that works. Its more of a hack than a solution, but here it is:
# SET FIREFOX PROFILE
fp = webdriver.FirefoxProfile()
fp.set_preference("browser.download.folderList",2)
fp.set_preference("browser.download.manager.showWhenStarting",False)
fp.set_preference("browser.download.dir", os.getcwd())
fp.set_preference("browser.helperApps.neverAsk.saveToDisk", "text/x-vcard")
#OPEN URL
browser = webdriver.Firefox(firefox_profile=fp)
browser.get(url)
#FIND MOST RECENT FILE IN (YOUR) DIR AND RENAME IT
os.chdir("DIR-STRING")
files = filter(os.path.isfile, os.listdir("DIR-STRING"))
files = [os.path.join("DIR-STRING", f) for f in files]
files.sort(key=lambda x: os.path.getmtime(x))
newest_file = files[-1]
os.rename(newest_file, "NEW-FILE-NAME"+"EXTENSION")
#GET THE STRING, AND DELETE THE FILE
f = open("DIR-STRING"+"NEW-FILE-NAME"+"EXTENSION", "r")
string = f.read()
#DO WHATEVER YOU WANT WITH THE STRING/TEXT FROM THE DOWNLOAD
f.close()
os.remove("DIR-STRING"+"NEW-FILE-NAME"+"EXTENSION")
DIR-STRING is the path to the directory where the file is saved
NEW-FILE-NAME is the name of the file you want
EXTENSION is the .txt, etc.

Resources