At the organization I work for, different printers are set up at various locations. All are mainly used to print A4-sized documents, so the defaults are set up accordingly.
We are also using a bunch of custom-sized forms which people have up to now been filling in by hand.
Recently, I was tasked with setting up print-automation onto the said forms from our central database.
I'm using reportlab to create temporary pdf files which I am then trying to send to the default printer. All is relatively simple, save for getting the printers to register a custom paper size.
I got as far as the following code snippet, but I'm really stuck.
import tempfile
import win32api
import win32print
pdf_file = tempfile.mktemp(".pdf")
#CREATION OF PDF FILE WITH REPORTLAB
printer = win32print.GetDefaultPrinter()
PRINTER_DEFAULTS = {"DesiredAccess":win32print.PRINTER_ALL_ACCESS}
pHandle = win32print.OpenPrinter(printer, PRINTER_DEFAULTS)
level = 2
properties = win32print.GetPrinter(pHandle, level)
pDevModeObj = properties["pDevMode"]
pDevModeObj.PaperSize = 0
pDevModeObj.PaperLength = 2200 #SIZE IN 1/10 mm
pDevModeObj.PaperWidth = 1000 #SIZE IN 1/10 mm
properties["pDevMode"]=pDevModeObj
win32print.SetPrinter(pHandle,level,properties,0)
#OPTION ONE
#win32api.ShellExecute(0, "print", pdf_file, None, ".", 0)
#OPTION TWO
win32api.ShellExecute (0,"printto",pdf_file,'"%s"' % printer,".",0)
win32print.ClosePrinter(pHandle)
It just does not work. Printers do not report a "paper size mismatch", like they should when a non-A4 document is being sent to them. And when I try printing to a PDF printer, it also defaults to A4.
When calling
print(pDevModeObj.PaperSize)
print(pDevModeObj.PaperLength)
print(pDevModeObj.PaperWidth)
everything seems to be in order, so I'm guessing I don't know how to send those paper size values back to the printer settings.
Here is a list of all the resources I checked out (examples not all in python, and a few are not using the win32api), and couldn't get the thing to work properly:
Programmatically Print a PDF File - Specifying Printer
Python's win32api only printing to default printer
https://mail.python.org/pipermail/python-win32/2005-August/003683.html
https://learn.microsoft.com/en-us/troubleshoot/windows/win32/modify-printer-settings-setprinter-api
Print PDF file in duplex mode via Python
https://www.thinbug.com/q/39249360
Saving / Restoring Printer DevModes - wxPython / win32print
pywin32: how do I get a pyDEVMODE object?
https://learn.microsoft.com/en-us/troubleshoot/windows/win32/modify-printer-settings-documentproperties
How to change printer preference settings using python
Print file to continuous paper using win32print Python
python win32print can't set custom page size
http://timgolden.me.uk/pywin32-docs/PyDEVMODE.html
https://newcenturycomputers.net/projects/pythonicwindowsprinting.html
Printing a file and configure printer settings
Change printer default paper size
https://grokbase.com/t/python/python-win32/085x5hdbtd/how-to-change-paper-size-while-printing
openpyxl - set custom paper size for printing
Python win32print changing advanced printer options
Printing PDF files with Python
Python silent print PDF to specific printer
https://learn.microsoft.com/en-us/windows/win32/cimwin32prov/win32-printerconfiguration
Printing PDF's using Python,win32api, and Acrobat Reader 9
Python print pdf file with win32print
How to chose Paper Format when printing a PDF File with Python?
Access denied when attempting to remove printer
https://www.programcreek.com/python/example/24860/win32api.ShellExecute
https://opensource.gonnerman.org/?p=192
Python27 - on windows 10 how can i tell printing paper size is 50.8mm x 25.4mm?
https://mail.python.org/pipermail/python-win32/2008-May/007640.html
http://timgolden.me.uk/python/win32_how_do_i/print.html
ShellExecute is using the default printing parameters. If you need to use the reset DevMode for printing, you can use CreateDC.
Refer: GDI Print API
If you use SetPrinter to modify the default DEVMODE structure for a
printer (globally setting the printer defaults), you must first call
the DocumentProperties function to validate the DEVMODE structure.
Refer:
SetPrinter Remarks
Modify printer settings by using the SetPrinter function
You can also directly use DocumentProperties to modify printer initialization information.
Then pass pDevModeObj to CreateDC, and use StartDoc and StartPage to print.
Similar case: Change printer tray with pywin32
Related
I am trying to get the file to work in bokeh. When using the input widget, it only gives me the filename. How to make a file name with a directory, when opening the file there were no errors?
csvfile = FileInput() # csvfile = ('C:/matlab0012.csv')
csvopen = myfun(csvfile) # myfun-my function that creates graphs from data from a file
#bigreddot is right. you cannot get full path. but you can reach selected file. However, you have to decode it first.
minimal example :
from pybase64 import b64decode
def get_file(attr, old, new):
file = io.BytesIO(b64decode(new))
new_data = pd.read_csv(file) # pandas or just use open.
file_input = FileInput(name="fileinput", accept="<.csv>")
file_input.on_change('value', get_file)
This is impossible. For security reasons, browsers will not provide the full path. They will only provide the filename and the file contents from the file that was requested.
Assuming those is a Bokeh server application, you can only respond to a file selection with an on_change callback that you add to the value property of the input widget.
If this is standalone output (not Bokeh server) then you can only respond with a JavaScript js_on_change callback since the Bokeh content displayed in the browser is not connected to any Python process.
In either case, all that the browser will provide is the file contents (which Bokeh stores as base64 encoded strings in the value property).
for reference this is all using Pyqt5 and Python 3.6:
I've got a QStandardItemModel that is built from QStandardItems that are strings of the items in a zip (the model displays all the contents of a zipfile). I went with this choice as I can not cache the files locally, and my research shows that QFileSystemModel can not work on archives unless I unpack at least temporarily.
All items in the QStandardItemModel end in the correct extension for the file (.csv,.txt,ect), and I need to display the icon a user would see if they were looking at the file in windows explorer, however show it in the qtreeview (a user seeing content.csv should also see the icon for excel). On that note, this application is only running on windows.
How can I pull the extensions default system file icon, and set it during my setting of these items? Would I have to manually download the icons for my known file types and do this, or does the system store it somewhere I can access?
Here's some basic code of how I build and display the model and treeview:
self.zip_model = QtGui.QStandardItemModel()
# My Computer directory explorer
self.tree_zip = QTreeView()
self.tree_zip.setModel(self.zip_model)
def build_zip_model(self,current_directory):
self.zip_model.clear()
with zipfile.ZipFile(current_directory) as zip_file:
for item in zip_file.namelist():
model_item = QtGui.QStandardItem(item)
self.zip_model.appendRow(model_item)
You can use QFileIconProvider:
def build_zip_model(self, current_directory):
iconProvider = QtWidgets.QFileIconProvider()
self.zip_model.clear()
with zipfile.ZipFile(current_directory) as zip_file:
for item in zip_file.namelist():
icon = iconProvider.icon(QtCore.QFileInfo(item))
model_item = QtGui.QStandardItem(icon, item)
self.zip_model.appendRow(model_item)
We have a large number of PDFs which have been created with InDesign and not all the text was being extracted by PyPDF2. Here is the code:-
for pageNum in range(0, pdfReader.numPages):
pageObj = pdfReader.getPage(pageNum)
# note text is a bytes object not string.
text = pageObj.extractText().encode('utf-8')
search_text = text.lower()
if search_word in search_text.decode("utf-8"):
search_word = search_word.strip()
search_word_count += 1
print("Pattern Found on Page: " + str(pageNum+1))
search_word_count_list.append(search_word_count)
print("The word:- '{}' was found:- {} times\n".format(search_word, search_word_count))
I did some testing with PDFminner and found I had the same results i.e. the same bits of text were extracted/not extracted. So I figured there must be something going on with the PDF.
Off the back of this, I worked with a Typesetter doing some testing and discovered when text boxes are locked in InDesign (Crtl+L) the PDF exported has its text locked and is not extractable, I mean the bits that are locked are not extractable via PyPDF2 or PDFminner.
While going forward I can ask the typesetters to unlock text before exporting PDFs. BUT with the thousands to existing PDFs, I want to be able to extract the locked text, asking the typesetters to unlock thousands of files is not an option. Does anyone have experience of this? Any ideas on how to access the locked text?
Edit 1
So doing some testing with the Adobe Acrobat pro 11. When Saving-As Plain Text the locked text does not save to the text file. But the unlocked text does save to the .txt file.
Checking the Security tab in Acrobat:-
With all tested documents, open in Acrobat I pick File -> Properties, switch to the Security tab of the Document Properties dialog, and there I read "Security Method: No Security", and under the restrictions everything is 'Allowed' (Printing, Changing, Copying ...). So I think these are all valid PDFs, which are unprotected.
Edit 2
I have tried to install pdf2txt but my machine does not meet requirements as I am missing "Microsoft Visual C++ 14.0" and as it's a work machine it is locked down.
Edit 3
Acrobat says the PDF Version 1.7. PDF Producer Adobe PDF Library 15.0
I can copy-paste the locked text so I do not think rasterisation is the problem.
Edit 4 - possible solution
So I have tested using the https://pdftotext.com/ and it was able to access the locked text. So I will talk to the IT department to get "Microsoft Visual C++ 14.0" installed so I can use the pdf2txt library.
Edit 5
Have not had much luck with installing PDFtotext due to problems installing poppler which is nothing short of a nightmare to install.
Off the back of usr2564301 input, I have done some more testing with PDFminner. Here is the code I am using to test with:-
from pdfminer.pdfinterp import PDFResourceManager,PDFPageInterpreter
from pdfminer.converter import TextConverter
from pdfminer.layout import LAParams
from pdfminer.pdfpage import PDFPage
from io import BytesIO
def pdf_to_text(path):
manager = PDFResourceManager()
retstr = BytesIO()
layout = LAParams(all_texts=True)
device = TextConverter(manager, retstr, laparams=layout)
filepath = open(path, 'rb')
interpreter = PDFPageInterpreter(manager, device)
for page in PDFPage.get_pages(filepath, check_extractable=True):
interpreter.process_page(page)
text = retstr.getvalue()
filepath.close()
device.close()
retstr.close()
return text
As for versions I have pdfminer==20191125 installed. On Github it says "Supports PDF-1.7. (well, almost)" so maybe the problem is PDF 1.7?
Edit 6
Just tried to use PDFminer (PDF2txt.py) via Command Prompt. Using this code python C:\Users\my_name\AppData\Local\Programs\Python\Python37-32\Scripts\pdf2txt.py -o output.txt file_name.pdf but I get the same result i.e. locked text does not come through.
Edit 7
So did some testing with a designer and we proved it is when the text is on the master page that the text is not accessible to PDFminer. The designers have multiple preset master pages so that they can drag the required one onto the frontcover. If a designer wrongly works directly on the master page the content is lock and can not be accessed via PDFminer.
Note there is not a problem when text is on the frontcover and locked. Only when on the master page.
I'm a noob PyQt5 user following a tutorial and I'm confused how I might extend the sample code below.
The two handlers canInsertFromMimeData and insertFromMimeData Qt5 methods accept an image mime datatype dragged and dropped onto document (that works great). They both receive a signal parameter source which receives a QMimeData object.
However, If I try to paste an image copied from the Windows clipboard into the document it just crashes as there is no handler for this.
Searching the Qt5 documentation at https://doc.qt.io/qt-5/qmimedata.html just leads me to further confusion as I'm not a C++ programmer and I'm using Python 3.x and PyQt5 to do this.
How would I write a handler to allow an image copied to the clipboard to be pasted into the document directly?
class TextEdit(QTextEdit):
def canInsertFromMimeData(self, source):
if source.hasImage():
return True
else:
return super(TextEdit, self).canInsertFromMimeData(source)
def insertFromMimeData(self, source):
cursor = self.textCursor()
document = self.document()
if source.hasUrls():
for u in source.urls():
file_ext = splitext(str(u.toLocalFile()))
if u.isLocalFile() and file_ext in IMAGE_EXTENSIONS:
image = QImage(u.toLocalFile())
document.addResource(QTextDocument.ImageResource, u, image)
cursor.insertImage(u.toLocalFile())
else:
# If we hit a non-image or non-local URL break the loop and fall out
# to the super call & let Qt handle it
break
else:
# If all were valid images, finish here.
return
elif source.hasImage():
image = source.imageData()
uuid = hexuuid()
document.addResource(QTextDocument.ImageResource, uuid, image)
cursor.insertImage(uuid)
return
super(TextEdit, self).insertFromMimeData(source)
code source: https://www.learnpyqt.com/examples/megasolid-idiom-rich-text-editor/
I was exactly in the same position as you. I am also new to Python, so there might be mistakes.
The variable uuid in document.addResource(QTextDocument.ImageResource, uuid, image) is not working. It should be a path -> QUrl(uuid).
Now you can insert the image. However, because the path to an image from the clipboard is changing, it would be better to use a different path, for example to the directory where you are also saving the files.
Also be aware that the user has to select the file type when saving (.html)
For my own project I am going to print the file as pdf. That way you dont have to worry about paths to images ^-^
I got around this by converting to base64 inline embedding of the images, then no resource files as it is all in one file.
I am upgrading some code from python 2 to python 3.
There is a function to open and read files. In Python 2 there is no need to specify binary mode or as a string. While in Python 3 I should specify the mode.
The python 2 code is:
with open(f_path, mode=open_mode) as fp:
content = fp.read()
This is causing me problems as it is called by various other functions where I don't necessarily know the file type in advance. (Sometimes the data is written to a zip file, other times the data is returned via an HTTP endpoint).
I expect most data will be binary image files, though CSv and text files will also be present.
What would be the best way of opening a file of unknown type and detecting if it is binary or string data?
Is it possible for example to open a file in binary mode, then detect that it contains text and convert it (or alternatively generate an exception and open it in string mode instead)?
You might try the binaryornot library.
pip install binaryornot
Then in the code:
from binaryornot.check import is_binary
is_binary(f_path)
Here is their documentation:
https://pypi.org/project/binaryornot/