Had a problem of how to install Install NLTK python modules under Python 3.7.1 for SQL 2017 ( CU22 ), but had no direct internet access ( i.e. offline ).
So how to do this?
Here is a method that appears to work OK :
(1) Go to https://github.com/nltk/nltk_data/archive/refs/heads/gh-pages.zip & download zip file (600 MB).
(2) Dig down into the zip file, and find the required packages located at :
nltk_data-gh-pages.zip\nltk_data-gh-pages\packages\
(3) Create a directory either
C:\ntlk_data ( this is a default search path for nltk )
or -
D:\Program Files\Microsoft SQL Server\MSSQL14.<instance_name>\PYTHON_SERVICES.3.7\nltk_data
(4) For the package/s, copy the .zip file, its .XML file and then also extract the .zip file so it is an uncompressed directory.
For example punkt module, final form should look like this :
c:\nltk_data\tokenizers\punkt.zip
c:\nltk_data\tokenizers\punkt\<and_sub_dirs_and_files>
c:\nltk_data\tokenizers\punkt.xml
(5) Run TSQL code to confirm it can see the punkt module :
EXECUTE sp_execute_external_script
#language = N'Python',
#script =N'
import nltk
x = nltk.data.find("tokenizers/punkt")
print(x)
'
You should get an output similar to this if it can see the punkt modules on disk :
STDOUT message(s) from external script:
Express Edition will continue to be enforced.
C:\nltk_data\tokenizers\punkt\PY3
HTH someone!
Related
I have installed the pytesseract module in my venv and want to extract text from a german file
with executingthis script from
pytesseract and setting the lenguage to german
import cv2
import pytesseract
try:
from PIL import Image
except ImportError:
import Image
print(pytesseract.image_to_string(Image.open('test.jpg')))
print(pytesseract.image_to_string(Image.open('test.jpg'), lang='ger'))
which gives me
raise TesseractError(proc.returncode, get_errors(error_string))
pytesseract.pytesseract.TesseractError: (1, 'Tesseract Open Source OCR Engine v3.05.00dev with Leptonica
Error opening data file C:\\Program Files (x86)\\Tesseract-OCR/tessdata/ger.traineddata
Please make sure the TESSDATA_PREFIX environment variable is set to the parent directory of your "tessdata" directory. Failed loading language \'ger\' Tesseract couldn\'t load any languages! Could not initialize tesseract.')
I have found the lenguage data on [tessdoc/Data-Files] (https://github.com/tesseract-ocr/tessdoc/blob/master/Data-Files.md)
so far I only found an guide for linux How do I install a new language pack for Tesseract on 16.04
where to I need to move the lenguage files in my pyteseract sidepackage to get the script working ?
There are two ways.
1. Install the corresponding tesseract package for your language -
apt-get install tesseract-ocr-YOUR_LANG_CODE
for example- in my case it was Bengali so I installed -
apt-get install tesseract-ocr-ben
or for installing all languages -
apt-get install tesseract-ocr-all.
This worked for me Ubuntu environment.
2. The other way is mentioned in the error message itself. Add an environment variable TESSDATA_PREFIX that point to the langauge pack. You can download the language pack from here: https://github.com/tesseract-ocr/tessdata .
Once you have downloaded the datapack you can also programmatically set the environment variable as
import os
os.putenv('TESSDATA_PREFIX','path/to/your/tessdata/file'
Best way I've found:
Download and install tesseract-ocr-w64-setup-v5.0.0-rc1.20211030.exe.
Open https://github.com/tesseract-ocr/tessdata and download your language. For example, for Farsi download fas.traineddata.
Copy the downloaded file to the tessreact_ocr installation location, some location like: C:\Program Files\Tesseract-OCR\tessdata
Don't forget to use the traineddata name for the language. For Farsi, I use lang='fas'.
found a guide to do this on a german site Python Texterkennung: Bild zu Text mit PyTesseract in Windows
I know there have been many people with this same issue, but here is my situation which I have not been able to find the exact same problem. I am building an executable with pyinstaller and I keep getting the importError. I am using ibm_db package to connect to a IBM DB2 database and do inserts into a table using pandas to_sql method. I used pyinstaller on my program before I added the SQL code so I'm pretty sure it has something to do with my trying to connect to DB2, but for the life of me I cannot figure this out.
I get lots of warnings and info messages when I"m running pyinstaller but no errors that I see. I only get the error once I try to execute the executable file that pyinstaller built.
I have tried to run it in a virtual environment to try to isolate the issue but I am not that familiar with virtual environments, so I stop trying to use that.
Traceback (most recent call last):
File "rebate_gui_sql.py", line 9, in <module>
File "c:\users\dt24358\lib\site-packages\PyInstaller\loader\pyimod03_importers.py", line 627, in exec_module
exec(bytecode, module.__dict__)
File "site-packages\ibm_db.py", line 10, in <module>
File "site-packages\ibm_db.py", line 9, in __bootstrap__
File "imp.py", line 342, in load_dynamic
ImportError: DLL load failed: The specified module could not be found.
[11020] Failed to execute script rebate_gui_sql
Update: 5/1/2019 from comment below, here is my simple program
import pandas as pd
from tkinter import *
from tkinter import ttk
import ibm_db
import ibm_db_dbi as db
from sqlalchemy import create_engine
class Application(Frame):
def __init__(self, master):
ttk.Frame.__init__(self, master)
self.master = master
self.run_process()
def run_process(self):
engine = create_engine("db2+ibm_db://userid:password#url:port/database")
conn = engine.connect()
print("Connected to " + str(engine))
sql = '''
Select *
from rebteam.pd5
fetch first row only
'''
df = pd.read_sql(sql, conn)
print(df)
df.to_csv(r'c:\users\dt24358\scripts\pricing tool\GUI_SQL\test.csv', index=False)
self.result_label = Label(root, text="Select of PD5 Successful", bg="light green", width=80, justify=LEFT)
self.result_label.grid(row=0,columnspan=2)
root=Tk()
root.title("Rebate Bid Data Upload")
root.configure(background="light green")
app = Application(root)
root.mainloop()
This answer is relevant for these versions:
python - up to 3.7 (but not higher)
pyinstaller 3.4
setuptools 41.0.1
ibm_db 3.0.1
ibm_db_sa 0.3.4
sqlalchemy 1.3.3
There are separate issues here.
The immediate issue (the ImportError) is the failure to load ibm_db.dll
The ImportError happens because pyinstaller does not copy external (non python) libraries into the bundle unless you explicitly request that to happen.
pyinstaller will also not copy a Db2-client into the bundle unless you explicitly tell it to do that, which means that if your target hostname to which you deploy your built-executable does not already have a preconfigured preinstalled Db2-client then you will also experience the failure to load ibm_db module.
The pyinstaller option --add-binary gives a workaround for some kinds of ImportError , see example below. If you are not using SQLAlchemy just skip those parts of this answer.
The pyinstaller option --add-data gives a workaround for adding directories (for example the clidriver directory for adding a Db2-driver) when your target environment lacks a Db2-driver.
Note that this answer does not require you to use SQLAlchemy, the answer is also relevant if you are only using ibm_db (or ibm_db_dbi), in which case just skip the SQLAlchemy parts.
If your python script uses SQLAlchemy to access Db2, then you may see a second symptom at run time after building with pyinstaller. The run time symptom is either:
"sqlalchemy.exc.NoSuchModuleError: Can't load plugin:
sqlalchemy.dialects:ibm_db_sa"
or
"sqlalchemy.exc.NoSuchModuleError: Can't load plugin:
sqlalchemy.dialects:db2.ibm_db"
(depending on the prefix for the url given to create_engine())
This symptom sqlalchemy.exe.NoSuchModuleError is not specific to Db2 but can impact other databases when used via SQLAlchemy with an external dialect ( Db2, teradata, snowflake, presto,...). Databases that use SQLAlchemy internal dialects may just work out of the box.
Here is one workaround for SQLAlchemy, other workarounds are possible.
SQLAlchemy external dialects use pkg_resources entry_points to allow SQLAlchemy to exploit them, but pyinstaller cannot yet handle these, without some assistance from you. Such entry point information is a kind of meta data about the module.
This workaround uses pyinstaller hooks to collect the metadata of the relevant modules , and tells pyinstaller the directory (or directories) that contain these hook files. For Db2 with SQLAlchemy, three hook files are needed, hook-ibm_db.py, hook-ibm_db_sa.py, hook-sqlalchemy.py. I choose to put these hook files in the same directory as my source file python script.
The contents of each of these files is trivial two lines, and the contents differ only by the module name contained within. Here is an example of one of the files hook-sqlalchemy.py (for the other 2 required files, just replace the module name appropriately):
from PyInstaller.utils.hooks import copy_metadata
datas = copy_metadata('sqlalchemy')
To add ibm_db.dll via the --add-binary method, you can either use a command line option to pyinstaller or edit the spec file.
For handling load failures of ibm_db.dll alone, just use the --add-binary additional option like this:
pyinstaller -y --add-binary %LOCALAPPDATA%\Programs\Python\Python37\Lib\site-packages\ibm_db_dlls\ibm_db.dll;.\ibm_db_dlls your_script.py
If you want to include clidriver in your bundle, first find the fully qualified pathname to its location via:
pip show ibm_db
and in the output of that command see the Location: line which has the first part of the fully qualified pathname , so you append \CLIDRIVER to that path and use it in the --add-data additional option like this:
--add-data="c:\path\to\clidriver;.\clidriver"
If you do include clidriver in your bundle, there are additional considerations, see the notes section below.
For apps that also use SQLAlchemy, you need additional steps.
Suppose that the ibm_db.dll lives in this directory:
%LOCALAPPDATA%\programs\python\python37\lib\site-packages\ibm_db_dlls
and you make a variable in a CMD.EXE shell to point to that location:
> set ibm_db_path=%LOCALAPPDATA%\programs\python\python37\lib\site-packages\ibm_db_dlls
For an MS-Windows batch file (using ^ as line continuation character), the pyinstaller command line example to handle both of the workarounds mentioned above is:
pyinstaller -y ^
--additional-hooks-dir=. ^
--hidden-import ibm_db_sa.ibm_db ^
--hidden-import ibm_db_dbi ^
--hidden-import ibm_db ^
--add-binary %LOCALAPPDATA%\Programs\Python\Python37\Lib\site-packages\ibm_db_dlls\ibm_db.dll;.\ibm_db_dlls ^
your_script.py
Notes:
If your python script explicitly imports the SQLAlchemy modules then you do not
need to specify them via --hidden-import options (buy you still need
the hooks for SQLAlchemy to operate after bundling).
For ibm_db versions up to 3.0.2, the ibm_db.dll needs to be in subdirectory ibm_db_dlls in your
bundle, which is the reason for specifying that destination on the
--add-binary option.
If you are building for Linux/Unix, instead of ^. use \ as the line continuation character as usual.
If you intend to copy your built executable to a new hostname, and that new hostname does not already have a pre-installed Db2-client , and you do not wish to install a separate Db2-client on the target, then you can bundle clidriver with the pyinstaller output with the --add-data option shown above.
If you bundle clidriver be aware that you may also need to bundle its configuration files (such as db2dsdriver.cfg and db2cli.ini) if they are in non-default locations, depending on whether your code uses externally configured DSNs or long connection-strings. If you do not bundle such configuration files (implicitly or explicitly) and you deploy your built environment to a different hostname than the build environment then you will need to reconfigure those files at the target hostname. The default location for these files is in the clidriver\cfg directory which will get included via --add-data as mentioned earlier.
If you bundle clidriver, and if you are using encrypted connections to Db2 via TLS/SSL, be aware you may also need to bundle additional files such as certificates, keystore/stash files etc when you run the pyinstaller build.
If you bundle clidriver, be aware that IBM refreshes this component a couple of times per year with bug fixes and security fixes and new functions, so you may need to refresh your executables periodically to prevent them from becoming security holes by being frozen in time with old versions.
if you bundle clidriver and if you need to use odbcad32 on the target hostname for configuring Db2 DSNs, then following deployment on the target hostname remember to run the clidriver\bin\db2cli install -setup command on the target hostname.
Thanks for your question and answer. I had met the same situation in Windows7 Python3.7 ibm-db 3.0.1
with your hint,I think the reason is that exe can't find *.dll in clidriver\bin and ibm_db.dll,
and solve it with a similar method in two steps
Frist:
the same as you, add clidriver directory to system path
**\site-packages\clidriver\bin
Second
pack with argument --add-binary
Pyinstaller --add-binary **\Lib\site-packages\ibm_db_dlls\ibm_db.dll;.\ibm_db_dlls myproject.py
Then it's OK!
similar question:
PyQt5 Executable is crashing with Missing DLL
I have just installed python pillow 5 on a raspberry pi. It installed fine, and works ok.
The issue i am having is finding a pilfont.py file.
I have several bdf fonts i need to convert and have been searching the web for how to do this.
All the information i have found points to the pilfont utility, but i cant find it on the pi.
Can anyone point me in the right direction as to where it is, I understand how to use it to convert the fonts, just can not activate it.
cheers
As of at least October 2018, the previous answer doesn't work anymore since the package does not include the pilfont utility. But it turns out you don't need to spend time hunting down an external utility, since pilfont is just a very simple script you recreate in just a couple of minutes.
Here's my own "pilfont utility" which converts all the .bdf and .pcf fonts in the current directory to .pil and .pbm:
#!/usr/bin/env python
# Author: Peter Samuel Anttila
# License: The Unlicense <http://unlicense.org, October 16 2018>
from PIL import BdfFontFile
from PIL import PcfFontFile
import os
import glob
font_file_paths = []
current_dir_path = os.path.dirname(os.path.abspath(__file__))
font_file_paths.extend(glob.glob(current_dir_path+"/*.bdf"))
font_file_paths.extend(glob.glob(current_dir_path+"/*.pcf"))
for font_file_path in font_file_paths:
try:
with open(font_file_path,'rb') as fp:
# despite what the syntax suggests, .save(font_file_path) won't
# overwrite your .bdf files, it just creates new .pil and .pdm
# files in the same folder
if font_file_path.lower().endswith('.bdf'):
p = BdfFontFile.BdfFontFile(fp)
p.save(font_file_path)
elif font_file_path.lower().endswith('.pcf'):
p = PcfFontFile.PcfFontFile(fp)
p.save(font_file_path)
else:
# sanity catch-all
print("Unrecognized extension.")
except (SyntaxError,IOError) as err:
print("File at '"+str(font_file_path)+"' could not be processed.")
print("Error: " +str(err))
For those on a tight deadline:
You don't need the utility. Just use the following code to convert it yourself:
with open(font_file_path,'rb') as fp:
p = BdfFontFile.BdfFontFile(fp) #PcfFontFile if you're reading PCF files
# won't overwrite, creates new .pil and .pdm files in same dir
p.save(font_file_path)
It throws SyntaxError and/or IOError in case the file can't be read as a BDF or PCF file.
Seems you installed pillow 5 via pip3. I did this myself, too and it did not include the pilfont utility. Even did not find the file in the pillow git. Did not find a deprecated info either. Therefore I suggest this workaround:
create an empty dir and change into it.
Now:
apt-get download python3-pil
to download the raspbian package which includes pillow 4 including pilfont. This does not install the package.
Next extract your downloaded deb package. Filename may vary:
ar -x python3-pil_4.0.0-4.deb
After that you have some files one is data.tar.xz which you need to extract:
tar -xvf data.tar.xz
This gives you ./usr/bin/pilfont
Now you may copy it to /usr/bin/
sudo cp ./usr/bin/pilfont /usr/bin/pilfont
After that you can delete the downloaded archive and its extracted contents.
I have one application developed in python 3.2, which has inbuilt modules(ex: Tkinter, matplotlib, openpyxl), user defined modules & classes(ex: draw_graph, generate_report), icon files, log file, .csv, .docx etc. I am running this application from script(ex: testapplication.py)
I have setup file as
import sys
from cx_Freeze import setup, Executable
exe = Executable(
script=r"C:\Python32\testapplication.py",
base="Win32GUI",
)
setup(
name = "TESTApp",
version = "0.1",
description = "An example",
executables = [exe]
)
Now I want to create a exe file of this application. can anyone please suggest me a way to do this?
So this is what you need to do. For starters, change script=r"C:\Python32\testapplication.py" to script=r"testapplication.py"
Then, put ALL the files to need to convert into C/python32 including the setup file. Then what you wan to do is get your command line up, and type the following commands: (assuming that you're cx_freeze file is named setup.py):
cd
cd python32
python setup.py build
And then you should have a build folder in that directory containing the exe file.
I don't get it working :(
What I did so far:
installed:
-tortoisehg-2.1.3-hg-1.9.2-x86.msi
-python-2.7.2.msi
-mercurial-1.9.2-x86.msi
My PATH Variable contains: D:\Program Files\TortoiseHg\;D:\Python27;
Created D:\MercurialWeb\ and set it up in IIS to run a test python cgi script.
In copied the templates directory from tortoiseHg to that web dir and extracted the library zip, too.
My hgweb.cgi looks like this:
#!/usr/bin/env python
#
# An example hgweb CGI script, edit as necessary
# See also http://mercurial.selenic.com/wiki/PublishingRepositories
# Path to repo or hgweb config to serve (see 'hg help hgweb')
config = "/path/to/repo/or/config"
# Uncomment and adjust if Mercurial is not installed system-wide:
#import sys; sys.path.insert(0, "/path/to/python/lib")
# Uncomment to send python tracebacks to the browser if an error occurs:
import cgitb; cgitb.enable()
from mercurial import demandimport; demandimport.enable()
from mercurial.hgweb import hgweb, wsgicgi
application = hgweb(config)
wsgicgi.launch(application)
hgweb.config
[paths]
MySourceCode = D:\MercurialRepos\**
[web]
style = monoblue
But if I open the Site I get this:
<type 'exceptions.ImportError'>: No module named mercurial
args = ('No module named mercurial',)
message = 'No module named mercurial'
There are a number of different blogs about setting up Mercurial on Windows Server, and most of them specify a specific Mercurial version ( the hightest I've seen is 1.7 ).
See if this blog helps : http://hyperionchaos.net/blog