Cloud dataflow python3 job not solving dependencies - python-3.x

I have a simple apache beam project using python 3 to transform some data and write to big query, it uses a package called texstat, if I run locally everything works, but when I run on dataflow I get the following error:
NameError: name 'textstat' is not defined [while running 'generatedPtransform-441']
This is my current setup.py file:
import setuptools
REQUIRED_PACKAGES = ['textstat==0.5.6']
PACKAGE_NAME = 'my_package'
PACKAGE_VERSION = '0.0.1'
setuptools.setup(
name=PACKAGE_NAME,
version=PACKAGE_VERSION,
description='Example project',
install_requires=REQUIRED_PACKAGES,
packages=setuptools.find_packages(),
)
and this are my pipeline args
pipeline_args = [
'--project={}'.format('etl-example'),
'--runner={}'.format('Dataflow'),
'--temp_location=gs://dataflowtemporal/',
'--setup_file=./setup.py',
]
and I run it like this
pipeline_options = PipelineOptions(pipeline_args)
pipeline_options.view_as(StandardOptions).streaming = True
pipeline = beam.Pipeline(options=pipeline_options)
...
pipeline.run()
I also tried with running this on the terminal before running the job:
python setup.py sdist --formats=gztar
but I get the same results of texstat not being found.
Another thing I tries was without setup.py and only with the argument
--requirements_file=./requirements.txt
But again, texstat is not found
At this point I don't know what else to try.

Normally it is because the library is not imported locally in your DoFn.
Alternatively you can try --save_main_session option as mentioned in https://cloud.google.com/dataflow/docs/resources/faq

Related

Jenkins executing python script

I am trying to call python script using Jenkins. I am able to execute the python script over command promt successfully. However when same script is called over Jenkins, I am facing problem of dll. My script is as follow
import isystem.connect as ic
import time
import os
print('isystem.connect version: ' + ic.getModuleVersion())
# 1. connect to winIDEA Application
pathTowinIDEA = 'C:/winIDEA/iConnect.dll'
cmgr_APPL = ic.ConnectionMgr(pathTowinIDEA)
cmgr_APPL.connectMRU('U:/winIDEA/Winidea_cust_config.xjrf.xjrf')
debug_APPL = ic.CDebugFacade(cmgr_APPL)
ec = ic.CExecutionController(cmgr_APPL)
print('APPL Configuration launched')
Error is appearing at line 9:
cmgr_APPL.connectMRU('U:/winIDEA/Winidea_cust_config.xjrf.xjrf')
Error message in Jenkins:
E:6 Can not load iConnect.dll: 'C:/winIDEA/iConnect64.dll'
SystemError: 'The specified module could not be found.
I have verified that iConnect64.dll is present at this path. How to resolve it ?

How to pass optionparser arguments through pylint?

test.py
from optparse import OptionParser
class Test(object):
def __init__(self):
pass
def _test1(self, some_val):
print(some_val)
def main(self, some_val):
self._test1(some_val)
if __name__ == "__main__":
parser = OptionParser()
parser.add_option("-a", "--abcd", dest="abcd", default=None,help="some_val")
(options, args) = parser.parse_args()
val = options.abcd
mainobj = Test()
mainobj.main(val)
Above script works, when executed python test.py --abcd=wxyz
When I run python -m pylint test.py --abcd=wxyz , doesn't execute.
Error:
strong text Usage: __main__.py [options]
__main__.py: error: no such option: --abcd
How to execute through pylint ?
Can you please help me ?
You cannot send new arguments like that using Pylint
Pylint is a static code analysis tool. It does NOT run your program. Therefore it does not matter what arguments are sent to the main program you want to test. Imagine it tests your code as it was "text" without running it.
Also the syntax you are using in the command line is not right, because after-m pylint the arguments which are sent are actually Pylint's arguments. Pylint has it's own set of options and rules you can set. you can view a summary here
or just type in your command line this:
pylint --help
The error message you get is a Pylint error. if you want to change the options you insert to Pylint you would have to change its source code i reckon...
Hope I understood your question correctly and that it helps.

Why does the program run in command line but not with IDLE?

The code uses a Reddit wrapper called praw
Here is part of the code:
import praw
from praw.models import MoreComments
username = 'myusername'
userAgent = 'MyAppName/0.1 by ' + username
clientId = 'myclientID'
clientSecret = 'myclientSecret'
threadId = input('Enter your thread id: ');
reddit = praw.Reddit(user_agent=userAgent, client_id=clientId, client_secret=clientSecret)
submission = reddit.submission(id=threadId)
subredditName = submission.subreddit
subredditName = str(subredditName)
act = input('type in here what you want to see: ')
comment_queue = submission.comments[:] # Seed with top-level
submission.comments.replace_more(limit=None)
def dialogues():
for comment in submission.comments.list():
if comment.body.count('"')>7 or comment.body.count('\n')>3:
print(comment.body + '\n \n \n')
def maxLen():
res = 'abc'
for comment in submission.comments.list():
if len(comment.body)>len(res):
res=comment.body
print(res)
#http://code.activestate.com/recipes/269708-some-python-style-switches/
eval('%s()'%act)
Since I am new to Python and don't really get programming in general, I am surprised to see that the every bit of code in the commandline works but I get an error in IDLE on the first line saying ModuleNotFoundError: No module named 'praw'
you have to install praw using the command
pip install praw which install latest version of praw in the environment
What must be happening is that your cmd and idle are using different python interpreters i.e., you have two different modules which can execute python code. It can either be different versions of python or it can be the same version but, installed in different locations in your machine.
Let's call the two interpreters as PyA and PyB for now. If you have pip install praw in PyA, only PyA will be able to import and execute functions from that library. PyB still has no idea what praw means.
What you can do is install the library for PyB and everything will be good to go.

Python3.6 how to install libtorrent?

Does libtorrent support python3 now? if supported how to install it .
I want to code a DHT Crawler with python3 , i don't know why my code alaways got Connection reset by peer error , so i want to use libtorrent , if there are another lib , i'm happy to use it . My biggest problem is , conversing the infohash to torrent file . Could it be a code problem?
class Crawler(Maga):
async def handler(self, infohash, addr):
fetchMetadata(infohash, addr)
def fetchMetadata(infohash, addr, timeout=5):
tcpServer = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
tcpServer.settimeout(timeout)
if tcpServer.connect_ex(addr) == 0:
try:
# handshake
send_handshake(tcpServer, infohash)
packet = tcpServer.recv(4096)
# handshake error
if not check_handshake(packet, infohash):
return
# ext handshake
send_ext_handshake(tcpServer)
packet = tcpServer.recv(4096)
# get ut_metadata and metadata_size
ut_metadata, metadata_size = get_ut_metadata(packet), get_metadata_size(packet)
# request each piece of metadata
metadata = []
for piece in range(int(math.ceil(metadata_size / (16.0 * 1024)))):
request_metadata(tcpServer, ut_metadata, piece)
packet = recvall(tcpServer, timeout) # the_socket.recv(1024*17) #
metadata.append(packet[packet.index("ee") + 2:])
metadata = "".join(metadata)
logging.info(bencoder.bdecode(metadata))
except ConnectionResetError as e:
logging.error(e)
except Exception as e:
logging.error(e)
finally:
tcpServer.close()
Yes, libtorrent (is supposed to) support python 3.
The basic way to build and install the python binding is by running the setup.py. That requires that all dependencies are properly installed where distutils expects them though. If this works it's probably the simplest way so it's worth a shot. I believe you'll have to invoke this python script using python3, if that's what you're building for.
To build using the main build system (and install the resulting module manually) you can follow the steps in the .travis file (for unix) or the appeyor file for windows.
In your case you want to specify a 3.x version of python, but the essence of it is:
brew install boost-python
echo "using python : 2.7 ;" >> ~/user-config.jam
cd bindings/python
bjam -j3 stage_module libtorrent-link=static boost-link=static
To test:
python test.py
Note, in the actual build step you may want to link shared against boost if you already have that installed, and shared against libtorrent if you have that installed, or will install it too.
On windows:
add the following to $HOMEPATH\user-config.jam:
using msvc : 14.0 ;
using gcc : : : <cxxflags>-std=c++11 ;
using python : 3.5 : c:\\Python35-x64 : c:\\Python35-x64\\include : c:\\Python35-x64\\libs ;
Then run:
b2.exe --hash openssl-version=pre1.1 link=shared libtorrent-link=shared stage_module stage_dependencies
To test:
c:\Python35-x64\python.exe test.py

ipython notebook --script deprecated. How to replace with post save hook?

I have been using "ipython --script" to automatically save a .py file for each ipython notebook so I can use it to import classes into other notebooks. But this recenty stopped working, and I get the following error message:
`--script` is deprecated. You can trigger nbconvert via pre- or post-save hooks:
ContentsManager.pre_save_hook
FileContentsManager.post_save_hook
A post-save hook has been registered that calls:
ipython nbconvert --to script [notebook]
which behaves similarly to `--script`.
As I understand this I need to set up a post-save hook, but I do not understand how to do this. Can someone explain?
[UPDATED per comment by #mobius dumpling]
Find your config files:
Jupyter / ipython >= 4.0
jupyter --config-dir
ipython <4.0
ipython locate profile default
If you need a new config:
Jupyter / ipython >= 4.0
jupyter notebook --generate-config
ipython <4.0
ipython profile create
Within this directory, there will be a file called [jupyter | ipython]_notebook_config.py, put the following code from ipython's GitHub issues page in that file:
import os
from subprocess import check_call
c = get_config()
def post_save(model, os_path, contents_manager):
"""post-save hook for converting notebooks to .py scripts"""
if model['type'] != 'notebook':
return # only do this for notebooks
d, fname = os.path.split(os_path)
check_call(['ipython', 'nbconvert', '--to', 'script', fname], cwd=d)
c.FileContentsManager.post_save_hook = post_save
For Jupyter, replace ipython with jupyter in check_call.
Note that there's a corresponding 'pre-save' hook, and also that you can call any subprocess or run any arbitrary code there...if you want to do any thing fancy like checking some condition first, notifying API consumers, or adding a git commit for the saved script.
Cheers,
-t.
Here is another approach that doesn't invoke a new thread (with check_call). Add the following to jupyter_notebook_config.py as in Tristan's answer:
import io
import os
from notebook.utils import to_api_path
_script_exporter = None
def script_post_save(model, os_path, contents_manager, **kwargs):
"""convert notebooks to Python script after save with nbconvert
replaces `ipython notebook --script`
"""
from nbconvert.exporters.script import ScriptExporter
if model['type'] != 'notebook':
return
global _script_exporter
if _script_exporter is None:
_script_exporter = ScriptExporter(parent=contents_manager)
log = contents_manager.log
base, ext = os.path.splitext(os_path)
py_fname = base + '.py'
script, resources = _script_exporter.from_filename(os_path)
script_fname = base + resources.get('output_extension', '.txt')
log.info("Saving script /%s", to_api_path(script_fname, contents_manager.root_dir))
with io.open(script_fname, 'w', encoding='utf-8') as f:
f.write(script)
c.FileContentsManager.post_save_hook = script_post_save
Disclaimer: I'm pretty sure I got this from SO somwhere, but can't find it now. Putting it here so it's easier to find in future (:
I just encountered a problem where I didn't have rights to restart my Jupyter instance, and so the post-save hook I wanted couldn't be applied.
So, I extracted the key parts and could run this with python manual_post_save_hook.py:
from io import open
from re import sub
from os.path import splitext
from nbconvert.exporters.script import ScriptExporter
for nb_path in ['notebook1.ipynb', 'notebook2.ipynb']:
base, ext = splitext(nb_path)
script, resources = ScriptExporter().from_filename(nb_path)
# mine happen to all be in Python so I needn't bother with the full flexibility
script_fname = base + '.py'
with open(script_fname, 'w', encoding='utf-8') as f:
# remove 'In [ ]' commented lines peppered about
f.write(sub(r'[\n]{2}# In\[[0-9 ]+\]:\s+[\n]{2}', '\n', script))
You can add your own bells and whistles as you would with the standard post save hook, and the config is the correct way to proceed; sharing this for others who might end up in a similar pinch where they can't get the config edits to go into action.

Resources