AZURE FUNCTIONS: PDFInfoNotInstalledError: Unable to get page count. Is poppler installed and in PATH? for pdf2image - azure

I am getting this error "Result: Failure Exception: PDFInfoNotInstalledError: Unable to get page count. Is poppler installed and in PATH? for azure functions."
I am using pdf2image library's convert_from_path() to process my pdf to image. This works fine while I test from local. While publishing the function to azure, poppler-utils package also gets installed there but still the error comes. I saw a lot of threads related to this error and tried it but wanted to know , if anyone experienced this specifically for azure functions.

Suggestion for this issue has been provided in the thread
"you should try to troubleshoot it by simply having a function that opens a process and prints the help of pdftoppm (poppler). You will be able to get a different message that might be more relevant.
Something like this:
import subprocess
def main():
p = subprocess.Popen(["pdftoppm", "-h"], stdout=subprocess.PIPE, stderr=subprocess.PIPE)
out, err = p.communicate()
print(out, err)
As a general recommendation, I would bundle the poppler utilities with your package to avoid installing it in the function environment. You can call the function with poppler_path."

Related

FlaskInjector : Working outside of request context

I am building a flask webapp that communicates with another application through http requests. So I am trying to using dependency injection for the httpClient object.
Here is the code
class HttpClient(object):
def __init__(self, host):
self.host = host
self.httpclient = 3partyModule.connect(url=host,
verbose=False,
max_greenlets=1)
def configure(binder):
binder.bind(HttpClient, to=HttpClient, scope=singleton)
if __name__ == '__main__':
app = Flask(__name__)
FlaskInjector(app=app, modules=[configure])
app.run()
When I run the application, I get the following error -
Exception has occurred: RuntimeError
Working outside of request context.
This typically means that you attempted to use functionality that needed
an active HTTP request. Consult the documentation on testing for
information about how to avoid this problem.
I have tried to lookup on this, but could not find any helpful lead.
Appreciate if anyone can shed some light on the issue here.
Thank you in advance.
Package Versions:
Flask==2.0.1
Flask-Injector==0.12.3
Python==3.8
Check your Werkzeug version. Noticed this issue with Werkzeug==2.0.1. Had to rollback to Werkzeug==0.15.2.
pip install --upgrade Werkzeug==0.15.2
I had this issue with Flask-Injector==0.12.3 and managed to resolve it by upgrading to Flask-Injector==0.13.0

Chrome run fails in Azure Functions: An attempt was made to access a socket in a way forbidden by its access permissions

I wrote a web bot that uses Selenium framework to crawl. Installed ChromeDriver 72.0.3626.69 and also downloaded Chromium 72.0.3626.121. The app initializes ChromeDriver with this included Chromium binary (and NOT a locally installed Chrome binary). All this perfectly works on my machine locally.
I've been attempting now to port the app to Azure Functions. I wrote a function, tested it, and it works fine locally. But once I publish it to Azure Functions it fails due to about 182 errors of type:
An attempt was made to access a socket in a way forbidden by its
access permissions
I know this happens due to exceeding the TCP connection limits of Azure sandbox, but the only attempt here was to create an instance of ChromeDriver (not even navigate anywhere yet!)
Here is a screenshot of Azure Function call log.
That error appears about 182 times in a row, and that's basically just an attempt to create a browser instance (or ChromeDriver instance, to be precise - can't be sure if that's Chromium or ChromeDriver causing the issue).
The question: Have anyone experienced issues with ChromeDriver/Chromium creating so many (obviously excessive) connections when launching? And what might help to avoid this.
If that's of any help, this is basically a piece of code that crashes on the last line:
ChromeOptions options = new ChromeOptions();
options.BinaryLocation = this.chromePath;
options.AddArgument("no-sandbox");
options.AddArgument("disable-infobars");
options.AddArgument("--disable-extensions");
if (this.headlessMode)
{
options.AddArgument("headless");
}
options.AddUserProfilePreference("profile.default_content_setting_values.images", 2);
Log.LogInformation("Chrome options compiled. Creating ChromeDriverService...");
var driverService = ChromeDriverService.CreateDefaultService(this.driverPath);
driver = new ChromeDriver(driverService, options, timeout);
I believe you are running this function in a Windows Function App which is subject to quite a few limitations as described in this wiki.
But when running on Linux, functions are basically run in a docker container, removing most of these restrictions that windows has. I believe what you are trying should be possible there.
You could either just deploy your function to a Linux Function App or even build a container and use that directly as well.

How to get messages from googles Pub/Sub sytsem by using the current pubsub subsciber

I need to receive published messages from googles Pub/Sub system by using a python based subscriber.
For this I did the following steps:
On the web console I created a project, a registry, a telemetry topic, a device and attached a subscription topic to the telemtry topic
A the Moment my code can publish messages over the mqtt bridge and also the publish functionality of the pubsub library
I can pull this messages over the terminal by using the following cmd:
gcloud pubsub subscriptions pull --auto-ack projects/{project_id}/subscriptions/{subscription_topic}
In the following you see the important snippet of my code. It is based on the git-examples but some functions do not seem to exist anymore in version 0.39.1 of the google-cloud-pubsub package. One example is the subscriber.subscription_path() method.
def receive_messages(subscription_path, service_account_json):
import time
from google.cloud import pubsub_v1
subscriber = pubsub_v1.SubscriberClient(credentials=service_account_json)
#subscription_path = subscriber.subscription_path(
# project_id, subscription_name)
def callback(message):
print('Received message: {}'.format(message))
message.ack()
subscriber.subscribe(subscription_path, callback=callback)
print('Listening for messages on {}'.format(subscription_path))
while True:
time.sleep(60)
When I run this function, countless threads are started in the background bit by bit, but none of them seem to ever quit or start the callback function.
I hopefully installed all requirements:
pip3 freeze
asn1crypto==0.24.0
cachetools==3.0.0
certifi==2018.11.29
cffi==1.11.5
chardet==3.0.4
cryptography==2.4.2
google-api-core==1.7.0
google-api-python-client==1.7.5
google-auth==1.6.2
google-auth-httplib2==0.0.3
google-auth-oauthlib==0.2.0
google-cloud-bigquery==1.8.1
google-cloud-core==0.29.1
google-cloud-datastore==1.7.3
google-cloud-monitoring==0.31.1
google-cloud-pubsub==0.39.1
google-resumable-media==0.3.2
googleapis-common-protos==1.5.6
grpc-google-iam-v1==0.11.4
grpcio==1.17.1
httplib2==0.12.0
idna==2.8
keyring==10.1
keyrings.alt==1.3
oauthlib==3.0.0
paho-mqtt==1.4.0
protobuf==3.6.1
pyasn1==0.4.5
pyasn1-modules==0.2.3
pycparser==2.19
pycrypto==2.6.1
pycurl==7.43.0
pygobject==3.22.0
PyJWT==1.6.4
python-apt==1.4.0b3
pytz==2018.9
pyxdg==0.25
redis==3.0.1
requests==2.21.0
requests-oauthlib==1.2.0
RPi.GPIO==0.6.5
rsa==4.0
SecretStorage==2.3.1
six==1.12.0
unattended-upgrades==0.1
uritemplate==3.0.0
urllib3==1.24.1
virtualenv==16.2.0
I run that code on debian aswell on windows 10 and updated the gcloud:
gcloud components update
For the past week, I've been trying different solutions out of the way or starting the seemingly obsolete google examples. Also, the documentation, which seems even older than the code examples did not help with. So I hope someone here can help me to finally receive python-based client messages via the Pub/Sub-Sytsem.
I hope I could provide the most important information and thank you in advance for your effort to help me.
The examples maintained on the python documentation site here should be up to date. Make sure that you've followed all the steps in the "In order to use this library, you first need to go through the following steps" section before running any code. In particular, you may not have properly set up authentication, I don't believe you should be passing the credentials path manually.
def callback(message: pubsub_v1.subscriber.message.Message) -> None:
print(f"Received {message}.")
message.ack()
streaming_pull_future = subscriber.subscribe(subscription_path,
callback=callback)
print(f"Listening for messages on {subscription_path}..\n")
try:
streaming_pull_future.result(timeout=timeout)
except TimeoutError:
streaming_pull_future.cancel() # Trigger the shutdown.
streaming_pull_future.result() # Block until the shutdown is

Azure: importing not already existing packages in 'src'

I have an experiment in which a module R script uses functions defined in a zip source (Data Exploration). Here it's described how to do about the packages not already existing in the Azure environment.
The DataExploration module has been imported from a file Azure.zip containing all the packages and functions I need (as shown in the next picture).
When I run the experiment nothing goes wrong. At the contrary, watching the log it seems clear that Azure is able to manage the source.
The problem is that, when I deploy the web service (classic), if I run the experiment I get the following error:
FailedToEvaluateRScript: The following error occurred during
evaluation of R script: R_tryEval: return error: Error in
.zip.unpack(pkg, tmpDir) : zip file 'src/scales_0.4.0.zip' not found ,
Error code: LibraryExecutionError, Http status code: 400, Timestamp:
Thu, 21 Jul 2016 09:05:25 GMT
It's like he cannot see the scales_0.4.0.zip into the 'src' folder.
The strange fact is that all used to work until some days ago. Then I have copied the experiment on a second workspace and it gives me the above error.
I have also tried to upload again the DataExploration module on the new workspace, but it's the same.
I have "solved" thanks to the help of the AzureML support: it is a bug they are trying to solve right now.
The bug shows up when you have more R script modules, and the first has no a zip input module while the following have.
Workaround: connect the zip input module to the first R script module too.

bottle.py WSGI server stops responding

I'm trying to build a simple API with the bottle.py (Bottle v0.11.4) web framework. To 'daemonize' the app on my server (Ubuntu 10.04.4), I'm running the shell
nohup python test.py &
, where test.py is the following python script:
import sys
import bottle
from bottle import route, run, request, response, abort, hook
#hook('after_request')
def enable_cors():
response.headers['Access-Control-Allow-Origin'] = '*'
#route('/')
def ping():
return 'Up and running!'
if __name__ == '__main__':
run(host=<my_ip>, port=3000)
I'm running into the following issue:
This works initially but the server stops responding after some time (~24hours). Unfortunately, the logs don't contain any revealing error messages.
The only way I have been able to reproduce the issue is when I try to run a second script on my Ubuntu server that creates another server listening to a different port (ie.: exactly the same script as above but port=3001). If I send a request to the newly created server, I also do not get a response and the connection eventually times out.
Any suggestions are greatly appreciated. I'm new to this, so if there's something fundamentally wrong with this approach, any links to reference guides would also be appreciated. Thank you!
Can you make sure the server isn't sleeping.
If it is, try enabling Wake On LAN http://ubuntuforums.org/showthread.php?t=234588

Resources