I want to create a Python script that will parse 40.000 PDF files(text and images). Since I saw that there is no easy method to check if a page contains images I think I should use textract module.
Ideally I would deploy to Google App Engine.
My question is, for textract I've also installed other packages beside Python to my system. Can I deploy the script(with proper requirements.txt file) on Google Cloud App Engine without problem? or I will to use something else?
It is possible to use App Engine, but only with the Flexible environment and using a custom runtime, which allows you to add non-python dependencies (and also python dependencies not installable via pip):
Custom runtimes allow you to define new runtime environments, which
might include additional components like language interpreters or
application servers.
See also Building Custom Runtimes.
Related
Why do I need to bundle runtime environment with every Electron app and punish the ones trying to download the app? Can't it create a ClickOnce app like thing where the runtime environment is downloaded if it is not available already or download additional dependencies as necessary and link dynamically? It will save up on a lot of storage space worldwide I am sure.
error while loading shared libraries: libnss3.so cannot open shared object file
I want to deploy my puppeteer app on google app engine since it says their node.js environment supports the puppeteer however
I get still get this error.
What do I need to do?
Puppeteer requires custom libraries, so you need to make sure you are using the custom environment in your app.yaml:
runtime: custom env: flex
You can find a similar issue in this Github thread.
You also need to check the App Engine documentation for your language that describes dependencies specification to make sure your steps are aligned with the guidelines.
If you are using the App Engine Standard environment, the Node.js runtime of the App Engine standard environment comes with all system packages needed to run Headless Chrome.
To use puppeteer, simply list the module as a dependency in your package.json and deploy to Google App Engine. Read more about using puppeteer on App Engine by following the official tutorial.
I see that Docker is intended to deploy applications, but what about libraries? For instance I have a library called RAILWAY that is a set of headers, binary code libraries, and command line tools.
I was thinking the output of the railway CI/CD pipeline can be a docker image that is pushed to a registry. Any application that wants to use railway, must be built using docker. And it will just put FROM railway:latest and COPY --from=railway ... in its Dockerfile. The application can copy whatever it need from the library image into its own image.
Is this a normal use-case?
I could use a Debian package for railway, but Azure Artifacts do not support Debian packages (only nuget and npm). And docker is just so damn easy!
Most languages have their own systems for distributing and managing dependencies (like NuGet which you mentioned) which you should use instead.
The problem with your suggestion is that it's not as simple as "applications use libraries", it's rather "applications use libraries which use libraries which use libraries which use...".
E.g. if your app wants to use libraries A and B, but library A also uses library B itself, how do you handle that in your setup? Is there a binary for B in As docker image that gets copied over? Does it overwrite the binary for B that you copied earlier? What if they're different versions with different methods in them?
I am trying JTR to brute force a pdf file.
The password of pdf is like First 4 Letters Last 4 Number ex: ABCD1234 or ZDSC1977
I've downloaded the jumbo source code from github and using pdf2john.pl i've extracted the hash.
But now by reading the documentation it says i need to configure and install john which is not going to work in my case.
Cloud Functions or firebase functions does not allow sudo apt get installs. and that's the reasone we can't use tools like popple utils which includes amazing pdftotext.
How can i use JTR in cloud functions properly without need of installation ?
is there any portable or prebuilt for ubuntu 18.04 version of JTR ?
It is important to keep in mind that you can't arrange for packages to be installed on Cloud Functions instances. This due to your code doesn't run with root privileges.
If you need binaries to be available to your code deployed to Cloud Functions, you will have to build it yourself for Debian, and include the binaries in your functions directory so it gets deployed along with the rest of your code.
Even if you're able to do that, there's no guarantee it will work, because the Cloud Fucntions images may not include all the shared libraries required for the executables to work.
You can request that new packages be added to the runtime using the Public Issue Tracker.
Otherway, you can use Cloud Run or Compute Engine.
My app needs cmake, libx11-dev and libpng-dev to build. I came across this documentation, which leads me to believe that I can list these as dependencies for my app to run on the Google App Engine platform, although I cannot figure out how. I was successfully able to run my app in a Compute Engine instance, although this is costly and, if I'm not mistaken, unnecessary. How do I get the packages listed at the beginning of the question installed beyond session end?
You can only list Node.js dependencies that way. From Declaring and managing dependencies (emphasis mine):
You can use any Linux-compatible Node.js package with App Engine
flexible environment, including packages that require native (C)
extensions.
You can use dependencies other than Node.js (at least cmake in your list) but only in the flexible environment, via a custom runtime. From About Custom Runtimes:
Custom runtimes allow you to define new runtime environments, which
might include additional components like language interpreters or
application servers.
See also Building Custom Runtimes.
You need to keep in mind that the App Engine Flexible Environment still uses Compute Engine instances so may not get an additional benefit from moving across to this
Based on Google Compute Engine, the App Engine flexible environment
automatically scales your app up and down while balancing the load.
The issue that you have is that if you require cmake, libx11-dev and libpng-dev to build your application you'll still need to use an underlying Compute Engine VM in order to run the application. This will be the case even if you consider moving across to Kubernetes Engine as well.
If you're looking to manage costs for your application, perhaps consider downsizing the VM to a smaller instance or look into modifying your application to suit the App Engine Standard Environment or use Cloud Functions