I want to build a website which will take an input file and process it on Apache spark in the backend then send back the output to website.
I am not understanding how to connect spark running on Jupyter notebook with my website.
Any idea is highly welcome.
Spark won’t communicate directly with your web application servers, really. One possible way around this is to publish your results to a database (MongoDB or PostgreSQL, for instance) and then integrate it with your website.
I got my answer regarding the question.
I have used python flask code in my pyspark program and it is giving me desired result on website.
Related
I have written a Python REST API using FastAPI. It connects to Janus Graph on a remote machine and runs some Gremlin Queries using the GremlinPython API. While writing my unit tests using FastAPI's built in test client, I cannot mock Janus Graph and test my APIs. In the worst case I need to run Janus on docker in my local setup and test there. However, I would like to do a pure unit test. I've not come across any useful documentation so far. Can anyone please help?
I think running Gremlin Server locally is how a lot of people do local testing. If you do not need to test data persistence you could configure JanusGraph to use the "inmemory" backend and avoid the need to provision any storage nodes.
I just will try to explain my simplified use case. There is:
Spark Application which count the words.
A web server which serves a web page with a form.
User that can type word within this form and submit it.
Server receives the word and sends it to the Spark Application.
Spark application takes as an input this word, based on some data and this word launches a job with recalculations. Once Spark done with calculations, it sends results to web server which shows results on a web page.
The question is, how i can establish communication between spark application and web-server?
I guess, that spark-jobserver or spark-streaming can help me here, but i am not sure about it.
There are a few projects that will help you with this.
Generally you run a separate webserver for managing the spark jobs as there is some messy systemExec work around the spark-submit cli to accomplish this. Obviously this runs on a different port than you're primary application and is only accessible by the server component of the primary web application.
There are a few open source projects that will handle this for you most notably:
https://github.com/spark-jobserver/spark-jobserver
https://github.com/cloudera/livy
I'm new to Cassandra.
I've deployed a Cassandra 2.0 cluster and everything works as expected.
There's one thing I don't understand, though.
From within a web app that uses the database, to which node should I connect? I know they're all the same, but how do I know that node isn't down?
I read that you're not supposed to use a load balancer, so I'm a little confused.
Any help appreciated. Thanks!
Depending on which driver you are using to connect, you can typically provide more than one node to connect to. Usually in the form of "node1, node2" ("192.168.1.1,192.168.1.2")
I am very impressed with IPython Notebook, and I'd like to use it more extensively. My question has to do with secure data. I know only very little about networking. If I use IPython Notebook, is the data sent out over the web to a remote server? Or is it all contained locally? I am not talking about setting up a common resource for multiple access points, just using the data on my machine as I would with SAS or R.
Thanks
If you run the notebook on your machine, then no, it doesn't send anything externally. There are sites like Wakari where you can use the IPython notebook that's running on a server, and obviously that will send your code and data to their servers.
If you did want to expose your notebook server on the internet, then there are security measures that you should take, but that's not necessary if you're just running ipython notebook locally, which is the default way it starts up.
I have been struggling to do the tutorial, https://www.windowsazure.com/en-us/develop/nodejs/tutorials/web-app-with-mongodb/, which basically makes a simple node.js application that has access to a Mongo DB. I keep running into the following issue when launching the program locally with the command Start-AzureEmulator:
"No connection could be made because the target machine actively refused it 127.0.0.1:27017"
I tried various ports and configurations with no success. Oddly enough, when I run mongodb.exe, the database launches without hiccup (this is just through the command line not within the Azure Emulator). I have also tried reinstalling all of the tools multiple times. It seems I am at a loss of what to do next.
Have any of you experienced this problem or have been able to complete this tutorial?
As a side note, do any of you know any cloud providers that allow the use of sockets with node.js? This is one of the main reasons I am trying to use Azure.
I assume you've followed the instructions step by step and haven't modified anything yet?
I note from the screenshot below, that the sample tries to open Mongo at 127.255.0.1:27017, not 127.0.0.1:27017:
I suggest checking your Azure services' URL's in case you're looking for Mongo on the wrong address.