How to debug presto server project?

How to debug presto server project? - presto

I am trying to debug presto server on eclipse with following steps,
com.facebook.presto.PrestoServer class > set debug break point inside main method.
but, following errors are coming.
1) Explicit bindings are required and com.facebook.presto.execution.TaskManager is not explicitly bound.
while locating com.facebook.presto.execution.TaskManager
for the 1st parameter of com.facebook.presto.server.GracefulShutdownHandler.(GracefulShutdownHandler.java:66)
at com.facebook.presto.server.GracefulShutdownModule.setup(GracefulShutdownModule.java:27)

The simplest is to follow the https://github.com/prestosql/presto/blob/master/README.md#running-presto-in-your-ide
In case if you would like attach a debugger to a server from existing Presto installation, please add the below to jvm.config file (see https://prestosql.io/docs/current/installation/deployment.html#jvm-config), restart Presto server and then attach the debugger:
-agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=5005

Related

"Error: Key not loaded" in h2o deployed through a K3s cluster, using python3 client

I can confirm the 3-replica cluster of h2o inside K3s is correctly deployed, as executing in the Python3 interpreter h2o.init(ip="x.x.x.x") works as expected. I followed the instructions noted here: https://www.h2o.ai/blog/running-h2o-cluster-on-a-kubernetes-cluster/
Nevertheless, I had to modify the service.yaml and comment out the line which says clusterIP: None, as K3s was complaining about something related to its inability to set the clusterIP to None. But even though, I can certify it is working correctly, and I am able to use an external IP to connect to the cluster.
If I try to load the dataset using the h2o cluster inside the K3s cluster using the exact same steps as described here http://docs.h2o.ai/h2o/latest-stable/h2o-docs/automl.html, this is the output that I get:
>>> train = h2o.import_file("https://s3.amazonaws.com/erin-data/higgs/higgs_train_10k.csv")
...
h2o.exceptions.H2OResponseError: Server error java.lang.IllegalArgumentException:
Error: Key not loaded: Key<Frame> https://s3.amazonaws.com/erin-data/higgs/higgs_train_10k.csv
Request: POST /3/ParseSetup
data: {'check_header': '0', 'source_frames': '["https://s3.amazonaws.com/erin-data/higgs/higgs_train_10k.csv"]'}
The same error occurs if I use the h2o.upoad_file("x.csv") method.
There is a clue about what may be happening here: Key not loaded: Key<Frame> while POSTing source frame through ParseSetup in H2O API call but I am not using curl, and I can not find any parameter that could help me overcome this issue: http://docs.h2o.ai/h2o/latest-stable/h2o-py/docs/h2o.html?highlight=import_file#h2o.import_file
I need to use the Python client inside the same K3s cluster due to different technical reasons, so I am not able to kick off nor Flow nor Firebug to know what may be happening.
I can confirm it is working correctly when I simply issue a h2o.init(), using the local Java instance.
UPDATE 1:
I have tried in different K3s clusters without success. I changed the service.yaml to a NodePort, and now this is the error traceback:
>>> train = h2o.import_file("https://s3.amazonaws.com/erin-data/higgs/higgs_train_10k.csv")
...
h2o.exceptions.H2OResponseError: Server error java.lang.IllegalArgumentException:
Error: Job is missing
Request: GET /3/Jobs/$03010a2a016132d4ffffffff$_a2366be93ec99a78d7bc161de8c54d67
UPDATE 2:
I have tried using different services (NodePort, LoadBalancer, ClusterIP) and none of them work. I also have tried using Minikube with the official image, and with a custom image made by me, without success. I suspect this is something related to either h2o itself, or the clustering between pods. I will keep digging and let's think there will be some gold in it.
UPDATE 3:
I also found out that the post about running H2O in Docker is really outdated https://www.h2o.ai/blog/h2o-docker/ nor is working the Dockerfile present at GitHub (I changed it to uncomment the ENTRYPOINT section without success): https://github.com/h2oai/h2o-3/blob/master/Dockerfile
Even though, I tried with the custom image I built for h2o-k8s and it is working seamlessly in pure Docker. I am wondering why it is still not working in K8s...
UPDATE 4:
I have tried modifying the environment variable called H2O_KUBERNETES_SERVICE_DNS without success.
In the meantime, the cluster started to be unavailable, that is, the readinessProbe's would not successfully complete. No matter what I change now, it does not work.
I spinned up a K3d cluster in local to see what happened, and surprisingly, the readinessProbe's were not failing, using v3.30.0.6. But now I started testing it with R instead of Python. I am glad I tried, because I may have pinpointed what was wrong. There is a version mismatch between the client and the server. So I updated accordingly the image to v3.30.0.1.
But now again, the readinessProbe is not working in my k3d cluster, so I am unable to test it.

It seems it is working now. R client version 3.30.0.1 with server version 3.30.0.1. Also tried with Python version 3.30.0.7 and server version 3.30.0.7 and it started working. Marvelous. The problem was caused by a version mismatch between the client and the server, as the python client was updated to 3.30.0.7 while the latest server for docker was 3.30.0.6.

How can I print my MongoDB queries from loopback

I want to print all the queries executed on MongoDB from my loopback 3 application when in debug mode. I tried setting "DEBUG" : "loopback:connector:mongodb"

I am using Loopback 2 and I also had to check the MongoDB queries for my APIs. I just used DEBUG=loopback:connector:mongodb node . command to start my loopback server with debugging enabled.
There is one more alternative way to do this. You can add a key debug and set it true in your datasource config file datasource.json files.
If the above two methods don't work for you, Please check the values of debug property in MongoDB function in node_modules/loopback-connector-mongodb/lib/mongodb.js file.
Resources
https://loopback.io/doc/en/lb2/Setting-debug-strings.html

Spring Data GemFire Server java.net.BindException in Linux

I have a Spring Boot app that I am using to start a Pivotal GemFire CacheServer.
When I jar up the file and run it locally:
java -jar gemfire-server-0.0.1-SNAPSHOT.jar
It runs fine without issue. The server is using the default properties
spring.data.gemfire.cache.log-level=info
spring.data.gemfire.locators=localhost[10334]
spring.data.gemfire.cache.server.port=40404
spring.data.gemfire.name=CacheServer
spring.data.gemfire.cache.server.bind-address=localhost
spring.data.gemfire.cache.server.host-name-for-clients=localhost
If I deploy this to a Centos distribution and run it with the same script but passing the "test" profile:
java -jar gemfire-server-0.0.1-SNAPSHOT.jar -Dspring.profiles.active=test
with my test profile application-test.properties looking like this:
spring.data.gemfire.cache.server.host-name-for-clients=server.centralus.cloudapp.azure.com
I can see during startup that the server finds the Locator already running on the host (I start it through a separate process with Gfsh).
The server even joins the cluster for about a minute. But then it shuts down because of a bind exception.
I have checked to see if there is anything running on that port (40404) - and nothing shows up
EDIT
Apparently I DO get this exception locally - it just takes a lot longer.
It is almost instant when I start it up on the Centos distribution. On my Mac it takes around 2 minutes before the process throws the exception:
Adding a few more images of this:
Two bash windows - left is monitoring GF locally and right I use to check the port and start the Java process:
The server is added to the cluster. Note the timestamp of 16:45:05.
Here is the server added and it appears to be running:
Finally, the exception after about two minutes - again look at the timestamp on the exception - 16:47:09. The server is stopped and dropped from the cluster.

Did you start other servers using Gfsh? That is, with a Gfsh command similar to...
gfsh>start server --name=ExampleGfshServer --log-level=config
Gfsh will start CacheServers listening on the default CacheServer port of 40404.
You have a few options.
1) First, you can disable the default CacheServer when starting a server with Gfsh like so...
gfsh>start server --name=ExampleGfshServer --log-level=config --disable-default-server
2) Alternatively, you can change the CacheServer port when starting other servers using Gfsh...
gfsh>start server --name=ExampleGfshServer --log-level=config --server-port=50505
3) If you are starting multiple instances of your Spring Boot, Pivotal GemFire CacheServer class, then you can vary the spring.data.gemfire.cache.server.port property by declaring the property as a System property when you startup.
For instance, you can, in the Spring Boot application.properties, do...
#application.properties
...
spring.data.gemfire.cache.server.port=${gemfire.cache.server.port:40404}
And then when starting the application from the command-line...
java -Dgemfire.cache.server.port=48484 -jar ...
Of course, you could just set the SDG property from the command line too...
java -Dspring.data.gemfire.cache.server.port=48484 --jar ...
Anyway, I guarantee you that you have another process (e.g. Pivotal GemFire CacheServer) with a ServerSocket listening on port 40404, running. netstat -a | grep 40404 should give you better results.
Hope this helps.
Regards,
John

spark-submit in cluster deploy mode get application id to console

I am stuck in one problem which I need to resolve quickly. I have gone through many posts and tutorial about spark cluster deploy mode, but I am clueless about the approach as I am stuck for some days.
My use-case :- I have lots of spark jobs submitted using 'spark2-submit' command and I need to get the application id printed in the console once they are submitted. The spark jobs are submitted using cluster deploy mode. ( In normal client mode , its getting printed )
Points I need to consider while creating solution :- I am not supposed to change code ( as it would take long time, cause there are many applications running ), I can only provide log4j properties or some custom coding.
My approach:-
1) I have tried changing the log4j levels and various log4j parameters but the logging still goes to the centralized log directory.
Part from my log4j.properties:-
log4j.logger.org.apache.spark.scheduler.cluster.YarnClusterSchedulerBackend=ALL,console
log4j.appender.org.apache.spark.scheduler.cluster.YarnClusterSchedulerBackend.Target=System.out
log4j.logger.org.apache.spark.deploy.SparkSubmit=ALL
log4j.appender.org.apache.spark.deploy.SparkSubmit=console
log4j.logger.org.apache.spark.deploy.SparkSubmit=TRACE,console
log4j.additivity.org.apache.spark.deploy.SparkSubmit=false
log4j.logger.org.apache.spark.deploy.yarn.Client=ALL
log4j.appender.org.apache.spark.deploy.yarn.Client=console
log4j.logger.org.apache.spark.SparkContext=WARN
log4j.logger.org.apache.spark.scheduler.DAGScheduler=INFO,console
log4j.logger.org.apache.hadoop.ipc.Client=ALL
2) I have also tried to add custom listener and I am able to get the spark application id after the applications finishes , but not to console.
Code logic :-
public void onApplicationEnd(SparkListenerApplicationEnd arg0)
{
for (Thread t : Thread.getAllStackTraces().keySet())
{
if (t.getName().equals("main"))
{
System.out.println("The current state : "+t.getState());
Configuration config = new Configuration();
ApplicationId appId = ConverterUtils.toApplicationId(getjobUId);
// some logic to write to communicate with the main thread to print the app id to console.
}
}
}
3) I have enabled the spark.eventLog to true and specified a directory in HDFS to write the event logs from spark-submit command .
If anyone could help me in finding an approach to the solution, it would be really helpful. Or if I am doing something very wrong, any insights would help me.
Thanks.

After being stuck at the same place for some days, I was finally able to get a solution to my problem.
After going through the Spark Code for the cluster deploy mode and some blogs, few things got clear. It might help someone else looking to achieve the same result.
In cluster deploy mode, the job is submitted via a Client thread from the machine from which the user is submitting. Actually I was passing the log4j configs to the driver and executors, but missed out on the part that the log 4j configs for the "Client" was missing.
So we need to use :-
SPARK_SUBMIT_OPTS="-Dlog4j.debug=true -Dlog4j.configuration=<location>/log4j.properties" spark-submit <rest of the parameters>

To clarify:
client mode means the Spark driver is running on the same machine you ran spark submit from
cluster mode means the Spark driver is running out on the cluster somewhere
You mentioned that it is getting logged when you run the app in client mode and you can see it in the console. Your output is also getting logged when you run in cluster mode you just can't see it because it is running on a different machine.
Some ideas:
Aggregate the logs from the worker nodes into one place where you can parse them to get the app ID.
Write the appIDs to some shared location like HDFS or a database. You might be able to use a Log4j appender if you want to keep log4j.

Server Error in '/DotNetNuke_Community' Application

I'm getting the following error when attempting to run DotNetNuke 7.1 from IIS.
Object reference not set to an instance of an object.
Description: An unhandled exception occurred during the execution of the current web request. Please review the stack trace for more information about the error and where it originated in the code.
Exception Details: System.NullReferenceException: Object reference not set to an instance of an object.
Source Error:
Line 572: //first call GetProviderPath - this insures that the Database is Initialised correctly
Line 573: //and also generates the appropriate error message if it cannot be initialised correctly
Line 574: string strMessage = DataProvider.Instance().GetProviderPath();
Line 575: //get current database version from DB
Line 576: if (!strMessage.StartsWith("ERROR:"))
I've tried running it from Visual Studio 2012 after downloading and extracting the source code to a folder, then running, but I get the same error (also, VS loads about 13 instances of it's built in webserver which can't be correct).
Clearly, there is something wrong with the database. From what I've read in the past, there should have been a start up configuration page (for configuring settings the first time you run the project).
I did look at the local version of IIS (running on Windows 8) and it created the site fine there, however, for some reason the internal webserver attempts to run (and the option to run on an external IIS is greyed out).
Anyone run into this problem with DNN Community edition? I've tried running as admin and setting permissions with no luck at all.
Any way to fix this?

Ok, the key is to delete the Database.mdf file completely.
Then create a new empty database of your choice in SQL Server (2008 or greater).
Create a new user account with db_owner access (as it must be able to create tables, etc).
Change the connection strings in the release.config and development.config to connect to the database.
DELETE the web.config file.
RENAME either config file to "web.config"
Set the default project to the web project in VS
set the default page to default.aspx
Run
I made the erroneous assumption that running the app would rename the config file for me (not sure why I assumed that).
SOLVED!

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string