running presto-cli warnning:SerDe org.apache.hadoop.hive.contrib.serde2.RegexSerDe does not exist - presto

Deploy the presto on single node . when running Presto-cli ,i got following errors:
presto:default> select * from test1;
Query 20131116_233859_00005_5a2yh failed: MetaException(message:org.apache.hadoop.hive.serde2.SerDeException SerDe org.apache.hadoop.hive.contrib.serde2.RegexSerDe does not exist)
hive is Operating normally. why prestodb was failed?
my profile:
export JAVA_HOME=/usr/java
export JRE_HOME=/usr/java/jre
export HADOOP_HOME=/usr/hadoop
export HIVE_HOME=/usr/hive
export PRESTO_HOME=/usr/presto
export CLASSPATH=:.:$CLASSPATH:$JAVA_HOME/lib:$JRE_HOME:`find /usr/hadoop -name '*.jar' | grep -v 'test' | grep -v 'example' | perl -e '#jars=<STDIN>;chomp #jars; print join(":",#jars);'`:$PRESTO_HOME/lib:$HADOOP/lib:$HIVE_HOME/lib
export PATH=$PATH:$JAVA_HOME/bin:$JRE_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PRESTO_HOME/bin:$HIVE_HOME/bin
export HADOOP_HEAPSIZE=4096
I got it,when the table of hive was created by RegEx, the presto-cli running failed.
I had no idea. somebody help me please!

On Master node:
Place your RegexSerDe.jar into hive connectors plugin directory related to your hadoop distribution. (Ex: for hadoop2 distribution you may place JAR file to ../presto/plugin/hive-hadoop2/)
Make sure the RegexSerDe.jar file has correct ownership as other jar files present in this directory.
Restart the presto-server process. (sudo service presto-server restart). If this do not work, you may need to restart with launcher sudo /usr/lib/presto/bin/launcher restart
Repeat this on all slave nodes !

Followed the same process as you mentioned above getting same error any suggestions please.
Query failed: org/apache/hadoop/hive/serde2/SerDe
Using hive-hadoop2 version placed the serde jar in the following path
presto/plugin/hive-hadoop2/hive-serde-1.0.0.jar and restarted presto.

You have to put the SerDe jar inside the plugin directory (plugin/hive-cdh4).
I didn't use RegexSerDe but it worked for CSVSerDe.

Related

DSBulk CSV Load Failure to DataStax Astra Cassandra Database, missing file config.json

I am trying to load a csv into a database in DataStax Astra using the DSBulk tool.
Here is the command I ran minus the sensitive details:
dsbulk load -url D:\\App\\data.csv -k data -t data -b D:\\App\\secure-connect-myapp -u username -p password
Here is the error I get back:
Operation LOAD_20221206-004421-512000 failed: Invalid bundle: missing file config.json.
Here is the full log:
2022-12-06 00:44:21 INFO Username and password provided but auth provider not specified, inferring PlainTextAuthProvider
2022-12-06 00:44:21 INFO A cloud secure connect bundle was provided: ignoring all explicit contact points.
2022-12-06 00:44:21 INFO A cloud secure connect bundle was provided and selected operation performs writes: changing default consistency level to LOCAL_QUORUM.
2022-12-06 00:44:21 INFO Operation directory: C:\Program Files\dsbulk-1.10.0\bin\logs\LOAD_20221206-004421-512000
2022-12-06 00:44:21 ERROR Operation LOAD_20221206-004421-512000 failed: Invalid bundle: missing file config.json.
java.lang.IllegalStateException: Invalid bundle: missing file config.json
at com.datastax.oss.driver.internal.core.config.cloud.CloudConfigFactory.createCloudConfig(CloudConfigFactory.java:114)
at com.datastax.oss.driver.api.core.session.SessionBuilder.buildDefaultSessionAsync(SessionBuilder.java:876)
at com.datastax.oss.driver.api.core.session.SessionBuilder.buildAsync(SessionBuilder.java:817)
at com.datastax.oss.driver.api.core.session.SessionBuilder.build(SessionBuilder.java:835)
at com.datastax.oss.dsbulk.workflow.commons.settings.DriverSettings.newSession(DriverSettings.java:560)
at com.datastax.oss.dsbulk.workflow.load.LoadWorkflow.init(LoadWorkflow.java:145)
at com.datastax.oss.dsbulk.runner.WorkflowThread.run(WorkflowThread.java:52)
The error says that config.json is missing, but it isn't. So I'm stuck. Unless it's looking somewhere other than in the bundle I specified, but the bundle definitely has the config.json file.
This error:
...
java.lang.IllegalStateException: Invalid bundle: missing file config.json
at com.datastax.oss.driver.internal.core.config.cloud.CloudConfigFactory.createCloudConfig(CloudConfigFactory.java:114)
...
indicates that the Java driver bundled with DSBulk is unable to connect to your Astra DB because it couldn't get the configuration details from the secure connect bundle.
Please make sure that the valid secure bundle ZIP is accessible to DSBulk. You need to provide the path to the ZIP file, not just the directory. For example:
$ dsbulk ... -b /path/to/secure-connect-db.zip ...
Please check the path in your command then try again. Cheers!
👉 Please support the Apache Cassandra community by hovering over the cassandra tag above and click on Watch tag. 🙏 Thanks!
In order for you to leverage DataStax Bulk Loader (aka DSBulk, in short), you would need to pass in the secure connect bundle (SCB) correctly. What I mean when I say is that you need either the fully qualified path or the relative path to the SCB file.
The correct command in your case would look like:
./dsbulk load -url 'D:\\App\\data.csv' -k data -t data -b 'D:\\App\\secure-connect-myapp.zip' -u username -p password
Note that -b option takes in the full SCB filename along with .zip file extension.
Other Resources:
Load data using DSBulk into DataStax Astra DB
-b command-line option reference
BONUS TIP: One could easily configure everything within a configuration file and leverage that. See documentation for additional details.

Can I execute presto CLI without specifying --server or --catalog

I would like to know where, if it is possible, I can configure default catalog and server values to use when executing the presto CLI.
Presto CLI info:
ls -lthr /opt/presto-server-0.169/presto
/opt/presto-server-0.169/presto -> presto-cli-0.169-executable.jar
And instead of executing:
/opt/presto-server-0.169/presto --server localhost:6666 --schema abc --catalog catalog-1
I would like to execute:
/opt/presto-server-0.169/presto
with it picking up localhost:6666 as my server and catalog-1 as my catalog. I would like to specify the schema once I make the connection.
Any help will be appreciated!
Thanks.
There is no such option to set host in console lazily. The server needs to be defined upfront by default localhost:8080 is used.
If you cannot pass proper arguments to the presto-cli and cannot use the default server host, you can change default values in presto-cli source code and compile your version.
You need to checkout project at github.
Change default values in ClientOptions.
Package jar for presto cli: cd presto-cli && mvn package
You can find a jar in target/presto-cli-0.201-SNAPSHOT.jar
For schema/catalog, you can define it in the console itself with USE command. The syntax as follows: USE [<catalog>.]<schema>.
Please note that with each version of presto you need also compile and maintain your own version of presto-cli, which might become a burden quite soon.

Spark is not started automatically on the AWS cluster - how to launch it?

A spark cluster has been launched using the ec2/spark-ec2 script from within the branch-1.4 codebase. I have logged onto it.
I can login to it - and it reflects 1 master, 2 slaves:
11:35:10/sparkup2 $ec2/spark-ec2 -i ~/.ssh/hwspark14.pem login hwspark14
Searching for existing cluster hwspark14 in region us-east-1...
Found 1 master, 2 slaves.
Logging into master ec2-54-83-81-165.compute-1.amazonaws.com...
Warning: Permanently added 'ec2-54-83-81-165.compute-1.amazonaws.com,54.83.81.165' (RSA) to the list of known hosts.
Last login: Tue Jun 23 20:44:05 2015 from c-73-222-32-165.hsd1.ca.comcast.net
__| __|_ )
_| ( / Amazon Linux AMI
___|\___|___|
https://aws.amazon.com/amazon-linux-ami/2013.03-release-notes/
Amazon Linux version 2015.03 is available.
But .. where are they?? The only java processes running are:
Hadoop: NameNode and SecondaryNode
Tachyon: Master and Worker
It is a surprise to me that the Spark Master and Workers are not started. When looking for the processes to start them manually it is not at all obvious where they are located.
Hints on
why spark did not start automatically
and
where the launch scripts live
would be appreciated. (In the meantime i will do an exhaustive
find / -name start-all.sh
And .. survey says:
root#ip-10-151-25-94 etc]$ find / -name start-all.sh
/root/persistent-hdfs/bin/start-all.sh
/root/ephemeral-hdfs/bin/start-all.sh
Which means to me that spark were not even installed??
Update I wonder: is this a bug in 1.4.0? I ran same set of commands in 1.3.1 and the spark cluster came up.
There was a bug in spark 1.4.0 provisioning script which is cloned from github repository by spark-ec2 (https://github.com/mesos/spark-ec2/) with similar symptoms - apache spark haven't started. The reason was - provisioning script failed to download spark archive.
Check was spark downloaded and uncompressed on the master host ls -altr /root/spark there should be several directories there. From your description looks like /root/spark/sbin/start-all.sh script is missing - which is missing there.
Also check the contents of the file cat /tmp/spark-ec2_spark.log it should has information about uncompressing step.
Another thing to try is to run spark-ec2 with other provisioning script branch by adding --spark-ec2-git-branch branch-1.4 into the spark-ec2 command line argument.
Also when you run spark-ec2 save all output and check is there something suspicious:
spark-ec2 <...args...> 2>&1 | tee start.log

Cassandra dead but pid file exists

I have novice to cassandra and tried my hands to install cassandra-2.1.2 on centos 7.0.
After complete installation execute cqlsh command and created few keyspace(s) and column family.
Which seems to me in first glance its working perfectly.
But later onwards i realized below issues:
1- when i execute "service cassandra status" command, i got below error:
Output:Cassandra dead but pid file exists.
I googled the above issue and found some links
http://www.datastax.com/support-forums/topic/dse-dead-but-pid-file-exists
https://baioradba.wordpress.com/2014/06/13/how-to-install-cassandra-on-centos-6-5/
and found that I had same configuration mentioned in above links but the same error still persists.
Please tell me the root cause and how to resolve it.
2- Second issue is in the cassandra.log file.
When I analysed the cassandra.log file there was an expection as :
Expecting URI in variable: [cassandra.config]. Please prefix the file with file:/// for local files or file://<server>/ for remote files. Aborting.
Below is the complete log:
12:01:40.816 [main] ERROR o.a.c.config.DatabaseDescriptor - Fatal configuration error
org.apache.cassandra.exceptions.ConfigurationException: Expecting URI in variable: [cassandra.config]. Please prefix the file with file:/// for local files or file://<server>/ for remote files. Aborting.
at org.apache.cassandra.config.YamlConfigurationLoader.getStorageConfigURL(YamlConfigurationLoader.java:73) ~[apache-cassandra-2.1.3.jar:2.1.3]
at org.apache.cassandra.config.YamlConfigurationLoader.loadConfig(YamlConfigurationLoader.java:84) ~[apache-cassandra-2.1.3.jar:2.1.3]
at org.apache.cassandra.config.DatabaseDescriptor.loadConfig(DatabaseDescriptor.java:158) ~[apache-cassandra-2.1.3.jar:2.1.3]
at org.apache.cassandra.config.DatabaseDescriptor.<clinit>(DatabaseDescriptor.java:133) ~[apache-cassandra-2.1.3.jar:2.1.3]
at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:110) [apache-cassandra-2.1.3.jar:2.1.3]
at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:465) [apache-cassandra-2.1.3.jar:2.1.3]
at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:554) [apache-cassandra-2.1.3.jar:2.1.3]
Expecting URI in variable: [cassandra.config]. Please prefix the file with file:/// for local files or file://<server>/ for remote files. Aborting.
Fatal configuration error; unable to start. See log for stacktrace.
I again searched the same issue in google and but the links were not that useful as they contained the java class code for cassandra.config .
Again please tell the root cause and how to resolve it?
Thanks in advance.
rm /var/run/cassandra.pid
Run ps -ef | grep cassandra
Kill the pid of the cassandra process.
Start cassandra
fix this issue, Edit the cassandra-env.sh:
sudo vi /etc/cassandra/conf/cassandra-env.sh
increase heap size for cassandra .. this should resolve your issue
Check if you have enough memory to start cassandra service with this command:
cat /proc/meminfo
I was running Hortonworks VM with Virtualbox, and I had a lot of Hadoop components started which needed a lot of memory, so for me the solution was to stop unnecessary Hadoop components and add some extra memory to the virtual machine.
From https://github.com/apache/cassandra/blob/cassandra-2.1/examples/client_only/README.txt#L43-L49 :
cassandra.yaml can be on the classpath as is done here, can be
specified (by modifying the script) in a location within the classpath
like this: java -Xmx1G
-Dcassandra.config=/path/in/classpath/to/cassandra.yaml ... or can be retrieved from a location outside the classpath like this: ...
-Dcassandra.config=file:///path/to/cassandra.yaml ... or ... -Dcassandra.config=http://awesomesauce.com/cassandra.yaml ...
So you probably had a misconfigured startup option.
Remove the pid file. Try
rm /var/run/cassandra.pid

Add storm in PATH

Here the steps i should do it
1- Download a Storm release , unpack it, and put the unpacked bin/
directory on your PATH
2- To be able to start and stop topologies on a remote cluster
, put the cluster information in ~/.storm/storm.yaml
i downloaded storm release and setting up it
i want to do that
"put the unpacked bin/ directory on your PATH"
cause i can't use storm as a command
and the second step what's the cluster info should i do in storm.yaml ?
Let us say your storm is unpacked at /home/your/download/storm
define that as an env variable by
STORM_HOME="/home/your/download/storm"
export this variable by
export STORM_HOME
do not forget to import the path
export PATH=$PATH:$STORM_HOME/bin
then you can use storm command in it.
Making it more simple with a single export command.
I downloaded apache storm from here and Lets say My download path is /home/username/Downloads/apache-storm-1.0.5
Change your actual path in this command and hit this at terminal.
export PATH=$PATH:/home/username/Downloads/apache-storm-1.0.5/bin
all done.

Resources