Unable to start geomesa-accumulo - accumulo

hduser#Neha-PC:/usr/local/geomesa-tutorials$ java -cp geomesa-tutorials-accumulo/geomesa-tutorials-accumulo-quickstart/target/geomesa-tutorials-accumulo-quickstart-2.3.0-SNAPSHOT.jar org.geomesa.example.accumulo.AccumuloQuickStart --accumulo.instance.id accumulo --accumulo.zookeepers localhost:2184 --accumulo.user root --accumulo.password PASS1234 --accumulo.catalog table1
Picked up JAVA_TOOL_OPTIONS: -Dgeomesa.hbase.coprocessor.path=hdfs://localhost:8020/hbase/lib/geomesa-hbase-distributed-runtime_2.11-2.2.0.jar
Loading datastore
java.lang.IncompatibleClassChangeError: Method org.locationtech.geomesa.security.AuthorizationsProvider.apply(Ljava/util/Map;Ljava/util/List;)Lorg/locationtech/geomesa/security/AuthorizationsProvider; must be InterfaceMethodref constant
at org.locationtech.geomesa.accumulo.data.AccumuloDataStoreFactory$.buildAuthsProvider(AccumuloDataStoreFactory.scala:234)
at org.locationtech.geomesa.accumulo.data.AccumuloDataStoreFactory$.buildConfig(AccumuloDataStoreFactory.scala:162)
at org.locationtech.geomesa.accumulo.data.AccumuloDataStoreFactory.createDataStore(AccumuloDataStoreFactory.scala:48)
at org.locationtech.geomesa.accumulo.data.AccumuloDataStoreFactory.createDataStore(AccumuloDataStoreFactory.scala:36)
at org.geotools.data.DataAccessFinder.getDataStore(DataAccessFinder.java:121)
at org.geotools.data.DataStoreFinder.getDataStore(DataStoreFinder.java:71)
at org.geomesa.example.quickstart.GeoMesaQuickStart.createDataStore(GeoMesaQuickStart.java:103)
at org.geomesa.example.quickstart.GeoMesaQuickStart.run(GeoMesaQuickStart.java:77)
at org.geomesa.example.accumulo.AccumuloQuickStart.main(AccumuloQuickStart.java:25)

You need to ensure that all versions of GeoMesa on the classpath are the same. Just from your command, it seems you are at least mixing 2.3.0-SNAPSHOT with 2.2.0. Try checking out the git tag for tutorial project that corresponds to the GeoMesa version you want, as described here. If you want to use a SNAPSHOT version, you need to make sure that you have pulled the latest changes for each project.

Related

How can I install flashtext on every executor?

I am using the flashtext library in a couple of UDFs. It works when I run it locally in Client mode, but once I try to run it in the Cloudera Workbench with several executors, I get an ModuleNotFoundError.
After some research I found that it is possible to add archives (and packages?) to a SparkSession when creating it, so I tried:
SparkSession.builder.config('spark.archives', 'flashtext-2.7-pyh9f0a1d_0.tar.gz')
but it didn't help, the same error remains.
According to Spark Configuration doc, there are other configs I could try, e.g. spark.submit.pyFiles, but I don't understand how these py-files to be added would have to look like.
Would it be enough to just create a pyton script with this content?
from flashtext import KeywordProcessor
Could you tell me the easiest way how I can install flashtext on every node?
Edit:
In the meantime, I figured that not only Flashtext was causing issues, but also every relative import from other scripts that I intended to use in a UDF. In order to fix it, I followed this article. I also took the source code from Flashtext and imported it to the main file without installing the actual library.
I think in order to point Spark executors to python modules extracted from your archive, you will need to add another config setting, that adds their location to PYTHONPATH. Something like this:
SparkSession.builder \
.config('spark.archives', 'flashtext-2.7-pyh9f0a1d_0.tar.gz#myUDFs') \
.config('spark.executorEnv.PYTHONPATH', './myUDFs')
Citing from the same link you have in the question:
spark.executorEnv.[EnvironmentVariableName]...Add the environment
variable specified by EnvironmentVariableName to the Executor process.
The user can specify multiple of these to set multiple environment
variables.
There are no environment details in your question (or I'm simply not familiar with Cloudera Workbench) but if you're trying to run Spark on YARN, you may need to use slightly different setting spark.yarn.dist.archives.
Also, please make sure that your driver log contains message confirming that an archive was actually uploaded, as in:
:
22/11/08 INFO yarn.Client: Uploading resource file:/absolute/path/to/your/archive.zip -> hdfs://nameservice/user/<your-user-id>/.sparkStaging/<application-id>/archive.zip
:

AWS RDS Postgres PostGIS upgrade problems

I have an RDS instance running Postgres 11.16. I'm trying to upgrade it to 12.11 but it's giving me errors on PostGIS. If I try a "modify" I get the following error in the precheck log file:
Upgrade could not be run on Sun Sep 18 06:05:13 2022
The instance could not be upgraded from 11.16.R1 to 12.11.R1 because of following reasons. Please take appropriate action on databases that have usages incompatible with requested major engine version upgrade and try again.
Following usages in database 'XXXXX' need to be corrected before upgrade:
-- The instance could not be upgraded because there are one or more databases with an older version of PostGIS extension or its dependent extensions (address_standardizer, address_standardizer_data_us, postgis_tiger_geocoder, postgis_topology, postgis_raster) installed. Please upgrade all installations of PostGIS and drop its dependent extensions and try again.
----------------------- END OF LOG ----------------------
First, I tried just removing postgis to upgrade then add it back again. I used: drop extension postgis cascade;. However, this generated the same error
Second, I tried running SELECT postgis_extensions_upgrade();. However, it gives me the following error:
NOTICE: Updating extension postgis_raster from unpackaged to 3.1.5
ERROR: function st_convexhull(raster) does not exist
CONTEXT: SQL statement "ALTER EXTENSION postgis_raster UPDATE TO "3.1.5";"
PL/pgSQL function postgis_extensions_upgrade() line 82 at EXECUTE
SQL state: 42883
Third, I tried to do a manual snapshot and upgrade the snapshot. Same results.
One additional piece of information, I ran SELECT PostGIS_Full_Version(); and this is what it returns:
"POSTGIS=""3.1.5 c60e4e3"" [EXTENSION] PGSQL=""110"" GEOS=""3.7.3-CAPI-1.11.3 b50468f"" PROJ=""Rel. 5.2.0, September 15th, 2018"" GDAL=""GDAL 2.3.1, released 2018/06/22"" LIBXML=""2.9.1"" LIBJSON=""0.12.1"" LIBPROTOBUF=""1.3.0"" WAGYU=""0.5.0 (Internal)"" TOPOLOGY RASTER (raster lib from ""2.4.5 r16765"" need upgrade) (raster procs from ""2.4.4 r16526"" need upgrade)"
As you'll notice, the raster lib is old but I can't really figure out how to upgrade it. I think this is what is causing me problems but I don't know how to overcome it.
I appreciate any thoughts.
I ended up finally giving up on this after many failed attempts. I ended up solving this by:
Spinning up a new instance on the desired postgres version
Using pg_dump on the old version (schema and data)
Using pg_restore on the new version
I'm not sure if I did something wrong with the above but I found my sequences were out of sync on a number of tables. I wrote some scripts to reset the sequence values after doing this. I had to use something like this to re-sync those sequences:
SELECT setval('the_sequence', (SELECT MAX(the_primary_key) FROM the_table)+1);
I wasted enough time and this got me past the issue. Hopefully the next upgrade doesn't give me this much trouble.

Hikari NoSuchMethodError on AWS EMR/Spark

I am trying to upgrade EMR from 5.13to 5.35 using spark-2.4.8. The jar I'm trying to use has a dependency on HikariCP:4.0.3 which is called to set the db pool-config setKeepaliveTime. While I can run my job fine on my local machine, it bombs out in EMR-5.35 with the following error:
java.lang.NoSuchMethodError: com.zaxxer.hikari.HikariConfig.setKeepaliveTime(J)
The problem is, in runtime, the HikariConfig is being loaded from file:/usr/lib/spark/jars/HikariCP-java7-2.4.12.jar instead of what was provided as a dependency in my custom/fat jar. The workaround right now is to remove that jar, but is there an elegant way to know where that jar is coming from just on the EMR and how could we remove that on start-up?
Just in case, anyone else faces this, the fix was shading (Process that allows renaming packages on the Uber jar), I basically had to make sure the dependency I use doesn't get overridden with the one that's stale in EMR-5.35.0. It looked something like the below:
assembly / assemblyShadeRules := Seq(
ShadeRule
.rename("com.zaxxer.hikari.**" -> "x_hikari_conf.#1")
.inLibrary("x" % "y" % z)
.inProject
)
And that was pretty much it, after the above lines were put in and the new jar was created, it worked like a charm.
More on shading can be read here

Which config file to use for each GG example

Which spring-????-config.xml I should use to star GG nodes so the .net example GridClientApiExample works?
Each GridGain example provides a short description of how to run remote nodes in the example documentation.
Usually there are two ways to run remote nodes for the example. The first and, probably, the most convenient one is to run corresponding *NodeStartup class from IDE in the examples project. The name of startup class is specified in example documentation. The second way is to start a node with ggstart.{sh|bat} script with a configuration file specified in the documentation (if available).
GridClientApiExample works only with node started from IDE with ClientExampleNodeStartup, and there is a reason for it. The example expects a specific task class (org.gridgain.examples.misc.client.api.ClientExampleTask) to be in the node's classpath. Since this is an example-only class, it is not present in node classpath when running ggstart.{sh|bat}.
If for some reason you want to run a node with command line script for this example, you should build examples jar file and drop it to $GRIDGAIN_HOME/libs/ext (startup script will automatically pick up all additional libraries placed in this folder). Then you can use the same config which ClientExampleNodeStartup uses, namely examples/config/example-compute.xml
You can use ClientExampleNodeStartup or start node with ggstart.sh examples/config/example-compute.xml

When to use SPARK_CLASSPATH or SparkContext.addJar

I'm using a standalone spark cluster, one master and 2 workers.
I really don't understand how to use wisely SPARK_CLASSPATH or SparkContext.addJar. I tried both and It looks like addJar doesn't work as I used to believe.
In my case I tried to use some joda-time function, in the closures or outside. If I set SPARK_CLASSPATH with a path to the joda-time jar, everything works ok. But if I remove SPARK_CLASSPATH and add in my program:
JavaSparkContext sc = new JavaSparkContext("spark://localhost:7077", "name", "path-to-spark-home", "path-to-the-job-jar");
sc.addJar("path-to-joda-jar");
It doesn't work anymore, although in logs I can see:
14/03/17 15:32:57 INFO SparkContext: Added JAR /home/hduser/projects/joda-time-2.1.jar at http://127.0.0.1:46388/jars/joda-time-2.1.jar with timestamp 1395066777041
and immediatly after:
Caused by: java.lang.NoClassDefFoundError: org/joda/time/DateTime
at com.xxx.sparkjava1.SimpleApp.main(SimpleApp.java:57)
... 6 more
Caused by: java.lang.ClassNotFoundException: org.joda.time.DateTime
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
I used to suppose that SPARK_CLASSPATH was setting the classpath for the driver part of the job, and SparkContext.addJar was setting the classpath for the executors, but It does not seem right anymore.
Anyone knows better than me?
SparkContext.addJar is broken in 0.9 as well as ADD_JARS environment variable. It used to work as documented in 0.8.x and the fix is already commited to master, so it's expected in the next release. For now you can either use workaround described in Jira or make patched Spark build.
See relevant mailing list discussion: http://mail-archives.apache.org/mod_mbox/spark-user/201402.mbox/%3C5234E529519F4320A322B80FBCF5BDA6#gmail.com%3E
Jira issue: https://spark-project.atlassian.net/plugins/servlet/mobile#issue/SPARK-1089
SPARK_CLASSPATH is deprecated since Spark 1.0+. You can add jars to the classpath programatically, inside file spark-defaults.conf or with spark-submit flags.
Add jars to a Spark Job - spark-submit

Resources