Syntax error for time travel of delta sql - apache-spark

I ran the example in delta doc:
SELECT * FROM delta.`/delta/events` VERSION AS OF 1
But got the following error:
mismatched input ‘AS’ expecting {<EOF>, ‘;’}(line 3, pos 44)
Does anyone know what is the correct syntax ?
Spark version: 3.1.2
Delta version: 1.0.0
Configure spark as follows:
spark.sql.extensions io.delta.sql.DeltaSparkSessionExtension
spark.sql.catalog.spark_catalog org.apache.spark.sql.delta.catalog.DeltaCatalog

This syntax is not supported in the open source version right now as it requires changes in Spark to support that syntax (the required changes are committed already). Specifically, this is a bug in the documentation that was copied from the Databricks Delta documentation. The issue with documentation will be fixed in the next major release - it was already reported.

Related

AWS RDS Postgres PostGIS upgrade problems

I have an RDS instance running Postgres 11.16. I'm trying to upgrade it to 12.11 but it's giving me errors on PostGIS. If I try a "modify" I get the following error in the precheck log file:
Upgrade could not be run on Sun Sep 18 06:05:13 2022
The instance could not be upgraded from 11.16.R1 to 12.11.R1 because of following reasons. Please take appropriate action on databases that have usages incompatible with requested major engine version upgrade and try again.
Following usages in database 'XXXXX' need to be corrected before upgrade:
-- The instance could not be upgraded because there are one or more databases with an older version of PostGIS extension or its dependent extensions (address_standardizer, address_standardizer_data_us, postgis_tiger_geocoder, postgis_topology, postgis_raster) installed. Please upgrade all installations of PostGIS and drop its dependent extensions and try again.
----------------------- END OF LOG ----------------------
First, I tried just removing postgis to upgrade then add it back again. I used: drop extension postgis cascade;. However, this generated the same error
Second, I tried running SELECT postgis_extensions_upgrade();. However, it gives me the following error:
NOTICE: Updating extension postgis_raster from unpackaged to 3.1.5
ERROR: function st_convexhull(raster) does not exist
CONTEXT: SQL statement "ALTER EXTENSION postgis_raster UPDATE TO "3.1.5";"
PL/pgSQL function postgis_extensions_upgrade() line 82 at EXECUTE
SQL state: 42883
Third, I tried to do a manual snapshot and upgrade the snapshot. Same results.
One additional piece of information, I ran SELECT PostGIS_Full_Version(); and this is what it returns:
"POSTGIS=""3.1.5 c60e4e3"" [EXTENSION] PGSQL=""110"" GEOS=""3.7.3-CAPI-1.11.3 b50468f"" PROJ=""Rel. 5.2.0, September 15th, 2018"" GDAL=""GDAL 2.3.1, released 2018/06/22"" LIBXML=""2.9.1"" LIBJSON=""0.12.1"" LIBPROTOBUF=""1.3.0"" WAGYU=""0.5.0 (Internal)"" TOPOLOGY RASTER (raster lib from ""2.4.5 r16765"" need upgrade) (raster procs from ""2.4.4 r16526"" need upgrade)"
As you'll notice, the raster lib is old but I can't really figure out how to upgrade it. I think this is what is causing me problems but I don't know how to overcome it.
I appreciate any thoughts.
I ended up finally giving up on this after many failed attempts. I ended up solving this by:
Spinning up a new instance on the desired postgres version
Using pg_dump on the old version (schema and data)
Using pg_restore on the new version
I'm not sure if I did something wrong with the above but I found my sequences were out of sync on a number of tables. I wrote some scripts to reset the sequence values after doing this. I had to use something like this to re-sync those sequences:
SELECT setval('the_sequence', (SELECT MAX(the_primary_key) FROM the_table)+1);
I wasted enough time and this got me past the issue. Hopefully the next upgrade doesn't give me this much trouble.

Failure to execute cql script using cqlsh with nested SOURCE on DSE 5.1.2

I have a script I used to execute queries and DDL in DSE 4.8.10.
The script include nested use of the SOURCE command.
E.g.
1.sql
USE test;
SOURCE '2.sql'
exit;
2.sql
SELECT count(1) FROM user;
SOURCE '3.sql';
3.sql
SELECT count(1) FROM user;
When executing this script with DSE 4.8.10 it runs correctly and output
cqlsh –f 1.sql
count
--------
0
(1 rows)
count
--------
0
(1 rows)
Running the same script in DSE 5.1.2.
cqlsh –f 1.sql
count
-------
0
(1 rows)
Warnings :
Aggregation query used without partition key
2.sql:3:DSEShell instance has no attribute 'execution_profiles'
The actual issue is that the script in 3.sql is not executed.
I failed to find any useful information on the error
"DSEShell instance has no attribute 'execution_profiles'"
I failed to figure out what are execution_profiles although they are mentioned int the python docs here
Note: I am using python 2.7.7
Update
I did some additional investigations
With DSE-5.1.2 I switch the off to authenticator: AllowAllAuthenticator
authorizer: AllowAllAuthorizer, but I am still experiencing the issue
With DSE-5.1.1 it also happens
With DSE-5.0.9 it works
I failed to reproduce this in Apache Cassandra 3.11.0
Update 2: following support ticket we posted for Datastaxs we got a patch for this issue, I guess we should be expecting this to be fixed in the near feature.
As from DSE 5.1.4 this issue was resolved as part of DSP-14494.
See the 5.1.4 release notes
DSP-14494:Always define execution_profiles in cqlsh.py.
I tested this with 5.1.4 and the issue was resolved.

Warnings trying to read Spark 1.6.X Parquet into Spark 2.X

When attempting to load a spark 1.6.X parquet file into spark 2.X I am seeing many WARN level statements.
16/08/11 12:18:51 WARN CorruptStatistics: Ignoring statistics because created_by could not be parsed (see PARQUET-251): parquet-mr version 1.6.0
org.apache.parquet.VersionParser$VersionParseException: Could not parse created_by: parquet-mr version 1.6.0 using format: (.+) version ((.*) )?\(build ?(.*)\)
at org.apache.parquet.VersionParser.parse(VersionParser.java:112)
at org.apache.parquet.CorruptStatistics.shouldIgnoreStatistics(CorruptStatistics.java:60)
at org.apache.parquet.format.converter.ParquetMetadataConverter.fromParquetStatistics(ParquetMetadataConverter.java:263)
at org.apache.parquet.format.converter.ParquetMetadataConverter.fromParquetMetadata(ParquetMetadataConverter.java:567)
at org.apache.parquet.format.converter.ParquetMetadataConverter.readParquetMetadata(ParquetMetadataConverter.java:544)
at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:431)
at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:386)
at org.apache.spark.sql.execution.datasources.parquet.SpecificParquetRecordReaderBase.initialize(SpecificParquetRecordReaderBase.java:107)
at org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.initialize(VectorizedParquetRecordReader.java:109)
at org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$$anonfun$buildReader$1.apply(ParquetFileFormat.scala:369)
at org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$$anonfun$buildReader$1.apply(ParquetFileFormat.scala:343)
at [rest of stacktrace omitted]
I am running 2.1.0 release and there are multitudes of these warnings. Is there any way - short of changing logging level to ERROR - to suppress these?
It seems these were the result of a fix made - but the warnings may not yet be removed. Here are some details from that JIRA:
https://issues.apache.org/jira/browse/SPARK-17993
I have built the code from the PR and it indeed succeeds reading the
data. I have tried doing df.count() and now I'm swarmed with
warnings like this (they are just keep getting printed endlessly in
the terminal):
Setting the logging level to ERROR is a last ditch approach: it is swallowing messages we rely upon for standard monitoring. Has anyone found a workaround to this?
For the time being - i.e until/unless this spark/parquet bug were fixed - I will be adding the following to the log4j.properties:
log4j.logger.org.apache.parquet=ERROR
The location is:
when running against external spark server: $SPARK_HOME/conf/log4j.properties
when running locally inside Intellij (or other IDE): src/main/resources/log4j.properties

Cassandra no viable alternative at input Like

I am very new to Cassandra and am trying to use the new LIKE feature but keep getting the error
Line 1: no viable alternative at input 'LIKE'
I am using DataStax DevCenter and am following the examples on https://docs.datastax.com/en/cql/3.3/cql/cql_using/useSASIIndex.html .I am using Cassandra version 3.7.0 and CQL 3.4.2 and the Datastex version is 1.60 community . I have a table named zips with a text field called city that has 10,000 records and am simply using this CQL code
SELECT * FROM "MyTable".zips WHERE city LIKE 'M%';
Before that I added an index using
CREATE CUSTOM INDEX fn_prefix ON "MyTable".zips (city) USING 'org.apache.cassandra.index.sasi.SASIIndex';
I know that the index worked because it allowed me to do this query
SELECT * FROM "Exoler".zips WHERE city='Miami';
without using allow filter and it returns values. Any suggestions would be great as stated I am very new to this.
If you use Cassandra 3.9 and Datastax DevCenter version 1.5.0 or 1.6.0 it won't support LIKE (atleast on Windows). The result is only "no viable alternative at input 'LIKE'"
But it works fine if you use command prompt:
WINDOWS-Key
cmd
"%CASSANDRA_HOME%\bin\cqlsh"
It is just a bug in Datastax DevCenter I guess.

Is Cassandra Update IF statement where the condition is an inequality expression supported?

Is UPDATE IF [condition] where [condition] is an inequality, supported by Cassandra CQL?
I had a look over the language reference and it seems that CQL should not be able to support the inequality through its grammar. I used the language reference from here: https://cassandra.apache.org/doc/cql3/CQL.html#updateStmt
What is confusing is that using the C# driver or CQLSH the query is executed successfully, but when doing the same thing from Datasax DevCenter I get an error.
I'm using a query similar to the one below:
UPDATE product
SET edit_date = [new_value]
WHERE customer_id = '4' AND code = 'AMZ-ISMDB'
IF edit_date < [new_value]
The results are as follows:
Datasax DevCenter throws an error when trying to execute the script snippet. The error I get is complaining about the inequality in the UPDATE IF part of the script.
There is one syntax error in the current script (see details below).
Please fix it and try again.
Line 29: no viable alternative at input '<'
If I use CLINQ with the Datasax C# driver the query is generated as above and when executed the update is persisted if the new edit_date is after the existing edit_date which is the expected behaviour.
Using CQLSH the query runs successfully and the update is persisted
The issue is most likely that DevCenter has not been updated to support the latest syntax.
Inequality conditions were added in:
CASSANDRA-6839
Looks like they missed updating the docs when that was added. I open CASSANDRA-10752 to get that fixed.
This issue has been fixed in DevCenter version 1.6.0.

Resources