SQL Parser Visitor + Metabase + Presto - presto

I'm facing what seems to be a quite easy problem, but I'm not able to put my head around the problem to find a suitable solution.
Problem:
I need to append the schema into my SQL statement, in a "weird"(with schema in double quotes) way.
FROM "SCHEMA".tableB tableB
LEFT JOIN "SCHEMA".tableC tableC
Context
Basically, we are hosting and exposing a Metabase tool that will connect and perform query on our Hive database using Presto SQL.
Metabase allow the customer to write SQL statements and some customers, they just don't type the schema on statements. Today we are throwing and error for those queries, but I could easily retrieve the schema value from the Authorization header, since in our multi-tenant product the schema is the tenant id where this user is logged, and with that information in hands, I could append to the customer SQL statement and avoid the error.
Imagine that the customer typed the follow statement:
SELECT tableA.*
, (tableA.valorfaturado + tableA.valorcortado) valorpedido
FROM (SELECT from_unixtime(tableB.datacorte / 1000) datacorte
, COALESCE((tableB.quantidadecortada * tableC.preco), 0) valorcortado
, COALESCE((tableB.quantidade * tableC.preco), 0) valorfaturado
, tableB.quantidadecortada
FROM tableB tableB
LEFT JOIN tableC tableC
ON tableC.numeropedido = tableB.numeropedido
AND tableC.codigoproduto = tableB.codigoproduto
AND tableC.codigofilial = tableB.codigofilial
LEFT JOIN tableD tableD
ON tableD.numero = tableB.numeropedido
WHERE (CASE
WHEN COALESCE(tableB.codigofilial, '') = '' THEN
tableD.codigofilial
ELSE
tableB.codigofilial
END) = '10'
AND from_unixtime(tableB.datacorte / 1000) BETWEEN from_iso8601_timestamp('2020-07-01T03:00:00.000Z') AND from_iso8601_timestamp('2020-08-01T02:59:59.999Z')) tableA
ORDER BY datacorte
I should convert this into (adding the "SCHEMA"):
SELECT tableA.*
, (tableA.valorfaturado + tableA.valorcortado) valorpedido
FROM (SELECT from_unixtime(tableB.datacorte / 1000) datacorte
, COALESCE((tableB.quantidadecortada * tableC.preco), 0) valorcortado
, COALESCE((tableB.quantidade * tableC.preco), 0) valorfaturado
, tableB.quantidadecortada
FROM "SCHEMA".tableB tableB
LEFT JOIN "SCHEMA".tableC tableC
ON tableC.numeropedido = tableB.numeropedido
AND tableC.codigoproduto = tableB.codigoproduto
AND tableC.codigofilial = tableB.codigofilial
LEFT JOIN "SCHEMA".tableD tableD
ON tableD.numero = tableB.numeropedido
WHERE (CASE
WHEN COALESCE(tableB.codigofilial, '') = '' THEN
tableD.codigofilial
ELSE
tableB.codigofilial
END) = '10'
AND from_unixtime(tableB.datacorte / 1000) BETWEEN from_iso8601_timestamp('2020-07-01T03:00:00.000Z') AND from_iso8601_timestamp('2020-08-01T02:59:59.999Z')) tableA
ORDER BY datacorte
Still trying to find a solution that uses only presto-parser and Visitor + Instrumentation solution.
Also, I know about JSQLParser and I tried, but I alway come back to try to find a "plain" solution scared that JSQLParser will not be able to support all the Presto/Hive queries, that are a little bit different than standard SQL;
I create a little project on GitHub with test case to validate..
https://github.com/genyherrera/prestosqlerror
But for those that don't want to clone a repository, here are the classes and dependencies:
import java.util.Optional;
import com.facebook.presto.sql.SqlFormatter;
import com.facebook.presto.sql.parser.ParsingOptions;
import com.facebook.presto.sql.parser.SqlParser;
public class SchemaAwareQueryAdapter {
// Inspired from
// https://github.com/prestodb/presto/tree/master/presto-parser/src/test/java/com/facebook/presto/sql/parser
private static final SqlParser SQL_PARSER = new SqlParser();
public String rewriteSql(String sqlStatement, String schemaId) {
com.facebook.presto.sql.tree.Statement statement = SQL_PARSER.createStatement(sqlStatement, ParsingOptions.builder().build());
SchemaAwareQueryVisitor visitor = new SchemaAwareQueryVisitor(schemaId);
statement.accept(visitor, null);
return SqlFormatter.formatSql(statement, Optional.empty());
}
}
public class SchemaAwareQueryVisitor extends DefaultTraversalVisitor<Void, Void> {
private String schemaId;
public SchemaAwareQueryVisitor(String schemaId) {
super();
this.schemaId = schemaId;
}
/**
* The customer can type:
* [table name]
* [schema].[table name]
* [catalog].[schema].[table name]
*/
#Override
protected Void visitTable(Table node, Void context) {
List<String> parts = node.getName().getParts();
// [table name] -> is the only one we need to modify, so let's check by parts.size() ==1
if (parts.size() == 1) {
try {
Field privateStringField = Table.class.getDeclaredField("name");
privateStringField.setAccessible(true);
QualifiedName qualifiedName = QualifiedName.of("\""+schemaId+"\"",node.getName().getParts().get(0));
privateStringField.set(node, qualifiedName);
} catch (NoSuchFieldException | SecurityException | IllegalArgumentException | IllegalAccessException e) {
throw new SecurityException("Unable to execute query");
}
}
return null;
}
}
import static org.testng.Assert.assertEquals;
import org.gherrera.prestosqlparser.SchemaAwareQueryAdapter;
import org.testng.annotations.Test;
public class SchemaAwareTest {
private static final String schemaId = "SCHEMA";
private SchemaAwareQueryAdapter adapter = new SchemaAwareQueryAdapter();
#Test
public void testAppendSchemaA() {
String sql = "select * from tableA";
String bound = adapter.rewriteSql(sql, schemaId);
assertEqualsFormattingStripped(bound,
"select * from \"SCHEMA\".tableA");
}
private void assertEqualsFormattingStripped(String sql1, String sql2) {
assertEquals(sql1.replace("\n", " ").toLowerCase().replace("\r", " ").replaceAll(" +", " ").trim(),
sql2.replace("\n", " ").toLowerCase().replace("\r", " ").replaceAll(" +", " ").trim());
}
}
<dependencies>
<dependency>
<groupId>com.facebook.presto</groupId>
<artifactId>presto-parser</artifactId>
<version>0.229</version>
</dependency>
<dependency>
<groupId>org.testng</groupId>
<artifactId>testng</artifactId>
<version>6.10</version>
<scope>test</scope>
</dependency>
</dependencies>
PS: I was able to add the schema without the doubles quotes, but them I got into identifiers must not start with a digit; surround the identifier with double quotes error. Basically this error comes from SqlParser$PostProcessor.exitDigitIdentifier(...) method..
Thanks

I was able to find a solution for my case, either way will share on Presto Slack my finding to see if that is expected behavior.
So, if you want to append with double quote your schema, you will need to create your own Vistor class and you'll need to override the method visitTable and when you Qualify the name of your table with schema, (here's the tick), pass the schema as UPPERCASE, so it will not match the regex pattern on class SqlFormatter on method formatName and it will add the double-quote..
public class SchemaAwareQueryVisitor extends DefaultTraversalVisitor<Void, Void> {
private String schemaId;
public SchemaAwareQueryVisitor(String schemaId) {
super();
this.schemaId = schemaId;
}
#Override
protected Void visitTable(Table node, Void context) {
try {
Field privateStringField = Table.class.getDeclaredField("name");
privateStringField.setAccessible(true);
QualifiedName qualifiedName = QualifiedName.of(schemaId, node.getName().getParts().get(0));
privateStringField.set(node, qualifiedName);
} catch (NoSuchFieldException
| SecurityException
| IllegalArgumentException
| IllegalAccessException e) {
throw new SecurityException("Unable to execute query");
}
return null;
}
}

Related

is it possible to connect to java jOOQ DB?

I discovered a new interesting service and I'm trying to understand how it works. Please explain how to connect to my jOOQ database from another program?
MockDataProvider provider = new MyProvider();
MockConnection connection = new MockConnection(provider);
DSLContext create = DSL.using(connection, SQLDialect.H2);
Field<Integer> id = field(name("BOOK", "ID"), SQLDataType.INTEGER);
Field<String> book = field(name("BOOK", "NAME"), SQLDataType.VARCHAR);
So, I create but can I connect to it?
Here I have added your code, Lukas.
try (Statement s = connection.createStatement();
ResultSet rs = s.executeQuery("SELECT ...")
) {
while (rs.next())
System.out.println(rs.getString(1));
}
This example was found here
https://www.jooq.org/doc/3.7/manual/tools/jdbc-mocking/
public class MyProvider implements MockDataProvider {
#Override
public MockResult[] execute(MockExecuteContext ctx) throws SQLException {
// You might need a DSLContext to create org.jooq.Result and org.jooq.Record objects
//DSLContext create = DSL.using(SQLDialect.ORACLE);
DSLContext create = DSL.using(SQLDialect.H2);
MockResult[] mock = new MockResult[1];
// The execute context contains SQL string(s), bind values, and other meta-data
String sql = ctx.sql();
// Dynamic field creation
Field<Integer> id = field(name("AUTHOR", "ID"), SQLDataType.INTEGER);
Field<String> lastName = field(name("AUTHOR", "LAST_NAME"), SQLDataType.VARCHAR);
// Exceptions are propagated through the JDBC and jOOQ APIs
if (sql.toUpperCase().startsWith("DROP")) {
throw new SQLException("Statement not supported: " + sql);
}
// You decide, whether any given statement returns results, and how many
else if (sql.toUpperCase().startsWith("SELECT")) {
// Always return one record
Result<Record2<Integer, String>> result = create.newResult(id, lastName);
result.add(create
.newRecord(id, lastName)
.values(1, "Orwell"));
mock[0] = new MockResult(1, result);
}
// You can detect batch statements easily
else if (ctx.batch()) {
// [...]
}
return mock;
}
}
I'm not sure what lines 3-5 of your example are supposed to do, but if you implement your MockDataProvider and put that into a MockConnection, you just use that like any other JDBC connection, e.g.
try (Statement s = connection.createStatement();
ResultSet rs = s.executeQuery("SELECT ...")
) {
while (rs.next())
System.out.println(rs.getString(1));
}

jooq: Add interval to timestamp postgres

I'm trying to bump a timestamptz value further in to the future by a number of interval seconds. Is there a way to massage these types so the jooq will allow me to do so in one statement, or do I just need to get the TriggerRecord and do the calculation in Java code?
Code and attempt follows:
public final TableField<TriggerRecord, Instant> PAUSED_UNTIL = createField(DSL.name("paused_until"), SQLDataType.TIMESTAMPWITHTIMEZONE(6), this, "", new OffsetDateTimeInstantConverter());
public class OffsetDateTimeInstantConverter implements Converter<OffsetDateTime, Instant> {
private static Instant min;
public OffsetDateTimeInstantConverter() {
}
public Instant from(OffsetDateTime databaseObject) {
return databaseObject == null ? null : databaseObject.toInstant();
}
public OffsetDateTime to(Instant userObject) {
if (userObject == null) {
return null;
} else {
return userObject.isBefore(min) ? OffsetDateTime.MIN : userObject.atOffset(ZoneOffset.UTC);
}
}
public Class<OffsetDateTime> fromType() {
return OffsetDateTime.class;
}
public Class<Instant> toType() {
return Instant.class;
}
static {
min = OffsetDateTime.MIN.toInstant();
}
In one case it errors out
final Long ps = 360;
query = using(configuration)
.update(TRIGGER)
.set(TRIGGER.ACTIVE, active)
.set(TRIGGER.PAUSED_UNTIL,
TRIGGER.PAUSED_UNTIL.add(ps))
.returning()
.fetch();
ERROR: operator does not exist: timestamp with time zone + timestamp with time zone
And in another attempt errors as
final var query = using(configuration)
.update(TRIGGER)
.set(TRIGGER.ACTIVE, active)
.set(TRIGGER.PAUSED_UNTIL,
TRIGGER.PAUSED_UNTIL
.add(val(DayToSecond.valueOf(Duration.ofSeconds(ps)))))
org.jooq.exception.DataTypeException: Cannot convert from +0 00:06:00.000000000 (class org.jooq.types.DayToSecond) to class java.time.OffsetDateTime
update trigger set "paused_until" = ("alert"."trigger"."paused_until" + cast(? as timestamp(6) with time zone))
This looks like bug #12036, which has been fixed in jOOQ 3.17.0, 3.16.4, and 3.15.8. The workaround is to use plain SQL templating for this particular expression.
DSL.field("{0} + {1}",
TRIGGER.PAUSED_UNTIL.getDataType(),
TRIGGER.PAUSED_UNTIL, ps
);

How the prevent Azure table injection?

Is there a general way to prevent azure storage injection.
If the query contains a user entered string for example his name. Then it is possible to do some injection like: jan + ' or PartitionKey eq 'kees.
This will and up getting an object jan and an object with the partitionKey kees.
One option is URLEncoding. In this case ' and " are encoded. And the above injection is not possible anymore.
Is this the best option or are there better ones?
Per my experience, I realize that there is two general ways to prevent azure storage table injection.
The one is replace the string ' with the other string such as ; , " or URLEncode string of '. This is your option.
The other is storage table key using an encoding format(such as Base64) instread of plain content.
This is my test Java program as follows:
import org.apache.commons.codec.binary.Base64;
import com.microsoft.azure.storage.CloudStorageAccount;
import com.microsoft.azure.storage.table.CloudTable;
import com.microsoft.azure.storage.table.CloudTableClient;
import com.microsoft.azure.storage.table.TableOperation;
import com.microsoft.azure.storage.table.TableQuery;
import com.microsoft.azure.storage.table.TableQuery.QueryComparisons;
public class TableInjectTest {
private static final String storageConnectString = "DefaultEndpointsProtocol=http;" + "AccountName=<ACCOUNT_NAME>;"
+ "AccountKey=<ACCOUNT_KEY>";
public static void reproduce(String query) {
try {
CloudStorageAccount storageAccount = CloudStorageAccount.parse(storageConnectString);
CloudTableClient tableClient = storageAccount.createCloudTableClient();
// Create table if not exist.
String tableName = "people";
CloudTable cloudTable = new CloudTable(tableName, tableClient);
final String PARTITION_KEY = "PartitionKey";
String partitionFilter = TableQuery.generateFilterCondition(PARTITION_KEY, QueryComparisons.EQUAL, query);
System.out.println(partitionFilter);
TableQuery<CustomerEntity> rangeQuery = TableQuery.from(CustomerEntity.class).where(partitionFilter);
for (CustomerEntity entity : cloudTable.execute(rangeQuery)) {
System.out.println(entity.getPartitionKey() + " " + entity.getRowKey() + "\t" + entity.getEmail() + "\t"
+ entity.getPhoneNumber());
}
} catch (Exception e) {
e.printStackTrace();
}
}
/*
* The one way is replace ' with other symbol string
*/
public static String preventByReplace(String query, String symbol) {
return query.replaceAll("'", symbol);
}
public static void addEntityByBase64PartitionKey() {
try {
CloudStorageAccount storageAccount = CloudStorageAccount.parse(storageConnectString);
CloudTableClient tableClient = storageAccount.createCloudTableClient();
// Create table if not exist.
String tableName = "people";
CloudTable cloudTable = new CloudTable(tableName, tableClient);
String partitionKey = Base64.encodeBase64String("Smith".getBytes());
CustomerEntity customer = new CustomerEntity(partitionKey, "Will");
customer.setEmail("will-smith#contoso.com");
customer.setPhoneNumber("400800600");
TableOperation insertCustomer = TableOperation.insertOrReplace(customer);
cloudTable.execute(insertCustomer);
} catch (Exception e) {
e.printStackTrace();
}
}
// The other way is store PartitionKey using encoding format such as Base64
public static String preventByEncodeBase64(String query) {
return Base64.encodeBase64String(query.getBytes());
}
public static void main(String[] args) {
String queryNormal = "Smith";
reproduce(queryNormal);
/*
* Output as follows:
* PartitionKey eq 'Smith'
* Smith Ben Ben#contoso.com 425-555-0102
* Smith Denise Denise#contoso.com 425-555-0103
* Smith Jeff Jeff#contoso.com 425-555-0105
*/
String queryInjection = "Smith' or PartitionKey lt 'Z";
reproduce(queryInjection);
/*
* Output as follows:
* PartitionKey eq 'Smith' or PartitionKey lt 'Z'
* Webber Peter Peter#contoso.com 425-555-0101 <= This is my information
* Smith Ben Ben#contoso.com 425-555-0102
* Smith Denise Denise#contoso.com 425-555-0103
* Smith Jeff Jeff#contoso.com 425-555-0105
*/
reproduce(preventByReplace(queryNormal, "\"")); // The result same as queryNormal
reproduce(preventByReplace(queryInjection, "\"")); // None result, because the query string is """PartitionKey eq 'Smith" or PartitionKey lt "Z'"""
reproduce(preventByReplace(queryNormal, "&")); // The result same as queryNormal
reproduce(preventByReplace(queryInjection, "&")); // None result, because the query string is """PartitionKey eq 'Smith& or PartitionKey lt &Z'"""
/*
* The second prevent way
*/
addEntityByBase64PartitionKey(); // Will Smith
reproduce(preventByEncodeBase64(queryNormal));
/*
* Output as follows:
* PartitionKey eq 'U21pdGg='
* U21pdGg= Will will-smith#contoso.com 400800600 <= The Base64 string can be decoded to "Smith"
*/
reproduce(preventByEncodeBase64(queryInjection)); //None result
/*
* Output as follows:
* PartitionKey eq 'U21pdGgnIG9yIFBhcnRpdGlvbktleSBsdCAnWg=='
*/
}
}
I think that the best option is choose a suitable way to prevent query injection on the basis of application sence.
Any concerns, please feel free to let me know.

Is it possible to do paging with JoinSqlBuilder?

I have a pretty normal join that I create via JoinSqlBuilder
var joinSqlBuilder = new JoinSqlBuilder<ProductWithManufacturer, Product>()
.Join<Product, Manufacturer>(sourceColumn: p => p.ManufacturerId,
destinationColumn: mf => mf.Id,
sourceTableColumnSelection: p => new { ProductId = p.Id, ProductName = p.Name },
destinationTableColumnSelection: m => new { ManufacturerId = m.Id, ManufacturerName = m.Name })
Of course, the join created by this could potentially return a lot of rows, so I want to use paging - preferably on the server-side. However, I cannot find anything in the JoinSqlBuilder which would let me do this? Am I missing something or does JoinSqlBuilder not have support for this (yet)?
If you aren't using MS SQL Server I think the following will work.
var sql = joinSqlBuilder.ToSql();
var data = this.Select<ProductWithManufacturer>(
q => q.Select(sql)
.Limit(skip,rows)
);
If you are working with MS SQL Server, it will most likely blow up on you. I am working to merge a more elegant solution similar to this into JoinSqlBuilder. The following is a quick and dirty method to accomplish what you want.
I created the following extension class:
public static class Extension
{
private static string ToSqlWithPaging<TResult, TTarget>(
this JoinSqlBuilder<TResult, TTarget> bldr,
string orderColumnName,
int limit,
int skip)
{
var sql = bldr.ToSql();
return string.Format(#"
SELECT * FROM (
SELECT ROW_NUMBER() OVER (ORDER BY [{0}]) As RowNum, *
FROM (
{1}
)as InnerResult
)as RowConstrainedResult
WHERE RowNum > {2} AND RowNum <= {3}
", orderColumnName, sql, skip, skip + limit);
}
public static string ToSqlWithPaging<TResult, TTarget>(
this JoinSqlBuilder<TResult, TTarget> bldr,
Expression<Func<TResult, object>> orderSelector,
int limit,
int skip)
{
var member = orderSelector.Body as MemberExpression;
if (member == null)
throw new ArgumentException(
"TResult selector refers to a non member."
);
var propInfo = member.Member as PropertyInfo;
if (propInfo == null)
throw new ArgumentException(
"TResult selector refers to a field, it must be a property."
);
var orderSelectorName = propInfo.Name;
return ToSqlWithPaging(bldr, orderSelectorName, limit, skip);
}
}
It is applied as follows:
List<Entity> GetAllEntities(int limit, int skip)
{
var bldr = GetJoinSqlBuilderFor<Entity>();
var sql = bldr.ToSqlWithPaging(
entity => entity.Id,
limit,
skip);
return this.Db.Select<Entity>(sql);
}

ServiceStack.OrmLite Select<> throws npgsql syntax error when using WITH CTE

From the error I thought this was an issue with Npgsql (see closed issue), however the error is with OrmLite Select<> as it's changing the executed sql.
Question:
Other than not using the WITH CTE is there another way around this error in OrmLite?
Is db.Select<> the wrong command to be using?
Note: WITH CTE works with OrmLite.Scalar
Postgres WITH CTE: http://www.postgresql.org/docs/current/static/queries-with.html
UPDATE: Issue seems to be with OrmLite preparing the SQL statement and it not starting with "SELECT" causes OrmLite to treat the SQL as a "WHERE" param.
[Test]
public void with_cte_ormlite_obj()
{
using (var db = DbConnection)
{
var sql = "WITH w_cnt AS (SELECT 5 AS cnt, 'me' AS name) SELECT cnt, name FROM w_cnt";
// An exception of type 'Npgsql.NpgsqlException' occurred in Npgsql.dll
// ERROR: 42601: syntax error at or near "WITH w_cnt"
// Actual Exec Sql:
// SELECT "cnt", "name" FROM "my_with_cte_obj" WHERE WITH w_cnt AS (SELECT 5 AS cnt, 'me' AS name) SELECT cnt, name FROM w_cnt
var cnt = db.Select<MyWithCteObj>(sql);
var first = cnt.First();
Assert.AreEqual(5, first.Cnt);
Assert.AreEqual("me", first.Name);
}
}
public class MyWithCteObj
{
public int Cnt { get; set; }
public string Name { get; set; }
}
The db.Select<T>() API should only by used for SQL SELECT statements.
The db.SqlList<T>() API should be used for non-SELECT queries, e.g:
using (var db = DbConnection)
{
var cnt = db.SqlList<MyWithCteObj>(
"WITH w_cnt AS (SELECT 5 AS cnt, 'me' AS name) SELECT cnt, name FROM w_cnt");
}
See the docs for more custom SQL APIs examples.

Resources