variadic function and parametised safe queries - node.js

Using node-pg and Postgres 11.
I have a variadic function in postgres
CALL schema.function(('1'),('2'))
In order to prevent sql injection I need to do something like
await client.query('CALL schema.function($1::smallint);', XXX);
Where XXX is what the call uses for substitutions.
The problem is that my function expects a list of records, not an array ('1'),('2').
Has anyone else encountered this?
Stored procedure:
CREATE OR REPLACE PROCEDURE update_events_variadic(
VARIADIC _events_array event_array[]
)
LANGUAGE plpgsql AS
$$
DECLARE
EVENT_RECORD RECORD;
BEGIN
FOR EVENT_RECORD IN
SELECT
metadata,
payload
FROM unnest(_events_array) as t(
metadata,
payload
)
LOOP
INSERT INTO events
(
metadata,
payload
)
VALUES
(
EVENT_RECORD.metadata,
EVENT_RECORD.payload
);
END LOOP;
END;
$$;

If that's the only type of call you expect, variadic is just syntax noise. You can redefine your procedure to accept a regular array of type that fits into table events, that you're trying to insert into. Also, you don't have to loop over a select from unnest() - there's a FORACH loop that lets you iterate over the array directly.
CREATE OR REPLACE PROCEDURE update_events ( events_array events[] )
LANGUAGE plpgsql AS $$
DECLARE
event_record events;
BEGIN
FOREACH event_record IN ARRAY events_array LOOP
INSERT INTO events
( metadata,
payload
)
VALUES
( event_record.metadata,
event_record.payload
);
END LOOP;
END; $$;
You can also insert directly from a select without any loop, gaining some performance from that and from switching to plain language SQL that doesn't have the PLpgSQL overhead:
CREATE OR REPLACE PROCEDURE update_events ( events_array events[] )
LANGUAGE SQL AS $$
INSERT INTO events
( metadata,
payload
)
select
event_record.metadata,
event_record.payload
from unnest(events_array)
as event_record
( metadata,
payload
)
$$;
You can either cast each element in the array, or cast the whole array at once:
await client.query('CALL schema.function( ARRAY[$1]::events[] );', XXX);
Demo. If you happen to initially get a malformed record literal error after these changes, it means that some or all of what you're supplying as XXX isn't a valid set of values that can make up events record. Another demo, showing some examples.
I'm naively assuming your events table has only those two columns - if it has more, you'll have to define an intermediate type or table and use that as your array type in procedure definition and call argument cast. Example
In this case variadic would be useful if you wanted to be able to use both types of calls below:
call update_events( --not an array in a strict sense, just a bunch of arguments
('metadata1','payload1')::events,
('metadata2','payload2')::events,
('metadata3','payload3')::events);
The above isn't passing an array in a strict sense, just a bunch of arguments that the function will internally collect and make available as a single, actual array.
call update_events( VARIADIC
ARRAY[('metadata1','payload1'),
('metadata2','payload2'),
('metadata3','payload3')
]::events[] );
VARIADIC informs the function that nothing has to be collected into an array as the argument list is already provided as one.

Related

Alias for RETURNING result in jooq

How can I denote the following PostgreSQL syntax with jooq?
WITH main AS
(DELETE FROM maintable WHERE id = 1 RETURNING name)
INSERT INTO subtable (name) VALUES (main.name)
jooq's as function expects Select type as its argument, but returning function returns DeleteResultStep type?
It seems not yet supported, discussed in https://github.com/jOOQ/jOOQ/issues/4474

exec StoredProcedure passing Array of Integers?

I have a stored procedure which has .. WHERE something IN ? .....
I can't find any documentation on how to call this procedure using "exec"
I tried all combinations
exec bestThumbs ([[324622 ,321235]]);
Invalid parameter count for procedure: bestThumbs (expected: 1, received: 2)
exec bestThumbs [324622,321235,3454345];
Invalid parameter count for procedure: bestThumbs (expected: 1, received: 3)
exec bestThumbs [[324622 ,321235, 3454345]];
Invalid parameter count for procedure: bestThumbs (expected: 1, received: 3)
Furthermore , trying to do the same in PHP via JSON interface:
$a = array([163195,163199,163196]);
$params = json_encode($a);
$params = urlencode($params);
$querystring = "Procedure=$proc&Parameters=$params";
returns: VOLTDB ERROR: PROCEDURE bestThumbs TYPE ERROR FOR PARAMETER 0: org.voltdb.VoltTypeException: tryScalarMakeCompatible: Unable to match parameter array:int to provided long
What is the proper way of doing this ?
Thanks !
VoltDB's sqlcmd interface and PHP client library do not support array parameters. Some of the other client libraries do.
If you are using a java procedure, you can format an array of numbers as a string and split the string and parse the values within the procedure, then build an int[] or long[] to pass into voltQueueSQL() when calling the SQLStmt.
However, if your procedure's only input was intended to be an array of integers, keep in mind that a concatenated String parameter as I suggested would not allow the procedure to be partitioned. Even if you were using a client library such as python or java that supports array parameters, the procedure cannot be partitioned on a parameter which is an array. This would mean it must be a multi-partition procedure which runs in all of the partitions. It would be more scalable to have a procedure that takes a single parameter value, if you then partitioned the procedure so it runs in only one partition based on that input value. If the client has an array of values to evaluate, you could iterate through the array and make separate calls to the procedure, each one being executed on only one partition.
I work at VoltDB.

How can I define multiple input file patterns in USQL?

I have U-SQL script where I need to process some data. The data is stored in blob, with ~100 files per day in this folder structure: /{year}/{month}/{day}/{hour}/filenames.tsv
Getting one day of data is easy, just put a wildcard in the end and it will pick out all the files for all the hours for the day.
However, in my script I want to read out the current day and the last 2 hours of the previous day. The naive way is with 3 extract statements in this way:
DECLARE #input1 = #"/data/2017/10/08/22/{*}.tsv";
DECLARE #input2 = #"/data/2017/10/08/23/{*}.tsv";
DECLARE #input3 = #"/data/2017/10/09/{*}.tsv";
#x1 = EXTRACT .... FROM #input1 USING Extractors.Tsv();
#x2 = EXTRACT .... FROM #input2 USING Extractors.Tsv();
#x3 = EXTRACT .... FROM #input3 USING Extractors.Tsv();
But in my case each extract line is very long and complicated (~50 columns) using the AvroExtractor, so I would really prefer to only specify the columns and extractor once instead of 3 times. Also, by having 3 inputs its not possible from the caller side to decide how many hours from the previous days that should be read.
My question is how can I define this in a convenient way, ideally using only one extract statement?
You could wrap your logic up into a U-SQL stored procedure so it is encapsulated. Then you need only make a few calls to the proc. A simple example:
CREATE PROCEDURE IF NOT EXISTS main.getContent(#inputPath string, #outputPath string)
AS
BEGIN;
#output =
EXTRACT
...
FROM #inputPath
USING Extractors.Tsv();
OUTPUT #output
TO #outputPath
USING Outputters.Tsv();
END;
Then to call it (untested):
main.getContent (
#"/data/2017/10/08/22/{*}.tsv",
#"/output/output1.tsv"
)
main.getContent (
#"/data/2017/10/08/23/{*}.tsv",
#"/output/output2.tsv"
)
main.getContent (
#"/data/2017/10/09/{*}.tsv",
#"/output/output3.tsv"
)
That might be one way to go about it?

String concatenation within an exception

In my trigger procedures I use RAISE EXCEPTION for messages. I have no problem with simple messages, but if I want to give the user some more complex feedback, I face a problem: the concatenation operator doesn't work within RAISE EXCEPTION statement.
First, I tried this:
CREATE OR REPLACE FUNCTION hlidej_datum_kon() RETURNS trigger AS $$
DECLARE
od date;
BEGIN
SELECT a.datum_od FROM akce AS a WHERE a.kod_akce = (
SELECT b.kod_akce FROM sj AS b WHERE b.kod_sj = NEW.kod_sj
) INTO od;
IF NEW.datum < od THEN
RAISE EXCEPTION 'Kontext nemohl být odkryt před začátkem akce ('||TO_CHAR(od)||')!'
ROLLBACK;
END IF;
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
Didn't work. So I tried to put the whole text to a text variable, but I didn't find how to put the variable's contents to the exception statement so that it would be printed as a message.
My question is: how to print a message containing variables in a PostgreSQL trigger function?
Just for sake of completeness, here is my trigger:
CREATE TRIGGER hlidej_datum_kon
AFTER INSERT OR UPDATE ON kontext
FOR EACH ROW
EXECUTE PROCEDURE hlidej_datum_kon();
END;
You not need to use concat. You can use wildcards instead:
RAISE EXCEPTION 'Kontext nemohl být odkryt před začátkem akce (%)!', od;
There are two bugs
first parameter of RAISE statement is format string - this string should be constant. It can contains a substitution symbols '%' and values for these symbols are places as others parameters of RAISE statement.
There should not be used ROLLBACK statement. RAISE EXCEPTION throws exceptions and ROLLBACK statement is newer executed. You cannot control transaction explicitly in PL/pgSQL - so you cannot use ROLLBACK or COMMIT statement in plpgsql ever.You can use a exception trapping
BEGIN
RAISE EXCEPTION 'blabla';
EXCEPTION WHEN some_exception_identif_see_list_of_exception_in_doc THEN
.. do some or do nothing
END;

Replace empty strings with null values

I am rolling up a huge table by counts into a new table, where I want to change all the empty strings to NULL, and typecast some columns as well. I read through some of the posts and I could not find a query, which would let me do it across all the columns in a single query, without using multiple statements.
Let me know if it is possible for me to iterate across all columns and replace cells with empty strings with null.
Ref: How to convert empty spaces into null values, using SQL Server?
To my knowledge there is no built-in function to replace empty strings across all columns of a table. You can write a plpgsql function to take care of that.
The following function replaces empty strings in all basic character-type columns of a given table with NULL. You can then cast to integer if the remaining strings are valid number literals.
CREATE OR REPLACE FUNCTION f_empty_text_to_null(_tbl regclass, OUT updated_rows int)
LANGUAGE plpgsql AS
$func$
DECLARE
_typ CONSTANT regtype[] := '{text, bpchar, varchar}'; -- ARRAY of all basic character types
_sql text;
BEGIN
SELECT INTO _sql -- build SQL command
'UPDATE ' || _tbl
|| E'\nSET ' || string_agg(format('%1$s = NULLIF(%1$s, '''')', col), E'\n ,')
|| E'\nWHERE ' || string_agg(col || ' = ''''', ' OR ')
FROM (
SELECT quote_ident(attname) AS col
FROM pg_attribute
WHERE attrelid = _tbl -- valid, visible, legal table name
AND attnum >= 1 -- exclude tableoid & friends
AND NOT attisdropped -- exclude dropped columns
AND NOT attnotnull -- exclude columns defined NOT NULL!
AND atttypid = ANY(_typ) -- only character types
ORDER BY attnum
) sub;
-- RAISE NOTICE '%', _sql; -- test?
-- Execute
IF _sql IS NULL THEN
updated_rows := 0; -- nothing to update
ELSE
EXECUTE _sql;
GET DIAGNOSTICS updated_rows = ROW_COUNT; -- Report number of affected rows
END IF;
END
$func$;
Call:
SELECT f_empty2null('mytable');
SELECT f_empty2null('myschema.mytable');
To also get the column name updated_rows:
SELECT * FROM f_empty2null('mytable');
db<>fiddle here
Old sqlfiddle
Major points
Table name has to be valid and visible and the calling user must have all necessary privileges. If any of these conditions are not met, the function will do nothing - i.e. nothing can be destroyed, either. I cast to the object identifier type regclass to make sure of it.
The table name can be supplied as is ('mytable'), then the search_path decides. Or schema-qualified to pick a certain schema ('myschema.mytable').
Query the system catalog to get all (character-type) columns of the table. The provided function uses these basic character types: text, bpchar, varchar, "char". Only relevant columns are processed.
Use quote_ident() or format() to sanitize column names and safeguard against SQLi.
The updated version uses the basic SQL aggregate function string_agg() to build the command string without looping, which is simpler and faster. And more elegant. :)
Has to use dynamic SQL with EXECUTE.
The updated version excludes columns defined NOT NULL and only updates each row once in a single statement, which is much faster for tables with multiple character-type columns.
Should work with any modern version of PostgreSQL. Tested with Postgres 9.1, 9.3, 9.5 and 13.

Resources