PL/Python3 - Concatenate PLyResults

PL/Python3 - Concatenate PLyResults - python-3.x

I'm wondering if it's possible to concatenate PLyResults somehow inside a function. For example, let's say that firstly I have a function _get_data that, given a tuple (id, index) returns a table of values:
CREATE OR REPLACE FUNCTION _get_data(id bigint, index bigint):
RETURNS TABLE(oid bigint, id bigint, val double precision) AS
$BODY$
#...process for fetching the data, irrelevant for the question.
return recs
$BODY$
LANGUAGE plpython3u;
Now I would like to be able to create a generic function defined as such, that fetches data between two boundaries for a given ID, and uses the previous function to fetch data individually and then aggregate the results somehow:
CREATE OR REPLACE FUNCTION get_data(id bigint, lbound bigint, ubound bigint)
RETURNS TABLE(oid bigint, id bigint, val double precision) AS
$BODY$
concatenated_recs = [] #<-- For the sake of argument.
plan = plpy.prepare("SELECT oid, id, val FROM _get_data($1, $2);", ['bigint', 'bigint'])
for i in range(lbound, ubound+1):
recs = plpy.execute(plan, [id, i]) # <-- Records fetched individually
concatenated_recs += [recs] #<-- Not sure how to concatenate them...
return concatenated_recs
$BODY$
LANGUAGE plpython3u;

Perhaps I am missing something, but the answer you gave looks like a slower, more complicated version of this query:
SELECT oid, id, val
FROM generate_series(your_lower_bound, your_upper_bound) AS g(i),
_get_data(your_id, i);
You could put that in a simple SQL function with no loops or temporary tables:
CREATE OR REPLACE FUNCTION get_data(id bigint, lbound bigint, ubound bigint)
RETURNS TABLE(oid bigint, id bigint, val double precision) AS
$BODY$
SELECT oid, id, val
FROM generate_series(lbound, ubound) AS g(i),
_get_data(id, i);
$BODY$ LANGUAGE SQL;

Although I wasn't able to find a way to concatenate the results from the PL/Python documentation, and as of 06-2019, I'm not sure if the language supports this resource, I could solve it by creating a temp table, inserting the records in it for each iteration and then returning the full table:
CREATE OR REPLACE FUNCTION get_data(id bigint, lbound bigint, ubound bigint)
RETURNS TABLE(oid bigint, id bigint, val double precision) AS
$BODY$
#Creates the temp table
plpy.execute("""CREATE TEMP TABLE temp_results(oid bigint, id bigint, val double precision)
ON COMMIT DROP""")
plan = plpy.prepare("INSERT INTO temp_results SELECT oid, id, val FROM _get_data($1, $2);",
['bigint', 'bigint'])
#Inserts the results in the temp table
for i in range(lbound, ubound+1):
plpy.execute(plan, [id, i])
#Returns the whole table
recs = plpy.execute("SELECT * FROM temp_results")
return recs
$BODY$
LANGUAGE plpython3u;

Related

Can I use regular expression in PARTITION BY?

(
ResponseRgBasketId STRING,
RawStandardisedLoadDateTime TIMESTAMP,
InfoMartLoadDateTime TIMESTAMP,
Operaame STRING,
RequestTimestamp TIMESTAMP,
RequestSiteId STRING,
RequestSalePointId STRING,
RequestdTypeId STRING,
RequeetValue DECIMAL(10,2),
ResponsegTimestamp TIMESTAMP,
RequessageId STRING,
RequestBasketId STRING,
ResponsesageId STRING,
RequestTransmitAttempt INT,
ResponseCode STRING,
RequestasketItems INT,
ResponseFinancialTimestamp TIMESTAMP,
RequeketJsonString STRING,
LoyaltyId STRING
)
USING DELTA
PARTITIONED BY (RequestTimestamp)
TBLPROPERTIES
(
delta.deletedFileRetentionDuration = "interval 1 seconds",
delta.autoOptimize.optimizeWrite = true
)
It has been partitioned by RequestTimestamp(2020-12-12T07:39:35.000+0000
), but it has the format as below. Could I change the format to different format to something like 2020-12-34 in partition by?

Short answer: No regexp or other transformation is possible in PARTITIONED BY.
The only solution is to apply substr(timestamp, 1, 10) during/before load.
See also this answer: https://stackoverflow.com/a/64171676/2700344
Long answer:
No regexp is possible in PARTITIONED BY. No functions are allowed in table DDL, only type can be specified. Type in column specification works as constraint and at the same time can cause implicit type conversion. For example if you are loading strings into dates, it will be casted implicitly if possible and loaded into null default partition if not possible to cast. Also if you are loading BIGINT, it will be silently truncated to INT, as a result you will see corrupted data and duplicates.
Does the same implicit cast work with partitioned by? Let,s see:
DROP TABLE IF EXISTS test_partition;
CREATE TABLE IF NOT EXISTS test_partition (Id int)
partitioned by (dt date) --Hope timestamp will be truncated to DATE
;
set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;
insert overwrite table test_partition partition(dt)
select 1 as id, current_timestamp as dt;
show partitions test_partition;
Result (We expect timestamp truncated to DATE...):
dt=2021-03-24 10%3A19%3A19.985
No, it does not work. Tested the same with varchar(10) column with strings like yours.
See short answer.

How to use the date range in PostgreSQL stored function?

I am new to SQL. I am using PostgreSQL. I have created the stored function called report:
CREATE OR REPLACE FUNCTION report(sensors VARCHAR,fromdate DATE,todate DATE)
RETURNS TABLE (
sensor VARCHAR,
id INT,
value INT,
created_date DATE
)
AS $$
DECLARE
var_r record;
BEGIN
FOR var_r IN(SELECT
*
FROM probe_data
WHERE probe_data.sensor =sensors AND probe_data.created_date >= fromdate AND probe_data.created_date <= todate )
LOOP
sensor := var_r.sensor ;
id := var_r.id;
created_date := var_r.created_date;
value:= var_r.value;
RETURN NEXT;
END LOOP;
END;
I want to get the data's between two date ranges.When I execute in SQL tool, it returns my desired output. But when I call the function
let sql= `select * from public.report($1,$2,$3);`
let values=[req.body.sensors,req.body.fromdates,req.body.todates];
from my nodejs program, I am getting the data's out of range.
Here is my screen shot:
This is the structure of my return table
As you can see,In my second screen shot, I'm getting the data out of range. I don't where I am doing wrong.

Your parameter is declared as DATE but you are trying to pass a timestamp. So Postgres will cast the passed value to a date and you lose the time information. You need to define the parameter as timestamp if you want it to be treated as one. The return type of created_at should probably be a timestamp as well.
You can also get rid of the CURSOR loop and PL/pgSQL as well:
CREATE OR REPLACE FUNCTION report(sensors VARCHAR, fromdate timestamp, todate timestamp)
RETURNS TABLE (sensor VARCHAR, id INT, value INT, created_date timestamp)
AS $$
select pb.sensor, pb.id, pb.value, pg.created_date
FROM probe_data
WHERE pb.sensor = sensors
AND pb.created_date >= fromdate
AND pb.created_date <= todate;
$$
language sql;

How can I optimize dijsktra path finding query for speed

Through postgresql, postgis, pgrouting and nodejs I am working on a project which basically finds a path between shops.
There are three tables in my database
1.CREATE TABLE public."edges" (id int, name varchar(100), highway varchar(100), oneway varchar(100), surface varchar(100), the_geom geometry, source int, target int);
2.CREATE TABLE public."edges_noded" (id bigint, old_id int, sub_id int, source bigint, target bigint, the_geom geometry, name varchar(100), type varchar(100), distance double precision);
3.CREATE TABLE public."edges_noded_vertices_pgr" (id bigint, cnt int, chk int, ein int, eout int, the_geom geometry); –
And the query by which I am finding path
client.query( "WITH dijkstra AS (SELECT * FROM pgr_dijkstra('SELECT id,source,target,distance AS cost FROM
edges_noded',"+source+","+target+",FALSE)) SELECT seq, CASE WHEN
dijkstra.node = edges_noded.source THEN
ST_AsGeoJSON(edges_noded.the_geom) ELSE
ST_AsGeoJSON(ST_Reverse(edges_noded.the_geom)) END AS
route_geom_x,CASE WHEN dijkstra.node = edges_noded.source THEN
ST_AsGeoJSON(edges_noded.the_geom) ELSE
ST_AsGeoJSON(ST_Reverse(edges_noded.the_geom)) END AS route_geom_y
FROM dijkstra JOIN edges_noded ON(edge = id) ORDER BY
seq",(err,res)=>{ })
This query works for me but taking too much time for example, If I want to find a path between 30 shops then it is taking almost 25 to 30 sec which is too much.
After searching about this problem I found this link
https://gis.stackexchange.com/questions/16886/how-can-i-optimize-pgrouting-for-speed/16888
In this link Délawenis is saying that use a st_buffer so it doesn't get all ways, but just the "nearby" ways:
So I tried to apply st_buffer in above query but not got any success.
If someone has any idea plz help me with this problem.
If this approach is wrong please also tell me the right way.

how to accepts list columns as Cassandra UDF parameter

I created one table 
CREATE TABLE human (chromosome text, position bigint,
hg01583 frozen<set<text>>,
hg03006 frozen<set<text>>,
PRIMARY KEY (chromosome, position)
)
and i created function 
CREATE FUNCTION process(sample list<frozen<set<text>>>)
CALLED ON NULL INPUT
RETURNS text
LANGUAGE java
AS
$$
return leftsample==null?null:leftsample.getClass().toString()+" "+leftsample.toString();
$$;
when i issie CQL query
SELECT chromosome,position,hg01583, hg03006, process([hg01583,hg03006]) from human;
i got this error
SyntaxException: line 1:80 no viable alternative at input ',' ([[hg01583],..
how can i pass hg01583 ,hg03006 as list to process function?

With each as own argument like: SELECT chromosome, position, hg01583, hg03006, process(hg01583, hg03006) from human;
CREATE FUNCTION process(hg01583 frozen<set<text>>, hg03006 frozen<set<text>>)
CALLED ON NULL INPUT
RETURNS text
LANGUAGE java AS
$$
return hg01583==null? null : ...
$$;
If you want them to be dynamic, instead of creating fixed columns for each one make it a wide row and use a UDA to aggregate them with an accumulator function. like:
CREATE TABLE human (chromosome text, position bigint,
sample text,
value frozen<set<text>>
PRIMARY KEY (chromosome, position, sample)
)

Cassandra QueryBuilder not returning any result, whereas same query works fine in CQL shell

SELECT count(*) FROM device_stats
WHERE orgid = 'XYZ'
AND regionid = 'NY'
AND campusid = 'C1'
AND buildingid = 'C1'
AND floorid = '2'
AND year = 2017;
The above CQL query returns correct result - 32032, in CQL Shell
But when I run the same query using QueryBuilder Java API , I see the count as 0
BuiltStatement summaryQuery = QueryBuilder.select()
.countAll()
.from("device_stats")
.where(eq("orgid", "XYZ"))
.and(eq("regionid", "NY"))
.and(eq("campusid", "C1"))
.and(eq("buildingid", "C1"))
.and(eq("floorid", "2"))
.and(eq("year", "2017"));
try {
ResultSetFuture tagSummaryResults = session.executeAsync(tagSummaryQuery);
tagSummaryResults.getUninterruptibly().all().stream().forEach(result -> {
System.out.println(" totalCount > "+result.getLong(0));
});
I have only 20 partitions and 32032 rows per partition.
What could be the reason QueryBuilder not executing the query correctly ?
Schema :
CREATE TABLE device_stats (
orgid text,
regionid text,
campusid text,
buildingid text,
floorid text,
year int,
endofwindow timestamp,
categoryid timeuuid,
devicestats map<text,bigint>,
PRIMARY KEY ((orgid, regionid, campusid, buildingid, floorid,year),endofwindow,categoryid)
) WITH CLUSTERING ORDER BY (endofwindow DESC,categoryid ASC);
// Using the keys function to index the map keys
CREATE INDEX ON device_stats (keys(devicestats));
I am using cassandra 3.10 and com.datastax.cassandra:cassandra-driver-core:3.1.4

Moving my comment to an answer since that seems to solve the original problem:
Changing .and(eq("year", "2017")) to .and(eq("year", 2017)) solves the issue since year is an int and not a text.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

PL/Python3 - Concatenate PLyResults - python-3.x

Related

Can I use regular expression in PARTITION BY?

How to use the date range in PostgreSQL stored function?

How can I optimize dijsktra path finding query for speed

how to accepts list columns as Cassandra UDF parameter

Cassandra QueryBuilder not returning any result, whereas same query works fine in CQL shell

Categories

Resources