Implementing Multithreading in Oracle Procedures - multithreading

I am working on Oracle 10gR2.
And here is my problem -
I have a procedure, lets call it *proc_parent* (inside a package) which is supposed to call another procedure, lets call it *user_creation*. I have to call *user_creation* inside a loop, which is reading some columns from a table - and these column values are passed as parameters to the *user_creation* procedure.
The code is like this:
FOR i IN (SELECT community_id,
password,
username
FROM customer
WHERE community_id IS NOT NULL
AND created_by = 'SRC_GLOB'
)
LOOP
user_creation (i.community_id,i.password,i.username);
END LOOP;
COMMIT;
user_Creation procedure is invoking a web service for some business logic, and then based on the response updates a table.
I need to find a way by which I can use multi-threading here, so that I can run multiple instances of this procedure to speed up things. I know I can use *DBMS_SCHEDULER* and probably *DBMS_ALERT* but I am not able to figure out, how to use them inside a loop.
Can someone guide me in the right direction?
Thanks,
Ankur

what you can do is submit lots of jobs in the same time. See Example 28-2 Creating a Set of Lightweight Jobs in a Single Transaction
This fills a pl/sql table with all jobs you want to submit in one tx, all at the same time. As soon as they are submitted (enabled) they will start running, as many as the system can handle, or as many as are allowed by a resource manager plan.
The overhead that the Lightweight jobs have is very ... minimal/light.

I would like to close this question. DBMS_SCHEDULER as well as DBMS_JOB (though DBMS_SCHEDULER is preferred) can be used inside the loop to submit and execute the job.
For instance, here's a sample code, using DBMS_JOB which can be invoked inside a loop:
...
FOR i IN (SELECT community_id,
password,
username
FROM customer
WHERE community_id IS NOT NULL
AND created_by = 'SRC_GLOB'
)
LOOP
DBMS_JOB.SUBMIT(JOB => jobnum,
WHAT => 'BEGIN user_creation (i.community_id,i.password,i.username); END;'
COMMIT;
END LOOP;
Using a commit after SUBMIT will kick off the job (and hence the procedure) in parallel.

Related

Group together celery results

TL:DR
I want to lable results in the backend.
I have a flask/celery project and I'm new to celery.
A user sends in a batch of tasks for celery to work on.
Celery saves the results to a backend SQL database (table automatically created by Celery, named celery_taskmeta).
I want to let the user see the status of his batch, and request the results from the backend.
My problem is that all the results are in one table. What are my options to lable this batch, so the user can differentiate the batches?
My ideas:
Can I add a lable to each task, e.g. "Bob's batch no. 12" and then query celery_taskmeta for that?
Can I put each batch in named backend tables, so ask Celery to save results to a table named task_12?
Trying with groups
I've tried the following code to group the results
job_group = group(api_get.delay(url) for url in urllist)
But I don't see any way to identify the group in the backend/results DB
Trying with task name
In the backend I see an empty column header 'name' so I thought I could add an arbitrary string there:
#app.task(name="an amazing vegetable")
def api_get(url: str) -> tuple:
...
But then the celery worker throws an error when I run the task:
KeyError: 'an amazing vegetable'
[2020-12-18 12:07:22,713: ERROR/MainProcess] Received unregistered task of type 'an amazing vegetable'.
Probably the simplest solution is to use Group and use the Group Result to periodically poll for group state.
A1: As for the label question - yes, you can "label" your task by using the custom state feature.
A2: you can hack around to put each batch of tasks inside backend table, but I strongly advise not to mess with it. If you really want to go this route, make a separate database for this particular use.

Mocking Postgresql now() function for testing

I have the following stack
Node/Express backend
Postgresql 10 database
Mocha for testing
Sinon for mocking
I have written a bunch of end-to-end tests to test all my webservices. The problem is that some of them are time dependent (as in "give me the modified records of the last X seconds").
sinon is pretty good at mocking all the time/dated related stuff in Node, however I have a modified field in my Postgresql tables that is populated with a trigger:
CREATE FUNCTION update_modified_column()
RETURNS TRIGGER AS $$
BEGIN
NEW.modified = now();
RETURN NEW;
END;
$$ LANGUAGE 'plpgsql';
The problem of course is that sinon can't override that now() function.
Any idea on how I could solve this? The problem is not setting a specific date at the start of the test, but advancing time faster than real-time (in one of my tests I want to change some stuff in the database, advance the 'current time' with one day, change some more stuff in the database and do webservice calls to see the result).
I can figure out a few solutions myself, but they all involve changing the application code and making it less elegant. I don't think your application code should be impacted by the fact that you want to test it.
Here's an idea: Create your own mock_now() with mock_dates table:
create table mock_dates (
id serial PRIMARY KEY,
mock_date timestamptz not null
);
create or replace function mock_now()
returns timestamptz
as $$
declare
RET timestamptz;
begin
-- Delete first added date and assign it to RET
delete from mock_dates where id in (
select id from mock_dates order by id asc limit 1
)
returning mock_dates.mock_date into RET;
-- If no deletion happened just return the current timestamp
if RET is null then
return now();
end if;
-- Otherwise return the mocked date
return RET;
end;
$$
language plpgsql;
And insert some mocked dates
insert into mock_dates (mock_date) values ('2001-03-11 02:34:00'::timestamptz);
insert into mock_dates (mock_date) values ('2002-05-22 01:49:00'::timestamptz);
and use mock_now() instead of now(). It will return the timestamps inserted to the mock_dates table once (first in first out).
When the table is empty it will work like the default now().
Just ensure the mock_dates table is empty in production 😁
Or you could even define a different function for production which does not even try to read the mock_dates table.
I found this very neat gist which provides a fake NOW() function that lives in a separate schema. You load it into your test database and then modify the search path of each testing session to search override before pg_catalog. Two functions freeze_time and unfreeze_time are provided to enable and disable frozen time.
To be honest, DB internal stuff is always hard to test from application code. In my experience, the best thing to do is to just verify the record's state.
So, rather than testing specifically that the now function is called internally, write a test that creates a new record, then verify that record has created and modified fields set. At that point they should equal each other, and probably be within the last second or two, so that's all stuff you can write assertions around.
Then you write another test that changes some value in the record, and write assertions that the modified stamp is different from the created stamp, is more recent, and probably within the last second d or two.

Right way to delete and then reindex ES documents

I have a python3 script that attempts to reindex certain documents in an existing ElasticSearch index. I can't update the documents because I'm changing from an autogenerated id to an explicitly assigned id.
I'm currently attempting to do this by deleting existing documents using delete_by_query and then indexing once the delete is complete:
self.elasticsearch.delete_by_query(
index='%s_*' % base_index_name,
doc_type='type_a',
conflicts='proceed',
wait_for_completion=True,
refresh=True,
body={}
)
However, the index is massive, and so the delete can take several hours to finish. I'm currently getting a ReadTimeoutError, which is causing the script to crash:
WARNING:elasticsearch:Connection <Urllib3HttpConnection: X> has failed for 2 times in a row, putting on 120 second timeout.
WARNING:elasticsearch:POST X:9200/base_index_name_*/type_a/_delete_by_query?conflicts=proceed&wait_for_completion=true&refresh=true [status:N/A request:140.117s]
urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='X', port=9200): Read timed out. (read timeout=140)
Is my approach correct? If so, how can I make my script wait long enough for the delete_by_query to complete? There are 2 timeout parameters that can be passed to delete_by_query - search_timeout and timeout, but search_timeout defaults to no timeout (which is I think what I want), and timeout doesn't seem to do what I want. Is there some other parameter I can pass to delete_by_query to make it wait as long as it takes for the delete to finish? Or do I need to make my script wait some other way?
Or is there some better way to do this using the ElasticSearch API?
You should set wait_for_completion to False. In this case you'll get task details and will be able to track task progress using corresponding API: https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-delete-by-query.html#docs-delete-by-query-task-api
Just to explain more in the form of codebase explained by Random for the newbee in ES/python like me:
ES = Elasticsearch(['http://localhost:9200'])
query = {'query': {'match_all': dict()}}
task_id = ES.delete_by_query(index='index_name', doc_type='sample_doc', wait_for_completion=False, body=query, ignore=[400, 404])
response_task = ES.tasks.get(task_id) # check if the task is completed
isCompleted = response_task["completed"] # if complete key is true it means task is completed
One can write custom definition to check if the task is completed in some interval using while loop.
I have used python 3.x and ElasticSearch 6.x
You can use the 'request_timeout' global param. This will reset the Connections timeout settings, as mentioned here
For example -
es.delete_by_query(index=<index_name>, body=<query>,request_timeout=300)
Or set it at connection level, for example
es = Elasticsearch(**(get_es_connection_parms()),timeout=60)

In H2 database, the auto_increment field is incremented by 32?

I have this simple Table (just for test) :
create table table
(
key int not null primary key auto_increment,
name varchar(30)
);
Then I execute the following requests:
insert into table values ( null , 'one');// key=1
insert into table values ( null , 'two');// key=2
At this Stage all goes well, then I close The H2 Console and re-open it and re-execute this request :
insert into table values ( null , 'three');// key=33
Finally, here is all results:
I do not know how to solve this problem, if it is a real problem...
pending a response from the author...
The database uses a cache of 32 entries for sequences, and auto-increment is internally implemented a sequence. If the system crashes without closing the database, at most this many numbers are lost. This is similar to how sequences work in other databases. Sequence values are not guaranteed to be generated without gaps in such cases.
So, did you really close the database? You should - it's not technically a problem if you don't, but closing the database will ensure such strange things will not occur. I can't reproduce the problem if I normally close the database (stop the H2 Console tool). Closing all connections will close the database, and the database is closed if the application is stopped normally (using a shutdown hook).
By the way, what is your exact database URL? It seems you are using jdbc:h2:tcp://... but I can't see the rest of the URL.
Don't close terminal. Terminal is parent process of h2-tcp-server. They are not detached. When you just close terminal, it's process closes all child processes, what means emergency server shutdown
This happens when a database "thinks" it got forced to close (an accident or emergency for example), and its related to "identity-cache"
In my case I was facing this issue while learning and playing with the H2 database with an SpringBoot application, the solution was that at the h2-console when finishing playing, execute the SHUTDOWN; command and after that you can safely stop your spring boot application without having this tremendous jump on your autogenerated fields.
Personal Note: This usually is not a problem if you are creating the new database on every application start, but when you persist the data (for example on a data.sql file like on the below properties) you are playing with on the h2 database and it persist even when restarting, then this happens, so close it safely with SHUTDOWN command.
spring.datasource.url=jdbc:h2:./src/main/resources/data;DB_CLOSE_ON_EXIT=FALSE;AUTO_RECONNECT=TRUE
spring.jpa.hibernate.ddl-auto=update
References:
Solution
https://stackoverflow.com/a/40135657/10195307
Learn about identity-cache https://www.sqlshack.com/learn-to-avoid-an-identity-jump-issue-identity_cache-with-the-help-of-trace-command-t272/

Strict control over the statement_timeout variable in PostgreSQL

Does anybody know how to limit a users ability to set variables? Specifically statement_timeout?
Regardless of if I alter the user to have this variable set to a minute, or if I have it set to a minute in the postgresql.conf file, a user can always just type SET statement_timeount TO 0; to disable the timeout completely for that session.
Does anybody know a way to stop this? I know some variables can only be changed by a superuser but I cannot figure out if there is a way to force this to be one of those controlled variables. Alternatively, is there a way to revoke SET from their role?
In my application, this variable is used to limit the ability of random users (user registration is open to the public) from using up all the CPU time with (near) infinite queries. If they can disable it then it means that I must find a new methodology for limiting resources to users. If there is no method for securing this variable, is there other ways of achieving this same goal that you may suggest?
Edit 2011-03-02
The reason the database is open to the public and arbitrary SQL is allowed is because this project is for a game played directly in the database. Every player is a database user. Data is locked down behind views, rules and triggers, CREATE is revoked from public and the player role to prevent most alterations to the schema and SELECT on pg_proc is removed to secure game-sensitive function code.
This is not some mission critical system I have opened up to the world. It is a weird proof of concept that puts an abnormal amount of trust in the database in an attempt to maintain the entire CIA security triangle within it.
Thanks for your help,
Abstrct
There is no way to override this. If you allow the user to run arbitrary SQL commands, changing the statement_timeout is just the top of the iceberg anyway... If you don't trust your users, you shouldn't let them run arbitrary SQL - or accept that they can run, well, arbitrary SQL. And have some sort of external monitor that cancels the queries.
Basically you can't do this in plain postgres.
Meantime for accomplish your goal you may use some type of proxies and rewrite/forbidd some queries.
There several solutions for that, f.e.:
db-query-proxy - article how it born (in Russian).
BGBouncer + pgbouncer-rr-patch
Last contains very useful examples and it is very simple do on Python:
import re
def rewrite_query(username, query):
q1="SELECT storename, SUM\(total\) FROM sales JOIN store USING \(storeid\) GROUP BY storename ORDER BY storename"
q2="SELECT prodname, SUM\(total\) FROM sales JOIN product USING \(productid\) GROUP BY prodname ORDER BY prodname"
if re.match(q1, query):
new_query = "SELECT storename, SUM(total) FROM store_sales GROUP BY storename ORDER BY storename;"
elif re.match(q2, query):
new_query = "SELECT prodname, SUM(total) FROM product_sales GROUP BY prodname ORDER BY prodname;"
else:
new_query = query
return new_query

Resources