ERROR: COALESCE types timestamp without time zone and integer cannot be matched (Postgresql)

ERROR: COALESCE types timestamp without time zone and integer cannot be matched (Postgresql) - coalesce

## PROBLEM ##
I got error from this script (Postgresql 9.3.2)
( It's OK in MS SQL Server )
SELECT
CASE COALESCE(my_date_field,0)
WHEN 0 THEN 0
ELSE 1 END
AS status
FROM
my_table
Error :COALESCE types timestamp without time zone and integer cannot be matched
Line 2 : CASE COALESCE(my_date_field,0)
## SOLVED ##
SELECT
CASE WHEN my_date_field IS NULL
THEN 0 ELSE 1 END
AS status
FROM
my_table
COALESCE accepts pretty much any number of parameters, but they should be the same data-type.
I quoted from COALESCE Function in TSQL

Zero is not a valid date. It's surprising that it works in MS SQL. You need to use a sensible date, or accept NULL.
CASE COALESCE(my_date_field, DATE '0001-01-01')
The query is a bit bizarre in general. Isn't that an incredibly longwinded and complicated way of writing IS NULL?
SELECT
my_date_field IS NULL AS status
FROM
my_table
If, per comment, you want 0 or 1, use:
SELECT
CASE WHEN my_date_field IS NULL THEN 1 ELSE 0 END AS status
FROM
my_table

Related

What can cause "java.lang.IndexOutOfBoundsException: Index: 1, Size: 1" in a prepared query?

This is in Cassandra 3.11.4.
I'm running a modified version of a query that previously worked fine in my app. The original query that is fine was like:
SELECT SerializedRecord FROM SxRecord WHERE Mark=?
I modified the query to have a range of a timestamp (which I also added an index for, though I don't think that is relevant):
SELECT SerializedRecord FROM SxRecord WHERE Mark=? AND Timestamp>=? AND Timestamp<=?
This results in:
ResponseError {reHost = datacenter1:rack1:127.0.0.1:9042, reTrace = Nothing, reWarn = [], reCause = ServerError "java.lang.IndexOutOfBoundsException: Index: 1, Size: 1"}
When this occurs, I don't see the query CQL being logged in system_traces.sessions, which is interesting, because if I put a syntax error into the query, it is still logged there.
Additionally, when I run an (as far as I know, identical, up to timestamps) query in cqlsh, there doesn't seem to be a problem:
cqlsh> SELECT SerializedRecord FROM test_fds.SxRecord WHERE Mark=8391 AND Timestamp >= '2021-03-06 00:00:00.000+0000' AND Timestamp <= '2021-03-09 00:00:00.000+0000';
serializedrecord
------------------
This results in the following query trace:
cqlsh> select parameters from system_traces.sessions;
parameters
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
{'consistency_level': 'ONE', 'page_size': '100', 'query': 'SELECT SerializedRecord FROM test_fds.SxRecord WHERE Mark=8391 AND Timestamp >= ''2021-03-06 00:00:00.000+0000'' AND Timestamp <= ''2021-03-09 00:00:00.000+0000'';', 'serial_consistency_level': 'SERIAL'}
null

It seems that the query, executed inside a prepared/bound statement, is not recieving all the parameters needed OR too many (something bound in previous code)
The fact that you don't see the query traced comes from the fact that the driver does not even perform the query as it has unbound parameters

AWS Athena working with nested arrays, trying to search for a field within the array

I have a sql query:
SELECT id_str, entities.hashtags
FROM tweets, unnest(entities.hashtags) as t(hashtag)
WHERE cardinality(entities.hashtags)=2 and id_str='1248585590573948928'
limit 5
which returns:
id_str hashtags
1248585590573948928 [{text=LUCAS, indices=[75, 81]}, {text=WayV, indices=[83, 88]}]
1248585590573948928 [{text=LUCAS, indices=[75, 81]}, {text=WayV, indices=[83, 88]}]
The unnesting has returned the row twice which originally was one row, this is because there are 2 objects in this array.
The next part I wanted to add to the sql query was
select hashtag['text'] as htag to the existing select which should return 2 rows still but this time returning LUCAS and WayV in the separate rows in same column, named htag.
But I get this error - any idea what I am doing wrong?
Your query has the following error(s):
SYNTAX_ERROR: line 1:8: '[]' cannot be applied to row(text varchar,indices array(bigint)), varchar(4)
I assume it is because I have another array within this array.. ?
Thanks in advance

I'm not entirely sure where you're adding the hashtag['text'] expression, so I can't say with confidence what your problem is, but I have two suggestions for you to try:
The error says that hashtag is of type row(text varchar, …), which suggests that hashtag.text should work.
If that doesn't work, you can try using element_at e.g. element_at(hashtag, 'text').

I came across this issue as well and since there is no solution provided I like to chip in:
After you unnest an array, you can address the result with a . reference instead of ['']:
WITH dataset AS (
SELECT ARRAY[
CAST(ROW('Bob', 38) AS ROW(name VARCHAR, age INTEGER)),
CAST(ROW('Alice', 35) AS ROW(name VARCHAR, age INTEGER)),
CAST(ROW('Jane', 27) AS ROW(name VARCHAR, age INTEGER))
] AS users
)
SELECT
user,
user.name
FROM dataset
cross join unnest (users) as t(user)

Athena query results show null values despite is not null condition in query

I have the following query which I run in Athena. I would like to receive all the results that contain a tag in the 'resource_tags_aws_cloudformation_stack_name'. However, when I run the query my results show me rows where the 'resource_tags_aws_cloudformation_stack_name' is empty and I don't know what I am doing wrong.
SELECT
cm.line_item_usage_account_id,
pr.line_of_business,
cm.resource_tags_aws_cloudformation_stack_name,
SUM(CASE WHEN cm.line_item_product_code = 'AmazonEC2'
THEN line_item_unblended_cost * 0.97
ELSE cm.line_item_unblended_cost END) AS discounted_cost,
CAST(cm.line_item_usage_start_date AS DATE) AS start_day
FROM cost_management cm
JOIN prod_cur_metadata pr ON cm.line_item_usage_account_id = pr.line_item_usage_account_id
WHERE cm.line_item_usage_account_id IN ('1234504482')
AND cm.resource_tags_aws_cloudformation_stack_name IS NOT NULL
AND cm.line_item_usage_start_date
BETWEEN date '2020-01-01'
AND date '2020-01-30'
GROUP BY cm.line_item_usage_account_id,pr.line_of_business, cm.resource_tags_aws_cloudformation_stack_name, CAST(cm.line_item_usage_start_date AS DATE), pr.line_of_business
HAVING sum(cm.line_item_blended_cost) > 0
ORDER BY cm.line_item_usage_account_id

I modified my query to exclude ' ' and that seems to work:
SELECT
cm.line_item_usage_account_id,
pr.line_of_business,
cm.resource_tags_aws_cloudformation_stack_name,
SUM(CASE WHEN cm.line_item_product_code = 'AmazonEC2'
THEN line_item_unblended_cost * 0.97
ELSE cm.line_item_unblended_cost END) AS discounted_cost,
CAST(cm.line_item_usage_start_date AS DATE) AS start_day
FROM cost_management cm
JOIN prod_cur_metadata pr ON cm.line_item_usage_account_id = pr.line_item_usage_account_id
WHERE cm.line_item_usage_account_id IN ('1234504482')
AND NOT cm.resource_tags_aws_cloudformation_stack_name = ' '
AND cm.line_item_usage_start_date
BETWEEN date '2020-01-01'
AND date '2020-01-30'
GROUP BY cm.line_item_usage_account_id,pr.line_of_business, cm.resource_tags_aws_cloudformation_stack_name, CAST(cm.line_item_usage_start_date AS DATE), pr.line_of_business
HAVING sum(cm.line_item_blended_cost) > 0
ORDER BY cm.line_item_usage_account_id

You can try space use case as below
AND Coalesce(cm.resource_tags_aws_cloudformation_stack_name,' ') !=' '
Or if you have multiple spaces try. The below query is not good if spaces required in actual data
AND Regexp_replace(cm.resource_tags_aws_cloudformation_stack_name,' ') is not null
Adding to this you may also have special char like CR or LF in data. Although its rare scenario

Azure SQL Data Warehouse hanging or not responding to simple query after large BCP operation

I have a preview version of Azure Sql Data Warehouse running which was working fine until I imported a large table (~80 GB) through BCP. Now all the tables including the small one do not respond even to a simple query
select * from <MyTable>
Queries to Sys tables are working still.
select * from sys.objects
The BCP process was left over the weekend, so any Statistics Update should have been done by now. Is there any way to figure out what is making this happen? Or at lease what is currently running to see if anything is blocking?
I'm using SQL Server Management Studio 2014 to connect to the Data Warehouse and executing queries.

#user5285420 - run the code below to get a good view of what's going on. You should be able to find the query easily by looking at the value in the "command" column. Can you confirm if the BCP command still shows as status="Running" when the query steps are all complete?
select top 50
(case when requests.status = 'Completed' then 100
when progress.total_steps = 0 then 0
else 100 * progress.completed_steps / progress.total_steps end) as progress_percent,
requests.status,
requests.request_id,
sessions.login_name,
requests.start_time,
requests.end_time,
requests.total_elapsed_time,
requests.command,
errors.details,
requests.session_id,
(case when requests.resource_class is NULL then 'N/A'
else requests.resource_class end) as resource_class,
(case when resource_waits.concurrency_slots_used is NULL then 'N/A'
else cast(resource_waits.concurrency_slots_used as varchar(10)) end) as concurrency_slots_used
from sys.dm_pdw_exec_requests AS requests
join sys.dm_pdw_exec_sessions AS sessions
on (requests.session_id = sessions.session_id)
left join sys.dm_pdw_errors AS errors
on (requests.error_id = errors.error_id)
left join sys.dm_pdw_resource_waits AS resource_waits
on (requests.resource_class = resource_waits.resource_class)
outer apply (
select count (steps.request_id) as total_steps,
sum (case when steps.status = 'Complete' then 1 else 0 end ) as completed_steps
from sys.dm_pdw_request_steps steps where steps.request_id = requests.request_id
) progress
where requests.start_time >= DATEADD(hour, -24, GETDATE())
ORDER BY requests.total_elapsed_time DESC, requests.start_time DESC

Checkout the resource utilization and possibly other issues from https://portal.azure.com/
You can also run sp_who2 from SSMS to get a snapshot of what's threads are active and whether there's some crazy blocking chain that's causing problems.

postgresql insert rules for parallel transactions

We have a postgreql connection pool used by multithreaded application, that permanently inserts some records into big table. So, lets say we have 10 database connections, executing the same function, whcih inserts the record.
The trouble is, we have 10 records inserted as a result meanwhile it should be only 2-3 records inserted, if only transactions could see the records of each other (our function takes decision to do not insert the record according to the date of the last record found).
We can not afford table locking for func execution period.
We tried different tecniques to make the database apply our rules to new records immediately despite the fact they are created in parallel transactions, but havent succeeded yet.
So, I would be very grateful for any help or idea!
To be more specific, here is the code:
schm.events ( evtime TIMESTAMP, ref_id INTEGER, param INTEGER, type INTEGER);
record filter rule:
BEGIN
select count(*) into nCnt
from events e
where e.ref_id = ref_id and e.param = param and e.type = type
and e.evtime between (evtime - interval '10 seconds') and (evtime + interval '10 seconds')
if nCnt = 0 then
insert into schm.events values (evtime, ref_id, param, type);
end if;
END;
UPDATE (comment length is not enough unfortunately)
I've applied to production the unique index solution. The results are pretty acceptable, but the initial target has not been achieved.
The issue is, with the unique hash I can not control the interval between 2 records with sequential hash_codes.
Here is the code:
CREATE TABLE schm.events_hash (
hash_code bigint NOT NULL
);
CREATE UNIQUE INDEX ui_events_hash_hash_code ON its.events_hash
USING btree (hash_code);
--generate the hash codes data by partioning(splitting) evtime in 10 sec intervals:
INSERT into schm.events_hash
select distinct ( cast( trunc( extract(epoch from evtime) / 10 ) || cast( ref_id as TEXT) || cast( type as TEXT ) || cast( param as TEXT ) as bigint) )
from schm.events;
--and then in a concurrently executed function I insert sequentially:
begin
INSERT into schm.events_hash values ( cast( trunc( extract(epoch from evtime) / 10 ) || cast( ref_id as TEXT) || cast( type as TEXT ) || cast( param as TEXT ) as bigint) );
insert into schm.events values (evtime, ref_id, param, type);
end;
In that case, if evtime lies within hash-determined interval, only one record is being inserted.
The case is, we can skip records that refer to different determined intervals, but are close to each other (less than 60 sec interval).
insert into schm.events values ( '2013-07-22 19:32:37', '123', '10', '20' ); --inserted, test ok, (trunc( extract(epoch from cast('2013-07-22 19:32:37' as timestamp)) / 10 ) = 137450715 )
insert into schm.events values ( '2013-07-22 19:32:39', '123', '10', '20' ); --filtered out, test ok, (trunc( extract(epoch from cast('2013-07-22 19:32:39' as timestamp)) / 10 ) = 137450715 )
insert into schm.events values ( '2013-07-22 19:32:41', '123', '10', '20' ); --inserted, test fail, (trunc( extract(epoch from cast('2013-07-22 19:32:41' as timestamp)) / 10 ) = 137450716 )
I think there must be a way to modify the hash function to achieve the initial target, but havent found it yet. Maybe, there are some table constraint expressions, that are executed by the postgresql itself, out of the transaction?

About your only options are:
Using a unique index with a hack to collapse 20-second ranges to a single value;
Using advisory locking to control communication; or
SERIALIZABLE isolation and intentionally creating a mutual dependency between sessions. Not 100% sure this will be practical in your case.
What you really want is a dirty read, but PostgreSQL does not support dirty reads, so you're kind of stuck there.
You might land up needing a co-ordinator outside the database to manage your requirements.
Unique index
You can truncate your timestamps for the purpose of uniquenes checking, rounding them to regular boundaries so they jump in 20 second chunks. Then add them to a unique index on (chunk_time_seconds(evtime, 20), ref_id, param, type) .
Only one insert will succeed and the rest will fail with an error. You can trap the error in a BEGIN ... EXCEPTION block in PL/PgSQL, or preferably just handle it in the application.
I think a reasonable definition of chunk_time_seconds might be:
CREATE OR REPLACE FUNCTION chunk_time_seconds(t timestamptz, round_seconds integer)
RETURNS bigint
AS $$
SELECT floor(extract(epoch from t) / 20) * 20;
$$ LANGUAGE sql IMMUTABLE;
A starting point for advisory locking:
Advisory locks can be taken on a single bigint or a pair of 32-bit integers. Your key is bigger than that, it's three integers, so you can't directly use the simplest approach of:
IF pg_try_advisory_lock(ref_id, param) THEN
... do insert ...
END IF;
then after 10 seconds, on the same connection (but not necessarily in the same transaction) issue pg_advisory_unlock(ref_id_param).
It won't work because you must also filter on type and there's no three-integer-argument form of pg_advisory_lock. If you can turn param and type into smallints you could:
IF pg_try_advisory_lock(ref_id, param << 16 + type) THEN
but otherwise you're in a bit of a pickle. You could hash the values, of course, but then you run the (small) risk of incorrectly skipping an insert that should not be skipped in the case of a hash collision. There's no way to trigger a recheck because the conflicting rows aren't visible, so you can't use the usual solution of just comparing rows.
So ... if you can fit the key into 64 bits and your application can deal with the need to hold the lock for 10-20s before releasing it in the same connection, advisory locks will work for you and will be very low overhead.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string