AWS CLI Scan with placeholder - aws-cli

With this scan I can get 1 item (it's a test item):
aws dynamodb scan --table-name my_table --select "COUNT"
--filter-expression "attribute_type(sender.test, :v_sub)"
--expression-attribute-values file://expression-attribute-values.json
But when I try to use a placeholder instead the real path sender.test I get 0 items. What I'm doing wrong?
aws dynamodb scan --table-name my_table --select "COUNT"
--filter-expression "attribute_type(#code, :v_sub)"
--expression-attribute-names '{"#code": "sender.test"}'
--expression-attribute-values file://expression-attribute-values.json

Finally I could manage it with help of a colleague. This is the way:
aws dynamodb scan --table-name my_table --select "COUNT"
--filter-expression "attribute_type(#code.#cor, :v_sub)"
--expression-attribute-names '{"#code": "sender", "#cor": "custom:myAttribute"}'
--expression-attribute-values '{":v_sub":{"S":"N"}}'
I had to use the path with 2 placeholder #code.#cor I was trying to create the path with only one.

Related

SparkSQL in EMR to fetch Data from AWS Glue (Cross Account) (Permission issue)

I have a EMR cluster on which I am running a SparkSQL Job to fetch data from AWS Glue Catalogue (S3) and both are present in different accounts.
Related to my post earlier:- SparkSQL in EMR to fetch Data from AWS Glue (Cross Account)
My query is something of the form:-
SELECT array_join(collect_list(month),'\\',\\'') mnth FROM ( SELECT DISTINCT regexp_extract(col1, '(\\\\d+)-(\\\\d+)-(\\\\d+)', 2) month FROM (SELECT explode_outer(sequence(CURRENT_DATE, CURRENT_DATE - 15)) AS col1));##var2;\n
SELECT array_join(collect_list(day),'\\',\\'') dy FROM ( SELECT DISTINCT regexp_extract(col1, '(\\\\d+)-(\\\\d+)-(\\\\d+)', 3) day FROM (SELECT explode_outer(sequence(CURRENT_DATE, CURRENT_DATE - 15)) AS col1));##var3;\n
CREATE OR REPLACE VIEW employee AS
SELECT
pay.recordid,
pay.employeeid,
pay.amount,
pay.paycode,
pay.paydate,
pay.paycycle,
pay.updatetime
FROM masterpoc.payrollinputsaudit pay
WHERE
pay.partition_0 in (var1)
and pay.partition_1 in (var2)
and pay.partition_2 in (var3)
and paycode = 'P1'
AND paycycle = 'Monthly'
AND country = 'test'
AND paydate = ( SELECT DISTINCT paydate FROM master-reference_data_atp.paydate
WHERE CURRENT_DATE < DATE(paydate) AND CURRENT_DATE > DATE(payperiodstart)
AND paycycle = 'M' AND iso3 = 'test')
AND amount > 0;\n
CACHE TABLE employee;\n
SELECT employeeid,SUM(amount) amount,paydate FROM (
SELECT employeeid,recordid,amount,paydate FROM (
SELECT row_number() over (partition by employeeid,recordid order by updatetime DESC) rank_num,
* FROM employee )
WHERE rank_num = 1)
group by employeeid,paydate"}
Here I am having following issue:-
The following permissions have been given in Glue catalog settings:-
{
"Version" : "2012-10-17",
"Statement" : [ {
"Effect" : "Allow",
"Principal" : {
"AWS" : [ "arn:aws:iam::test:role/fades-emr-ec2-instance-profile-role-gamma" ]
},
"Action" : [ "glue:GetDatabase", "glue:GetUserDefinedFunctions", "glue:GetTable", "glue:GetPartitions" ],
"Resource" : [ "arn:aws:glue:us-east-1:glueAccount:catalog", "arn:aws:glue:us-east-1:glueAccount:catalog:database/master", "arn:aws:glue:us-east-1:glueAccount:catalog:database/masterpoc", "arn:aws:glue:us-east-1:glueAccount:catalog:database/master-reference_data_atp", "arn:aws:glue:us-east-1:glueAccount:catalog:database/master-reference_data_atp/*", "arn:aws:glue:us-east-1:glueAccount:catalog:table/master/*", "arn:aws:glue:us-east-1:glueAccount:catalog:table/masterpoc/*",
"arn:aws:glue:us-east-1:glueAccount:catalog:database/default", "arn:aws:glue:us-east-1:glueAccount:catalog:database/default/*",
"arn:aws:glue:us-east-1:glueAccount:catalog:table/default/*",
"arn:aws:glue:us-east-1:glueAccount:catalog:table/master-reference_data_atp/*" ]
} ]
}
Here even though in query I am not fetching data from default database still I have to add permissions for default otherwise I get Access denied exception.
Can someone explain?

PostgreSQL double colon to Sequelize query

select column_name::date, count(*) from table_name group by column_name::date
What is the equivalent of this SQL query in Sequelize?
I couldn't find what to do when there is "double colon" in PostgreSQL query.
Thanks to a_horse_with_no_name comment I decide to use;
sequelize.literal("cast(time_column_name as date)")
with the grouping section and the latest code take form;
ModelName.findAndCountAll({
attributes: [
[sequelize.literal("cast(time_column_name as date)"), "time_column_name"],
],
group: sequelize.literal("cast(time_column_name as date)"),
})
So, it gives two SQL query (because of findAndCountAll() function);
SELECT count(*) AS "count"
FROM "table_name"
GROUP BY cast(time_column_name as date);
AND
SELECT cast(time_column_name as date) AS "time_column_name"
FROM "table_name"
GROUP BY cast(time_column_name as date);

CQL UPDATE a set<bigint> with join query

I have to delete few data from a table using CQL based on some condition which will fetch data from another table. But I am unable to form the query.
Here is the table details from where I need to delete data :
Table Name : xyz_group
Columns : dept_id [int] , sub_id [set<bigint>]
PRIMARY KEY (Partition key) : dept_id
There can be same sub_id for multiple dept_id. The data is something like below :
dept_id | sub_id
-------------------------------
1098 | 345678298, 24579123, 8790455308
2059 | 398534698, 24579123, 8447659928
3467 | 311209878, 24579123, 8790455308, 987654321,
I need to remove only ---> 24579123, 8790455308 from all the rows.
And here is my SELECT query which will fetch the data from another table abc_list which is to be removed from table xyz_group
select sub_id from abc_list where sub_name='XYZ';
The output for the above query will give me a list of sub_id which I want to remove from the table xyz_group. So basically I want to update the set by removing data from the set. Something like below :
UPDATE xyz_group SET sub_id = sub_id - [ query result from above select query ] WHERE dept_id in (1098, 2059, 3467, ...);
I have tried to remove one element from the set, but I am getting the below error :
UPDATE xyz_group SET sub_id = sub_id - [ 24579123 ] WHERE dept_id in (1098, 2059, 3467, ...);
Error : Column sub_id type set<bigint> is not compatible with type list<int>
The tables has around >50k records. Can anyone please help to form the single correct query to update.
The below query is working for me now :
UPDATE xyz_group SET sub_id = sub_id - { 24579123 } WHERE dept_id in (1098, 2059, 3467, ...);
But I am doing a 2 step process to update the table. First collecting the required sub_id and then using a separate UPDATE query to update the table.Not able to do in a single query.

Yugabtye Row-Level Geo-Partitioning

I wrote a Row-Level Geo-Partitioning POC, and I set up the following link. I found that all the data of the node table are exactly the same.enter code here I want to make the horizontal split how to achieve it,how do you du it, thank you!
https://docs.yugabyte.com/latest/explore/multi-region-deployments/row-level-geo-partitioning/
/usr/local/yugabyte-2.9.0.0/bin/yugabyted start \
--base_dir=/home/yugabyte/yugabyte-data \
--listen=192.168.106.34 \
--master_flags "placement_cloud=aws,placement_region=us-east-1,placement_zone=us-east-1a" \
--tserver_flags "placement_cloud=aws,placement_region=us-east-1,placement_zone=us-east-1a" &
/usr/local/yugabyte-2.9.0.0/bin/yugabyted start \
--base_dir=/home/yugabyte/yugabyte-data \
--listen=192.168.106.23 \
--join=192.168.106.34 \
--tserver_flags "placement_cloud=aws,placement_region=us-east-1,placement_zone=us-east-1a"
CREATE TABLE transactions (
user_id INTEGER NOT NULL,
account_id INTEGER NOT NULL,
geo_partition VARCHAR,
account_type VARCHAR NOT NULL,
amount NUMERIC NOT NULL,
txn_type VARCHAR NOT NULL,
created_at TIMESTAMP DEFAULT NOW()
) PARTITION BY LIST (geo_partition);
CREATE TABLESPACE us_central_1_tablespace WITH (
replica_placement='{"num_replicas": 1, "placement_blocks":
[{"cloud":"aws","region":"us-east-1","zone":"us-east-1a","min_num_replicas":1}]}'
);
CREATE TABLESPACE ap_south_1_tablespace WITH (
replica_placement='{"num_replicas": 1, "placement_blocks":
[{"cloud":"cloud1","region":"datacenter1","zone":"rack1","min_num_replicas":1}]}'
);
CREATE TABLE transactions_us
PARTITION OF transactions
(user_id, account_id, geo_partition, account_type,
amount, txn_type, created_at,
PRIMARY KEY (user_id HASH, account_id, geo_partition))
FOR VALUES IN ('US') TABLESPACE us_central_1_tablespace;
CREATE TABLE transactions_default
PARTITION OF transactions
(user_id, account_id, geo_partition, account_type,
amount, txn_type, created_at,
PRIMARY KEY (user_id HASH, account_id, geo_partition))
FOR VALUES IN ('India') TABLESPACE ap_south_1_tablespace;
INSERT INTO transactions
VALUES (200, 20001, 'India', 'savings', 1000, 'credit');
INSERT INTO transactions
VALUES (300, 30001, 'US', 'checking', 105.25, 'debit');
select * from transactions;
select * from transactions_us;
select * from transactions_default;

How to use order by(Sorting) on Secondary index using Cassandra DB

My table schema is:
CREATE TABLE users
(user_id BIGINT PRIMARY KEY,
user_name text,
email_ text);
I inserted below rows into the table.
INSERT INTO users(user_id, email_, user_name)
VALUES(1, 'abc#test.com', 'ABC');
INSERT INTO users(user_id, email_, user_name)
VALUES(2, 'abc#test.com', 'ZYX ABC');
INSERT INTO users(user_id, email_, user_name)
VALUES(3, 'abc#test.com', 'Test ABC');
INSERT INTO users(user_id, email_, user_name)
VALUES(4, 'abc#test.com', 'C ABC');
For searching data into the user_name column, I created an index to use the LIKE operator with '%%':
CREATE CUSTOM INDEX idx_users_user_name ON users (user_name)
USING 'org.apache.cassandra.index.sasi.SASIIndex'
WITH OPTIONS = {
'mode': 'CONTAINS',
'analyzer_class': 'org.apache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer',
'case_sensitive': 'false'};
Problem:1
When I am executing below Query, it returns 3 records only, instead of 4.
select *
from users
where user_name like '%ABC%';
Problem:2
When I use below query, it gives an error as
ERROR: com.datastax.driver.core.exceptions.InvalidQueryException:
ORDER BY with 2ndary indexes is not supported.
Query =select * from users where user_name like '%ABC%' ORDER BY user_name ASC;
Query:
select *
from users
where user_name like '%ABC%'
ORDER BY user_name ASC;
My requirement is to filter the user_name with order by user_name.
The first query does work correctly for me using cassandra:latest which is now cassandra:3.11.3. You might want to double-check the inserted data (or just recreate from scratch using the cql statements you provided).
The second one gives you enough info - ordering by secondary indexes is not possible in Cassandra. You might have to sort the result set in your application.
That being said I would not recommend running this setup in real apps. With some additional scale (when you have many records) this will be a suicide performance-wise. I should not go into much detail since maybe you already understand this and SO is not a wikia/documentation site, so here is a link.

Resources