How does CqlConfigHelper.setOutputCql() work? - cassandra

I am following the hadoop_cql3_word_count example in Cassandra and have questions with the following code segment:
String query =
"UPDATE " + KEYSPACE + "." + OUTPUT_COLUMN_FAMILY +
" SET count_num = ? ";
CqlConfigHelper.setOutputCql(job.getConfiguration(), query);
My questions are:
What is the definition of the question mark (i.e., ?) in the above query? Does Cassandra process it in a way such that the question mark is replaced by some value?
If I would like to update multiple columns of a row given its key, how should I modify the above update statement?
Thank you,

The ? represents a slot for a variable in a prepared statement. When your MR job completes the values will be placed into the ?s in order.
If your MR results looked like (key=key1, 1) (key=key2, 2) (key=key3, 3)
Then the statements executed would be
Update Keyspace.columnfamily SET count_num = 1 where key=key1
Update Keyspace.columnfamily SET count_num = 2 where key=key2
Update Keyspace.columnfamily SET count_num = 3 where key=key3
To update multiple columns you just need to write a larger prepared statement and make sure your map reduce job is providing all of the appropriate values.
In the WC example
keys.put("row_id1", ByteBufferUtil.bytes(partitionKeys[0]));
keys.put("row_id2", ByteBufferUtil.bytes(partitionKeys[1]));
...
keys.put("word", ByteBufferUtil.bytes(word.toString()));
variables.add(ByteBufferUtil.bytes(String.valueOf(sum)));
...
context.write(keys, getBindVariables(word, sum));
This makes the reducer output look like
({row_id1=1,row_id2=3,word=pizza},4)
And the prepared statement will be executed like
UPDATE cql3_worldcount.output_words SET count_num = 4 where row_id1=1 AND row_id2=3 AND word=pizza ;
If I wanted a prepared statement with multiple columns it would look like
UPDATE test SET a =?,b=?,c=?,d=? (This gets filled in by the connector: where key=...)
With a real prepared statement we would also fill in the key as well, but here the connector to Cassandra will just use whatever mappings you have in your reducer output.
({key='mykey'},(1,2,3,4))
becomes
UPDATE test SET a =1,b=2,c=3,d=4 where key=mykey
For more information on prepared statements in general check
SO Question about Prepared Statements in CQL

Related

how do I get rid of leading/trailing spaces in SAS search terms?

I have had to look up hundreds (if not thousands) of free-text answers on google, making notes in Excel along the way and inserting SAS-code around the answers as a last step.
The output looks like this:
This output contains an unnecessary number of blank spaces, which seems to confuse SAS's search to the point where the observations can't be properly located.
It works if I manually erase superflous spaces, but that will probably take hours. Is there an automated fix for this, either in SAS or in excel?
I tried using the STRIP-function, to no avail:
else if R_res_ort_txt=strip(" arild ") and R_kom_lan=strip(" skåne ") then R_kommun=strip(" Höganäs " );
If you want to generate a string like:
if R_res_ort_txt="arild" and R_kom_lan="skåne" then R_kommun="Höganäs";
from three variables, let's call them A B C, then just use code like:
string=catx(' ','if R_res_ort_txt=',quote(trim(A))
,'and R_kom_lan=',quote(trim(B))
,'then R_kommun=',quote(trim(C)),';') ;
Or if you are just writing that string to a file just use this PUT statement syntax.
put 'if R_res_ort_txt=' A :$quote. 'and R_kom_lan=' B :$quote.
'then R_kommun=' C :$quote. ';' ;
A saner solution would be to continue using the free-text answers as data and perform your matching criteria for transformations with a left join.
proc import out=answers datafile='my-free-text-answers.xlsx';
data have;
attrib R_res_ort_txt R_kom_lan length=$100;
input R_res_ort_txt ...;
datalines4;
... whatever all those transforms will be performed on...
;;;;
proc sql;
create table want as
select
have.* ,
answers.R_kommun_answer as R_kommun
from
have
left join
answers
on
have.R_res_ort_txt = answers.res_ort_answer
& have.R_kom_lan = abswers.kom_lan_answer
;
I solved this by adding quotes in excel using the flash fill function:
https://www.youtube.com/watch?v=nE65QeDoepc

Update multiple column with bind variable using cx_Oracle.executemany()

I have been trying to update some columns of a database table using cx_Oracle in Python. I created a list named check_to_process which is a result of another sql query. I am creating log_msg in the program based on success or failure and want to update same in the table only for records in check_to_process list. When I update the table without using bind variable <MESSAGE = %s>, it works fine. But when I try to use bind variable to update the columns it gives me error :
cursor.executemany("UPDATE apps.SLCAP_CITI_PAYMENT_BATCH SET MESSAGE = %s, "
TypeError: an integer is required (got type str)
Below is the code, I am using:
import cx_Oracle
connection = cx_Oracle.connect(user=os.environ['ORA_DB_USR'], password=os.environ['ORA_DB_PWD'], dsn=os.environ['ORA_DSN'])
cursor = connection.cursor()
check_to_process = ['ACHRMUS-20-OCT-2021 00:12:57', 'ACHRMUS-12-OCT-2021 16:12:01']
placeholders = ','.join(":x%d" % i for i,_ in enumerate(check_to_process))
log_msg = 'Success'
cursor.executemany("UPDATE apps.SLCAP_CITI_PAYMENT_BATCH SET MESSAGE = %s, "
"PAYMENT_FILE_CREATED_FLAG='N' "
"WHERE PAYMENT_BATCH_NAME = :1",
[(i,) for i in check_to_process], log_msg, arraydmlrowcounts=True)
Many thanks for suggestions and insights!
Your code has an odd mix of string substitution (the %s) and bind variable placeholders (the :1). And odd code that creates bind variable placeholders that aren't used. Passing log_msg the way you do isn't going to work, since executemany() syntax doesn't support string substitution.
You probably want to use some kind of IN list, as shown in the cx_Oracle documentation Binding Multiple Values to a SQL WHERE IN Clause. Various solutions are shown there, depending on the number of values and frequency that the statement will be re-executed.
Use only bind variables. You should be able to use execute() instead of executemany(). Effectively you would do:
cursor.execute("""UPDATE apps.SLCAP_CITI_PAYMENT_BATCH
SET MESSAGE = :1
WHERE PAYMENT_BATCH_NAME IN (something goes here - see the doc)""",
bind_values)
The bottom line is: read the documentation and review examples like batch_errors.py. If you still have problems, refine your question, correct it, and add more detail.

Flag difference in a string variable in SQL

I have data as:
Image of data I have
I want to add flag variables in the data as:
Image of data I want
I have tried the lag function but it didn't work due to the variable being character.
I want to flag any change in string variable.Please help.
I solved this using the query along the lines of:
CREATE TEMP TABLE WANT AS(
SELECT *, CASE WHEN LAG(NAME) OVER(PARTITION BY ID ORDER BY ID) != NAME
THEN 1
ELSE 0
END AS FLAG1
FROM DATA_HAVE
ORDER BY
ID);
No judging, just sharing.

Condition IF In Power Query's Advanced Editor

I have a field named field, and I would like to see if it is null, but I get an error in the query, my code is this:
let
Condition= Excel.CurrentWorkbook(){[Name="test_table"]}[Content],
field= Condition{0}[fieldColumn],
query1="select * from students",
if field <> null then query1=query1 & " where id = '"& field &"',
exec= Oracle.Database("TESTING",[Query=query1])
in
exec
but I get an error in the condition, do you identify the mistake?
I got Expression.SyntaxError: Token Identifier expected.
You need to assign the if line to a variable. Each M line needs to start with an assignment:
let
Condition= Excel.CurrentWorkbook(){[Name="test_table"]}[Content],
field= Condition{0}[fieldColumn],
query1="select * from students",
query2 = if field <> null then query1 & " some stuff" else " some other stuff",
exec= Oracle.Database("TESTING",[Query=query2])
in
exec
In query2 you can build the select statement. I simplified it, because you also have conflicts with the double quotes.
I think you're looking for:
if Not IsNull(field) then ....
Some data types you may have to check using IsEmpty() or 'field is Not Nothing' too. Depending on the datatype and what you are using.
To troubleshoot, it's best to try to set a breakpoint and locate where the error is happening and watch the variable to prevent against that specific value.
To meet this requirement, I would build a fresh Query using the PQ UI to select the students table/view from Oracle, and then use the UI to Filter the [id] column on any value.
Then in the advanced editor I would edit the generated FilteredRows line using code from your Condition + field steps, e.g.
FilteredRows = Table.SelectRows(TESTING_students, each [id] = Excel.CurrentWorkbook(){[Name="test_table"]}{0}[fieldColumn])
This is a minor change from a generated script, rather than trying to write the whole thing from scratch.

Replace empty strings with null values

I am rolling up a huge table by counts into a new table, where I want to change all the empty strings to NULL, and typecast some columns as well. I read through some of the posts and I could not find a query, which would let me do it across all the columns in a single query, without using multiple statements.
Let me know if it is possible for me to iterate across all columns and replace cells with empty strings with null.
Ref: How to convert empty spaces into null values, using SQL Server?
To my knowledge there is no built-in function to replace empty strings across all columns of a table. You can write a plpgsql function to take care of that.
The following function replaces empty strings in all basic character-type columns of a given table with NULL. You can then cast to integer if the remaining strings are valid number literals.
CREATE OR REPLACE FUNCTION f_empty_text_to_null(_tbl regclass, OUT updated_rows int)
LANGUAGE plpgsql AS
$func$
DECLARE
_typ CONSTANT regtype[] := '{text, bpchar, varchar}'; -- ARRAY of all basic character types
_sql text;
BEGIN
SELECT INTO _sql -- build SQL command
'UPDATE ' || _tbl
|| E'\nSET ' || string_agg(format('%1$s = NULLIF(%1$s, '''')', col), E'\n ,')
|| E'\nWHERE ' || string_agg(col || ' = ''''', ' OR ')
FROM (
SELECT quote_ident(attname) AS col
FROM pg_attribute
WHERE attrelid = _tbl -- valid, visible, legal table name
AND attnum >= 1 -- exclude tableoid & friends
AND NOT attisdropped -- exclude dropped columns
AND NOT attnotnull -- exclude columns defined NOT NULL!
AND atttypid = ANY(_typ) -- only character types
ORDER BY attnum
) sub;
-- RAISE NOTICE '%', _sql; -- test?
-- Execute
IF _sql IS NULL THEN
updated_rows := 0; -- nothing to update
ELSE
EXECUTE _sql;
GET DIAGNOSTICS updated_rows = ROW_COUNT; -- Report number of affected rows
END IF;
END
$func$;
Call:
SELECT f_empty2null('mytable');
SELECT f_empty2null('myschema.mytable');
To also get the column name updated_rows:
SELECT * FROM f_empty2null('mytable');
db<>fiddle here
Old sqlfiddle
Major points
Table name has to be valid and visible and the calling user must have all necessary privileges. If any of these conditions are not met, the function will do nothing - i.e. nothing can be destroyed, either. I cast to the object identifier type regclass to make sure of it.
The table name can be supplied as is ('mytable'), then the search_path decides. Or schema-qualified to pick a certain schema ('myschema.mytable').
Query the system catalog to get all (character-type) columns of the table. The provided function uses these basic character types: text, bpchar, varchar, "char". Only relevant columns are processed.
Use quote_ident() or format() to sanitize column names and safeguard against SQLi.
The updated version uses the basic SQL aggregate function string_agg() to build the command string without looping, which is simpler and faster. And more elegant. :)
Has to use dynamic SQL with EXECUTE.
The updated version excludes columns defined NOT NULL and only updates each row once in a single statement, which is much faster for tables with multiple character-type columns.
Should work with any modern version of PostgreSQL. Tested with Postgres 9.1, 9.3, 9.5 and 13.

Resources