Cassandra update table set column1 with column2 values - cassandra

I am trying to update table column1 values to be copied from column2 in same table in Cassandra.
I have tried these: but throwing error
no viable alternative at input 'where' (...emp set col1_name = [shape] where...)
UPDATE emp SETcol1_name = col2_name WHERE id IN (1,2,3);
UPDATE emp SET col1_name = select(col2_name) WHERE id IN (1,2,3);

Related

Update Delta table based on condition from another table

I want to update the Delta table based on the condition it matches with another table.
below is my sql , and I want to convert it into delta update.
update emp set empid = (select emp_id from dept where empname="xyz")
where emp.deptid in (select dept_id from emp where emp_dept="IT")
I tried something like this with merge statement:
emp.merge(dept, "emp.dept_id = dept.merge_id")
.whenMatched()
.updateExpr(Map("emp_id" -> "dept.emp_id"))
.execute()
but now I'm stuck where I need to check if that emp_id having name as "xyz" and also wanted to check dept_id in where clause as well.
Any help would be appreciated

A view materialized in Cassandra 3.11.8 does not show the same number of rows as the base table

I have the following table in Cassandra 3.11.8
create table MyTable (
id int,
farm_id int,
etc....,
primary key (farm_id, id)
);
After inserting the table with data (14,273,683 rows):
select count (*)
from MyTable
where farm_id = 1504;
count
20964
Note: there is no row in the table (MyTable) with ID null.
After I created a materialized view as follows:
create materialized view MyView
as
select id, farm_id
from MyTable
where farm_id = 1504
and id is not null
primary key (id, farm_id);
But when checking the number of rows inside the view I got the following result:
select count(*) from MyView;
count
10297
I tried many times and the result is the same.
What is happening?
Only difference is id is not null being added in the where clause of view. Maybe you can check how many rows are there with id=null for given farm_id in the original table.

Insert new rows, continue existing rowset row_number count

I'm attempting to perform some sort of upsert operation in U-SQL where I pull data every day from a file, and compare it with yesterdays data which is stored in a table in Data Lake Storage.
I have created an ID column in the table in DL using row_number(), and it is this "counter" I wish to continue when appending new rows to the old dataset. E.g.
Last inserted row in DL table could look like this:
ID | Column1 | Column2
---+------------+---------
10 | SomeValue | 1
I want the next rows to have the following ascending ids
11 | SomeValue | 1
12 | SomeValue | 1
How would I go about making sure that the next X rows continues the ID count incrementally such that the next rows each increases the ID column by 1 more than the last?
You could use ROW_NUMBER then add it to the the max value from the original table (ie using CROSS JOIN and MAX). A simple demo of the technique:
DECLARE #outputFile string = #"\output\output.csv";
#originalInput =
SELECT *
FROM ( VALUES
( 10, "SomeValue 1", 1 )
) AS x ( id, column1, column2 );
#newInput =
SELECT *
FROM ( VALUES
( "SomeValue 2", 2 ),
( "SomeValue 3", 3 )
) AS x ( column1, column2 );
#output =
SELECT id, column1, column2
FROM #originalInput
UNION ALL
SELECT (int)(x.id + ROW_NUMBER() OVER()) AS id, column1, column2
FROM #newInput
CROSS JOIN ( SELECT MAX(id) AS id FROM #originalInput ) AS x;
OUTPUT #output
TO #outputFile
USING Outputters.Csv(outputHeader:true);
My results:
You will have to be careful if the original table is empty and add some additional conditions / null checks but I'll leave that up to you.

Unable to delete data from a Cassandra CF

So I have a CF whose Schema looks something like this :
CREATE TABLE "emp" (
id text,
column1 text,
column2 text,
PRIMARY KEY (id, column1, column2)
)
I have an entry which looks like this and I want to delete it :
20aff8144049 | name | someValue
So i tried this command :
Delete column2 from emp where id='20aff8144049';
It failed with below error:
no viable alternative at input '20aff8144049' (...column2 from emp where id=["20aff8144049]...)
Can someone help with where I'm going wrong? Thanks!
You can't delete or set null to primary key column
You have to delete the entire row.
You only can delete an entry using a valid value for your primary key. You defined your primary key to include (id, column1, column2) which means that you have to put all the corresponding values in your where clause.
However, I assume you wanted to be able to delete by id only. Therefore, I'd suggest you re-define your column family like this:
CREATE TABLE "emp" (
id text,
column1 text,
column2 text,
PRIMARY KEY ((id), column1, column2)
)
where id is your partition key and column1 and column2 are your clustering columns.

Pivoting in Pig

This is related to the question in Pivot table with Apache Pig.
I have the input data as
Id Name Value
1 Column1 Row11
1 Column2 Row12
1 Column3 Row13
2 Column1 Row21
2 Column2 Row22
2 Column3 Row23
and want to pivot and get the output as
Id Column1 Column2 Column3
1 Row11 Row12 Row13
2 Row21 Row22 Row23
Pls let me know how to do it in Pig.
The simplest way to do it without UDF is to group on Id and than in nested foreach select rows for each of the column names, then join them in the generate. See script:
inpt = load '~/rows_to_cols.txt' as (Id : chararray, Name : chararray, Value: chararray);
grp = group inpt by Id;
maps = foreach grp {
col1 = filter inpt by Name == 'Column1';
col2 = filter inpt by Name == 'Column2';
col3 = filter inpt by Name == 'Column3';
generate flatten(group) as Id, flatten(col1.Value) as Column1, flatten(col2.Value) as Column2, flatten(col3.Value) as Column3;
};
Output:
(1,Row11,Row12,Row13)
(2,Row21,Row22,Row23)
Another option would be to write a UDF which converts a bag{name, value} into a map[], than use get values by using column names as keys (Ex. vals#'Column1').
Not sure about pig, but in spark, you could do this with a one-line command
df.groupBy("Id").pivot("Name").agg(first("Value"))

Resources