how can we page results from a select query in cassandra - cassandra

I have a cassandra table that has more than 2 million rows. I need to fetch my results and page them.
how can I page my results from the select query.
I am getting rpc time out when i try to retrieve 1M rows.

From the cqlsh command prompt, one way to do this is by restricting your hashed partition key values, via the token function. Let's say that I have a table that keeps track of ship crew members (with crewname as my partition key):
aploetz#cqlsh:presentation> SELECT crewname,token(crewname),firstname,lastname
FROM crew;
crewname | token(crewname) | firstname | lastname
----------+----------------------+-----------+-----------
Simon | -8694467316808994943 | Simon | Tam
Jayne | -3415298744707363779 | Jayne | Cobb
Wash | 596395343680995623 | Hoban | Washburne
Mal | 4016264465811926804 | Malcolm | Reynolds
Zoey | 7853923060445977899 | Zoey | Washburne
Sheppard | 8386579365973272775 | Derial | Book
(6 rows)
If I just want to bring back the all the crew members from Jayne to Zoey (inclusive) I can run a query like this:
aploetz#cqlsh:presentation> SELECT crewname,token(crewname),firstname,lastname
FROM crew WHERE token(crewname) >= token('Jayne') AND token(crewname) <= token('Zoey');
crewname | token(crewname) | firstname | lastname
----------+----------------------+-----------+-----------
Jayne | -3415298744707363779 | Jayne | Cobb
Wash | 596395343680995623 | Hoban | Washburne
Mal | 4016264465811926804 | Malcolm | Reynolds
Zoey | 7853923060445977899 | Zoey | Washburne
(4 rows)
You should be able to do something similar with your partition key values as well.
Otherwise, you could probably accomplish this using one of the drivers. In her article Things You Should Be Doing When Using Cassandra Drivers DataStax's Rebecca Mills describes how to page through large result sets using setFetchSize (her example is below):
Statement stmt = new SimpleStatement("select * FROM raw_weather_data WHERE wsid= '725474:99999' AND year = 2005 AND month = 6");
stmt.setFetchSize(24);
ResultSet rs = session.execute(stmt);
Iterator<Row> iter = rs.iterator();
while (!rs.isFullyFetched()) {
rs.fetchMoreResults();
Row row = iter.next();
System.out.println(row);
}

Related

Oracle: update table where number column in a string variable

Here is what I want to do:
current table:
+----+-------------+
| id | data |
+----+-------------+
| 1 | max |
| 2 | linda |
| 3 | sam |
| 4 | henry |
+----+-------------+
I have a id_str=1,3,4
Mystery Query - something like:
UPDATE table SET data = 'jen' where id in (id_str)
resulting table:
+----+-------------+
| id | data |
+----+-------------+
| 1 | jen |
| 2 | lindaa |
| 3 | jen |
| 4 | jen |
+----+-------------+
Starting from a list of ids given as a CSV string, say :id_str, you can do:
update mytable
set data = 'jen'
where ',' || :id_str || ',' like ',%' || id || ',%'
An alternative is a regex functions:
where regexp_like(:id_str, '(^|,)' || id || '(,|$)')
Both solutions work, but are rather inefficient. A much better solution would be not to pass the serch parameters as a proper list of values rather than a CSV string.

Cross-referencing values from a reference table with fuzzy inputs

I've got a Microsoft Access database with several tables. I've thrown 2 of those into an Excel file to simplify my work, but either an Access or Excel solution can be used for this. Below are examples of the data that needs to be manipulated, but in those records there's a lot of other columns and information.
I've got Table 1 (Input Table):
| Bank | Reference |
|-----------------|-----------|
| Chase Bank LLC | |
| JPMorgan Chase | |
| Chase | |
| Bank of America | |
| Bank of America | |
| Wells Fargo | |
The Reference column is empty. I want to fill it based on the reference table, which contains the IDs that would go into the Reference column.
Table 2 (Reference Table):
| Bank | ID |
|-----------------|-----------|
| Chase Bank | 1 |
| Bank of America | 2 |
| Wells Fargo | 3 |
So the solution would fill the "Reference" column like this:
| Bank | Reference |
|-----------------|-----------|
| Chase Bank LLC | 1 |
| JPMorgan Chase | 1 |
| Chase | 1 |
| Bank of America | 2 |
| Bank of America | 2 |
| Wells Fargo | 3 |
Since this is taken from a database's table, these aren't really ordered records. The purpose of this is to create a relationship in an already-existing database that didn't have those relationships set up.
a join between the 2 text fields, in an Update query, will provide a write of the ID for those records that exactly match.
there is no technology/option for the non matching; you can only apply some creative designs... for instance the chase bank does match for the first 10 characters... so for the non matched you could set up a temp table with a new field that is Left(fieldname,10)...join on this new field to get the ID into the temp table - - and then do a 2nd Update query to move the ID again finally using the full name

add uniqueness to a table column just for some cases in mysql using knex

I'm using mysql. I want a column to have unique values just in some cases.
Example, the table can have the next vales:
+----+-----------+----------+------------+
| id | user_id | col1 | col2 |
+----+-----------+----------+------------+
| 1 | 2 | no | no |
| 2 | 2 | no | no |
| 3 | 3 | no | yes |
| 4 | 2 | yes | no |
| 5 | 2 | no | yes |
+----+-----------+----------+------------+
I want the no|no to be able to repeat for the same user but no the yes|no combination. Is this possible in mysql? And with knex?
My migration fot that table looks like this
return knex.schema.createTable('myTable', table => {
table.increments('id').unsigned().primary();
table.integer('uset_id').unsigned().notNullable().references('id').inTable('table_user').onDelete('CASCADE').index();
table.string('col1').defaultTo('yes');
table.string('col2').defaultTo('no');
});
That doesn't seem to be easy task to do. You would need partial unique index over multiple columns.
I couldn't spot that mysql would support partial indexes https://dev.mysql.com/doc/refman/8.0/en/create-index.html
So it could Something like what is described here, but using triggers for that seems a bit overkill https://dba.stackexchange.com/questions/41030/creating-a-partial-unique-constraint-for-mysql

What algorithm to use to exchange data between multiple parties

Let's say there are Alice, Bob, Eve and Arbitrator.
And let's say
Alice has a table of records
| id | pet type | birth date |
|----------------------------------
| 1 | cat | 2010-03-03 |
| 2 | dog | 2011-06-12 |
Bob has a table of records
| id | pet type | color |
|-------------------------------|
| 2 | dog | white |
| 3 | bird | green |
Eve has a table of records
| id | pet type | size |
|--------------------------------
| 1 | cat | small |
| 3 | bird | small |
Now everyone wants to enrich his own data by the neighbor's data with the corresponding id, but without disclosuring this id, for example,
Alice wants her data to be like the following
| id | pet type | birth date | color | size |
|------------------------------------------------------
| 1 | cat | 2010-03-03 | | small |
| 2 | dog | 2011-06-12 | white | |
Bob wants his data to be like the following
| id | pet type | birth date | color | size |
|------------------------------------------------------
| 2 | dog | 2011-06-12 | white | |
| 3 | bird | | green | small |
and so on.
Arbitrator coordinates all the exchange operations between the parties and also matches the data using corresponding encrypted id fields from the dataset of each party, so parties must communicate through the arbitrator, but not directly to each other.
Also arbitrator must be able to ensure that
hash(Alice's id = 2) = hash(Bob's id = 2), hash(Bob's id = 3) = hash(Eve's id = 3)
and so on, but must not be able to recover original identifiers, and also arbitrator must not be able to brute-force the encrypted identifiers (so if talking about some kind of hashes - they must be salted)
To simplify things for Alice, Bob and Eve - they would like to have only a single key to encrypt own identifiers, but this key should be different for each party, i.e.
F1(alive_key(alice_id)) = F2(bob_key(bob_id)) = F3(eve_key(eve_id))
where, F1, F2, F3 - are some functions the arbitrator applies to encrypted identifiers of Alice, Bob and Eve, and these functions does not decrypt the original identifiers, but lead the encrypted identifiers to be the same.
So the question - is there any algorithm that can help to solve such an issue?

How to pivot data using Informatica when you have variable amount of pivot rows?

Based on my earlier questions, how can I pivot data using Informatica PowerCenter Designer when I have variable amount of Addresses in my data. I would like to Pivot e.g four addresses from my data. This is the structure of the source data file:
+---------+--------------+-----------------+
| ADDR_ID | NAME | ADDRESS |
+---------+--------------+-----------------+
| 1 | John Smith | JohnsAddress1 |
| 1 | John Smith | JohnsAddress2 |
| 1 | John Smith | JohnsAddress3 |
| 2 | Adrian Smith | AdriansAddress1 |
| 2 | Adrian Smith | AdriansAddress2 |
| 3 | Ivar Smith | IvarAddress1 |
+---------+--------------+-----------------+
And this should be the resulting table:
+---------+--------------+-----------------+-----------------+---------------+----------+
| ADDR_ID | NAME | ADDRESS1 | ADDRESS2 | ADDRESS3 | ADDRESS4 |
+---------+--------------+-----------------+-----------------+---------------+----------+
| 1 | John Smith | JohnsAddress1 | JohnsAddress2 | JohnsAddress3 | NULL |
| 2 | Adrian Smith | AdriansAddress1 | AdriansAddress2 | NULL | NULL |
| 3 | Ivar Smith | IvarAddress1 | NULL | NULL | NULL |
+---------+--------------+-----------------+-----------------+---------------+----------+
I guess I can use
SOURCE --> SOURCE_QUALIFIER --> SORTER --> AGGREGATOR --> EXPRESSION --> TARGET TABLE
But what kind of port should I use in AGGREGATOR and EXPRESSION transforms?
You should use something along the lines of this:
Source->Expression->Aggregator->Target
In the expression, add a variable port:
v_count expr: IIF(ISNULL(v_COUNT) OR v_COUNT=3, 1, v_COUNT + 1)
OR
v_count expr: IIF(ADDR_ID=v_PREVIOUS_ADDR_ID, v_COUNT + 1, 1)
And 3 output ports:
o_addr1 expr: DECODE(TRUE, v_COUNT=1, ADDR_IN, NULL)
o_addr2 expr: DECODE(TRUE, v_COUNT=2, ADDR_IN, NULL)
o_addr3 expr: DECODE(TRUE, v_COUNT=3, ADDR_IN, NULL)
Then use the aggregator, group by ID and select always the Max,
e.g.
agg_addr1: expr: MAX(O_ADDR1)
agg_addr2: expr: MAX(O_ADDR2)
agg_addr3: expr: MAX(O_ADDR3)
If you need more denormalized ports, add additional ports and set the initial state
of the v_count variable accordingly.
Try this:
SOURCE --> SOURCE_QUALIFIER --> RANK --> AGGREGATOR -->TARGET
In RANK transformation, group by on ADDR_ID and select ADDRESS as rank port. In properties tab, select Number of ranks as 4.
In AGGREGATOR transformation group by on ADDR_ID and use the following output port expressions (RANKINDEX will be generated by RANK transformation):
ADDRESS1 = MAX(ADDRESS,RANKINDEX=1)
ADDRESS2 = MAX(ADDRESS,RANKINDEX=2)
ADDRESS3 = MAX(ADDRESS,RANKINDEX=3)
ADDRESS4 = MAX(ADDRESS,RANKINDEX=4)

Resources