Cassandra 2.2.11 add new map column from text column - cassandra

Let's say I have table with 2 columns
primary key: id - type varchar
and non-primary-key: data - type text
Data column consist only of json values for example like:
{
"name":"John",
"age":30
}
I know that i can not alter this column to map type but maybe i can add new map column with values from data column or maybe you have some other idea?
What can i do about it ? I want to get map column in this table with values from data

You might want to make use of the CQL COPY command to export all your data to a CSV file.
Then alter your table and create a new column of type map.
Convert the exported data to another file containing UPDATE statements where you only update the newly created column with values converted from JSON to a map. For conversion use a tool or language of your choice (be it bash, python, perl or whatever).
BTW be aware, that with map you specify what data type is your map's key and what data type is your map's value. So you will most probably be limited to use strings only if you want to be generic, i.e. a map<text, text>. Consider whether this is appropriate for your use case.

Related

How to copy Numeric Array from parquet to postgres using Azure data factory

We are trying to copy the parquet file from blob to Postgres table. Now the problem is my source parquet has some columns with number arrays which ADF is complaining to be not supported, if I change that to string datatype my Postgres say that it is expecting Number Array
Is there some solution or workaround to tackle this?
The workaround for the problem would be to change the type of those columns from array type to string in your Postgres table. This can be done using the following code:
ALTER TABLE <table_name> ALTER COLUMN <column_name> TYPE text;
I have taken a sample table player consisting of 2 array columns position (integer array) and role (text array).
After changing the type of these columns, the table looks like this.
ALTER TABLE player1 ALTER COLUMN position TYPE varchar(40);
ALTER TABLE player1 ALTER COLUMN role TYPE varchar(40);
You can now complete the copy activity in ADF without getting any errors.
If there are any existing records, the specific array type values will be converted to string type, and it also helps you complete the copy activity without any errors. The following is an example of this case.
Initial table data (array type columns): https://i.stack.imgur.com/O6ErV.png
Convert to String type: https://i.stack.imgur.com/Xy69B.png
After using ADF copy activity: https://i.stack.imgur.com/U8pFg.png
NOTE:
Considering you have changed the array column to string type in the source file, if you can make changes such that the list of values are enclosed within {} rather than [], then you can convert the column type back to array type using ALTER query.
If list of elements are enclosed within [] and you try to convert the columns back to array type in your table, it throws the following error.
ERROR: malformed array literal: "[1,1,0]"
DETAIL: Missing "]" after array dimensions.

Passing the Dataflow Parameter to Sink Key column in Azure Data factory

I wanted to implement SCD type 2 logic but using dynamic tables and dynamic key fields from Config Table, I have a challenge to pass the Data Flow Parameter as Sink Key Column for my Alter Row activity, it is not taking the parameter values and always gives the error as invalid key column name, I tried picking the Dataflow parameter for the expression builder at sink key column and trying to pass the value from alter row transformation and I have named the field with parameter in the select statement as well , any help or suggestion highly appreciated
Please clink below image
Sample How I wanted to Pass Dynamic Values in Sink Mapping
Trying to Give the Dynamic Value to Key Value
You have "List of columns" selected, so ADF is looking for a column in your target table that is literally called "$TargetPK1Parameter".
Change the selector to "Custom expression" and enter a string array parameter. The parameter can be an array of strings that represent names of key columns in your target table.
It should look something like this:
I encountered a similar problem when trying to pass a composite key, parameterized, as part of the update method to sink. This now allows me to fully parameterise my dataflow and it handles both composite keys and single columns keys.
Here's how the data looks in my config table:
UpsertKeyColumn = DOMNAME,DDLANGUAGE,AS4LOCAL,VALPOS,AS4VERS
A parameter value is set in the dataflow
Upsert_Key_Column = #item().UpsertKeyColumn
Finally, in the Sink settings, Custom Expression is selected for Key columns and the following expression is entered - split($upsert_key_column,',')

Create a Presto table with a column as an Array datatype

How does one create a table in Presto with one of the columns having an Array datatype?
For example:
CREATE TABLE IF NOT EXISTS (ID BIGINT, ARRAY_COL ARRAY)...
Edit
The syntax for array type is array(element_type), like this:
create table memory.default.t (a array(varchar));
Original answer
The syntax for array type is array<element_type>, like this:
create table memory.default.t (a array<varchar>);
Note: the connector in which you create the table must support the array data type and not every connector supports it.

How to convert int column to float/double column in Cassandra database table

I am using cassandra database in production.I have one column field in
a cassandra table e.g coin_deducted is int data type.
I need to convert coin_deducted in float/double data type.
But I tried to change data type by using alter
table command but cassandra is throwing incompatible issue while
converting int to float. Is there any way to do this?
e.g: currently it is showing like:
user_id | start_time | coin_deducted (int)
122 | 26-01-01 | 12
I want to be
user_id | start_time | coin_deducted (float)
122 | 26-01-01 | 12.0
Is it possible to copy entire one column field into new added column
field in same table?
Changing type of column is possible only if old type and new type are compatible. From documentation:
To change the storage type for a column, the type you are changing to
and from must be compatible.
One more proof that this cannot be done is when you write statement:
ALTER TABLE table_name ALTER int_column TYPE float;
it will tell you that types are incompatible. This is also logical since float is broader type than int (has decimal) and database would not know what to put on decimal space. Here is a list of compatible types which can be altered one to another without problems.
Solution 1
You can do it on application level, create one more column in that table which is float and create background job which will loop through all records and copy your int value to new float column.
We created cassandra migration tool for DATA and SCHEMA migrations for cases like this, you add it as dependency and can write SCHEMA migration which will add new column and add DATA migration which will fire in background and copy values from old column to new column. Here is a link to Java example application to see usage.
Solution 2
If you do not have application level and want to do this purely in CQL you can use COPY command to extract data to CSV, create new table with float, sort manually int values in CSV and return data to new table.

Time UUID type in pycassa

I'm having problems with using the time_uuid type as a key in my columnfamily. I want to store my records, and have them ordered by when they were inserted, and then I figured that the time_uuid is a good way to go. This is how I've set up my column family:
sys.create_column_family("keyspace", "records", comparator_type=TIME_UUID_TYPE)
When I try to insert, I do this:
q=pycassa.ColumnFamily(pycassa.connect("keyspace"), "records")
myKey=pycassa.util.convert_time_to_uuid(datetime.datetime.utcnow())
q.insert(myKey,{'somedata':'comevalue'})
However, when I insert data, I always get an error:
Argument for a v1 UUID column name or value was neither a UUID, a datetime, or a number.
If I change the comparator_type to UTF8_TYPE, it works, but the order of the items when returned are not as they should be. What am I doing wrong?
The problem is that in your data model, you are using the time as a row key. Although this is possible, you won't get a meaningful ordering unless you also use the ByteOrderedPartitioner.
For this reason, most people insert time-ordered data using the time as a column name, not a row key. In this model, your insert statement would look like:
q.insert(someKey, {datetime.datetime.utcnow(): 'somevalue'})
where someKey is a key that relates to the entire time series that you're inserting (for example, a username). (Note that you don't have to convert the time to UUID, pycassa does it for you.) To store something more than a single value, use a supercolumn or a composite key.
If you really want to store the time in your row keys, then you need to specify key_validation_class, not comparator_type. comparator_type sets the type of the column names, while key_validation_class sets the type of the row keys.
sys.create_column_family("keyspace", "records", key_validation_class=TIME_UUID_TYPE)
Remember the rows will not be sorted unless you also use the ByteOrderedPartitioner.
The comparator for a column family is used for ordering the columns within each row. You are seeing that error because 'somedata' is valid utf-8 but not a valid uuid.
The ordering of the rows stored in cassandra is determined by the partitioner. Most likely you are using RandomPartitioner which distributes load evenly across your cluster but does not allow for meaningful range queries (the rows will be returned in a random order.)
http://wiki.apache.org/cassandra/FAQ#range_rp

Resources