cassandra copy makes empty string null on reimport - cassandra

I use COPY command to take a copy of data. COPY looks more simple than sstables. But it looks like it can't import empty string. Columns which are empty in original table are null in imported. Steps to reproduce below.
CREATE TABLE empty_example (id bigint PRIMARY KEY, empty_column text, null_column text);
INSERT INTO empty_example (id, empty_column) VALUES ( 1, '');
SELECT * from empty_example ;
id | empty_column | null_column
----+--------------+-------------
1 | | null
COPY empty_example TO 'empty_example.csv';
TRUNCATE empty_example ;
COPY empty_example FROM 'empty_example.csv';
SELECT * from empty_example ;
id | empty_column | null_column
----+--------------+-------------
1 | null | null
I tried to play with WITH options but couldn't solve the issue.
Is it possible to preserve null/empty string distinction with COPY?

Which version of Cassandra are you using ? Since Cassandra 3.4, COPY commands has a bunch of options to handle empty or null strings:
cqlsh:system_schema> help COPY
COPY [cqlsh only]
COPY x FROM: Imports CSV data into a Cassandra table
COPY x TO: Exports data from a Cassandra table in CSV format.
COPY <table_name> [ ( column [, ...] ) ]
FROM ( '<file_pattern_1, file_pattern_2, ... file_pattern_n>' | STDIN )
[ WITH <option>='value' [AND ...] ];
File patterns are either file names or valid python glob expressions, e.g. *.csv or folder/*.csv.
COPY <table_name> [ ( column [, ...] ) ]
TO ( '<filename>' | STDOUT )
[ WITH <option>='value' [AND ...] ];
Available common COPY options and defaults:
DELIMITER=',' - character that appears between records
QUOTE='"' - quoting character to be used to quote fields
ESCAPE='\' - character to appear before the QUOTE char when quoted
HEADER=false - whether to ignore the first line
NULL='' - string that represents a null value
As you can see, by default the option NULL='' means that empty string is treated as null value. To change this behavior, set NULL='null' or whatever character you want for null value ...

Related

How do I escape the ampersand character (&) in cql?

I am inserting a statement into a table that looks something like this:
insert into db.table (field1, field2) values (1, 'eggs&cheese')
but when i later query this error on our servers, my query returns:
eggs\u0026cheese instead.
Not sure whether to use \ or '
If anyone can help, that would be great. Thank you!
This doesn't appear to be a problem with CQL but the way your app displays the value.
For example, if the CQL column type is text, the unicode character is encoded as a UTF-8 string.
Using this example schema:
CREATE TABLE unicodechars (
id int PRIMARY KEY,
randomtext text
)
cqlsh displays the ampersand as expected:
cqlsh> SELECT * FROM unicodechars ;
id | randomtext
----+-------------
1 | eggs&cheese

Is it possible to create a PERSISTED column that's made up of an array of specific JSON values and if so how?

Is it possible to create a PERSISTED column that's made up of an array of specific JSON values and if so how?
Simple Example (json column named data):
{ name: "Jerry", age: 91, mother: "Janet", father: "Eustace" }
Persisted Column Hopeful (assuming json column is called 'data'):
ALTER TABLE tablename ADD parents [ data::$mother, data::$father ] AS PERSISTED JSON;
Expected Output
| data (json) | parents (persisted json) |
| -------------------------------------------------------------- | ------------------------- |
| { name: "Jerry", age: 91, mother: "Janet", father: "Eustace" } | [ "Janet", "Eustace" ] |
| { name: "Eustace", age: 106, mother: "Jane" } | [ "Jane" ] |
| { name: "Jim", age: 54, mother: "Rachael", father: "Dom" } | [ "Rachael", "Dom ] |
| -------------------------------------------------------------- | ------------------------- |
The above doesn't work, but hopefully it conveys what I'm trying to accomplish.
There is no PERSISTED ARRAY data type for columns, but there is a JSON column type that can store arrays.
For example:
-- The existing table
create table tablename (
id int primary key AUTO_INCREMENT
);
-- Add the new JSON column
ALTER TABLE tablename ADD column parents JSON;
-- Insert data into the table
INSERT INTO tablename (parents) VALUES
('[ "Janet", "Eustace" ]'),
('[ "Jane" ]');
-- Select table based on matches in the JSON column
select *
from tablename
where JSON_ARRAY_CONTAINS_STRING(parents, 'Jane');
-- Change data in the JSON column
update tablename
set parents = JSON_ARRAY_PUSH_STRING(parents, 'Jon')
where JSON_ARRAY_CONTAINS_STRING(parents, 'Jane')
-- Show changed data
select *
from tablename
where JSON_ARRAY_CONTAINS_STRING(parents, 'Jane');
Check out more examples of pushing and selecting JSON data in the docs at https://docs.memsql.com/v7.0/concepts/json-guide/
Here is a sample table definition where I do something similar with customer and event:
CREATE TABLE `eventsext2` (
`data` JSON COLLATE utf8_bin DEFAULT NULL,
`memsql_insert_time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`customer` as data::$custID PERSISTED text CHARACTER SET utf8 COLLATE utf8_general_ci,
`event` as data::$event PERSISTED text CHARACTER SET utf8 COLLATE utf8_general_ci,
customerevent as concat(data::$custID,", ",data::$event) persisted text,
`generator` as data::$genID PERSISTED text CHARACTER SET utf8 COLLATE utf8_general_ci,
`latitude` as (substr(data::$longlat from (instr(data::$longlat,'|')+1))) PERSISTED decimal(21,18),
`longitude` as (substr(data::$longlat from 1 for (instr(data::$longlat,'|')-1))) PERSISTED decimal(21,18),
`location` as concat('POINT(',latitude,' ',longitude,')') PERSISTED geographypoint,
KEY `memsql_insert_time` (`memsql_insert_time`)
/*!90618 , SHARD KEY () */
) /*!90623 AUTOSTATS_CARDINALITY_MODE=OFF, AUTOSTATS_HISTOGRAM_MODE=OFF */ /*!90623 SQL_MODE='STRICT_ALL_TABLES' */;
Though not your question, denormalizing this table into two tables might be a good choice:
create table parents (
id int primary key auto_increment,
tablenameid int not null,
name varchar(20),
type int not null, -- 1=Father, 2=Mother, ideally foreign key to other table
);

How to insert an array of strings in javascript into PostgreSQL

I am building an API server which accepts file uploads using multer.
I need to store an array of all the paths to all files uploaded for each request to a column in the PostgreSQL database which I have connected to the server.
Say I have a table created with the following query
CREATE TABLE IF NOT EXISTS records
(
id SERIAL PRIMARY KEY,
created_on TIMESTAMPTZ NOT NULL DEFAULT NOW(),
created_by INTEGER,
title VARCHAR NOT NULL,
type VARCHAR NOT NULL
)
How do I define a new column filepaths on the above table where I can insert a javascript string array (ex: ['path-to-file-1', 'path-to-file-2', 'path-to-file-3']).
Also how do I retrive, update/edit the list in javascript using node-postgres
You have 2 options:
use json or jsonb type. In the case string to insert will look:
'["path-to-file-1", "path-to-file-2", "path-to-file-3"]'
I would prefer jsonb - it allows to have good indexes. Json is rather just text with some additional built-in functions.
Use array of text - something like filepaths text[]. To insert you can use:
ARRAY ['path-to-file-1', 'path-to-file-2', 'path-to-file-3']
or
'{path-to-file-1,path-to-file-2,path-to-file-3,"path to file 4"}'
You need to use " here only for elements that contain space and so on. But you fill free to use it for all elements too.
You can create a file table that has a path column and a foreign key reference to the record that it belongs to. This way you can store the path as just a text column instead of storing an array in a column, which is better practice for relational databases. You'll also be able to store additional information on a file if you need to later. And it'll be more simple to interact with the file path records since you'd add a new file path by just inserting a new row into the file table (with the appropriate foreign key) and remove by deleting a row from the file table.
For example:
CREATE TABLE IF NOT EXISTS file (
record_id integer NOT NULL REFERENCES records(id) ON DELETE CASCADE,
path text NOT NULL
);
Then to get all the files for a record you can join the two tables together and convert to an array if you want.
For example:
SELECT
records.*,
ARRAY (
SELECT
file.path
FROM
file
WHERE
records.id = file.record_id
) AS file_paths
FROM
records;
Sample input (using only the title field of records):
INSERT INTO records (title) VALUES ('A'), ('B'), ('C');
INSERT INTO file (record_id, path) VALUES (1, 'patha1'), (1, 'patha2'), (1, 'patha3'), (2, 'pathb1');
Sample output:
id | title | file_paths
----+-------+------------------------
1 | A | {patha1,patha2,patha3}
2 | B | {pathb1}
3 | C | {}

Updating to empty set

I just created a new column for my table
alter table user add (questions set<timeuuid>);
Now the table looks like
user (
google_id text PRIMARY KEY,
date_of_birth timestamp,
display_name text,
joined timestamp,
last_seen timestamp,
points int,
questions set<timeuuid>
)
Then I tried to update all those null values to empty sets, by doing
update user set questions = {} where google_id = ?;
for each google id.
However they are still null.
How can I fill that column with empty sets?
A set, list, or map needs to have at least one element because an
empty set, list, or map is stored as a null set.
source
Also, this might be helpful if you're using a client (java for instance).
I've learnt that there's not really such a thing as an empty set, or list, etc.
These display as null in cqlsh.
However, you can still add elements to them, e.g.
> select * from id_set;
set_id | set_content
-----------------------+---------------------------------
104649882895086167215 | null
105781005288147046623 | null
> update id_set set set_content = set_content + {'apple','orange'} where set_id = '105781005288147046623';
set_id | set_content
-----------------------+---------------------------------
104649882895086167215 | null
105781005288147046623 | { 'apple', 'orange' }
So even though it displays as null you can think of it as already containing the empty set.

COPY FROM CSV with static fields on Postgres

I'd like to switch an actual system importing data into a PostgreSQL 9.5 database from CSV files to a more efficient system.
I'd like to use the COPY statement because of its good performance. The problem is that I need to have one field populated that is not in the CSV file.
Is there a way to have the COPY statement add a static field to all the rows inserted ?
The perfect solution would have looked like that :
COPY data(field1, field2, field3='Account-005')
FROM '/tmp/Account-005.csv'
WITH DELIMITER ',' CSV HEADER;
Do you know a way to have that field populated in every row ?
My server is running node.js so I'm open to any cost-efficient solution to complete the files using node before COPYing it.
Use a temp table to import into. This allows you to:
add/remove/update columns
add extra literal data
delete or ignore records (such as duplicates)
, before inserting the new records into the actual table.
-- target table
CREATE TABLE data
( id SERIAL PRIMARY KEY
, batch_name varchar NOT NULL
, remote_key varchar NOT NULL
, payload varchar
, UNIQUE (batch_name, remote_key)
-- or::
-- , UNIQUE (remote_key)
);
-- temp table
CREATE TEMP TABLE temp_data
( remote_key varchar -- PRIMARY KEY
, payload varchar
);
COPY temp_data(remote_key,payload)
FROM '/tmp/Account-005'
;
-- The actual insert
-- (you could also filter out or handle duplicates here)
INSERT INTO data(batch_name, remote_key, payload)
SELECT 'Account-005', t.remote_key, t.payload
FROM temp_data t
;
BTW It is possible to automate the above: put it into a function (or maybe a prepared statement), using the filename/literal as argument.
Set a default for the column:
alter table data
alter column field3 set default 'Account-005'
Do not mention it the the copy command:
COPY data(field1, field2) FROM...

Resources