How to insert an array of strings in javascript into PostgreSQL - node.js

I am building an API server which accepts file uploads using multer.
I need to store an array of all the paths to all files uploaded for each request to a column in the PostgreSQL database which I have connected to the server.
Say I have a table created with the following query
CREATE TABLE IF NOT EXISTS records
(
id SERIAL PRIMARY KEY,
created_on TIMESTAMPTZ NOT NULL DEFAULT NOW(),
created_by INTEGER,
title VARCHAR NOT NULL,
type VARCHAR NOT NULL
)
How do I define a new column filepaths on the above table where I can insert a javascript string array (ex: ['path-to-file-1', 'path-to-file-2', 'path-to-file-3']).
Also how do I retrive, update/edit the list in javascript using node-postgres

You have 2 options:
use json or jsonb type. In the case string to insert will look:
'["path-to-file-1", "path-to-file-2", "path-to-file-3"]'
I would prefer jsonb - it allows to have good indexes. Json is rather just text with some additional built-in functions.
Use array of text - something like filepaths text[]. To insert you can use:
ARRAY ['path-to-file-1', 'path-to-file-2', 'path-to-file-3']
or
'{path-to-file-1,path-to-file-2,path-to-file-3,"path to file 4"}'
You need to use " here only for elements that contain space and so on. But you fill free to use it for all elements too.

You can create a file table that has a path column and a foreign key reference to the record that it belongs to. This way you can store the path as just a text column instead of storing an array in a column, which is better practice for relational databases. You'll also be able to store additional information on a file if you need to later. And it'll be more simple to interact with the file path records since you'd add a new file path by just inserting a new row into the file table (with the appropriate foreign key) and remove by deleting a row from the file table.
For example:
CREATE TABLE IF NOT EXISTS file (
record_id integer NOT NULL REFERENCES records(id) ON DELETE CASCADE,
path text NOT NULL
);
Then to get all the files for a record you can join the two tables together and convert to an array if you want.
For example:
SELECT
records.*,
ARRAY (
SELECT
file.path
FROM
file
WHERE
records.id = file.record_id
) AS file_paths
FROM
records;
Sample input (using only the title field of records):
INSERT INTO records (title) VALUES ('A'), ('B'), ('C');
INSERT INTO file (record_id, path) VALUES (1, 'patha1'), (1, 'patha2'), (1, 'patha3'), (2, 'pathb1');
Sample output:
id | title | file_paths
----+-------+------------------------
1 | A | {patha1,patha2,patha3}
2 | B | {pathb1}
3 | C | {}

Related

Is it possible for CQL to parse a JSON object to insert data?

From what I looked so far, it seems impossible with Cassandra. But I thought I'd give it a shot:
How can I select a value of a json property, parsed from a json object string, and use it as part of an update / insert statement in Cassandra?
For example, I'm given the json object:
{
id:123,
some_string:"hello there",
mytimestamp: "2019-09-02T22:02:24.355Z"
}
And this is the table definition:
CREATE TABLE IF NOT EXISTS myspace.mytable (
id text,
data blob,
PRIMARY KEY (id)
);
Now the thing to know at this point is that for a given reason the data field will be set to the json string. In other words, there is no 1:1 mapping between the given json and the table columns, but the data field contains the json object as kind of a blob value.
... Is it possible to parse the timestamp value of the given json object as part of an insert statement?
Pseudo code example of what I mean, which obviously doesn't work ($myJson is a placeholder for the json object string above):
INSERT INTO myspace.mytable (id, data)
VALUES (123, $myJson)
USING timestamp toTimeStamp($myJson.mytimestamp)
The quick answer is no, it's not possible to do that with CQL.
The norm is to parse the elements of the JSON object within your application to extract the corresponding values to construct the CQL statement.
As a side note, I would discourage using the CQL blob type due to possible performance issues should the blob size exceeed 1MB. If it's JSON, consider storing it as CQL text type instead. Cheers!
Worth mentioning, but CQL can do a limited amount of JSON parsing on its own. Albeit, not as detailed as you're asking here (ex: USING timestamp).
But something like this works:
> CREATE TABLE myjsontable (
... id TEXT,
... some_string TEXT,
... PRIMARY KEY (id));
> INSERT INTO myjsontable JSON '{"id":"123","some_string":"hello there"}';
> SELECT * FROM myjsontable WHERE id='123';
id | some_string
-----+-------------
123 | hello there
(1 rows)
In your case you'd either have to redesign the table or the JSON payload so that they match. But as Erick and Cédrick have mentioned, the USING timestamp part would have to happen client-side.
What you detailed is doable with Cassandra.
Timestamp
To insert timestamp in a query it should be formatted as an ISO 8601 String. Sample examples could be found here. In your code, you might have to convert your incoming value to expected type and format.
Blob:
Blob expects to store binary data, as such it cannot be put Ad hoc as a String in a CQL query. (you can use TEXT type to do it if you want to encode base64)
When you need to insert binary data you need to provide proper type as well. For instance if you are working with Javascript to need to provide a Buffer as describe in the documentation Then when you execute your query you externalized your parameters
const sampleId = 123;
const sampleData = Buffer.from('hello world', 'utf8');
const sampleTimeStamp = new Date();
client.execute('INSERT INTO myspace.mytable (id, data) VALUES (?, ?) USING timestamp toTimeStamp(?)', [ sampleId, sampleData, sampleTimeStamp ]);

how to concatenate multiple row or column data into one row or column from a text file while importing data into db2 table

For Eg:
1)File has
ID|Name|job|hobby|salary|hobby2
2)Data:
1|ram|architect|tennis|20000|cricket
1|ram|architect|football|20000|gardening
2|krish|teacher|painting|25000|cooking
3)Table:
Columns in table: ID-Name-Job-Hobby-Salary
Is it possible to load data into table as below:
1-ram-architect-tenniscricketfootbalgardening-20000
2-krish-teacher-paintingcooking-25000
Command: db2 "Load CLIENT FROM ABC.FILE of DEL MODIFIED BY coldel0x7x keepblanks REPLACE INTO tablename(ID,Name,Job,Hobby,salary) nonrecoverable"
You cannot achieve what you think you want in a single action with either LOAD CLIENT or IMPORT.
You are asking to denormalize, and I presume you understand the consequences.
Regardless, you can use a multi-step approach, first load/import into a temporary table, and then in a second step use SQL to denormalize into the final table, before discarding the temporary table.
Or if you are adept with awk , and the data file is correctly sorted, then you can pre-process the file externally to a database before load/import.
Or use an ETL tool.
You may use the INGEST command instead of LOAD.
You must create the corresponding infrastructure for this command beforehand with the following command, for example:
CALL SYSINSTALLOBJECTS('INGEST', 'C', 'USERSPACE1', NULL);
Load your file afterwards with the following command:
INGEST FROM FILE ABC.FILE
FORMAT DELIMITED by '|'
(
$id INTEGER EXTERNAL
, $name CHAR(8)
, $job CHAR(20)
, $hobby CHAR(20)
, $salary INTEGER EXTERNAL
, $hobby2 CHAR(20)
)
MERGE INTO tablename
ON ID = $id
WHEN MATCHED THEN
UPDATE SET hobby = hobby CONCAT $hobby CONCAT $hobby2
WHEN NOT MATCHED THEN
INSERT (ID, NAME, JOB, HOBBY, SALARY) VALUES($id, $name, $job, $hobby CONCAT $hobby2, $salary);

How to insert value in already created Database table through pandas `df.to_sql()`

I'm creating new table then inserting values in it because the tsv file doesn't have headers so i need to create table structure first then insert the value. I'm trying to insert the value in database table which is been created. I'm using df.to_sql function to insert tsv values into database table but its creating table but it's not inserting values in that table and its not giving any type of error either.
I have tried to create new table through sqalchemy and insert value it worked but it didn't worked for already created table.
conn, cur = create_conn()
engine = create_engine('postgresql://postgres:Shubham#123#localhost:5432/walmart')
create_query = '''create table if not exists new_table(
"item_id" TEXT, "product_id" TEXT, "abstract_product_id" TEXT,
"product_name" TEXT, "product_type" TEXT, "ironbank_category" TEXT,
"primary_shelf" TEXT, apparel_category" TEXT, "brand" TEXT)'''
cur.execute(create_query)
conn.commit()
file_name = 'new_table'
new_file = "C:\\Users\\shubham.shinde\\Desktop\\wallll\\new_file.txt"
data = pd.read_csv(new_file, delimiter="\t", chunksize=500000, error_bad_lines=False, quoting=csv.QUOTE_NONE, dtype="unicode", iterator=True)
with open(file_name + '_bad_rows.txt', 'w') as f1:
sys.stderr = f1
for df in data:
df.to_sql('new_table', engine, if_exists='append')
data.close()
I want to insert values from df.to_sql() into database table
Not 100% certain if this argument works with postgresql, but I had a similar issue when doing it on mssql. .to_sql() already creates the table in the first argument of the method in new_table. The if_exists = append also doesn't check for duplicate values. If data in new_file is overwritten, or run through your function again, it will just add to the table. As to why you're seeing the table name, but not seeing the data in it, might be due to the size of the df. Try setting fast_executemany=True as the second argument of the create_engine.
My suggestion, get rid of create_query, and handle the data types after to_sql(). Once the SQL table is created, you can use your actual SQL table, and join against this staging table for duplicate testing. The non-duplicates can be written to the actual table, converting datatypes on UPDATE to match the tables data type structure.

How to make a lookup-table in cassandra

I want to create a table in cassandra, that is used as a lookup table. I have a lot of urls in my database and want to store ids instead of the urls-strings. So my approach is, to store the urls in a table with two columns: id (int) and url (text).
My problem is, that I need an index for the url field and also for the id field.
The first index is used during progressing new ulrs (so find an id for an url in the database) and the second index is use during displaying data (get the url for an id).
How can I implement that in cassandra?
I would suggest creating 2 separate tables for this:
CREATE TABLE id_url (id int primary key, url text);
and
CREATE TABLE url_id (url text primary key, id int);
Inserts to these tables should be done with a batch:
BEGIN BATCH
INSERT INTO id_url (id, url) VALUES (1, '<url1>');
INSERT INTO url_id (url, id) VALUES ('<url1>', 1);
APPLY BATCH
You could create your table like this:
CREATE TABLE urls_table(
id int PRIMARY KEY,
url text
);
and then create an index on the second column:
create index urls_table_url on urls_table (url);
Your first query is satisfied since you're querying over partition key. The second one is satisfied since you created an index on url column.

COPY FROM CSV with static fields on Postgres

I'd like to switch an actual system importing data into a PostgreSQL 9.5 database from CSV files to a more efficient system.
I'd like to use the COPY statement because of its good performance. The problem is that I need to have one field populated that is not in the CSV file.
Is there a way to have the COPY statement add a static field to all the rows inserted ?
The perfect solution would have looked like that :
COPY data(field1, field2, field3='Account-005')
FROM '/tmp/Account-005.csv'
WITH DELIMITER ',' CSV HEADER;
Do you know a way to have that field populated in every row ?
My server is running node.js so I'm open to any cost-efficient solution to complete the files using node before COPYing it.
Use a temp table to import into. This allows you to:
add/remove/update columns
add extra literal data
delete or ignore records (such as duplicates)
, before inserting the new records into the actual table.
-- target table
CREATE TABLE data
( id SERIAL PRIMARY KEY
, batch_name varchar NOT NULL
, remote_key varchar NOT NULL
, payload varchar
, UNIQUE (batch_name, remote_key)
-- or::
-- , UNIQUE (remote_key)
);
-- temp table
CREATE TEMP TABLE temp_data
( remote_key varchar -- PRIMARY KEY
, payload varchar
);
COPY temp_data(remote_key,payload)
FROM '/tmp/Account-005'
;
-- The actual insert
-- (you could also filter out or handle duplicates here)
INSERT INTO data(batch_name, remote_key, payload)
SELECT 'Account-005', t.remote_key, t.payload
FROM temp_data t
;
BTW It is possible to automate the above: put it into a function (or maybe a prepared statement), using the filename/literal as argument.
Set a default for the column:
alter table data
alter column field3 set default 'Account-005'
Do not mention it the the copy command:
COPY data(field1, field2) FROM...

Resources