Range filters on Google BigTable - node.js

I am currently testing out BigTable at the moment to see if is something we will use.
We currently use CloudSql with Postgres 9.6 with the current schema of;
id, sensor_id, time, value
Most of our queries we query for data between a range, something like this
SELECT
*
FROM
readings
WHERE
sensor_id IN(7297,7298,7299,7300)
AND time BETWEEN '2018-07-15 00:00:00' AND '2019-07-15 00:00:00'
ORDER BY
time, sensor_id
Each sensor can have readings every 10mins or so, so that's a fair bit of data.
At last check, we have 2 billion records, which is increasing a lot each day.
For BigTable I am importing with a row key of
readings#timestamp#sensorId so something like this readings#20180715000000#7297
So far seems so good.
To query a range (using node) I am doing this
const fromDate = '20180715000000'
const toDate = '20190715000000'
const ranges = sensorIds.map(sensorId => {
return {
start: `readings#${fromDate}#${sensorId}`,
end: `readings#${toDate}#${sensorId}`,
}
});
const results = [];
await table.createReadStream({
column: {
cellLimit: 1,
},
ranges
})
.on('error', err => {
console.log(err);
})
.on('data', row => {
results.push({
id: row.id,
data: row.data
})
})
.on('end', async () => {
console.log(` ${results.length} Rows`)
})
My understanding of this would be that the results would be similar to the sql query above, but it seems to be returning for all sensor ids across the date range, not by the ones specified within the query.
My questions;
Is this the correct row key that we should be using for this type of querying
If this is correct, can we filter per range? or is there a filter that we have to use to only return the values for the given date range and sensorId range?
thanks in advance for your advice.

The problem is that you are setting up your ranges variable in a wrong way and Big Table is getting lost because of that, try doing the following:
const fromDate = '20180715000000'
const toDate = '20190715000000'
const sensorId = sensorIds[0]
const filter = {
column: {
cellLimit: 1,
},
value: {
start: `readings#${fromDate}#${sensorId}`,
end: `readings#${toDate}#${sensorId}`,
}
};
const results = [];
await table.createReadStream({
filter
})
.on('error', err => {
console.log(err);
})
.on('data', row => {
results.push({
id: row.id,
data: row.data
})
})
.on('end', async () => {
console.log(` ${results.length} Rows`)
})
**NOTE: I am getting the first position of sensorIds which I assume is a list of all the Ids, but you can select any of them. Also this is all untested but should be a good starting point for you.
You can find snippets on the usage of the Node.js Client for BigTable on this Github Repo.

Related

Multiple queries on Firestore CollectionReference and QuerySnapshot: cloud functions in node.js

Using Cloud Functions and node.js, I have a Firestore collection that I am querying and then selecting one random document from the returned documents. The problem is that I can't seem to query the QuerySnaphot that is returned from the initial query and I can't think of another way of going about it. For context, what I want to do is get the server time and minus 30 days from it, I then want to query the collection available-players for all player documents that have a timestamp of more than 30 days ago in the last_used field (which contains a timestamp) and then run some logic (that I know works independently so no need to show it all here).
const availablePlayers = db.collection("available-players");
const now = admin.firestore.Timestamp.now();
const intervalInMillis = 30 * 24 * 60 * 60 * 1000;
const cutoffTime = admin.firestore.Timestamp.fromMillis(now.toMillis() - intervalInMillis);
const key = availablePlayers.doc().id;
// query the last_used field
const returnedPlayers = availablePlayers.where(last_used, "<=", cutoffTime);
// this now doesn't work
// it does work if I don't run the above query and just query availablePlayers
returnedPlayers.where(admin.firestore.FieldPath.documentId(), '>=', key).limit(1).get()
.then(snapshot => {
if(snapshot.size > 0) {
// do some stuff
})
});
}
else {
const player = returnedPlayers.where(admin.firestore.FieldPath.documentId(), '<', key).limit(1).get()
.then(snapshot => {
snapshot.forEach(doc => {
// do some stuff
})
});
})
.catch(err => {
console.log('Error getting documents', err);
});
}
})
.catch(err => {
console.log('Error getting documents', err);
});
The idea is that I want to run the query, get the returned documents meeting the time criteria, generate a key from these, then use the key to select a random document. Is this because I am trying to query a QuerySnapshot, whereas just querying availablePlayers works because I'm querying a CollectionReference? How do I get around this? Any help would be greatly appreciated!

NodeJs SQLite query returns other values than direct SQL on database

I'm a bit puzzled by the situation I have now.
I've a simple SQL statement I execute from NodeJs on a SQLite database. The SQL statement returns values with a lot of decimals; although my data only contain two decimals.
When I run the exact same query in DB Browser for SQLite, I have a correct result.
My NodeJs code
app.get('/payerComparison/', (req, res) => {
// Returns labels and values within response
var response = {};
let db = new sqlite3.Database('./Spending.db', sqlite3.OPEN_READONLY, (err) => {
if(err){console.log(err.message); return}
});
response['labels'] = [];
response['data'] = [];
db.each("SELECT payer, sum(amount) AS sum FROM tickets GROUP BY payer", (err, row) => {
if(err){console.log(err.message); return}
response['labels'].push(row.payer);
response['data'].push(row.sum);
});
db.close((err) => {
if(err){console.log(err.message); return}
// Send data
console.log(response);
res.send(JSON.stringify(response));
});
})
What I have in the command line
{
labels: [ 'Aurélien', 'Commun', 'GFIS', 'Pauline' ],
data: [ 124128.26, 136426.43000000008, 5512.180000000001, 39666.93 ]
}
The result in DB Browser
I hope you can help me clarify this mystery!
Thank you
Round the values up to 2 decimals :).
SELECT payer, round(sum(amount),2) AS sum FROM tickets GROUP BY payer

How to bulk insert in psql using knex.js?

I've searched a lot and this is deprecated question.
I'm trying to bulk insert in a table.
My approach was like this
knex('test_table').where({
user: 'user#example.com',
})
.then(result => {
knex.transaction(trx => {
Bluebird.map(result, data => {
return trx('main_table')
.insert(data.insert_row)
}, { concurrency: 3 })
.then(trx.commit);
})
.then(() => {
console.log("done bulk insert")
})
.catch(err => console.error('bulk insert error: ', err))
})
this could work if the columns where text or numeric columns, but i have jsonb columns
But I got this error:
invalid input syntax for type json
How can I solve this problem?
Sounds like some json columns doesn't have data stringified when sent to DB.
Also that is pretty much the slowest way to insert multiple rows, because you are doing 1 query for each inserted row and using single connection for inserting.
That concurrency 3 only causes pg driver to buffer those 2 queries before they are sent to the DB through the same transaction that all the others.
Something like this should be pretty efficient (didn't test running the code, so there might be errors):
const rows = await knex('test_table').where({ user: 'user#example.com' });
rows.forEach(row => {
// make sure that json columns are actually json strings
row.someColumnWithJson = JSON.stringify(row.someColumnWithJson);
});
await knex.transaction(async trx => {
let i, j, temparray, chunk = 200;
// insert rows in 200 row batches
for (i = 0, j = rows.length; i < j; i += chunk) {
rowsToInsert = rows.slice(i, i + chunk);
await trx('main_table').insert(rowsToInsert);
}
});
Also knex.batchInsert might work for you.

How to add limit options while fetching data from bigTable ? Can someone give me the proper syntax to do so in NodeJS

Currenlty I am doing like this
var [rowData] = await table.row(key).get({limit: 2});
Still am getting the 4 results instead of 2.
If you're looking to get 2 columns in an individual row there are a few ways to do that using filters.
You can use the cells per row filter:
const filter = {
row: {
cellLimit: 2,
},
};
await table
.createReadStream({
filter,
})
.on('error', err => {
// Handle the error.
console.log(err);
})
.on('data', row => {
// Use the row data.
})
.on('end', () => {
// All rows retrieved.
});
You could also do cells per column filter:
const filter = {
column: {
cellLimit: 2,
},
};
The Bigtable filter's documentation is still a WIP, but here are a set of code samples with various filters you can use with your reads.
I don't believe you can do a filter with a single row get, but you can create a scan that only reads that rowkey and it will effectively be the same thing. Let me know if you need more support on this question.
Worked for me:
var filter = [
{
family: 'payloads'
},
{
row: {
key: identifier,
cellLimit:2
}
}
];

Node Postgres COPY FROM failing silently

I am trying to use PostgreSQL's COPY FROM API to stream potentially-thousands of records into a database as they are dynamically generated in node.js code. To do so, I wrote this generic wrapper function:
function streamRows(client, { table, columns, data }) {
return new Promise((resolve, reject) => {
const sqlStream = client.query(
copyFrom(`COPY ${ table } (${ columns.join(', ') }) FROM STDIN`));
const rowStream = new Readable();
rowStream.pipe(sqlStream)
.on('finish', resolve)
.on('error', reject);
for (const row of data) {
rowStream.push(`${ row.join('\t') }\n`);
}
rowStream.push('\\.\n');
rowStream.push(null);
});
}
The database table I'm writing into looks like this:
CREATE TABLE devices (
id SERIAL PRIMARY KEY,
group_id INTEGER REFERENCES groups(id),
serial_number CHAR(12) NOT NULL,
status INTEGER NOT NULL
);
And I am calling it as follows:
function *genRows(id, devices) {
let count = 0;
for (const serial of devices) {
yield [ id, serial, UNSTARTED ];
count++;
if (count % 10 === 0) log.info(`Streamed ${ count } rows...`);
}
log.info(`Streamed ${ count } rows.`);
}
await streamRows(client, {
table: 'devices',
columns: [ 'group_id', 'serial_number', 'status' ],
data: genRows(id, devices),
});
The log statements in my generator function that's producing the per-row data all run as expected, and the output indicates that it is in fact always running the generator to completion, and streaming all the data rows I want. No errors are ever thrown. But if I wait for it to complete, the table sometimes ends up with 0 rows added to it--i.e., it looks like I sent all that data to Postgres, but none of it was actually inserted. What am I doing wrong?
I do not know exactly what parts of this made the difference and what is purely stylistic, but after playing around with a bunch of different examples from across the web, I managed to cobble together this function which works:
function streamRows(client, { table, columns, data }) {
return new Promise((resolve, reject) => {
const iterator = data[Symbol.iterator]();
const rs = new Readable();
const ws = client.query(copyFrom(`COPY ${ table } (${ columns.join(', ') }) FROM STDIN`));
rs._read = function() {
const { value, done } = iterator.next();
rs.push(done ? null : `${ value.join('\t') }\n`);
};
rs.on('error', reject);
ws.on('error', reject);
ws.on('end', resolve);
rs.pipe(ws);
});
}

Resources