I have a simple BigQuery table with a few columns. One of the columns is named my_id (of type STRING). I'm querying my BigQuery datasets like this:
import * as bq from "#google-cloud/bigquery";
const bqdb = new bq.BigQuery();
// ...
const projectId = 'my_project';
const datasetId = "my_dataset";
const tableId = "my_table";
const dbId = [projectId, datasetId, tableId].join('.');
// myIds is an array of strings
const stringedArray = myIds.map((id) => '\'' + id + '\'');
const sql_select_query = `
SELECT my_id
FROM \`${dbId}\`
WHERE my_id IN (${String(stringedArray)})
LIMIT 1
;
`;
const dataset = bqdb.dataset(datasetId);
const destinationTable = dataset.table(tableId);
console.log("Querying database...");
const queryOptions = {
query: sql_select_query,
destination: destinationTable,
write_disposition: "WRITE_APPEND",
priority: 'BATCH',
};
// Run the query as a job
const [job] = await bqdb.createQueryJob(queryOptions);
// Wait for the job to finish.
const results = await job.getQueryResults({maxResults: 500});
const resutsArray = results[0]
This query brings back the ENTIRE table (all rows, all columns). In other words, the result of this query is the same as if I'd wrote:
const sql_select_query = `
SELECT *
FROM \`${dbId}\`
;
`;
The output is formatted like a successful query: there's no error messages or warnings. But all my conditionals are being ignored, even LIMIT.
Why is BigQuery dumping the entire table into the response?
If your query settings are set to writing results to a destination table and using write_disposition: "WRITE_APPEND", job.getQueryResults() will return the data of the destination table along with the newly appended data which is an expected behavior of BigQuery.
job.getQueryResults() will only return the initially selected result if a destination table is configured and write disposition is either 'write if empty' or 'overwrite table'.
As a workaround you can query two times, first using temporary table to return results with condition and run the query again appending to the destination table.
Using your code, you can create two query options. First query option does not have destination and second query option has the destination along with write_disposition. Then create two jobs that uses the first and second query option.
Code snippet:
const queryOptions = {
query: sql_select_query,
priority: 'BATCH',
};
const queryOptionsWrite = {
query: sql_select_query,
destination: destinationTable,
write_disposition: "WRITE_APPEND",
priority: 'BATCH',
};
const [queryJob] = await bqdb.createQueryJob(queryOptions);
const queryResults = await queryJob.getQueryResults();
console.log("Query result:");
console.log(queryResults[0]);
const [writeJob] = await bqdb.createQueryJob(queryOptionsWrite);
const writeResults = await writeJob.getQueryResults();
console.log("\nUpdated table values:");
console.log(writeResults[0]);
Test done:
Related
I have an azure storage table which Im trying to filter by timestamp. I am using TableClient by #azure/data-tables to achieve this. I used odata query expression to do the filtering.
import { AzureNamedKeyCredential, TableClient } from "#azure/data-tables";
const credential = new AzureNamedKeyCredential("AccountName", "AccountKey");
const tableName = "SomeTable";
const serviceClient = new TableClient( `https://${connectionObject.AccountName}.table.core.windows.net`,
tableName,
credential
);
const page = await serviceClient
.listEntities({
queryOptions: {
// filter: `RowKey ge 'someunixvalue'`, //this works
filter: `timestamp ge datetime'2022-06-27T02:57:35.1831423Z'` //this doesnt work.
}
})
.byPage({ maxPageSize: 1000, continuationToken })
.next();
I have confirmed that the data has a field called timestamp and the value is 2022-06-27T02:57:35.1831423Z for one of the entities. The value of page.value will be an empty array for the failed case whereas the passing case has values.
What am I missing?
Based on below request object I need to update the table
var reqBody = {
"Name":"testing11",
"columns":[
{
"fieldExistsIn":"BOTH",
"columnWidth":5,
"hide":false
},
{
"fieldExistsIn":"BOTH",
"columnWidth":10,
"hide":false
}
],
"Range":{
"startDate":"20-Oct-2022",
"endDate":"26-Oct-2022"
}
}
UPDATE table_name
SET requestData = reqBody
WHERE requestData.Name = reqBody.oldName;
I am doing the insertion using the below query
await bigquery
.dataset(datasetId)
.table(tableId)
.insert(reqBody);
For table schema you can refer the below question
Node JS - Big Query insert to a request object fully into a record data type
As per the table schema and sample data you have provided, I tried to replicate it on my end.
Table schema:
As per your requirement, I have modified the code by referring to the queryParamsStructs and query sample codes from Google BigQuery Node.js Client API. To update multiple columns in a table using a JSON object through the BigQuery Client API, we have to write an UPDATE query and pass it in the code. The JSON object should be passed in params and you should have to access that JSON object in the UPDATE query as like below:
const {BigQuery} = require('#google-cloud/bigquery');
const bigquery = new BigQuery();
async function query() {
// Queries the U.S. given names dataset for the state of Texas.
const query = `UPDATE
\`ProjectID.DatasetID.TableID\`
SET reqData.columns = ARRAY(
SELECT AS STRUCT * FROM UNNEST(#reqData.columns)
),
reqData.Range.startDate = CAST(#reqData.Range.startDate AS DATE),
reqData.Range.endDate = CAST(#reqData.Range.endDate AS DATE)
WHERE reqData.Name = #reqData.Name`;
// For all options, see https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs/query
const options = {
query: query,
// Location must match that of the dataset(s) referenced in the query.
location: 'US',
params: {
"reqData": {
"Name": "testing",
"columns": [
{
"fieldExistsIn": "One",
"columnWidth": 1,
"hide": true
},
{
"fieldExistsIn": "four",
"columnWidth": 4,
"hide": true
}
],
"Range":{
"startDate": "2021-1-11",
"endDate": "2022-2-22"
}
}
},
};
// Run the query as a job
const [job] = await bigquery.createQueryJob(options);
console.log(`Job ${job.id} started.`);
// Wait for the query to finish
const [rows] = await job.getQueryResults();
// Print the results
console.log('Rows:');
rows.forEach(row => console.log(row));
}
// [END bigquery_query]
query();
Initial Data:
Updated Result:
Note: If you have inserted the rows to the table recently by using streaming insert, the rows cannot be modified with UPDATE, DELETE or MERGE within the last 30 minutes. Refer to this limitations doc for more information. If you tried to update a table within the last 30 mins, you will get the below error:
UnhandledPromiseRejectionWarning: Error: UPDATE or DELETE statement over table projectID.datasetID.tableID would affect rows in the streaming buffer, which is not supported
I am trying to compute the skip and limit with the total number of documents (count).
The issue is that the query object returns the count value when I try to get the items.
Follows an example:
const query = MyModel.find().or([AbilityRule1, AbilityRule2, ...]);
const count = await query.countDocuments(); // count = 3
// Some logic to compute the values of `skip` and `limit` with `count`
// const skip = ...
// const limit = ...
const items = await query.skip(skip).limit(limit); // items = 3 instead of [Model, Model, Model]
I found myself with a similar question when I was trying to implement pagination. The answer I came up with was to use the merge function on the Query object.
const query = MyModel.find().or([AbilityRule1, AbilityRule2, ...]);
const count = await MyModel.find().merge(query).countDocuments();
const items = await query.skip(skip).limit(limit);
Source: https://mongoosejs.com/docs/api/query.html#query_Query-merge
countDocuments() looks to be a method of Model, not Query. I guess in the way you're using it here by calling it on an existing query object, you may be just overwriting it.
Why not just:
const query = MyModel.find();
const count = await MyModel.countDocuments();
// ...
const items = await query.skip(skip).limit(limit);
Inspired by Mathew's answer:
I am adding this answer because I find it important that the second and third instructions do not depend on MyModel, they just depend on the query object.
const query = MyModel.find().or([AbilityRule1, AbilityRule2, ...]);
const count = await query.model.find().merge(query).countDocuments();
const items = await query.skip(skip).limit(limit);
During the big query, the parameters of the function in the SQL statement
I want to update the result of a sql statement by inserting it as # variable name.
However, there is no method to support node.js.
For python, there are methods like the following example.
You can use the function's parameters as # variable names.
query = "" "
SELECT word, word_count
FROM `bigquery-public-data.samples.shakespeare`
WHERE corpus = # corpus
AND word_count> = #min_word_count
ORDER BY word_count DESC;"" "
query_params = [
bigquery.ScalarQueryParameter ('corpus', 'STRING', 'romeoandjuliet'),
bigquery.ScalarQueryParameter ('min_word_count', 'INT64', 250)]
job_config = bigquery.QueryJobConfig ()
job_config.query_parameters = query_params
related document:
https://cloud.google.com/bigquery/docs/parameterized-queries#bigquery-query-params-python
I would like to ask for advice.
BigQuery node.js client supports parameterized queries when you pass them with the params key in options. Just updated the docs to show this. Hope this helps!
Example:
const sqlQuery = `SELECT word, word_count
FROM \`bigquery-public-data.samples.shakespeare\`
WHERE corpus = #corpus
AND word_count >= #min_word_count
ORDER BY word_count DESC`;
const options = {
query: sqlQuery,
// Location must match that of the dataset(s) referenced in the query.
location: 'US',
params: {corpus: 'romeoandjuliet', min_word_count: 250},
};
// Run the query
const [rows] = await bigquery.query(options);
let ip_chunk = "'1.2.3.4', '2.3.4.5', '10.20.30.40'"
let query = `
SELECT
ip_address.ip as ip,
instance.zone as zone,
instance.name as vmName,
instance.p_name as projectName
FROM
\`${projectId}.${datasetId}.${tableId}\` instance,
UNNEST(field_x.DATA.some_info) ip_address
WHERE ip_address.networkIP IN (${ip_chunk})`
**Use - WHERE ip_address.networkIP in (${ip_chunk})
instead of - WHERE ip in (${ip_chunk})**
It is worth adding that you can create a stored procedure and pass parameters the same way as the accepted answer shows.
const { BigQuery } = require('#google-cloud/bigquery');
function testProc() {
return new Promise((resolve) => {
const bigquery = new BigQuery();
const sql = "CALL `my-project.my-dataset.getWeather`(#dt);";
const options = {
query: sql,
params: {dt: '2022-09-01'},
location: 'US'
};
// Run the query
const result = bigquery.query(options);
return result.then((rows) => {
console.log(rows);
resolve(rows);
});
});
}
testProc().catch((err) => { console.error(JSON.stringify(helpers.getError(err.message))); });
I'm writing a small utility to copy data from one sqlite database file to another. Both files have the same table structure - this is entirely about moving rows from one db to another.
My code right now:
let tables: Array<string> = [
"OneTable", "AnotherTable", "DataStoredHere", "Video"
]
tables.forEach((table) => {
console.log(`Copying ${table} table`);
sourceDB.each(`select * from ${table}`, (error, row) => {
console.log(row);
destDB.run(`insert into ${table} values (?)`, ...row) // this is the problem
})
})
row here is a js object, with all the keyed data from each table. I'm certain that there's a simple way to do this that doesn't involve escaping stringified data.
If your database driver has not blocked ATTACH, you can simply tell the database to copy everything:
ATTACH '/some/where/source.db' AS src;
INSERT INTO main.MyTable SELECT * FROM src.MyTable;
You could iterate over the row and setup the query with dynamically generated parameters and references.
let tables: Array<string> = [
"OneTable", "AnotherTable", "DataStoredHere", "Video"
]
tables.forEach((table) => {
console.log(`Copying ${table} table`);
sourceDB.each(`select * from ${table}`, (error, row) => {
console.log(row);
const keys = Object.keys(row); // ['column1', 'column2']
const columns = keys.toString(); // 'column1,column2'
let parameters = {};
let values = '';
// Generate values and named parameters
Object.keys(row).forEach((r) => {
var key = '$' + r;
// Generates '$column1,$column2'
values = values.concat(',', key);
// Generates { $column1: 'foo', $column2: 'bar' }
parameters[key] = row[r];
});
// SQL: insert into OneTable (column1,column2) values ($column1,$column2)
// Parameters: { $column1: 'foo', $column2: 'bar' }
destDB.run(`insert into ${table} (${columns}) values (${values})`, parameters);
})
})
Tried editing the answer by #Cl., but was rejected. So, adding on to the answer, here's the JS code to achieve the same:
let sqlite3 = require('sqlite3-promise').verbose();
let sourceDBPath = '/source/db/path/logic.db';
let tables = ["OneTable", "AnotherTable", "DataStoredHere", "Video"];
let destDB = new sqlite3.Database('/your/dest/logic.db');
await destDB.runAsync(`ATTACH '${sourceDBPath}' AS sourceDB`);
await Promise.all(tables.map(table => {
return new Promise(async (res, rej) => {
await destDB.runAsync(`
CREATE TABLE ${table} AS
SELECT * FROM sourceDB.${table}`
).catch(e=>{
console.error(e);
rej(e);
});
res('');
})
}));