BigQuery Internal Error with `pageToken` when running in GCP - node.js

I run into this error with BigQuery:
"An internal error occurred and the request could not be completed. This is usually caused by a transient issue. Retrying the job with back-off as described in the BigQuery SLA should solve the problem: https://cloud.google.com/bigquery/sla. If the error continues to occur please contact support at https://cloud.google.com/support. Error: 5034996"
Two application use the same way with pageToken to paginate trough big result sets.
run query with initital startIndex: 0, maxResults: 10
get result together with pageToken
send to client
... some time may pass ...
request "next page": use pageToken together with maxResults: 10 to get the next result
repeat from 3.
NodeJS 16, #google-cloud/bigquery 6.0.3
Locally (Windows 10), for both application every thing works, pagination with pageToken returns results quite fast (<5s). All steps 1 to 6 and requesting multiple next pages one after another works, even tested that the pageToken still works after 60min+.
Production Cloud has problems: the initial query works always, but as soon as a pageToken is given, the query fails after ~15s+, even when "requested the next page directly (1-5s. delay) after getting the first page". Steps 1 to 3 work, but requesting next page fails nearly most time, it's very rare that it doesn't fail.
Production uses Google Cloud Functions and Google Cloud Run to serve the applications.
One application is an internal experiment, this application uses the same dataset + table when running locally and when running in "production".
The other application uses the same dataset but different tables for local/production - and is in another Google Cloud project than the first application.
Thus project-level quotas or e.g. different table setups locally/prod shouldn't cause the issue here (hopefully).
Example code used:
const [rows, , jobQueryResults] = await (job.getQueryResults(
('maxResults' in paginate ?
{
// there seems to be no need to give the startIndex again / but tested it also with giving a stable `0` index; atm. as soon as a pageToken is given the `startIndex` is omitted
startIndex: paginate.pageToken ? undefined : paginate.startIndex,
maxResults: paginate.maxResults,
pageToken: paginate.pageToken,
} : undefined) as QueryResultsOptions,
) as Promise<QueryRowsResponse>)
What wonders me is that the pageToken isn't shown in the log of the failure, the maxResults is visible:
Edit
The error suggests some SLA problem, one of the GCP projects only include experimentals (non public) applications, thus any traffic/usage can be easily monitored.
The monitoring for BigQuery in that project shows roughly 1 job per 1 second when testing it, job 1+2 where "load without pageToken" -> 3 used the pageToken from 2 and run into an error, the retries must happen from BigQuery side, there is nothing implemented from my side (using only the official BigQuery package).

Related

"the operation was canceled" error while creating Indexer to seed Azure search index

I am using Azure's SearchServiceClient to create indexer from my API. My data source for this indexer is a sql view which returns 2 million records and it is a long running query. On the call Indexers.CreateOrUpdateAsync to create indexer, I am getting this error - "the operation was canceled".
I tried adding 30 minute query timeout to indexer definition, but no luck. ({ "queryTimeout", "00:30:00" }. Reference -https://learn.microsoft.com/en-us/dotnet/api/microsoft.azure.search.models.indexingparameters.configuration?view=azure-dotnet#Microsoft_Azure_Search_Models_IndexingParameters_Configuration)no
The 'queryTimeout' parameter you are passing in looks correct, so it's possible you are hitting a timeout on the client side in the SDK rather than from the service. You should be able to configure the 'HttpClient' timeout on your SearchServiceClient to be longer. https://learn.microsoft.com/en-us/dotnet/api/microsoft.azure.search.searchserviceclient?view=azure-dotnet
You may also want to consider working to reduce your sql query time for best indexer performance.

Firebase Dynamic Pages using Cloud Functions with Firestore are slow

We have dynamic pages being served by Firebase Cloud Functions, but the TTFB is very slow on these pages with TTFB of 900ms - 2s, at first we just assumed it to be a cold start issue, but even with consistent traffic it is very slow at TTFB of 700ms - 1.2s.
This is a bit problematic for our project since it is organic traffic dependent and Google Pagespeed would need a server response of less than 200ms.
Anyway, we tried to check what might be causing the issue and we pinpointed it with Firestore, when a Cloud Function accesses Firestore, we noticed there are some delays. This is a basic sample code of how we implement Cloud Function and Firestore:
dynamicPages.get('/ph/test/:id', (req, res) => {
var globalStartTime = Date.now();
var period = [];
db.collection("CollectionTest")
.get()
.then((querySnapshot) => {
period.push(Date.now() - globalStartTime);
console.log('1', period);
return db.collection("CollectionTest")
.get();
})
.then((querySnapshot) => {
period.push(Date.now() - globalStartTime);
console.log('2', period);
res.status(200)
.send('Period: ' + JSON.stringify(period));
return true;
})
.catch((error) => {
console.log(error);
res.end();
return false;
});
});
This is running on Firebase + Cloud Functions + NodeJS
CollectionTest is very small with only 100 documents inside, with each document having the following fields:
directorName: (string)
directorProfileUrl: (string)
duration: (string)
genre: (array)
posterUrl: (string)
rating: (string)
releaseDate: (string)
status: (int)
synopsis: (string)
title: (string)
trailerId: (string)
urlId: (string)
With this test, we would get the following results:
[467,762] 1.52s
[203,315] 1.09s
[203,502] 1.15s
[191,297] 1.00s
[206,319] 1.03s
[161,267] 1.03s
[115,222] 843ms
[192,301] 940ms
[201,308] 945ms
[208,312] 950ms
This data is [Firestore Call 1 Exectution Time, Firestore Call 2 Exectution Time] TTFB
If we check the results of the test, there are signs that the TTFB is getting lower, maybe that is when the Cloud Function has already warmed up? But even so, Firestore is eating up 200-300ms in the Cloud Function based on the results of our second Firestore Call and even if Firestore took lesser time to execute, TTFB would still take up 600-800ms, but that is a different story.
Anyway, can anyone help how we can improve Firestore performance in our Cloud Functions (or if possible, the TTFB performance)? Maybe we are doing something obviously wrong that we don't know about?
I will try to help, but maybe lacks a bit of context about what you load before returning dynamicPages but here some clues:
First of all, the obvious part (I have to point it anyway):
1 - Take care how you measure your TTFB:
Measuring TTFB remotely means you're also measuring the network
latency at the same time which obscures the thing TTFB is actually
measuring: how fast the web server is able to respond to a request.
2 - And from Google Developers documentation about Understanding Resource Timing (here):
[...]. Either:
Bad network conditions between client and server, or
A slowly responding server application
To address a high TTFB, first cut out as much network as possible.
Ideally, host the application locally and see if there is still a big
TTFB. If there is, then the application needs to be optimized for
response speed. This could mean optimizing database queries,
implementing a cache for certain portions of content, or modifying
your web server configuration. There are many reasons a backend can be
slow. You will need to do research into your software and figure out
what is not meeting your performance budget.
If the TTFB is low locally then the networks between your client and
the server are the problem. The network traversal could be hindered by
any number of things. There are a lot of points between clients and
servers and each one has its own connection limitations and could
cause a problem. The simplest method to test reducing this is to put
your application on another host and see if the TTFB improves.
Not so obvious ones:
You can take a look at the official Google documentation regarding Cloud Functions Performance here: https://cloud.google.com/functions/docs/bestpractices/tips
Did you require some files before?
According to this answer from Firebase cloud functions is very slow: Firebase cloud functions is very slow:
Looks like a lot of these problems can be solved using the hidden
variable process.env.FUNCTION_NAME as seen here:
https://github.com/firebase/functions-samples/issues/170#issuecomment-323375462
Are these dynamic pages loaded being accessed by a guest user or a logged user? Because maybe the first request has to sort out the authentication details, so it's known to be slower...
If nothing of this works, I will take a look at the common performance issues like DB connection (here: Optimize Database Performance), improving server configuration, cache all you can and take care of possible redirections in your app...
To end, reading through Internet, there are a lot of threads with your problem (low performance on simple Cloud Functions). Like this one: https://github.com/GoogleCloudPlatform/google-cloud-node/issues/2374 && in S.O: https://stackoverflow.com/search?q=%5Bgoogle-cloud-functions%5D+slow
With comments like:
since when using cloud functions, the penalty is incurred on each http
invocation the overhead is still very high (i.e. 0.8s per HTTP call).
or:
Bear in mind that both Cloud Functions and Cloud Firestore are both in
beta and provide no guarantees for performance. I'm sure if you
compare performance with Realtime Database, you will see better
numbers.
Maybe it is still an issue.
Hope it helps!

Firebase Functions - ERROR, but no Event Message in Console

I have written a function on firebase that downloads an image (base64) from firebase storage and sends that as response to the user:
const functions = require('firebase-functions');
import os from 'os';
import path from 'path';
const storage = require('firebase-admin').storage().bucket();
export default functions.https.onRequest((req, res) => {
const name = req.query.name;
let destination = path.join(os.tmpdir(), 'image-randomNumber');
return storage.file('postPictures/' + name).download({
destination
}).then(() => {
res.set({
'Content-Type': 'image/jpeg'
});
return res.status(200).sendFile(destination);
});
});
My client calls that function multiple times after one another (in series) to load a range of images for display, ca. 20, of an average size of 4KB.
After 10 or so pictures have been loaded (amount varies), all other pictures fail. The reason is that my function does not respond correctly, and the firebase console shows me that my function threw an error:
The above image shows that
A request to the function (called "PostPictureView") suceeds
Afterwards, three requests to the controller fail
In the end, after executing a new request to the "UserLogin"-function, also that fails.
The response given to the client is the default "Error: Could not handle request". After waiting a few seconds, all requests get handled again as they are supposed to be.
My best guesses:
The project is on free tier, maybe google is throttling something? (I did not hit any limits afaik)
Is there a limit of messages the google firebase console can handle?
Could the tmpdir from the functions-app run low? I never delete the temporary files so far, but would expect that either google deletes them automatically, or warns me in a different way that the space is running low.
Does someone know an alternative way to receive the error messages, or has experienced similar issues? (As Firebase Functions is still in Beta, it could also be an error from google)
Btw: Downloading the image from the client (android app, react-native) directly is not possible, because I will use the function to check for access permissions later. The problem is reproducable for me.
In Cloud Functions, the /tmp directory is backed by memory. So, every file you download there is effectively taking up memory on the server instance that ran the function.
Cloud Functions may reuses server instances for repeated calls to the same function. This means your function is downloading another file (to that same instance) with each invocation. Since the names of the files are different each time, you are accumulating files in /tmp that each occupy memory.
At some point, this server instance is going to run out of memory with all these files in /tmp. This is bad.
It's a best practice to always clean up files after you're done with them. Better yet, if you can stream the file content from Cloud Storage to the client, you'll use even less memory (and be billed even less for the memory-hours you use).
After some more research, I've found the solution: The Firebase Console seems to not show all error information.
For detailed information to your functions, and errors that might be omitted in the Firebase Console, check out the website from google cloud functions.
There I saw: The memory (as suggested by #Doug Stevensson) usage never ran over 80MB (limit of 256MB) and never shut the server down. Moreover, there is a DNS resolution limit for the free tier, that my application hit.
The documentation points to a limit of DNS resolutions: 40,000 per 100 seconds. In my case, this limit was never hit - firebase counts a total executions of roundabout 8000 - but it seems there is a lower, undocumented limit for the free tier. After upgrading my account (I started the trial that GCP offers, so actually not paying anything) and linking the project to the billing account, everything works perfectly.

Meteor MongoDB subscription delivering data in 10 second intervals instead of live

I believe this is more of a MongoDB question than a Meteor question, so don't get scared if you know a lot about mongo but nothing about meteor.
Running Meteor in development mode, but connecting it to an external Mongo instance instead of using Meteor's bundled one, results in the same problem. This leads me to believe this is a Mongo problem, not a Meteor problem.
The actual problem
I have a meteor project which continuosly gets data added to the database, and displays them live in the application. It works perfectly in development mode, but has strange behaviour when built and deployed to production. It works as follows:
A tiny script running separately collects broadcast UDP packages and shoves them into a mongo collection
The Meteor application then publishes a subset of this collection so the client can use it
The client subscribes and live-updates its view
The problem here is that the subscription appears to only get data about every 10 seconds, while these UDP packages arrive and gets shoved into the database several times per second. This makes the application behave weird
It is most noticeable on the collection of UDP messages, but not limited to it. It happens with every collection which is subscribed to, even those not populated by the external script
Querying the database directly, either through the mongo shell or through the application, shows that the documents are indeed added and updated as they are supposed to. The publication just fails to notice and appears to default to querying on a 10 second interval
Meteor uses oplog tailing on the MongoDB to find out when documents are added/updated/removed and update the publications based on this
Anyone with a bit more Mongo experience than me who might have a clue about what the problem is?
For reference, this is the dead simple publication function
/**
* Publishes a custom part of the collection. See {#link https://docs.meteor.com/api/collections.html#Mongo-Collection-find} for args
*
* #returns {Mongo.Cursor} A cursor to the collection
*
* #private
*/
function custom(selector = {}, options = {}) {
return udps.find(selector, options);
}
and the code subscribing to it:
Tracker.autorun(() => {
// Params for the subscription
const selector = {
"receivedOn.port": port
};
const options = {
limit,
sort: {"receivedOn.date": -1},
fields: {
"receivedOn.port": 1,
"receivedOn.date": 1
}
};
// Make the subscription
const subscription = Meteor.subscribe("udps", selector, options);
// Get the messages
const messages = udps.find(selector, options).fetch();
doStuffWith(messages); // Not actual code. Just for demonstration
});
Versions:
Development:
node 8.9.3
mongo 3.2.15
Production:
node 8.6.0
mongo 3.4.10
Meteor use two modes of operation to provide real time on top of mongodb that doesn’t have any built-in real time features. poll-and-diff and oplog-tailing
1 - Oplog-tailing
It works by reading the mongo database’s replication log that it uses to synchronize secondary databases (the ‘oplog’). This allows Meteor to deliver realtime updates across multiple hosts and scale horizontally.
It's more complicated, and provides real-time updates across multiple servers.
2 - Poll and diff
The poll-and-diff driver works by repeatedly running your query (polling) and computing the difference between new and old results (diffing). The server will re-run the query every time another client on the same server does a write that could affect the results. It will also re-run periodically to pick up changes from other servers or external processes modifying the database. Thus poll-and-diff can deliver realtime results for clients connected to the same server, but it introduces noticeable lag for external writes.
(the default is 10 seconds, and this is what you are experiencing , see attached image also ).
This may or may not be detrimental to the application UX, depending on the application (eg, bad for chat, fine for todos).
This approach is simple and and delivers easy to understand scaling characteristics. However, it does not scale well with lots of users and lots of data. Because each change causes all results to be refetched, CPU time and network bandwidth scale O(N²) with users. Meteor automatically de-duplicates identical queries, though, so if each user does the same query the results can be shared.
You can tune poll-and-diff by changing values of pollingIntervalMs and pollingThrottleMs.
You have to use disableOplog: true option to opt-out of oplog tailing on a per query basis.
Meteor.publish("udpsPub", function (selector) {
return udps.find(selector, {
disableOplog: true,
pollingThrottleMs: 10000,
pollingIntervalMs: 10000
});
});
Additional links:
https://medium.baqend.com/real-time-databases-explained-why-meteor-rethinkdb-parse-and-firebase-dont-scale-822ff87d2f87
https://blog.meteor.com/tuning-meteor-mongo-livedata-for-scalability-13fe9deb8908
How to use pollingThrottle and pollingInterval?
It's a DDP (Websocket ) heartbeat configuration.
Meteor real time communication and live updates is performed using DDP ( JSON based protocol which Meteor had implemented on top of SockJS ).
Client and server where it can change data and react to its changes.
DDP (Websocket) protocol implements so called PING/PONG messages (Heartbeats) to keep Websockets alive. The server sends a PING message to the client through the Websocket, which then replies with PONG.
By default heartbeatInterval is configure at little more than 17 seconds (17500 milliseconds).
Check here: https://github.com/meteor/meteor/blob/d6f0fdfb35989462dcc66b607aa00579fba387f6/packages/ddp-client/common/livedata_connection.js#L54
You can configure heartbeat time in milliseconds on server by using:
Meteor.server.options.heartbeatInterval = 30000;
Meteor.server.options.heartbeatTimeout = 30000;
Other Link:
https://github.com/meteor/meteor/blob/0963bda60ea5495790f8970cd520314fd9fcee05/packages/ddp/DDP.md#heartbeats

Umbraco 7.6.0 - Site becomes unresponsive for several minutes every day

We've been having a problem for several months where the site becomes completely unresponsive for 5-15 minutes every day. We have added a ton of request logging, enabled DEBUG logging, and have finally found a pattern: Approximately 2 minutes prior to the outages (in every single log file I've looked at, going back to the beginning), the following lines appear:
2017-09-26 15:13:05,652 [P7940/D9/T76] DEBUG
Umbraco.Web.PublishedCache.XmlPublishedCache.XmlCacheFilePersister -
Timer: release. 2017-09-26 15:13:05,652 [P7940/D9/T76] DEBUG
Umbraco.Web.PublishedCache.XmlPublishedCache.XmlCacheFilePersister -
Run now (sync).
From what I gather this is the process that rebuilds the umbraco.config, correct?
We have ~40,000 nodes, so I can't imagine this would be the quickest process to complete, however the strange thing is that the CPU and Memory on the Azure Web App do not spike during these outages. This would seem to point to the fact that the disk I/O is the bottleneck.
This raises a few questions:
Is there a way to schedule this task in a way that it only runs
during off-peak hours?
Are there performance improvements in the newer versions (we're on 7.6.0) that might improve this functionality?
Are there any other suggestions to help correct this behavior?
Hosting environment:
Azure App Service B2 (Basic)
SQL Azure Standard (20 DTUs) - DTU usage peaks at 20%, so I don't think there's anything there. Just noting for completeness
Azure Storage for media storage
Azure CDN for media requests
Thank you so much in advance.
Update 10/4/2017
If it helps, It appears that these particular log entries correspond with the first publish of the day.
I don't feel like 40,000 nodes is too much for Umbraco, but if you want to schedule republishes, you can do this:
You can programmatically call a cache refresh using:
ApplicationContext.Current.Services.ContentService.RePublishAll();
(Umbraco source)
You could create an API controller which you could call periodically by a URL. The controller would probably look something like:
public class CacheController : UmbracoApiController
{
[HttpGet]
public HttpResponseMessage Republish(string pass)
{
if (pass != "passcode")
{
return Request.CreateResponse(HttpStatusCode.Unauthorized, new
{
success = false,
message = "Access denied."
});
}
var result = Services.ContentService.RePublishAll();
if (result)
{
return Request.CreateResponse(HttpStatusCode.OK, new
{
success = true,
message = "Republished"
});
}
return Request.CreateResponse(HttpStatusCode.InternalServerError, new
{
success = false,
message = "An error occurred"
});
}
}
You could then periodically ping this URL:
/umbraco/api/cache/republish?code=passcode
I have a blog post on how you can read on how to schedule events like these to occur. I recommend just using the Windows Task Scheduler to ping the URL: https://harveywilliams.net/blog/better-task-scheduling-in-umbraco#windows-task-scheduler

Resources