Surprisingly high latencies for selects/inserts in spanner - google-cloud-spanner

I am getting latencies around 50-100ms for simple queries in spanner (updates or selects by primary key). Connecting to spanner from the same project/region. Is it expected behavior? I have expected much lower latency for those.

No, the latency for a simple select using the primary key should be a lot lower than that.
I did a quick benchmark based on the information you provided above using the following simple program:
package main
import (
"context"
"fmt"
"math/rand"
"time"
"cloud.google.com/go/spanner"
"github.com/montanaflynn/stats"
"google.golang.org/api/iterator"
)
func main() {
fmt.Printf("Simple Spanner benchmarking...\n")
source := rand.NewSource(time.Now().UnixNano())
rnd := rand.New(source)
client, err := spanner.NewClient(context.Background(), "projects/my-project/instances/my-instance/databases/my-database")
if err != nil {
fmt.Printf("Client creation failed: %v", err)
return
}
var times stats.Float64Data
for i := 0; i < 25; i++ {
id := rnd.Int63n(1000) + 100000
statement := spanner.NewStatement("SELECT * FROM Singers WHERE SingerId=#id")
statement.Params["id"] = id
start := time.Now()
iter := client.Single().Query(context.Background(), statement)
for {
row, err := iter.Next()
if err == iterator.Done {
break
}
if err != nil {
fmt.Printf("Query failure: %v", err)
break
}
var fullName string
row.ColumnByName("FullName", &fullName)
fmt.Printf("Singer name: %s\n", fullName)
elapsed := time.Since(start)
fmt.Printf("Time: %v\n", elapsed)
times = append(times, float64(elapsed.Milliseconds()))
}
iter.Stop()
}
median, _ := stats.Median(times)
avg, _ := stats.Mean(times)
p90, _ := stats.Percentile(times, 90)
fmt.Printf("Median: %v\n", median)
fmt.Printf("P90: %v\n", p90)
fmt.Printf("Avg: %v\n", avg)
}
The application was executed on the smallest possible Google Cloud Compute Engine VM located in the same region as the Spanner instance. The results were:
Simple Spanner benchmarking...
Singer name: FirstName LastName 100960
Time: 374.627846ms
Singer name: FirstName LastName 100865
Time: 4.102019ms
Singer name: FirstName LastName 100488
Time: 3.479059ms
...
Singer name: FirstName LastName 100542
Time: 3.986866ms
Singer name: FirstName LastName 100822
Time: 3.978838ms
Singer name: FirstName LastName 100235
Time: 4.511711ms
Singer name: FirstName LastName 100020
Time: 3.476673ms
Singer name: FirstName LastName 100234
Time: 3.191529ms
Singer name: FirstName LastName 100219
Time: 4.451639ms
Median: 3
P90: 4
Avg: 18.44
So your execution times around 50-100ms sound like a lot. Normal execution time in this (simple) test case is around 3-4ms for a single-row select (except for the first request, as that also initializes the backing session pool).
Could it be that your table has a primary key that uses a monotonically increasing value? That could create hotspots in the backing index of the primary key.
Could it be that you are closing and creating a new client between each query? That would require the session pool to be re-initialized for each new query?
Are you using a single-use read-only transaction for your queries? Or are you using some other type of transaction to read the data?
Could you please provide some additional details on how exactly you are executing the query (preferably with a code sample)?
Are you using a client library? If so, which one? (Java, Node, Go, ...?)
Are you only measuring the very first query that you are executing after starting your application? The very first query will be slower than later queries, as the client library will need to first create a session and then execute the query.
You write that you are connecting from the same project/region. Does that mean that your client code is running on a Google Cloud VM or similar?

Related

Query for PageRanges of a Managed Disk Incremental Snapshot returns zero changes

We have a solution which takes Incremental Snapshots of all the disks of a virtual machine. Once the snapshot is created, the solution queries it's page ranges to get the changed data.
The issue which we are facing currently is that the page ranges are returned as empty even when it's a first snapshot for the disk and the disk has data. This happens intermittently for OS as well as Data disks. Strangely, if a virtual machine has multiple disks, the page ranges return appropriate information for few and empty for others.
We are using Virtual Machine Disk Incremental Snapshot Operation as below. The solution makes use of GO SDK of Azure for making these operations. Below is the sample code for the operations.
// Prepare azure snapshot api client
snapClient, err := armcompute.NewSnapshotsClient(subscriptionID, azureCred, nil)
// Configure snapshot parameters
snapParams := armcompute.Snapshot{
Location: to.Ptr(location), // Snapshot location
Name: &snapshotName,
Properties: &armcompute.SnapshotProperties{
CreationData: &armcompute.CreationData{
CreateOption: to.Ptr(armcompute.DiskCreateOptionCopy),
SourceResourceID: to.Ptr(getAzureIDFromName(constant.AzureDiskURI, subscriptionID, resourceGroupName, diskID, "")),
}, // Disk ID for which the snapshot needs to be created
Incremental: to.Ptr(true),
},
}
// Create Disk Snapshot (Incremental)
snapPoller, err := snapClient.BeginCreateOrUpdate(ctx, resourceGroupName, snapshotName, snapParams, nil)
if err != nil {
return nil, err
}
Once the snapshot is created successfully, we prepare the changed areas for the snapshot using PageRanges feature as below.
// Grant Snapshot Access
resp, err := snapClient.BeginGrantAccess(ctx, resourceGroupName, snapshotId), armcompute.GrantAccessData{
Access: &armcompute.PossibleAccessLevelValues()[1], // 0 is none, 1 is read and 2 is write
DurationInSeconds: to.Ptr(duration), // 1 hr
}, nil)
if err != nil {
return "", err
}
grantResp, err := resp.PollUntilDone(ctx, nil)
if err != nil {
return "", err
}
currentSnapshotSAS = grantResp.AccessSAS
// Create Page Blob Client
pageBlobClient, err := azblob.NewPageBlobClientWithNoCredential(currentSnapshotSAS, nil)
pageOption := &azblob.PageBlobGetPageRangesDiffOptions{}
pager := pageBlobClient.GetPageRangesDiff(pageOption)
if err != nil {
return nil, err
}
// Gather Page Ranges for all the changed data
var pageRange []*azblob.PageRange
if pager.NextPage(ctx) {
resp := pager.PageResponse()
pageRange = append(pageRange, resp.PageRange...)
}
// Loop through page ranges and collect all changed data indexes
var changedAreaForCurrentIter int64
changedAreasString := ""
for _, page := range pageRanges {
length := (*page.End + 1) - *page.Start
changedAreaForCurrentIter = changedAreaForCurrentIter + length
changedAreasString = changedAreasString + strconv.FormatInt(*page.Start, 10) + ":" + strconv.FormatInt(length, 10) + ","
}
zap.S().Debug("Change areas : [" changedAreasString "]")
It is this when the Change areas is coming in as empty. We have checked Disk properties for the ones which are successful and which failed and they are all the same. There is no Lock configured on the disk.
Following are the SDK versions which we are using.
github.com/Azure/azure-sdk-for-go/sdk/resourcemanager/compute/armcompute/v3 v3.0.0
github.com/Azure/azure-sdk-for-go/sdk/resourcemanager/storage/armstorage v1.1.0
github.com/Azure/azure-sdk-for-go/sdk/storage/azblob v0.4.1
Can someone please provide pointers explaining what are the factors which could make this problem appear intermittently.

Unable to get latest updated data in AWS Lambda immediately after table update in AWS Aurora postgresql database

I have created an after update trigger on the table as below which invokes the AWS Lambda after active column update.(Followed this aws docs to invoke the lambda)
--**Table creation script**
CREATE TABLE public.groupmember
(
id integer NOT NULL GENERATED BY DEFAULT AS IDENTITY ( INCREMENT 1 START 1 MINVALUE 1 MAXVALUE 2147483647 CACHE 1 ),
groupid integer NOT NULL,
employeeid integer,
viewid integer,
enddate timestamp with time zone,
endby character varying(55) COLLATE public.case_insensitive,
active boolean GENERATED ALWAYS AS (
CASE
WHEN (enddate IS NOT NULL) THEN false
ELSE true
END) STORED,
test character varying COLLATE pg_catalog."default",
CONSTRAINT "PK_groupmember" PRIMARY KEY (id),
CONSTRAINT fk_groupmember_group FOREIGN KEY (groupid)
REFERENCES public."group" (id) MATCH SIMPLE
ON UPDATE NO ACTION
ON DELETE RESTRICT
)
-- **Trigger creation scripts**
CREATE TRIGGER groupmember_update_trigger
AFTER UPDATE
ON public.groupmember
FOR EACH ROW
EXECUTE PROCEDURE public.groupmember_update_triggerfn();
-- **Trigger function**
CREATE or replace FUNCTION public.groupmember_update_triggerfn()
RETURNS trigger
LANGUAGE 'plpgsql'
COST 100
VOLATILE NOT LEAKPROOF
AS $BODY$
Declare
jsonstring character varying(500);
jsontosend character varying(500)
awsarn character varying(300);
BEGIN
awsarn:='arn:aws:lambda:us-east-1:xyz:function:test-function';
IF New.active <> Old.active THEN
IF New.active = false THEN
SELECT json_build_object('Action','Delete','Entity','GroupMember','GroupId',New.groupid,'EmployeeId',COALESCE(New.employeeid,-1),'ViewId',COALESCE(New.viewid,-1)) into jsonstring;
jsontosend := concat('{"body":',jsonstring,'}');
PERFORM payload FROM aws_lambda.invoke(aws_commons.create_lambda_function_arn(awsarn),jsontosend::json,'Event');
END IF;
END IF;
RETURN New;
END;
$BODY$;
Here is my Aws Aurora PostgreSQL configuration Image
Consider I have 50 records in the database with active column values as true, and when I update all the records active column value to false and when I try to check the same value in the Lambda it still fetching value as true but should it be false.
Below is my Lambda
const { Pool, Client } = require('pg');
const connectionString = process.env.Conn;
exports.lambdaHandler = (event, context) => {
const client = new Client({
connectionString,
})
client.connect();
client.query('select groupid, employeeid, active from groupmember where groupid = $1 and employeeid = $2',[event.body.GroupId,event.body.EmployeeId], (err, res) => {
console.log(err);
console.log("response is");
console.log(res.rowCount);
if(res.rowCount>0){
//this prints true, even the actual value is false.
console.log(res.rows[0].active);
}
client.end();
})
}
When I delay the thread in Lambda for 5secs or so then it's giving me the updated data without any issue. I am unable to figure out why it's taking time to get the latest data.
This is not the case
when I run the Lambda locally.
when I query the active column within the Postgres trigger.
I don't see any replication lag in the metrics. Is there something I need to configure in RDS?
Can someone please help me here?
Any answer would be appreciated, thank you!

Google Datastore query very slow from Cloud Function compared to local machine

I'm using Google Cloud Function with google-cloud/datastore modules. My data is structured as 1 kind with 4 string properties, only 1000 entities, indexed on all properties.
My query is:
if (/^[a-z0-9]+$/i.test(name)) {
name = name.toLowerCase();
query = datastore.createQuery('IPPhone').filter('email', '>=', name).filter('email', '<', name + '\uffff');
} else if (name.includes('<')) {
query = datastore.createQuery('IPPhone').filter('department', '>=', name).filter('department', '<', name + '\uffff');
isDepartment = true;
} else {
name = fixName(name);
query = datastore.createQuery('IPPhone').filter('name', '=', name);
}
When I query from Google Cloud Function the query time is 14-17 second. However, doing the same thing on my local machine, the query time is much shorter around 800 - 1000 ms. I'm from Hanoi, Vietnam but the only option I can choose for Cloud Function is us-central1.

Finding all properties for a schema-less vertex class

I have a class Node extends V. I add instances to Node with some set of document type information provided. I want to query the OrientDB database and return some information from Node; to display this in a formatted way I want a list of all possible field names (in my application, there are currently 115 field names, only one of which is a property used as an index)
To do this in pyorient, the only solution I found so far is (client is the name of the database handle):
count = client.query("SELECT COUNT(*) FROM Node")[0].COUNT
node_records = client.query("SELECT FROM Node LIMIT {0}".format(count))
node_key_list = set([])
for node in node_records:
node_key_list |= node.oRecordData.keys()
I figured that much out pretty much through trial and error. It isn't very efficient or elegant. Surely there must be a way to have the database return a list of all possible fields for a class or any other document-type object. Is there a simple way to do this through either pyorient or the SQL commands?
I tried your case with this dataset:
And this is the structure of my class TestClass:
As you can see from my structure only name, surname and timeStamp have been created in schema-full mode, instead nameSchemaLess1 and nameSchemaLess1 have been inserted into the DB in schema-less mode.
After having done that, you could create a Javascript function in OrientDB Studio or Console (as explained here) and subsequently you can recall it from pyOrient by using a SQL command.
The following posted function retrieves all the fields names of the class TestClass without duplicates:
Javascript function:
var g = orient.getGraph();
var fieldsList = [];
var query = g.command("sql", "SELECT FROM TestClass");
for (var x = 0; x < query.length; x++){
var fields = query[x].getRecord().fieldNames();
for (var y = 0; y < fields.length; y++) {
if (fieldsList == false){
fieldsList.push(fields[y]);
} else {
var fieldFound = false;
for (var z = 0; z < fieldsList.length; z++){
if (fields[y] == fieldsList[z]){
fieldFound = true;
break;
}
}
if (fieldFound != true){
fieldsList.push(fields[y]);
}
}
}
}
return fieldsList;
pyOrient code:
import pyorient
db_name = 'TestDatabaseName'
print("Connecting to the server...")
client = pyorient.OrientDB("localhost", 2424)
session_id = client.connect("root", "root")
print("OK - sessionID: ", session_id, "\n")
if client.db_exists(db_name, pyorient.STORAGE_TYPE_PLOCAL):
client.db_open(db_name, "root", "root")
functionCall = client.command("SELECT myFunction() UNWIND myFunction")
for idx, val in enumerate(functionCall):
print("Field name: " + val.myFunction)
client.db_close()
Output:
Connecting to the server...
OK - sessionID: 54
Field name: name
Field name: surname
Field name: timeStamp
Field name: out_testClassEdge
Field name: nameSchemaLess1
Field name: in_testClassEdge
Field name: nameSchemaLess2
As you can see all of the fields names, both schema-full and schema-less, have been retrieved.
Hope it helps
Luca's answer worked. I modified it to fit my tastes/needs. Posting here to increase the amount of OrientDB documentation on Stack Exchange. I took Luca's answer and translated it to groovy. I also added a parameter to select the class to get fields for and removed the UNWIND in the results. Thank you to Luca for helping me learn.
Groovy code for function getFieldList with 1 parameter (class_name):
g = orient.getGraph()
fieldList = [] as Set
ret = g.command("sql", "SELECT FROM " + class_name)
for (record in ret) {
fieldList.addAll(record.getRecord().fieldNames())
}
return fieldList
For the pyorient part, removing the database connection it looks like this:
node_keys = {}
ret = client.command("SELECT getFieldList({0})".format("'Node'"))
node_keys = ret[0].oRecordData['getFieldList']
Special notice to the class name; in the string passed to client.command(), the parameter must be encased in quotes.

How to search a string in the elasticsearch document(indexed) in golang?

I am writing a function in golang to search for a string in elasticsearch documents which are indexed. I am using elasticsearch golang client elastic. For example consider the object is tweet,
type Tweet struct {
User string
Message string
Retweets int
}
And the search function is
func SearchProject() error{
// Search with a term query
termQuery := elastic.NewTermQuery("user", "olivere")
searchResult, err := client.Search().
Index("twitter"). // search in index "twitter"
Query(&termQuery). // specify the query
Sort("user", true). // sort by "user" field, ascending
From(0).Size(10). // take documents 0-9
Pretty(true). // pretty print request and response JSON
Do() // execute
if err != nil {
// Handle error
panic(err)
return err
}
// searchResult is of type SearchResult and returns hits, suggestions,
// and all kinds of other information from Elasticsearch.
fmt.Printf("Query took %d milliseconds\n", searchResult.TookInMillis)
// Each is a convenience function that iterates over hits in a search result.
// It makes sure you don't need to check for nil values in the response.
// However, it ignores errors in serialization. If you want full control
// over iterating the hits, see below.
var ttyp Tweet
for _, item := range searchResult.Each(reflect.TypeOf(ttyp)) {
t := item.(Tweet)
fmt.Printf("Tweet by %s: %s\n", t.User, t.Message)
}
// TotalHits is another convenience function that works even when something goes wrong.
fmt.Printf("Found a total of %d tweets\n", searchResult.TotalHits())
// Here's how you iterate through results with full control over each step.
if searchResult.Hits != nil {
fmt.Printf("Found a total of %d tweets\n", searchResult.Hits.TotalHits)
// Iterate through results
for _, hit := range searchResult.Hits.Hits {
// hit.Index contains the name of the index
// Deserialize hit.Source into a Tweet (could also be just a map[string]interface{}).
var t Tweet
err := json.Unmarshal(*hit.Source, &t)
if err != nil {
// Deserialization failed
}
// Work with tweet
fmt.Printf("Tweet by %s: %s\n", t.User, t.Message)
}
} else {
// No hits
fmt.Print("Found no tweets\n")
}
return nil
}
This search is printing tweets by the user 'olivere'. But if I give 'olive' then search is not working. How do I search for a string which is part of User/Message/Retweets?
And the Indexing function looks like this,
func IndexProject(p *objects.ElasticProject) error {
// Index a tweet (using JSON serialization)
tweet1 := `{"user" : "olivere", "message" : "It's a Raggy Waltz"}`
put1, err := client.Index().
Index("twitter").
Type("tweet").
Id("1").
BodyJson(tweet1).
Do()
if err != nil {
// Handle error
panic(err)
return err
}
fmt.Printf("Indexed tweet %s to index %s, type %s\n", put1.Id, put1.Index, put1.Type)
return nil
}
Output:
Indexed tweet 1 to index twitter, type tweet
Got document 1 in version 1 from index twitter, type tweet
Query took 4 milliseconds
Tweet by olivere: It's a Raggy Waltz
Found a total of 1 tweets
Found a total of 1 tweets
Tweet by olivere: It's a Raggy Waltz
Version
Go 1.4.2
Elasticsearch-1.4.4
Elasticsearch Go Library
github.com/olivere/elastic
Could anyone help me on this.? Thank you
How you search and find data depends on your analyser - from your code it's likely that the standard analyser is being used (i.e. you haven't specified an alternative in your mapping).
The Standard Analyser will only index complete words. So to match "olive" against "olivere" you could either:
Change the search process
e.g. switch from a term query to a Prefix query or use a Query String query with a wildcard.
Change the index process
If you want to find strings within larger strings then look at using nGrams or Edge nGrams in your analyser.
multiQuery := elastic.NewMultiMatchQuery(
term,
"name", "address", "location", "email", "phone_number", "place", "postcode",
).Type("phrase_prefix")

Resources