I have a weird issue, where doing an out-operation on a few edges causes my RU cost to triple. Hope someone can help me shed light on why + what I can do to mitigate it.
I have a Graph in CosmosDB, where there are two types of vertex labels: "Profile" and "Score". Each profile has 0 or 1 score-vertices via a "ProfileHasAggregatedScore" edge. The partitionKey is the ID of the Profile.
If I make the following queries, the RU currently is:
g.V().hasLabel('Profile').out('ProfileHasAggregatedScore')
>78 RU (8 scores found)
And for reference, the cost of getting all vertices of a type is:
g.V().hasLabel('Profile')
>28 RU (110 profiles found)
g.E().hasLabel('ProfileHasAggregatedScore')
>11 RU (8 edges found)
g.V().hasLabel('AggregatedRating')
>11 RU (8 scores found)
And the cost of a single of the vertices or edges are:
g.V('aProfileId').hasLabel('Profile')
>4 RU (1 found)
g.E('anEdgeId')
> 7RU
G.V('aRatingId')
> 3.5 RU
Can someone please help me as to why, making a traversal with only a few vertices along the way (see traversal at the bottom), is more expensive than searching for everything? And is there something I can do to prevent it? Adding a has-filter with the partitionKey does not seem to help. It seems odd that traversing/finding 16 elements more (8 edges and 8 vertices) after finding 110 vertices triples the cost of the operation?
(NB. With 1000 profiles the cost of doing 1 traversal along an edge to the score node is 2200 RU. This seems high, considering the emphasis their Azure team put on it being scalable?)
Traversal if it can help (It seems most of the time is spent finding the edges with the out() step):
[
{
"gremlin": "g.V().hasLabel('Profile').out('ProfileHasAggregatedScore').executionProfile()",
"totalTime": 46,
"metrics": [
{
"name": "GetVertices",
"time": 13,
"annotations": {
"percentTime": 28.26
},
"counts": {
"resultCount": 110
},
"storeOps": [
{
"fanoutFactor": 1,
"count": 110,
"size": 124649,
"time": 2.47
}
]
},
{
"name": "GetEdges",
"time": 26,
"annotations": {
"percentTime": 56.52
},
"counts": {
"resultCount": 8
},
"storeOps": [
{
"fanoutFactor": 1,
"count": 8,
"size": 5200,
"time": 6.22
},
{
"fanoutFactor": 1,
"count": 0,
"size": 49,
"time": 0.88
}
]
},
{
"name": "GetNeighborVertices",
"time": 7,
"annotations": {
"percentTime": 15.22
},
"counts": {
"resultCount": 8
},
"storeOps": [
{
"fanoutFactor": 1,
"count": 8,
"size": 6303,
"time": 1.18
}
]
},
{
"name": "ProjectOperator",
"time": 0,
"annotations": {
"percentTime": 0
},
"counts": {
"resultCount": 8
}
}
]
}
]
enter code here
Related
I am having an issue with FormRecognizer not behaving how I have seen it should. Here is the dilemma
I have an Invoice that, when run through https://{endpoint}/formrecognizer/v2.0/layout/analyze
it recognized the table in the Invoice and generates the proper JSON with the "tables" node. Here is an example of part of it
{
"rows": 8,
"columns": 8,
"cells": [
{
"rowIndex": 0,
"columnIndex": 4,
"columnSpan": 3,
"text": "% 123 F STREET Deer Park TX 71536",
"boundingBox": [
3.11,
2.0733
],
"elements": [
"#/readResults/0/lines/20/words/0",
"#/readResults/0/lines/20/words/1"
]
}
When I train a model with NO labels file https://{endpoint}/formrecognizer/v2.0/custom/models It does not generate an empty "tables" node, but it generates (tokens). Here is an example of the one above without "table"
{
"key": {
"text": "__Tokens__12",
"boundingBox": null,
"elements": null
},
"value": {
"text": "123 F STREET",
"boundingBox": [
5.3778,
2.0625,
6.8056,
2.0625,
6.8056,
2.2014,
5.3778,
2.2014
],
"elements": null
},
"confidence": 1.0
}
I am not sure exactly where this is not behaving how intended, but any insight would be appreciated!
If you train a model WITH labeling files, then call FR Analyze(), the FR service will call the Layout service, which returns tables in "pageResults" section.
i am trying to calculate heating/cooling degree day using (Tbase - Ta) formula Tbase is usually 65F and Ta = (high_temp + low_temp)/2
(e.x)
high_temp = 96.5F low_temp=65.21F then
mean=(high_temp + low_temp)/2
result = mean - 65
65 is average room temperature
if result is > 65 then cooling degree day(cdd) else heating degree day(hdd)
i get weather data from two api
weatherbit
darksky
in weatherbit the provide both cdd and hdd data, but in darksky we need to calculate using above formula (Tbase - Ta)
my problem is both api show different result (e.x)
darksky json response for day
{
"latitude": 47.552758,
"longitude": -122.150589,
"timezone": "America/Los_Angeles",
"daily": {
"data": [
{
"time": 1560927600,
"summary": "Light rain in the morning and overnight.",
"icon": "rain",
"sunriseTime": 1560946325,
"sunsetTime": 1561003835,
"moonPhase": 0.59,
"precipIntensity": 0.0057,
"precipIntensityMax": 0.0506,
"precipIntensityMaxTime": 1561010400,
"precipProbability": 0.62,
"precipType": "rain",
"temperatureHigh": 62.44,
"temperatureHighTime": 1560981600,
"temperatureLow": 48,
"temperatureLowTime": 1561028400,
"apparentTemperatureHigh": 62.44,
"apparentTemperatureHighTime": 1560981600,
"apparentTemperatureLow": 46.48,
"apparentTemperatureLowTime": 1561028400,
"dewPoint": 46.61,
"humidity": 0.75,
"pressure": 1021.81,
"windSpeed": 5.05,
"windGust": 8.36,
"windGustTime": 1560988800,
"windBearing": 149,
"cloudCover": 0.95,
"uvIndex": 4,
"uvIndexTime": 1560978000,
"visibility": 4.147,
"ozone": 380.8,
"temperatureMin": 49.42,
"temperatureMinTime": 1561010400,
"temperatureMax": 62.44,
"temperatureMaxTime": 1560981600,
"apparentTemperatureMin": 47.5,
"apparentTemperatureMinTime": 1561014000,
"apparentTemperatureMax": 62.44,
"apparentTemperatureMaxTime": 1560981600
}
]
},
"offset": -7
}
python calculation
response = result.get("daily").get("data")[0]
low_temp = response.get("temperatureMin")
hi_temp = response.get("temperatureMax")
mean = (hi_temp + low_temp)/2
#65 is normal room temp
print(65-mean)
here mean is 6.509999999999998
65 - mean = 58.49
hdd is 58.49 so cdd is 0
same date in weatherbit json response is :
{
"threshold_units": "F",
"timezone": "America/Los_Angeles",
"threshold_value": 65,
"state_code": "WA",
"country_code": "US",
"city_name": "Newcastle",
"data": [
{
"rh": 68,
"wind_spd": 5.6,
"timestamp_utc": null,
"t_ghi": 8568.9,
"max_wind_spd": 11.4,
"cdd": 0.4,
"dewpt": 46.9,
"snow": 0,
"hdd": 6.7,
"timestamp_local": null,
"precip": 0.154,
"t_dni": 11290.6,
"temp_wetbulb": 53.1,
"t_dhi": 1413.9,
"date": "2019-06-20",
"temp": 58.6,
"sun_hours": 7.6,
"clouds": 58,
"wind_dir": 186
}
],
"end_date": "2019-06-21",
"station_id": "727934-94248",
"count": 1,
"start_date": "2019-06-20",
"city_id": 5804676
}
here hdd is 6.7 and cdd is 0.4
can you explain how they get this result ?
You need to use hourly data to calculate the HDD and CDD, and then average them to get the daily value.
More details here: https://www.weatherbit.io/blog/post/heating-and-cooling-degree-days-weather-api-release
When I use Microsoft.Azure.Search (v3.0.3)'s "SearchAsync" and "Search" methods to return the indexed items, the sdk doesn't return all the facets.
However; when I try the same thing using Postman, it returns all the facets correctly.
Could this be a bug of the sdk (I believe it is as a direct call to an sdk method doesn't return the all the facets correctly - but couldn't find any records about this possible bug)? If yes, is there a fix for this for the sdk? Any help is appreciated.
UPDATE:
After spending some more time, I have found out that the bug is not .NET SDK Specific.
Both .NET SDK and REST API appear to have this problem and none of them returns all the facets. Can you please tell me is there a known bug for this and what is the fix for it?
Please see the following example;
There must be 2 Coaching facets but only 1 is returning from the Azure Search Service.
The new search query(facet specialisms added)
https://MYPROJECT-search.search.windows.net/indexes/myproject-directory-qa/docs?api-version=2016-09-01&$count=true&facet=specialisms&$filter=listingType eq 'Therapist'
"Coaching:Development coaching", --> This doesn't return as a facet.
"Coaching:Executive coaching", -->This returns fine.
"#search.facets": {
"specialisms#odata.type": "#Collection(Microsoft.Azure.Search.V2016_09_01.QueryResultFacet)",
"specialisms": [
{
"count": 5,
"value": "Anxiety, depression and trauma:Depression"
},
{
"count": 4,
"value": "Addiction, self-harm and eating disorders:Obsessions"
},
{
"count": 4,
"value": "Anxiety, depression and trauma:Post-traumatic stress"
},
{
"count": 4,
"value": "Coaching:Executive coaching"
},
{
"count": 4,
"value": "Identity, culture and spirituality:Self esteem"
},
{
"count": 4,
"value": "Relationships, family and children:Pregnancy related issues"
},
{
"count": 4,
"value": "Stress and work:Redundancy"
},
{
"count": 3,
"value": "Addiction, self-harm and eating disorders:Eating disorders"
},
{
"count": 3,
"value": "Anxiety, depression and trauma:Bereavement"
},
{
"count": 3,
"value": "Anxiety, depression and trauma:Loss"
}
]
},
{
"#search.score": 1,
"contactId": "df394997-6e94-e711-80ed-3863bb34db00",
"location": {
"type": "Point",
"coordinates": [
-2.58586,
51.47873
],
"crs": {
"type": "name",
"properties": {
"name": "EPSG:4326"
}
}
},
"profileImageUrl": "https://myprojectwebqa.blob.core.windows.net/profileimage/3e31457c-5113-4062-b960-30f038ce7bfc.jpg",
"locationText": "Bristol",
"listingType": "Therapist",
"disabledAccess": true,
"flexibleHours": true,
"offersConcessionaryRates": false,
"homeVisits": true,
"howIWillWork": "<p>Some test data</p>",
"specialisms": [
"Health related issues:Asperger syndrome",
"Health related issues:Chronic fatigue syndrome/ME",
"Addiction, self-harm and eating disorders:Addictions",
"Addiction, self-harm and eating disorders:Eating disorders",
"Addiction, self-harm and eating disorders:Obsessions",
"Anxiety, depression and trauma:Bereavement",
"Anxiety, depression and trauma:Depression",
"Anxiety, depression and trauma:Loss",
"Coaching:Development coaching",
"Coaching:Executive coaching",
"Identity, culture and spirituality:Self esteem",
"Identity, culture and spirituality:Sexuality",
"Relationships, family and children:Infertility",
"Relationships, family and children:Relationships",
"Stress and work:Redundancy"
],
"clientele": [
"Adults",
"Children",
"Groups"
],
"approaches": [
"CBT",
"Cognitive",
"Psychoanalytic",
"Psychosynthesis"
],
"sessionTypes": [
"Home visits",
"Long-term face to face work"
],
"hourlyRate": 50,
"fullName": "Test Name",
"id": "ZWUwNGIyNjYtYjQ5Ny1lNzExLTgwZTktMzg2M2JiMzY0MGI4"
}
Please see details below;
For my case I have found out that by default the Azure Search Service returns 10 of the facets. That is why I couldn't see all my facets.
After updating my search query as follows, I have fixed my problem and now I can see all my facets in the search results - please see the facet bit updated to this; facet=specialisms, count:9999.
https://MYPROJECTNAME-search.search.windows.net/indexes/MYPROJECTNAME-directory-qa/docs?api-version=2016-09-01&$count=true&facet=specialisms, count:9999&facet=clientele, count:9999&$filter=listingType eq 'Therapist'
For the Microsoft documentation, please see the following link.
"max # of facet terms; default is 10"
https://learn.microsoft.com/en-us/rest/api/searchservice/search-documents
I using arango with nodejs and arangojs driver, one of the arango collection has 10,000,000 documents
Sometimes page fault going up (150 or 500) and arango freezed and don't response to query request Also freezed arango web panel.
My server config is:
RAM: 6 GB
CPU: 8 core
(From web panel arango using 4.76 GB (83.90 %) 6 GB of ram)
UPDATE1
This is output of /_api/collection/AdsStatics/figures
{
"id": "191689719157",
"name": "AdsStatics",
"isSystem": false,
"doCompact": true,
"isVolatile": false,
"journalSize": 33554432,
"keyOptions": {
"type": "traditional",
"allowUserKeys": true
},
"waitForSync": false,
"indexBuckets": 8,
"count": 7816780,
"figures": {
"alive": {
"count": 7815806,
"size": 3563838968
},
"dead": {
"count": 306,
"size": 167464,
"deletion": 0
},
"datafiles": {
"count": 104,
"fileSize": 3530743672
},
"journals": {
"count": 1,
"fileSize": 33554432
},
"compactors": {
"count": 0,
"fileSize": 0
},
"shapefiles": {
"count": 0,
"fileSize": 0
},
"shapes": {
"count": 121,
"size": 56520
},
"attributes": {
"count": 24,
"size": 56
},
"indexes": {
"count": 3,
"size": 1660594864
},
"lastTick": "10044860034955",
"uncollectedLogfileEntries": 985,
"documentReferences": 0,
"waitingFor": "-",
"compactionStatus": {
"message": "checked datafiles, but no compaction opportunity found",
"time": "2016-02-24T08:29:27Z"
}
},
"status": 3,
"type": 2,
"error": false,
"code": 200
}
Thanks
It seems that your system is running out of memory. The datafiles for the one collection are 3,530,743,672 bytes in size, the indexes are 1,660,594,864. That is about 5.1 GB for this one collection alone.
arangod will need further memory for its WAL, the V8 contexts and temporary query results in order to operate properly.
Provided the system has 6 GB of total RAM and the OS and other processes need some RAM, too, it looks like you're running out of memory.
I am wondering if you're seeing some sort of swapping activity, which would explain why (all) operations would get extremely slow.
I have simples aggregation like
"aggs": {
"firm_aggregation": {
"terms": {
"field": "experience.company_name.slug",
"size": 10
}
}
}
and this gives me result like
"aggregations": {
"firm_aggregation": {
"buckets": [
... (some others)
{
"key": "freelancer",
"doc_count": 33
},
but when I increase aggregation size to 2000 i get
"aggregations": {
"firm_aggregation": {
"buckets": [
... (some others)
{
"key": "freelancer",
"doc_count": 35
},
why is this happening ?? I thouht that size will increase number of aggregations which elastic return.
This is owing to the estimation done on shard level.
For results of size 5 , only top 5 terms are taken from each shard and this is added to get the result. This need not be very accurate.
There is a very good explanation about this here.
Along with size , you can pass shard_size parameter which can control this behavior without affecting the data that is returned