How do I create a composite index for my Firestore query? - node.js

I'm trying to perform a firestore query on a collection which results in a failure because an index needs to be created for the query I'm attempting. The error contains a link that is suppose to auto create the missing index for me. However when I follow the link and attempt to create the index that has been prepared for me I encounter an error stating "name only indexes are not supported". I would also point out I have been using the npm functions-framework to test my cloud function that contains the relevant query.
I have tried creating the composite index myself manually but none of the index I have made seem to satisfy my attempted query.
Sample docs in my Items Collection:
{
descriptionLastModified: someTimestamp <a timestamp datatype>
detectedLanguage: "en-us" <string>
}
{
descriptionLastModified: someTimestamp <a timestamp datatype>
detectedLanguage: "en-us" <string>
}
{
descriptionLastModified: someTimestamp <a timestamp datatype>
detectedLanguage: "fr" <string>
}
{
descriptionLastModified:someTimestamp <a timestamp datatype>
detectedLanguage: "en-us" <string>
}
These are all queries I have tried which fail:
let queryRef = itemsRef.where('descriptionLastModified','<=', oneDayAgoTimestamp).orderBy("descriptionLastModified","desc").where("detectedLanguage", '==', "en-us").get()
let queryRef = itemsRef.where('descriptionLastModified','<=', oneDayAgoTimestamp).where("detectedLanguage", '==', "en-us").get()
let queryRef = itemsRef.where("detectedLanguage", '==', "en-us").where('descriptionLastModified','<=', oneDayAgoTimestamp).get()
I have made the following composite indexes at the collection level to no avail:
CollectionId:items Fields: descriptionLastModified:DESC detectedLangauge: ASC
CollectionId:items Fields: descriptionLastModified:ASC detectedLangauge: ASC
CollectionId:items Fields: detectedLangauge: ASC descriptionLastModified:DESC
My expectation is I should be able to filter my items by their descriptionLastModified timestamp field and additionally by the value of their detected Language string field.

In case anyone finds this in the future, its 2021, I still find composite indexes created manually, despite being incredibly simple, or you'd think and I fully understand why the OP thought his indexes would work, often just don't. Doubtless there is some subtlety that reading some guides would make clear but I haven't found the trick yet and have been using firestore for over 18 months intensively at work.
The trick is to use the link it creates, but this often fails, you get a dialogue box telling you an index will be created, but no details for you to manually create and the friendly blue 'create' button does nothing, it neither creates the index nor does it dismiss the window.
For a while I had it working in firefox but it stopped. A colleague across a couple of desks who has to create them a lot tells me that Edge is the most reliable, and you have to be very careful to not have multiple google accounts signed in - if edge (or chrome) takes you to the wrong login initially when following the link, even if you switch user back (and you have to do this because it will assume your default login rather than say the one currently selected in your only google cloud console window), even if you switch back its about a 1 in 3. He tells me in edge it works about 60%
I used to get about 30% with firefox just hitting refresh and soon a few times, but cant get it working other than in edge now, and actually, unless there is a client with little cash who will notice, I just go for inefficient and costly queries which return the superset of results and do some filters on the results. Mostly running in nodejs and its nippy enough for my purposes. Real shame to ramp up the read counts and consequential bills, but just doesn't seem a fix.

Related

immutable _id error when performing MongoDB bulkWrite replaceOne on first attempt only

I'm working on a little web application that will crawl and update baseball standings by day and track teams positions (among other things) over time.
I have an API I grab all of this from and a collection in MongoDB that stores all the team data and information for the current day. Right now I just run this manually but eventually it'll be automated to run at like 3am or whenever.
The API supplies a unique ID for each team that never changes. So what I'm doing is I'm taking in the team data from the API. Passing it to a function that then extracts the teams data (there is other data from the response object I don't need), puts it into an object for replacement, and then wherever that team ID exist in the collection its document is replaced in a bulkWite.
async function currentStandings(db,team_standings,callback){
const current_standings = db.collection('current_standings');
let replacePool = [];
for(const single_team of team_standings.data.standing){
let replaceOnePusher = {
replaceOne: {
"filter": {"team_id": single_team.team_id},
"replacement": single_team
}
}
replacePool.push(replaceOnePusher);
}
await current_standings.bulkWrite(replacePool);
callback();
}
However when I execute this code for the first time each day I get an error reading BulkWriteError: After applying the update, the (immutable) field '_id' was found to have been altered to _id: ObjectId('5f26e57b6831761ac840bf1d') (not the same ID every day) and if I look in Compass the data isn't updated. If I immediately run the script again, it goes through successfully without error. Refreshing the data in compass generates the correct data.
Can someone explain to me what is going wrong here? This is actually my first time using MongoDB since I wanted to learn it and this pet project seemed like a good place to start.

Multiple member mdx query returns error (permission to access the specified member)

I want to mention that I'm new to SSAS and MDX.
In the past several days I've been dwelling with an excel generated query that errors out.
The problem is that a query is generated by excel when trying to read data from an online cube data source preventing other reads for that cube. The query is executed against an AZURE cube and I manage to profile it and get the following query:
with set __XLUniqueNames as {[Stores].[Chain].[Chain].&[SuperBrugsen], [Stores].[Chain].[Chain].&[Salling], [Stores].[Chain].[Chain].&[SuperBrugsen] }
set __XLDrilledUp as
Generate(__XLUniqueNames,
{ IIF([Stores].[Chain].currentmember.LEVEL_NUMBER <= 2147483647,
[Stores].[Chain].currentmember,
Ancestor([Stores].[Chain].currentmember,
[Stores].[Chain].currentmember.LEVEL_NUMBER - 2147483647)) } )
member [Measures].__XLPath as
Generate(
Ascendants([Stores].[Chain].currentmember),
[Stores].[Chain].currentmember.unique_name,
"__XLPSEP")
select { [Measures].__XLPath } on 0,
__XLDrilledUp on 1
from [SomeCube]
cell properties value
Each time query contains more than one member (an existing member from that dimension) it errors out with this message:
"Either you do not have permission to access the specified member or the specified member does not exist.".
What I have tried:
First, I tried to identify a pattern of member combinations that errors out, with no luck. It seems that for some certain members I get the error and for some, It doesn't. For single member, duplicate members and combination of members that don't exist in the cube it doesn't error.
Second, I did try the query on a different cube (on-premise SSAS) and I didn't get the error.
Third, by modifying the connection string I tried to make Excel ignore the missing members in the hope it will work using the "MDXMissingMemberMode" flag set to Ignore. I didn't work.
Forth, I tried to dissect the query to see which clause was giving the error. With my limited knowledge of MDX I suspect that "currentmember" with its "LEVEL_NUMBER" property is at fault. My guess is that it fails to get the current member for the next member in the set.
Fifth, the last thing and the longest, by accident I discovered that in SSMS you can execute a query in an mdx session (Right-click on cube -> New query) or you can open the cube in browse mode (Right-click on cube -> Browse) which results in a UI similar to the mdx query like.
No here comes the surprise, in this browse "mode" my query executes successfully each time. Intrigued by this I started to profile the request and see what was different. The difference was some additional xml structure like a list with properties. Seeing this I figured I could manipulate my connection string from excel to send some of the properties to make it work, but in the end, I didn't work.
Additional proprieties that worked:
<PropertyList xmlns="urn:schemas-microsoft-com:xml-analysis">
<Catalog>SomeCatalog</Catalog>
<ShowHiddenCubes>true</ShowHiddenCubes>
<SspropInitAppName>Microsoft SQL Server Management Studio</SspropInitAppName>
<Timeout>3600</Timeout>
<LocaleIdentifier>1033</LocaleIdentifier>
<ClientProcessID>24400</ClientProcessID>
<DataSourceInfo/>
<Format>Tabular</Format>
<Content>Schema</Content>
<DbpropMsmdFlattened2>true</DbpropMsmdFlattened2>
<ReturnCellProperties>true</ReturnCellProperties>
<DbpropMsmdActivityID>2309dfa2-3607-41b2-9446-8ece2f5ababa</DbpropMsmdActivityID>
<DbpropMsmdCurrentActivityID>2309dfa2-3607-41b2-9446-8ece2f5ababa</DbpropMsmdCurrentActivityID>
<DbpropMsmdRequestID>d3dbd079-5ca7-496c-ab55-afea71889238</DbpropMsmdRequestID>
</PropertyList>
Additional properties that didn't work:
<PropertyList xmlns="urn:schemas-microsoft-com:xml-analysis">
<Catalog>SomeCatalog</Catalog>
<SspropInitAppName>Microsoft SQL Server Management Studio - Query</SspropInitAppName>
<LocaleIdentifier>1033</LocaleIdentifier>
<ClientProcessID>24400</ClientProcessID>
<DataSourceInfo/>
<Format>Native</Format>
<AxisFormat>TupleFormat</AxisFormat>
<Content>SchemaData</Content>
<Timeout>0</Timeout>
<DbpropMsmdActivityID>e5e75ad6-8fca-4f25-abba-047f86198602</DbpropMsmdActivityID>
<DbpropMsmdCurrentActivityID>e5e75ad6-8fca-4f25-abba-047f86198602</DbpropMsmdCurrentActivityID>
<DbpropMsmdRequestID>8901787f-15a7-48a0-86eb-18ff0b92bdc4</DbpropMsmdRequestID>
</PropertyList>
Excel additional properties:
<PropertyList xmlns="urn:schemas-microsoft-com:xml-analysis" xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">
<Catalog>SomeCatalog</Catalog>
<Timeout>0</Timeout>
<Format>Native</Format>
<DbpropMsmdFlattened2>false</DbpropMsmdFlattened2>
<SafetyOptions>2</SafetyOptions>
<Dialect>MDX</Dialect>
<MdxMissingMemberMode>Error</MdxMissingMemberMode>
<DbpropMsmdOptimizeResponse>9</DbpropMsmdOptimizeResponse>
<DbpropMsmdActivityID>9D69640F-553A-4970-BD4E-7234F1CD928C</DbpropMsmdActivityID>
<DbpropMsmdRequestID>B5E10FF0-EF2F-409E-83BF-CD2DBA20C2BE</DbpropMsmdRequestID>
<LocaleIdentifier>1030</LocaleIdentifier>
<DbpropMsmdMDXCompatibility>1</DbpropMsmdMDXCompatibility>
</PropertyList>
Result of a single member working mxd query:
SuperBrugsen [Stores].[Chain].[Chain].&[SuperBrugsen]__XLPSEP[Stores].[Chain].[All]
This all the info that I could gather for my problem. My next step is to get to Microsoft for help by I don't want to do that just yet due to the costs.
Can someone of you guys please help me out? any ideas or suggestion are most welcomed because I ran out of ideas.
It seems that the problem solved itself. Most likely there was an update that solved this issue. Ref. to azure update logs page: https://azure.microsoft.com/en-us/updates/?product=analysis-services&status=nowavailable

Query Google Cloud Datastore to retrieve matching results

I am using Google Cloud Datastore to save my application data. I have to add a query to get all results matching with Name, Brand or Sku.
Query data with one of the field is returning me records but using all fields together returns me error.
Query:
const term = "My Red";
const q = gstore.createQuery(req.params.orgId, "Variant")
.filter('brand', '=', term)
.filter('sku', '=', term)
.limit(10);
Error:
{"msec":435.96913800016046,"error":"no matching index found.
recommended index is:- kind: Variant properties: -
name: brand - name:
sku","data":{"code":412,"metadata":{"_internal_repr":{}},"isBoom":true,"isServer":true,"data":null,"output":{"statusCode":500,"payload":{"statusCode":500,"error":"Internal
Server Error","message":"An internal server error
occurred"},"headers":{}}}} Debug: internal, error
Also, I want to perform OR operation to get matching results as above will return data with AND operation.
Please help me to find correct path to achieve the desired result.
Thanks in advance and let me know if something is not clear.
The error indicates that the composite index required by the respective query is not in Serving state.
That means it's either not created/deployed or it was recently deployed and is still being built.
Composite indexes must be specifically created and deployed in your app.
If you didn't create it you need to do so. The error message indicates the content the index configuration requires. If you're using the development server it might create it automatically, but you still need to deploy it.
See Indexes docs for more details.
If you recently deployed the composite index please note that it can take some significant amount of time until the matching index is built, depending on how many entities of that kind already exist in the Datastore. You can check the status of the index building in the developer console, on the Indexes page

Referencing external doc in CouchDB view

I am scraping an 90K record database using JSON-RPC and I am trying to put in some basic error checking. I want to start by scraping the database twice using two different settings and adding a prefix to the second scrape. This way I can check to ensure that the two settings are not producing different records (due to dropped updates, etc). I wanted to implement the comparison using a view which compares each document from the first scrape with it's twin produced by the second scrape and then emit the names of records with a difference between them.
However, I cannot quite figure out how to pull in another doc in the view, everything I have read only discusses external docs using the emit() function, which is too late to permit me to compare it. In the example below, the lookup() function would grab the referenced document.
Is this just not possible?
function(doc) {
if(doc._id.slice(0,1)!=='$' && doc._id.slice(0,1)!== "_"){
var otherDoc = lookup('$test" + doc._id);
if(otherDoc){
var keys = doc.value.keys();
var same = true;
keys.forEach(function(key) {
if ((key.slice(0,1) !== '_') && (key.slice(0,1) !=='$') && (key!=='expires')) {
if (!Object.equal(otherDoc[key], doc[key])) {
same = false;
}
}
});
if(!same){
emit(doc._id, 1);
}
}
}
}
Context
You are correct that this is not possible in CouchDB. The whole point of the map function is that it must be idempotent, otherwise you lose all the other nice benefits of a pre-calculated index.
This is why you cannot access external resources in the map function, whether they be other records or the clock. Any time you run a map you must always get the same result if you put the same record into it. Since there are no relationships between records in CouchDB, you cannot promise that this is possible.
Solution
However, you can still achieve your end goal, just be different means. Some possibilities...
Assuming there is some meaningful numeric value in each doc, you could use a view to take the sum of all those values and group them by which import you did ({key: <batch id>, value: <meaningful number>}). Then compare the two numbers in your client or the browser to see if they match.
A brute force approach would be to use a view to pair the docs that should match. Each doc is on a different row, but they're grouped by a common field. Then iterate through the entire index comparing the pairs. This would certainly be the quickest to code and doesn't depend on your application or data.
Implement a validation function to enforce a schema on your data. Just be warned that this will reduce your write throughput since each written record will be piped out of Erlang and into the JS engine. Also, this is only applicable if you're worried about properly formed records instead of their precise content, which might not be the case.
Instead of your different batch jobs creating different docs, have them place them into the same doc. The structure might look like this: { "_id": "something meaningful", "batch_one": { ..data.. }, "batch_two": { ..data.. } } Then your validation function could compare them or you could create a view that indexes all the docs that don't match. All depends on where in your pipeline you want to do the error checking and correction.
Personally I like the last option better, but only if you don't plan to use the database as is in production. Ie., you wouldn't want to carry around all that extra data in each record.
Hope that helps.
Cheers.

Point-in-time restores of databases and documents using Cloudant

How can I save changes in CouchDB / Cloudant in order to later do point-in-time restores of my databases, or even specific documents?
We’re working on making this a first-class feature, but until we roll it out, this is how one of our customers did it:
You have collections, and within those collections, resources. So, you keep a logging database where every document has an ID like collection-resource, so for a collection named "cars" and a resource named "Ford", you'd have a document in your logging database named cars-ford. That document looks like this:
{
versions: [...]
}
Any time that resource is touched or modified, your application updates the logging document by appending the new version to the end of the versions field. That version might look like this:
{
timestamp: '...', # some integer timestamp, for sorting
doc: {...} # attributes of the document as of the save
}
We'll use that view to return a list of all versions of all documents, sorted by when each change occurred.
Then, here's how you use that to do restores and the like:
Getting the most recent version of a resource
Get the document in its entirety, and grab the last element in the versions field. That's the most recent version.
See all versions relative to a timestamp
We'll create a view to sort by timestamp. The view looks like this:
{
map: "function(doc) {
for(var i in doc.versions){
emit(doc.versions[i].timestamp, doc.versions[i].doc);
}
}"
}
Say our database is named loggy, the design doc where our views live is named restore, and the view itself is named time. Then we'll make a GET request to this URL:
{CLOUDANT_HOST}/loggy/_design/restore/_view/time?startkey='...'
...where the value for startkey is some timestamp. This, unmodified, will return every version after the indicated timestamp. Add limit=X and you'll get the X versions after the timestamp. Add descending=true and you'll get versions before the timestamp, instead of after.
See the Nth revision for a resource
Much like above, but we'll tweak our view a little:
{
map: "function(doc){
for(var i in doc.versions){
emit(i, doc.versions[i].doc);
}
}"
}
Now our view results are keyed by index rather than timestamp. So, instead of passing a timestamp to startkey, we just pass N to versions around the Nth revision.
Getting the number of revisions for a collection or resource
We'll use another view to group by collection and resource:
{
map: "function(doc){
// split te ID into collection and resource
var parts = doc._id.split('-');
// emit them as keys so we can group by them
emit([doc.parts[0], doc.parts[1]], null);
}",
reduce: "_count"
}
Use the query parameter group and group_level to group results by their keys. So, if we want the number of events that have touched resources in the cars collection, we would use a querystring like this:
?group=true&group_level=1&key="cars"
group groups results whose keys are the same, but group_level=1 says "only group on the first key", which in our case is the collection. key specifies to only return documents whose key matches the given value.
Getting all resources for a given collection
Using the _all_docs view, we'll use a querystring like this:
?reduce=false&startkey="{collection}-"&endkey="{collection}0"
Remember the reduce part of our function? That _count value means "return the number of records emitted by map". reduce=false means "Don't do that." Instead, only the map function is run.
That startkey and endkey pair uses how Cloudant sorts results to exclude everything but the values matching IDs that start with the given collection.
Updating docs
Once you've got the versions you'd like to restore, GET the current version of the resource, GET the past version from the loggy database, and PUT the past version to the resource using the current version's _rev value. Bam, restored. Rinse and repeat for point-in-time restore.

Resources