Indexing Sharepoint Online Site in Azure search

Indexing Sharepoint Online Site in Azure search - sharepoint-online

I'm trying to index a Sharepoint Website document library and i followed this tutorial.
https://learn.microsoft.com/en-us/azure/search/search-howto-index-sharepoint-online
Everything is working fine i have my skillset triggered and standard Sharepoint columns are retrieved.
My main issue is that i want to retrieve custom columns created in this library.
For exemple i have a colum displayed as "Document usage" , the technical name is "DocumentUsage" and as i saw in the documentation , to get custom fields, you need to specify it in the connection string of datasource created in Azure Search like below :
"container": { "name": "useQuery", "query": "includeLibrariesInSite=https://myorganisation.sharepoint.com/sites/mysite;additionalColumns:DocumentUsage,UploadableDocument,DocumentFormat,Langage,TargetForUse,PaidContent,PublicationDate,Activity,TargetApplication,ProductCategory,SerialNumber,KeyWords"
And i made a mapping of the fields in my indexer like below :
"fieldMappings": [
{
"sourceFieldName": "metadata_spo_site_library_item_id",
"targetFieldName": "id",
"mappingFunction": {
"name": "base64Encode",
"parameters": null
}
},
{
"sourceFieldName": "content",
"targetFieldName": "content",
"mappingFunction": null
},
{
"sourceFieldName": "DocumentUsage",
"targetFieldName": "document_usage",
"mappingFunction": null
},
{
"sourceFieldName": "UploadableDocument",
"targetFieldName": "uploadable_document",
"mappingFunction": null
},
{
"sourceFieldName": "DocumentFormat",
"targetFieldName": "document_format",
"mappingFunction": null
},
{
"sourceFieldName": "Langage",
"targetFieldName": "language_sp",
"mappingFunction": null
},
{
"sourceFieldName": "TargetForUse",
"targetFieldName": "target_for_use",
"mappingFunction": null
},
{
"sourceFieldName": "PaidContent",
"targetFieldName": "paid_content",
"mappingFunction": null
},
{
"sourceFieldName": "PublicationDate",
"targetFieldName": "publication_date",
"mappingFunction": null
},
{
"sourceFieldName": "Activity",
"targetFieldName": "activity",
"mappingFunction": null
},
{
"sourceFieldName": "TargetApplication",
"targetFieldName": "target_application",
"mappingFunction": null
},
{
"sourceFieldName": "ProductCategory",
"targetFieldName": "product_category",
"mappingFunction": null
},
{
"sourceFieldName": "SerialNumber",
"targetFieldName": "serial_number",
"mappingFunction": null
},
{
"sourceFieldName": "KeyWords",
"targetFieldName": "key_words",
"mappingFunction": null
}
],
Everything is working fine except for my additional sharepoint custom columns.
My issue is that standard fields are retrieved but not the custom ones specified in the datasource query.
Do i have to put the technical names or the display names of the columns?
Is there a trick to retrieve special columns ?
As you can see , standard ones are retrieved but not the custom one which are filled in the document library.
Thanks in advance for help.
Regards

There is a typo in your data source 'container' => 'query' field value. You have a colon after 'additionalColumns' (additionalColumns:DocumentUsage) and equal (=) must be instead.
Try with replacing additionalColumns:DocumentUsage with additionalColumns=DocumentUsage

Related

Unable to retrieve a document from Mongo (using mongoose) with a partial match

I have a project in NodeJS (v16.14.0) in which I want to filter a list of documents. Most of the attributes are string and such they are easy to access yet, with the specialty (obj) attribute I am not able to get the info that I want:
{
"_id": {
"$oid": "--"
},
"name": "string",
"email": "string",
"password": "string",
"phoneNumber": "number",
"city": "string",
"myDescription": "string",
"specialty": {
"certified": ["plomero","electricista"],
"inProgress": ["albañil"],
"nonCertified": ["mecanico","veterinario"]
},
"image": {
"profile": [
"string",
"string"
],
"myJobs": [
"string",
"string"
]
},
"socialSecurity": {
"eps": "string",
"arl": "string"
},
"availability": {
"schedule": "string",
"fullAvailability": false
}
}
I want to use a string and return a list of documents that contain the given string in the certified attribute, for example:
db.collection.find({ specialty: { certified: { $in: ['plomero'] } } })
// return the document shown above
But I always get nothing. I have tried using $in and $elemMatch with no results. I can only retrieve the document as long as I copy the entire object:
db.collection.find({ specialty: { certified: ['plomero', 'electricista'], inProgress: ['albañil'], nonCertified: ['mecanico', 'veterinario'] } })
I have no clue on how to proceed, I have been reading MongoDB documentation but cannot find an example similar to this... Also I am quite new with Mongo.
Thanks for any guidance!

use find
db.collection.find({
"specialty.certified": "plomero"
})
mongoplayground

How to define a default value when creating an index in Elasticsearch

I need to create an index in elasticsearch by assigning a default value for a field. Ex,
In python3,
request_body = {
"settings":{
"number_of_shards":1,
"number_of_replicas":1
},
"mappings":{
"properties":{
"name":{
"type":"keyword"
},
"school":{
"type":"keyword"
},
"pass":{
"type":"keyword"
}
}
}
}
from elasticsearch import Elasticsearch
es = Elasticsearch(['https://....'])
es.indices.create(index="test-index", ignore=400, body= request_body)
in above scenario, the index will be created with those fields. But i need to put a default value to "pass" as True. Can i do that here?

Elastic search is schema-less. It allows any number of fields and any content in fields without any logical constraints.
In a distributed system integrity checking can be expensive so checks like RDBMS are not available in elastic search.
Best way is to do validations at client side.
Another approach is to use ingest
Ingest pipelines let you perform common transformations on your data before indexing. For example, you can use pipelines to remove fields, extract values from text, and enrich your data.
**For testing**
POST _ingest/pipeline/_simulate
{
"pipeline": {
"processors": [
{
"script": {
"lang": "painless",
"source": "if (ctx.pass ===null) { ctx.pass='true' }"
}
}
]
},
"docs": [
{
"_index": "index",
"_type": "type",
"_id": "2",
"_source": {
"name": "a",
"school":"aa"
}
}
]
}
PUT _ingest/pipeline/default-value_pipeline
{
"description": "Set default value",
"processors": [
{
"script": {
"lang": "painless",
"source": "if (ctx.pass ===null) { ctx.pass='true' }"
}
}
]
}
**Indexing document**
POST my-index-000001/_doc?pipeline=default-value_pipeline
{
"name":"sss",
"school":"sss"
}
**Result**
{
"_index" : "my-index-000001",
"_type" : "_doc",
"_id" : "hlQDGXoB5tcHqHDtaEQb",
"_score" : 1.0,
"_source" : {
"school" : "sss",
"pass" : "true",
"name" : "sss"
}
},

Custom junction table in strapi

I have recently started working on strapi and was looking at the relations inside model in their documentation. My scenario is as follows. I have a model named course and another named tag. They have many to many relationship between them.This is what ./api/course/models/course.settings.json has when I made the relation between them named as tag2.
{
"connection": "default",
"collectionName": "course",
"info": {
"name": "course"
},
"options": {
"increments": true,
"timestamps": true
},
"attributes": {
"image_link": {
"type": "string"
},
"created_by": {
"columnName": "created_by_id",
"plugin": "users-permissions",
"model": "user"
},
"updated_by": {
"columnName": "updated_by_id",
"plugin": "users-permissions",
"model": "user"
},
"title": {
"type": "string"
},
"short_description": {
"type": "text"
},
"slug": {
"type": "string",
"unique": true
},
"tags2": {
"collection": "tag",
"via": "courses",
"dominant": true
}
}
}
When I specify the relation using the admin panel strapi itself made a junction table named as courses_tags_2_s__tags_courses.
Here is what tag model looks like
{
"connection": "default",
"collectionName": "tag",
"info": {
"name": "tag",
"mainField": "ui_label"
},
"options": {
"increments": true,
"timestamps": true
},
"attributes": {
"code": {
"type": "string"
},
"description": {
"type": "string"
},
"created_by": {
"plugin": "users-permissions",
"model": "user",
"columnName": "created_by_id"
},
"updated_by": {
"plugin": "users-permissions",
"model": "user",
"columnName": "updated_by_id"
},
"ui_label": {
"type": "string"
},
"courses": {
"via": "tags2",
"collection": "course"
}
}
}
I have a couple of questions
1) Is there a way I can set up the junction table as courses_tags ? i.e overriding the strapi one
2) I have set my mainField as "ui_label" in tag.settings.json but in the admin panel while editing course table content(rows in course table), in the related field of tag2 I see "code" field shown there instead of "ui_label". How to set the mainField?
Note: I have setup strapi with mysql server.

so to answer your first question, there is currently no way to override the join table between two models. This is totally auto-generated by Strapi.
For the second question, this part of the docs is out to date.
To manage display information you will have to use the content manager configuration in the admin panel.
Here a short video - https://www.youtube.com/watch?v=tzipS2CePRc&list=PL7Q0DQYATmvhlHxHqfKHsr-zFls2mIVTi&index=5&t=0s

For 1) Is there a way I can set up the junction table as courses_tags ? i.e overriding the strapi one:
You can specify the following option:
"collectionName": "courses_tags"

CouchDB index with $or and $and not working but just $and does

For some reason, I have the following .find() commands and I am getting conflicting indexing errors. Below are examples of one working when I only try to get one type of document. But then if I try to get 2 types of documents it doesn't work for some reason.
Does anyone know why this would be the case?
My index file:
{
"_id": "_design/index",
"_rev": "3-ce41abcc481f0a180eb722980d68f103",
"language": "query",
"views": {
"index": {
"map": {
"fields": {
"type": "asc",
"timestamp": "asc"
},
"partial_filter_selector": {}
},
"reduce": "_count",
"options": {
"def": {
"fields": [
"type",
"timestamp"
]
}
}
}
}
}
Works:
var result = await POUCHDB_DB.find({
selector:{
$and: [{type:"document"},{uid:"123"}]
},
limit:50,
bookmark: bookmark,
sort: [{timestamp: "desc"}]
});
Doesn't work:
var result = await POUCHDB_DB.find({
selector:{
$or: [
{$and: [{type:"document"},{uid:"123"}]},
{$and: [{type:"page"},{uid:"123"}]}
]
},
limit:50,
bookmark: bookmark,
sort: [{timestamp: "desc"}]
});

Missing timestamp in selector
In order yo use the timestamp to sort, it must be in your selector. You can simply add it with a "$gte":null.
Redundant condition
The uid seems redundant for your query. For this reason, I would add it into a separate condition.
Finally, in order to use your index, you should create an index with the following fields: uid, timestamp, type (I think this one is optional).
{
"selector": {
"$and": [{
"uid": "123",
"timestamp": {
"$gte": null
}
},
{
"$or": [{
"type": "document"
},
{
"type": "page"
}
]
}
]
},
"sort": [{
"timestamp": "desc"
}]
}
Recommandation
If you want your queries to use your index, I would recommend to specify the "use_index" field. If you can version your indexes and queries, it will make the queries faster.

Speeding up Cloudant query for type text index

We have a table with this type of structure:
{_id:15_0, createdAt: 1/1/1, task_id:[16_0, 17_0, 18_0], table:”details”, a:b, c: d, more}
We created indexes using
{
"index": {},
"name": "paginationQueryIndex",
"type": "text"
}
It auto created
{
"ddoc": "_design/28e8db44a5a0862xxx",
"name": "paginationQueryIndex",
"type": "text",
"def": {
"default_analyzer": "keyword",
"default_field": {
},
"selector": {
},
"fields": [
],
"index_array_lengths": true
}
}
We are using the following query
{
"selector": {
"createdAt": { "$gt": 0 },
"task_id": { "$in": [ "18_0" ] },
"table": "details"
},
"sort": [ { "createdAt": "desc" } ],
"limit”: 20
}
It takes 700-800 ms for first time, after that it decreases to 500-600 ms
Why does it take longer the first time?
Any way to speed up the query?
Any way to add indexes to specific fields if type is “text”? (instead of indexing all the fields in these records)

You could try creating the index more explicitly, defining the type of each field you wish to index e.g.:
{
"index": {
"fields": [
{
"name": "createdAt",
"type": "string"
},
{
"name": "task_id",
"type": "string"
},
{
"name": "table",
"type": "string"
}
]
},
"name": "myindex",
"type": "text"
}
Then your query becomes:
{
"selector": {
"createdAt": { "$gt": "1970/01/01" },
"task_id": { "$in": [ "18_0" ] },
"table": "details"
},
"sort": [ { "createdAt": "desc" } ],
"limit": 20
}
Notice that I used strings where the data type is a string.
If you're interested in performance, try removing clauses from your query one at-a-time to see if one is causing the performance problem. You can also look at the explanation of your query to see if it using your index correctly.
Documentation on creating an explicit text query index is here

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Indexing Sharepoint Online Site in Azure search - sharepoint-online

There is a typo in your data source 'container' => 'query' field value. You have a colon after 'additionalColumns' (additionalColumns:DocumentUsage) and equal (=) must be instead. Try with replacing additionalColumns:DocumentUsage with additionalColumns=DocumentUsage

Related

Unable to retrieve a document from Mongo (using mongoose) with a partial match

How to define a default value when creating an index in Elasticsearch

Custom junction table in strapi

CouchDB index with $or and $and not working but just $and does

Speeding up Cloudant query for type text index

Categories

Resources