Is there a way to limit azure cognitive search results based on a condition for e.g.:
if content.length < 500:
I have several thousand pdf files indexed and many files are completely useless have less content. I don't want those files to show up in the search response.
I cannot delete them manually as these files are in large number.
Any help would be highly appreciated.
If you're using a blob indexer to populate your search index you can add a new/additional index field and populate it with metadata_storage_size. Be sure that this "size" field is configured as filterable, and you should be able to use that field to filter out small PDFs.
https://learn.microsoft.com/en-us/azure/search/search-howto-indexing-azure-blob-storage#how-azure-cognitive-search-indexes-blobs
If you're populating the data in your search index manually, I think you'll still need a field to hold the document's size, and will need to populate it yourself.
Related
I have a combined index that crawls data from Azure SQL and Blob and are mapped based a common column.
The mapped blob document is optional.
When there is no blob document the search indexer indexes the respective SQL row setting the content propertyas null and if the document is available the content property shows correct data.
I have enabled BLOB deletion tracking and the issue comes when a blob document is deleted. The deletion policy triggers and also removes the mapped SQL row values from index.
I was expecting that the content property will be set to null, but the deletion policy also removes the mapped SQL row values from index.
What am I doing wrong? Kindly help.
Thanks a lot in advance..
BR
The behavior you are seeing is by default. SQL Deletion Policy deletes full index documents based on their document Key, not specific fields. The policy doesn't have a way to know the index has values from other sources too. If there is a combined index and you would like the behavior you are looking for, you may try instead using a Logic App SQL trigger to update only specific fields with Add, update or delete documents, instead of the Deletion Policy.
I am creating a Saved Search for my team where users can filter by different parameters but the most important one is a ‘Keyword’ field where we have multiple text strings separated by commas. Eg: One could be (Horses, Apples, Cows, Carrots, Balloons) and another could be (Apples, Cake, Silver, Horses, Bananas)
I want to be able to use the free text search field to look up all rows where I can find a relevant entry.
Eg. Let’s say I type “Apples” and “Horses”. I want to see all entries where these are found together.
I have tried setting the criteria to “Contains” but can’t seem to use operators in the input field. I have also tried to use expressions but got You cannot use an expression builder criteria filter as an available filter" as an error.
I’m not familiar with NetSuite but willing to learn. I was able to create this in Google Sheets. Since we already store our information on NS already, I want to find a way to do it there. Is there a way to achieve this?
Thank you.
When you create the saved search, you can just specify a default value that will be used in the initial search load (e.g. contains Apples). In the Available Filters tab, select the same filter and check Show in Filter Region.
When users run the saved search, they can change the criteria by typing into the field and pressing Tab after (if you press Enter instead of Tab, the results will be downloaded into a CSV file instead of being displayed in the page). In your example, they should type 'apples%horses' then press Tab.
Additional reference: https://www.sikich.com/insight/using-formula-values-as-available-filters-in-netsuite-saved-searches/
Update:
Use 'has keywords' instead of 'contains' in the filter. When viewing the results, separate keywords with a comma. Example: 'apples, horses'
I have a collection in CosmosDB that has a large number of JSON files. I have a python program that continuously writes and uploads data to that collection. My format of the data just changed, so I am now writing files with a new structure. I have to delete all the files in my collection with the old structure.
Question 1: Do the documents have a date of creation tag? If so, I would like to delete all the files that have a date of creation earlier than a specific date. How can I do that?
Question 2: If the answer to the previous question is no, there is a way I can query parts inside all the old files I want to delete. I cannot query the files entirely, but I can query what's inside of them. So is there a way to delete the entire documents based on the query of what's inside of them? Maybe if there is a way to retrieve all the document IDs that are used to respond to my query, then it would be possible.
All documents have a property called _ts which is the unix timestamp of when the document was created and is auto populated by Cosmos. You should be able to query using this property to find all the documents created before a specific date.
Does anyone know how I can retrieve the NUMBER OF ITEMS in a category in netsuite?
I'm hoping for a getAttribute tag of some sort. I need the count of total items in order to create a pagination string url.
The simple method is to create a saved search with the criteria/filters you need.
You can create saved search either programmatically or by using the tools available in netsuite.
The length of the result of saved search will show the total NUMBER OF ITEMS.
NOTE : If you want, you can retrieve the entire details of each product from saved search results.
I am building a search engine, and have a not so unique ID for a lot of different names... So, for example, there could be an id of B0051QVF7A which would have multiple names like "Kindle" "Amazon Kindle" "Amazon Kindle 3G" "Kindle Ebook Reader" "New Kindle" etc.
The problem, and question i have, is that i am trying to enter this data from a DB of 11 ish million rows. each is being read one at a time. So i dont have all the names of each ID. I am adding new documents to the list each time.
What i am trying to find out is how do i add names to an existing Document? if i am reading documentation correctly, it seems to overwrite the whole document, not add extra info to the field... i just want to add an extra name to the document multivalue field...
I know this could cause some weird and wonderful "issues" if a name is removed (in the example above, "New Kindle" could be removed when a newer Kindle gets released) but i am thinking of recreating the index every now and again, to clear out issues like that (once a month or so. Its taking about 45min currently to create the index).
So, how do you add a value to a multivalue field in solr for an existing document?
Since according to the question linked to by #Mauricio Scheffer's comment... Solr does not currently support updating a single field value in an existing document. I see that there might be a couple of options here...
In your process that is pulling data from the database, when it finds a new name, it will need to pull all fields for the existing document from Solr, add the new value and resend the complete document to Solr (you may already be doing this).
Add some additional logic to your code that reads from the database, to gather all of the unique names for each document prior to inserting documents into the index. However, given that you have ~11 million records, there could be a resource constraint that would prevent this from being feasible.