Pushing documents(blobs) for indexing - Azure Search - azure

I've been working in Azure Search + Azure Blob Storage for while, and I'm getting trouble indexing the incremental changes for new files uploaded.
How can I refresh the index after upload a new file into my blob container? Following my steps after upload file(I'm using rest service to perform these actions): I'm using the Microsoft Azure Storage Explorer [link].
Through this App I've uploaded my new file to a folder already created before. After that, I used the Http REST to perform a 'Run' indexer command, you can see in this [link].
The indexer shows me that my new file was successfully added, but when I go to search the content in this new file is not found.
Please, anybody knows how to add this new file in Index and also how to find this new file by searching for his content?
I'm following Microsoft tutorials, but for this issue, I couldn't find a solution.
Thanks, guys!

Assuming everything is set up correctly, you don't need to do anything special - new blobs will be picked up and indexed the next time indexer runs according to its schedule, or you run the indexer on demand.
However, when you run the indexer on demand, successful completion of the Run Indexer API means that the request to run the indexer has been submitted; it does not mean that the indexer has finished running. To determine when the indexer has actually finished running (and observe the errors, if any), you should use Indexer Status API.
If you still have questions, please let us know your service name and indexer name and we can take a closer look at the telemetry.

I'll try to describe how can I figured out this issue.
Firstly, I've created a DataSource through this command:
POST https://[service name].search.windows.net/datasources?api-version=[api-version]
https://learn.microsoft.com/en-us/rest/api/searchservice/create-data-source.
Secondly, I created the Index:
POST https://[servicename].search.windows.net/indexes?api-version=[api-version]
https://learn.microsoft.com/en-us/rest/api/searchservice/create-index
Finally, I created the Indexer. The problem happened at this moment because it is where all configurations are setted.
POST https://[service name].search.windows.net/indexers?api-version=[api-version]
https://learn.microsoft.com/en-us/rest/api/searchservice/create-indexer
After all these things done. The Index starts indexing all contents automatically (once we have contents into blob storage).
The crucial thing comes now. while your index is trying to extract all 'text' into your files, could occur some issue when the type of file is not 'indexable'. For example, there are two properties that you must pay attention excluded extensions, indexed extensions.
If you don't write the types properly, the Index throws an exception. Then, The Feedback Message(in my opinion is not good, was like a 'miss lead') says to avoid this error you should set the Indexer to '"dataToExtract" : "storageMetadata"'.
This command means that you are trying just index the metadata and no more the content of your files, then you cannot search by this and retrieve.
After that, the same message at the bottom says to avoid these issue you should set two properties (who solved the problem)
"failOnUnprocessableDocument" : false,"failOnUnsupportedContentType" : false
In addition, now everything is working properly. I appreciate your help #Eugene Shvets, and I hope this could be useful for someone else.

Related

Archive not found in the storage location: Google Function

I have a Running Google Function which I use in my code and it works fine.
But when I go to Google function to see the source code it shows:
Archive not found in the storage location
Why Can't I see my source code? What should I do?
Runtime: Node.js 10
There are two possible reasons:
You may have deleted the source bucket in Google Cloud storage. Have you perhaps deleted GCS bucket named like gcf-source-xxxxxx? It is the source storage where your code is archived. If you are sure you have deleted the source bucket, There is no way to restore your source code.
Much more likely, though, is that you did not delete anything but instead renamed the bucket, for example by choosing a near city for the location settings. If the GCS bucket's region does not match your Cloud function region, the error is thrown. You should check both services' region.
You can check the Cloud Function's region at details -> general information
This error had appeared before when I browsed the Google Storage location that is used by the cloud function - without deleting anything there. It might have happened, though, that I changed the location / city of the bucket to Region MY_REGION (MY_CITY). In my case, the CF was likely already at the chosen region, therefore the other answer above, bullet point 2., probably does not cover the whole issue.
I guess it is about a third point that could be added to the list:
+ 3. if you firstly choose a region at all, the bucket name gets a new suffix that was not there before, that is, from gcf-sources-XXXXXXXXXXXX to gcf-sources-XXXXXXXXXXXX-MY_REGION. Then, the CF is not be able to find its source code anymore in the old bucket address. That would explain this first error.
First error put aside, now the error in question appears again, and this time I have not done anything to get Google app engine deployment fails- Error while finding module specification for 'pip' (AttributeError: module '__main__' has no attribute '__file__'). I left it for two days, doing anything, only to get the error in question afterwards. Thus, you seem to sometimes just lose your deployed script out of nowhere, better keep a backup before each deployment.
Solution:
Create a new Cloud Function or
edit the Cloud Function, choose Inline Editor as source code, create the default files for Runtime Node.js 10 manually and fill them with your backup code.

Archiving Azure Search Service

Need suggestion on archiving unused data from search service and reload it back when needed(reload to be done later).
Initial design draft looks like this:
Find the keys from search service based on some conditions(like take inactive, how old) that need to be archived.
Run achiever job(need suggestion here, could be a web job, function app)
Fetch the data and insert to blob storage and delete it from the search service.
Now the real way is to run the job in the pool and should be asynchronous
There's no right / wrong answer for this question. What you need to do is perform batch queries (up to 1000 docs), and schedule it to archive past data (eg. run an Azure function which will trigger and search for docs where createdDate > DataTime.Now).
Then persist that data somewhere (can be a cosmos db or as blob into storage account). Once you need to upload it again, I would consider it as a new insert, so it should follow your current insert process.
You can also take a look on this tool which helps to copy data from your index pretty quick:
https://github.com/liamca/azure-search-backup-restore

Active storage returns attached? as true in console but on rails server it return false

I am new to rails ActiveStorage and facing some issues in image uploading.
while i try to upload image it uploaded successfully but when i try to get the image it returns attached as false. But when i try same record in console it return the image url.
Rails server output:
I ran into a similar situation when I had multiple records attached to the same blob.
Not sure if that's what happened here, if you had 2 companies using the same attachment, and then purged that attachment from one record, it will delete both the blob reference and file itself, without removing other associated blobs. This means one record will still sometimes think it has a file attached (as it's still associated with a blob)
A good way to find out is to check out in rails console:
obj.image.blob.filename
This will show if the actual file that's associated with an object exists, rather than just its blob. It's a bug in Active Storage that they're apparently addressing, not sure if it applies here or not.

Event Hub - Invalid Token error Azure Stream Analytics Input

I am trying to to follow the tutorial below
Azure Tutorial
As noted at the bottom there appear to have been changes since this was created
When I get to the part where I create an input for my stream analytics job, I cannot select an event hub even though there is one in my subscription
So I went to provide the information manually and I get an error stating invalid token
Has anyone got any ideas how to resolve this or can point me to a better/more recent tutorial?
I am looking to stream data in real time
Paul
Thanks for the help here I ended up using the secondary key and that worked fine!
Change to use Secondary connection string or use a different shared policy altogether.
You can use the primary of the new shared access policy.
PS : It is a weird error, sometimes removing the last ";" worked.

Is there a delay when using the Box.com search API?

I'm using the Search API as defined here:
https://developers.box.com/docs/#search
It works well, though I noticed that when I make a folder on the site, then immediately call the API searching for that folder name, it doesn't appear in the results for a minute or so. Is there something I'm doing wrong, or some way to force it to do a live search? Thanks.
You're not doing anything wrong. It just takes a little bit of time for the search indexes to be updated with the new file/folder metadata. There's nothing you can do client-side to speed this up.
If you need immediate access that new folder, consider saving the folder ID that's returned in the response of the Create a New Folder request.

Resources