Azure search indexer is failing frequently - azure

I have set of files stored in Azure blob storage. I am trying to index all these files based on daily basis. Some times indexer runs are failing with errors. I am not sure why it is failing sometimes and sometimes it will run successfully.
Tried to resolve error but not able to solve it. Because sometimes indexer runs successfully and sometimes not.

By looking at error, it seems like some of the files in your storage are not supported for azure search indexing or files may be corrupted. Suggesting to check files are corrupted, or the files are not supported for indexing as mentioned here.
I have tried from my side and below are steps i followed,
I have list of files in storage account which are having different formats.
Created index, data source , skillset and indexer.
As there are different formats in data source configured allowed formats in indexer as shown below,
In this case, indexer will not fail for unsupported formats.
Also if you don't want to stop run of indexer in failure case, you can configure setting in indexer as shown below and reference is here,

Related

Azure blob soft delete and versioning- how to restore files easily?

I am trying to understand how soft delete and versioning work within azure blog storage.
It seems that if you have both soft delete and versioning turned on... you can’t just ‘undelete ’ deleted files, as versioning actually saves a new version as a deleted file.
So instead you have to promote the last version of each deleted file.
But what if you have a structure of nested folders and thousands of blobs... you can’t just promote the top version of the top level folder... you need to use Powershell to list files with no current version, and promote them? How would you do this?
This seems awfully complicated, when without versioning - a simple ‘undelete’ command is available from the GUI.
Am I missing something? What is the easiest way to ‘undelete’ a nested folder structure of thousand of blobs in folders, when versioning is turned on?
Thanks
As Rob Minson pointed out, the approach involves copying a blob version to the same container. For PowerShell, use the Copy-AzStorageBlob cmdlet; for Azure CLI, use the az storage blob copy start command. You can pass an account key or SAS token, or use Azure AD.
We've updated the documentation to shed some light on an approach to restoring blobs when soft-delete and/or versioning is enabled. Code samples are available for both PowerShell and Azure CLI.
Simply put, no.
The first point that needs to be emphasized is that blobs in blob storage are not nested as you might think. It seems that blob storage is the same as the local file system: some nested folders, and many files inside. But in fact these are fake, the storage structure of blob storage is flat. Blob storage is not about putting a small box in a box and then putting items in the small box. In fact, all blobs are items of blob storage, and there is no such thing as a "small box".
Then, the second point, for blob storage, the soft-delete operation only supports two objects, one is a blob and the other is a container.
Check out this document:
https://learn.microsoft.com/en-us/azure/storage/blobs/soft-delete-container-overview?tabs=azure-cli#how-container-soft-delete-works
However, you can only use container soft delete to restore blobs if
the container itself was deleted. To a restore a deleted blob when its
parent container has not been deleted, you must use blob soft delete
or blob versioning.
So unfortunately, there is no so-called easy way. You need to operate on each blob, the nested structure does not actually exist.
If you are interested, you can read this blog:
https://medium.com/#loopjockey/structuring-azure-blobs-for-functions-8305ba427356
I completely agree that this seems really un-documented at the moment. I've raised a github issue against this docs page to see if they can get the situation improved.
The best path through that I've found is something like the following:
Using Azure Storage Explorer, open up the container with the soft-deleted, versioned blobs, then change the drop down to "All blobs and blobs without current version". Now you can select a blob and hit 'Promote Version'. The deleted blob will be restored and in the Activities pane you can expand the operation and hit 'Copy AzCopy Command to Clipboard'.
The result will show you something like the following:
./azcopy.exe copy
"https://accountname.blob.core.windows.net/containername/blobname?<sastoken>&versionid=2021-04-22T11%3A35%3A36.9385599Z"
"https://accountname.blob.core.windows.net/containername/blobname?<sastoken>"
--overwrite=true
--recursive
--trusted-microsoft-suffixes=;
Now, based on this you can see you have a building block for automating the process you're talking about. Your problem at this point is finding this thing:
versionid=2021-04-22T11%3A35%3A36.9385599Z
Unfortunately that's a timestamp to nanosecond precision which you're not going to be able to infer. There's no functionality I can find in powershell, in the REST APIs or in AzCopy to get this data, the only way I have found is this sample for the .Net SDK.
All this probably means you can either:
Implement your own C# console app using the Azure.Storage.Blobs library to list the versions for each blob, then perform the relevant copy command now you know the magic version string
Wait for the REST API or Powershell library to get the ability to list versions

ADF Copy Activity FTP Source strange behavior

I have created a ADF pipeline to copy around 18 files from FTP location into Azure Blob container. Initially, I have used Get Metadata Activity to get all the files from the FTP location. Then, I have ForEach activity to loop through all the files. Inside ForEach Activity, I have Copy Data Activity which copies from FTP location to Blob location.
While running the pipeline, some of the files are getting copied however, some of them are getting failed saying below error message -
"ErrorCode=UserErrorFileNotFound,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=The remote server returned an error: (550) File unavailable (e.g., file not found, no access).,Source=Microsoft.DataTransfer.ClientLibrary,''Type=System.Net.WebException,Message=The remote server returned an error: (550) File unavailable (e.g., file not found, no access).,Source=System,'"
I am not sure what is wrong here, because other files get copied successfully, however, few of them are not. I had to try it multiple times, still no guarantee that all files would get copied.
When I try to see if the connection to FTP Linked service is working or not, it says it connects successfully. FTP linked service is SSL enabled and configured to get password from Azure Key Vault.
Refer below output when I ran pipeline -
Any thoughts as to what is going wrong in here? Is there any limit on number of files being copied at one time?
Thank you in advance.
As #Joel Cochran said the issue may be a concurrency limit issue.
When we select Sequential, Copy activity will be single-threaded. Uncheck it, Copy activity will be multi-threaded, efficiency is greatly improved.
So our solution is:
Uncheck Sequential
Increase the maximum number of parallel operations of internal activities.

azcopy list function gives a different count (almost double) of objects than Storage Explorer

I am uploading files with AZCOPY (one by one as and when they are provided) to Azure Datalakes gen 2 and keep a track with Storage explorer and individual log of each file.
There have been 6253 file uploads and Storage explorer shows the same along with number of logs for each file upload
But when i use AZCOPY LIST it gives me 11254.
Making it difficult to script and automate.
Is there a logical explanation for this?
There is no access issue, in fact the same AZCOPY is working on copying the files
I have tried to redownload if that makes sense
This is a known bug, scheduled for fixing in our next release: https://github.com/Azure/azure-storage-azcopy/issues/692

Azure convert blob to file

Some large disks containing hundreds of 30GB tar files have been prepared and ready to ship.
The disks have been prepared as BLOB using the WAImportExport tool.
The Azure share is expecting files.
Ideally we don't want to redo the disks as FILE instead of BLOB. Are we able to upload as BLOBs to one storage area and extract the millions of files from the tarballs to a FILE storage area without writing code?
Thanks
Kevin
azcopy will definitely do it and has been tested. We were able to move files from blobs to files using the CLI in Azure with the azcopy command.
The information provided below was proven not to be true.
Microsoft Partner told me yesterday there is no realistic way to convert Blobs to Files in the above-mentioned scenario.
Essentially, it is important to select either WAImportExport.exe Version 1 for BLOBS or WAImportExport.exe Version 2 for files. Information on this can be found at this location.
The mistake was easily made and done so by a number of people here: the link to the tool sent was to the binary version 1. Search results tended to direct users to version 1 but version 2 only appears only after deeper dig. Version 2 - seems to be an afterthought my Microsoft when they added the Files option to Azure. It's a pity they didn't use different binary names or build a switch into version 2 to do both and delete the version 1 offering.

Why do duplicate folders exist in my Azure blob storage container?

I am aware that Azure blob storage does not use an actual folder structure but could not think of a better way to describe this.
The issue we're seeing is when opening Server Explorer (in Visual Studio) to browse through our blob storage container. We separate client resources and data by folder so in this case we have a blob titled productdata/Client_5/testimage.jpg.
The problem is that this Client_5 folder appears twice when inspecting our blob storage. So far I've double checked there are no weird special characters in either folder and double checked case sensitivity. The two paths are EXACTLY the same except its actual contents. Our application has no problems with this because the path is still exactly the same to the resources it's attempting to get. (For example, since the folders are named exactly the same, https://myazureaccount.blob.core.windows.net/productdata/Client_5/image.jpg still takes us to exactly where we need to be.) It's just a pain when we use Server Explorer to view our blobs on Azure because we have two folder locations to check. This could very well be a bug in Server Explorer for Visual Studio as well.
If anyone else has ever come across this, any info is appreciated. I couldn't find anything on the topic when searching online but figure I would post the question here for reference. Also, I'll be contacting Azure support soon to see if they can shed some light on any of this and will post what info I get from them here later.
It's true that blob storage doesn't have the concept of folder but the API built on the top of it does. I've seen exactly the same or similar problems in other tools as well: Microsoft Azure Storage Explorer and even Azure Portal. I tried to go deeper and when I executed:
CloudBlobContainer.ListBlobs(null, useFlatBlobListing: false)
it also returned duplicated directories. To be precise it returned the list with several instances of CloudBlobDirectory that had the same Prefix. Sounds like a bug. Now, if a tool uses this approach to get a list of directories it will fail. If the tool uses flat listing and builds the structure of folders in its logic it should be ok.
Hard to say what is the reason of such behaviour. In my case files in blob storage were copied by Azure Data Factory activity with concurrency option but I'm not sure if it's the rule.
BTW Microsoft Azure Storage Explorer in my case showed only some subset of folders which is much worse than displaying duplicated directories so I switched to Azure Explorer mentioned above and it's worth recommending.
I was experiencing an issue where the "folder" names appeared identical, but on closer inspection one had a trailing space.
Because folders don't really exist in blob storage and a space is a valid value, it is possible to have trailing or leading spaces in the names.
Azure blob storage does not ahve the concept of folder, only container, you can simulate folder setting the name of the blog to save like 'folder/img.png', but folder/ is part of the name of the blob.
Also, I ever use Storage Explorer, try with this: http://azurestorageexplorer.codeplex.com/releases/view/125870

Resources