Data Factory copy from Azure Cognitive Search - azure

The Copy activity does not support Azure Cognitive Search as a source. Sink is fine but not a source. Makes transferring indexed documents from one index to another tedious: an until around get + post batches to the Search API with conditional variables to break the outer until iteration.
Easier way?

As of now Azure cognitive search is not supported as source. Azure cognitive search can be used as Sink only in Azure data factory copy activity. Refer the official documentation for supported data stores

Related

Query data from Azure Purview

Moving from AWS Glue to Azure Purview and i am confused about something
Its is possible to query Azure purview data catalog/assets in the same way we can query from AWS Glue data catalog using AWS Athena?
Unfortunately, you cannot query data from Azure Purview.
The Purview search experience is powered by a managed search index. After a data source is registered with Purview, its metadata is indexed by the search service to allow easy discovery. The index provides search relevance capabilities and completes search requests by querying millions of metadata assets. Search helps you to discover, understand, and use the data to get the most value out of it.
The search experience in Purview is a three stage process:
The search box shows the history containing recently used keywords
and assets.
When you begin typing the keystrokes, the search suggests
the matching keywords and assets.
The search result page is shown with assets matching the keyword entered.
For more details, refer to Understand search features in Azure Purview.

How to remove special characters from XML stored in ADLS using Azure data factory or any other option?

I have scenario where i need to remove some characters from xml tags which is stored in ADLS. I am looking for an option with ADF. Can someone help me here with approach i should follow?
This is not possible by ADF. May you can have piece code to do this in
Azure Functions. As, Azure Data Factory can do data movement and data
transformation only. When you are saying about tags that means it does
not come under that.
You may use the Azure Function activity in a Data Factory pipeline to run Azure Functions. To launch an Azure Function, you must first set up a connected service connection and an activity that specifies the Azure Function you want to perform.
There is the Microsoft document which have deep insights about Azure Function Activity in ADF | Here.

Searching through data stored in Azure Data Lake

I have the following use case for building a Data Lake (e.g. in Azure):
My organization deals with companies that go into bankruptcy. Once a company goes bankrupt, it needs to hand over all of their data to us, including structured data (e.g. CSVs) as well as semi-structured and unstructured data (e.g. PDFs, Word documents, images, JSON, .txt files etc.). Having a data lake would help here as the volumes of data can be large and unpredictable and Azure Data Lake seems like a relatively low-cost and scalable storage solution.
However, apart from storing all of that data we also need to give business users a tool that will enable them to search through all of that data. I can imagine two search types:
searching for specific files (using file names or part of file names as the search criteria)
searching through all text files (word documents, .txt and PDFs) and identifying those files that meet the search criteria (e.g. a specific phrase being searched for)
Are there any out of the box tools that can use Azure Data Lake as a data source that would enable users to perform such searches?
Unfortunately, there isn't a tool can help you filter the files directly in Data Lake for now.
Even Azure Storage Explorer only support search by prefix.
Data Factory support we filter the files, but it usually used for copy and transfer data. Reference: Data Factory supports wildcard file filters for Copy Activity
Update:
Azure Cognitive Search seems to be a good choice.
Cognitive Search supports import source from Data Lake, and it provide the filter to help us search the files.
A filter provides criteria for selecting documents used in an Azure Cognitive Search query. Unfiltered search includes all documents in the index. A filter scopes a search query to a subset of documents.
We could reference from Filters in Azure Cognitive Search
Hope this helps.
Cognitive Search with Azure Data Lake is definitely an option and it is Microsoft recommends. Several factors we need to consider:
Price. https://azure.microsoft.com/en-us/pricing/details/search/. Not a cheap option.
Size of your source data and index you need.
Your acknowledgment of other open-source services. ELK is a popular open-source framework for full-text searching.

Azure search - Which is the best way to follow API or Portal when there is two data sources one sql on VM & other the blob storage?

We have the following scenario and we need to implement Azure search. We have to finalize the method/work flow of the process.
We have two data sources one Sql on VM & other the Blob storage. We need to combine data from both the sources to be in a single index & then to be searched. Which is the best way to implement API or portal?
Unless you use two different indexes, there's no way to combine both using portal. So you need to write some code that will merge information from both sources and push them to your Azure Search index.
Here's a sample using CosmosDB and Blob Storage, all you need to do is use Sql rather than cosmos Db and proper model your index:
https://learn.microsoft.com/en-us/azure/search/tutorial-multiple-data-sources

Azure Data Sync - Copy Each SQL Row to Blob

I'm trying to understand the best way to migrate a large set of data - ~ 6M text rows from (an Azure Hosted) SQL Server to Blob storage.
For the most part, these records are archived records, and are rarely accessed - blob storage made sense as a place to hold these.
I have had a look at Azure Data Factory and it seems to be the right option, but I am unsure of it fulfilling requirements.
Simply the scenario is, for each row in the table, I want to create a blob, with the contents of 1 column from this row.
I see the tutorial (i.e. https://learn.microsoft.com/en-us/azure/data-factory/data-factory-copy-activity-tutorial-using-azure-portal) is good at explaining migration of bulk-to-bulk data pipeline, but I would like to migrate from a bulk-to-many dataset.
Hope that makes sense and someone can help?
As of now, Azure Data Factory does not have anything built in like a For Each loop in SSIS. You could use a custom .net activity to do this but it would require a lot of custom code.
I would ask, if you were transferring this to another database, would you create 6 million tables all with the same structure? What is to be gained by having the separate items?
Another alternative might be converting it to JSON which would be easy using Data Factory. Here is an example I did recently moving data into DocumentDB.
Copy From OnPrem SQL server to DocumentDB using custom activity in ADF Pipeline
SSIS 2016 with the Azure Feature Pack, giving Azure Tasks such as Azure Blob Upload Task and Azure Blob Destination. You might be better off using this, maybe an OLEDB command or the For Each loop with an Azure Blob destination could be another option.
Good luck!
Azure has a ForEach activity which can be place after LookUp or Metadata to get the each row from SQL to blob
ForEach

Resources