Hi is it possible to store .txt files inside azure sql database? if not then which service provides this?
You can write them into an nvarchar(max) column if you'd like. If they are CSV files, you may want to shred them into columns in a table. However, if you have a lot of text file data, you may find it cheaper to use Azure Storage instead. It is generally better for colder data where you don't need to process it if your text data is all want to store or if it is a majority of what you are trying to store.
hope that helps
You can have them first uploaded/stored on an Azure Storage Account and have them automatically processed and uploaded to a table on an Azure SQL Database using Azure Functions as explained here. Azure Functions have triggers to respond to an event like a new file has been uploaded.
Related
I plan on using Azure Blob storage to store images. I will have around 5000 categories for images that I plan on using folders to keep separated. For each of the image files, the file names won't differ a lot across the board and there is the potential to need to change metadata frequently.
My original plan was to use a SQL database to index all of these files and store my metadata there, but I'm second guessing that plan.
Is it feasible to index files in Azure Blob storage using a database, or should I just stick with using blob metadata?
Edit: I guess this question should really be "are there any downsides to indexing Azure Blob storage using a relational database?". I'm much more comfortable working with a DB than I am Azure storage, so my preference is to use a DB.
I'm second guessing whether or not to use a DB after looking at Azure storage more and discovering meta-tags and indexing. Hope this helps.
You can use Azure Search for this task as well, store images in Azure Storage (BLOB) and use Azure Search for crawling. indexing and searching. Using metadata you can enhance your search as well. This way you might not even need to use Folders to separate different categories.
Blob Index is a very feasible option and it can save the in the pricing, time, and overhead in terms of not using SQL.
https://azure.microsoft.com/en-gb/blog/manage-and-find-data-with-blob-index-for-azure-storage-now-in-preview/
If you are looking for more information on this preview feature, I would love hear more and work closer on this issue. Could you please reach me on BlobIndexPreview#microsoft.com.
I am storing a series of Excel files in an Azure File Storage container for my company. My manager wants to be able to see the file created date for these files, as we will be running monthly downloads of the same reports. Is there a way to automate a means of storing the created date as one of the properties in Azure, or adding a bit of custom metadata, perhaps? Thanks in advance.
You can certainly store the created date as part of custom metadata for the file. However, there are certain things you would need to be aware of:
Metadata is editable: Anybody with access to the storage account can edit the metadata. They can change the created date metadata value or even delete that information.
Querying is painful: Azure File Storage doesn't provide querying capability so if you want to query on file's created date, it is going to be a painful process. First you would need to list all files in a share and then fetch metadata for each file separately. Depending on the number of files and the level of nesting, it could be a complicated process.
There are some alternatives available to you:
Use Blob Storage
If you can use Blob Storage instead of File Storage, use that. Blob Storage has a system defined property for created date so you don't have to do anything special. However like File Storage, Blob Storage also has an issue with querying but it is comparatively less painful.
Use Table Storage/SQL Database For Reporting
For querying purposes, you can store the file's created date in either Azure Table Storage or SQL Database. The downside of this approach is that because it is a completely separate system, it would be your responsibility to keep the data in sync. For example, if a file is deleted, you will need to ensure that entry for the same in the database is also removed.
I was wondering what's the best practice moving a documentDB to the Azure Data Lake Storage.
Should I create a file for each document in a collection or move the entire documentDB?
Also I didn't find much information on how I can access the documentDB using U-SQL?
Input would be appreciated.
You currently cannot use U-SQL to access data in DocumentDB (or now called CosmosDB). There is a feature request here. Please feel free to add your vote.
If you move the data over, the organization depends on how you want to manage the data (delete all, or only parts?), how it is structured (keep similar structured data together, either in same file or same folder) and how you use it (always need all of it? or only parts?) and what gives you the best performance accessing it (larger files are normally better, but if they are JSON, also make sure the extraction process works).
You can use Azure Data Factory to connect to Document DB and store your data on Data Lake.
After that you can query the data directly from Data Lake using U-SQL.
I'm trying to understand the best way to migrate a large set of data - ~ 6M text rows from (an Azure Hosted) SQL Server to Blob storage.
For the most part, these records are archived records, and are rarely accessed - blob storage made sense as a place to hold these.
I have had a look at Azure Data Factory and it seems to be the right option, but I am unsure of it fulfilling requirements.
Simply the scenario is, for each row in the table, I want to create a blob, with the contents of 1 column from this row.
I see the tutorial (i.e. https://learn.microsoft.com/en-us/azure/data-factory/data-factory-copy-activity-tutorial-using-azure-portal) is good at explaining migration of bulk-to-bulk data pipeline, but I would like to migrate from a bulk-to-many dataset.
Hope that makes sense and someone can help?
As of now, Azure Data Factory does not have anything built in like a For Each loop in SSIS. You could use a custom .net activity to do this but it would require a lot of custom code.
I would ask, if you were transferring this to another database, would you create 6 million tables all with the same structure? What is to be gained by having the separate items?
Another alternative might be converting it to JSON which would be easy using Data Factory. Here is an example I did recently moving data into DocumentDB.
Copy From OnPrem SQL server to DocumentDB using custom activity in ADF Pipeline
SSIS 2016 with the Azure Feature Pack, giving Azure Tasks such as Azure Blob Upload Task and Azure Blob Destination. You might be better off using this, maybe an OLEDB command or the For Each loop with an Azure Blob destination could be another option.
Good luck!
Azure has a ForEach activity which can be place after LookUp or Metadata to get the each row from SQL to blob
ForEach
I have read this SO question but mine is quite specific to the "import" of CSV and not how to access the blob to get the CSV out
Which is the best way?
1) CSV Stored in the Blob - use a worker role, read the CSV from the blob, parse data and update database
2) Is SQL BulkCopy/BulkInsert an option. The challenge here is that it should not have any on-premise involvement. All within Azure: blob->SQL DAtabase.
3) Will Azure Automation help? Are there PS scripts/workflows that help in such bulk update of CSV data to Azure SQL DB? I haven't found any though
Are there other options that help import blob CSV data to SQL DB without having to write custom code?
Appreciate any thoughts...
Your first method would work. You could also use azcopy (http://aka.ms/azcopy) to download the file locally, and then use BCP to load it into SQL - this way you wont have to write any code for this.
Azure Automation would help if you want to do this repeatedly. You should be able to set this up as a script even if one doesn't exist.
I know this is outdated question but for anyone looking for quick way to do this feel free to check my article on how to do this quickly using SQL prodecure triggered by Logic App.
In short you run on master
CREATE MASTER KEY ENCRYPTION BY PASSWORD = 'UNIQUE_STRING_HERE'
Then you run on DB
CREATE DATABASE SCOPED CREDENTIAL BlobCredential
WITH IDENTITY = 'SHARED ACCESS SIGNATURE',
SECRET = 'sv=SAS_TOKEN_HERE';
CREATE EXTERNAL DATA SOURCE AzureBlob
WITH (
TYPE = BLOB_STORAGE,
LOCATION = 'https://<account_name>.blob.core.windows.net/<container_name>',
CREDENTIAL = BlobCredential
);
And then
BULK INSERT <my_table>
FROM '<file_name>.csv'
WITH (
DATA_SOURCE = 'AzureBlob',
FORMAT = 'CSV',
FIRSTROW = 2
);
Just wrap this insert in procedure and execute it from logic app.
https://marczak.io/posts/azure-loading-csv-to-sql/
or just use ADF like here
https://azure4everyone.com/posts/2019/07/data-factory-intro/
Late answer to old question, but...
If you can use an Azure SQL Data warehouse you could take advantage of PolyBase to directly query the data in CSV format stored in the blob https://learn.microsoft.com/en-us/azure/sql-data-warehouse/sql-data-warehouse-load-polybase-guide#export-data-to-azure-blob-storage. This will allow you to directly map the data as an external table and query it dynamically.
This saves you the trouble of writing an external tool/solution for extracting, parsing and uploading the data to the Azure SQL database. Unfortunately PolyBase only works for Azure SQL Data warehouse, not Database, but you could setup something that read the structured data from the warehouse to your solution.
I know this question is two years old, but for those just now searching on the topic, I'd like to mention that the new Azure Feature Pack for SSIS makes this an easy task in SSIS. In VS Data Tools, after installing the Azure Feature pack, you would open an empty SSIS project and 1) Create an Azure Storage Connection Manager, then 2) Add a Data Flow Task, then open the Data Flow task and 3) Add a Blob Source tool to connect to the CSV, and then 4) using Destination Assistant connect to the SQL Table where the data is going. You can then execute this as a one-time load interactively inside the VS Data Tools IDE, or publish it to the SQL Server instance and create a recurring job.