Azure Blob List Paging - azure

I have in my container 3000 files.In my gridview I am show the list of container blobs but 3000 is too much and is not good for performance (my thought :) ).
I need a paging code , example my grid pagesize is 50 I will show the first 50 blob in my container for my first page in gridview.Of course I need in pageindexchanging more code :)
Or does it not affect performance ?

According to your description, I suggest you could try to use azure storage SDK's ListBlobsSegmented method to achieve your requirement.
The ListBlobsSegmented inclue maxResults parameter.
maxResults:
A non-negative integer value that indicates the maximum number of results to be returned at a time, up to the per-operation limit of 5000. If this value is null, the maximum possible number of results will be returned, up to 5000.
So you could just search 50 records when your page is load firstly.
When your page index changed, you could call the search method to search the right number of blobs according to gridview index.
Notice:To include the performance, we will not get all the blobs at once to know how many blobs in your container. So we couldn't know the total number of blobs.I suggest you could search 100 blobs at first time, if the customer click the page 2, it will search next 100 blobs.
Here is a example demo, hope it gives you some tips:
Gridview:
<asp:GridView ID="GridView1" AllowPaging="true" PageSize="50" OnPageIndexChanging="GridView1_PageIndexChanging" runat="server">
</asp:GridView>
Code behind:
BlobContinuationToken continuationToken = null;
protected void Page_Load(object sender, EventArgs e)
{
if (!IsPostBack)
{
//1*100 is the numebr of blobs you will list
ListBlobResult(1*100);
}
}
public void ListBlobResult(int index)
{
CloudStorageAccount storageAccount = CloudStorageAccount.Parse("connectionstring");
var client = storageAccount.CreateCloudBlobClient();
var container = client.GetContainerReference("foo2");
string prefix = null;
bool useFlatBlobListing = true;
BlobListingDetails blobListingDetails = BlobListingDetails.All;
// int maxBlobsPerRequest = 50;
List<IListBlobItem> blobs = new List<IListBlobItem>();
if (index <= 5000)
{
var listingResult = container.ListBlobsSegmented(prefix, useFlatBlobListing, blobListingDetails, index, continuationToken, null, null);
continuationToken = listingResult.ContinuationToken;
blobs.AddRange(listingResult.Results);
}
else
{
do
{
var listingResult = container.ListBlobsSegmented(prefix, useFlatBlobListing, blobListingDetails, index, continuationToken, null, null);
continuationToken = listingResult.ContinuationToken;
blobs.AddRange(listingResult.Results);
index = index - 5000;
}
while (continuationToken != null);
}
DataTable d1 = new DataTable();
d1.Columns.Add("Id");
d1.Columns.Add("Url");
foreach (var item in blobs)
{
if (item.GetType() == typeof(CloudBlockBlob))
{
CloudBlockBlob blob = (CloudBlockBlob)item;
d1.Rows.Add(blob.Name, blob.Uri);
}
}
GridView1.DataSource = d1;
GridView1.DataBind();
}
protected void GridView1_PageIndexChanging(object sender, GridViewPageEventArgs e)
{
GridView1.PageIndex = e.NewPageIndex;
//(e.NewPageIndex*100)+100 is the numebr of blobs you will list
ListBlobResult((e.NewPageIndex*100)+100);
}
Result:

I was searching for pagination sample in JAVA and for some reason google gives this question as top 3. Anyway I found a solution and if anybody interested how to do pagination using java and the latest MS azure client here you go.
void listAllForContainer(BlobContainerClient container) {
String token = null;
do {
PagedResponse<BlobItem> pr = container
.listBlobs(options, token, Duration.ofSeconds(60))
.iterableByPage()
.iterator()
.next();
token = pr.getContinuationToken();
List<BlobItem> pageItems = pr.getValue();
pageItems.forEach(i->System.out.println(i.getName()));
} while (token != null);
}
Azure artifact
<dependency>
<groupId>com.azure</groupId>
<artifactId>azure-storage-blob</artifactId>
<version>12.X.X</version>
</dependency>

Related

How to mock Azure blobContainerClient.GetBlobsAsync()

I have a Azure blob container which I am accessing using below code -
var blobContainerClient = GetBlobContainer(containerName);
if (blobContainerClient != null)
{
// List all blobs in the container
await foreach (BlobItem blobItem in blobContainerClient.GetBlobsAsync())
{
queuedBlobsList.Add(new QueuedBlobs { BlobName = blobItem.Name, LastModified = blobItem.Properties.LastModified });
}
}
private BlobContainerClient GetBlobContainer(string containerName)
{
return gen2StorageClient != null
? gen2StorageClient.GetBlobContainerClient(containerName)
: gen1StorageClient.GetBlobContainerClient(containerName);
}
The clients are initialised in constructor -
public class BlobService : IBlobService
{
private readonly BlobServiceClient gen1StorageClient, gen2StorageClient;
public BlobService(BlobServiceClient defaultClient, IAzureClientFactory<BlobServiceClient> clientFactory)
{
gen1StorageClient = defaultClient;
if (clientFactory != null)
{
gen2StorageClient = clientFactory.CreateClient("StorageConnectionString");
}
}
}
And my unit test where I am setting GetBlobsAsync is like this -
But I want to add list of BlobItems to test another loop.
private static Mock<BlobContainerClient> GetBlobContainerClientMockWithListOfBlobs()
{
var blobContainerClientMock = new Mock<BlobContainerClient>("UseDevelopmentStorage=true", EnvironmentConstants.ParallelUploadContainer);
var cancellationToken = new CancellationToken();
var blobs = new List<BlobItem>();
//AsyncPageable<BlobItem> blobItems = new AsyncPageable<BlobItem>(); -- Not allowing
blobContainerClientMock.Setup(x => x.GetBlobsAsync(BlobTraits.All, BlobStates.All, null, cancellationToken)).Returns(It.IsAny<AsyncPageable<BlobItem>>());
return blobContainerClientMock;
}
I came to this question because I also had the same issue.
Based on this article
AsyncPageable<T> and Pageable<T> are classes that represent collections of models returned by the service in pages.
The method GetBlobsAsync returns an AsyncPageable.
To Create an AsyncPageable you need first to create a BlobItem Page.
To create a Page<T> instance, use the Page<T>.FromValues method, passing a list of items, a continuation token, and the Response.
So let's start creating the list of items:
var blobList = new BlobItem[]
{
BlobsModelFactory.BlobItem("Blob1"),
BlobsModelFactory.BlobItem("Blob2"),
BlobsModelFactory.BlobItem("Blob3")
};
Note: BlobItem has an internal constructor but I found in this answer that there's a BlobsModelFactory.
After having the list of blobs is time to create a Page<BlobItem>:
Page<BlobItem> page = Page<BlobItem>.FromValues(blobList, null, Mock.Of<Response>());
And finally, create the AsyncPageable<BlobItem>
AsyncPageable<BlobItem> pageableBlobList = AsyncPageable<BlobItem>.FromPages(new[] { page });
And now you are able to use this to mock GetBlobsAsync method:
blobContainerClientMock
.Setup(m => m.GetBlobsAsync(
It.IsAny<BlobTraits>(),
It.IsAny<BlobStates>(),
It.IsAny<string>(),
It.IsAny<CancellationToken>()))
.Returns(pageableBlobList);
I hope this helps others with this issue.
André

Storage quota has been exceeded for this service. You must either delete documents first, or use a higher SKU for additional quota

In azure search getting error sometimes not every time when I'm create or update index in azure.
and I'm tying post multiple data as per guideline How to index large data sets in Azure Search
is there any way to update existing data in azure? How to rebuild an index
Some azure Available tiers Available tiers
Code this simple example
private static SearchServiceClient searchServiceClient;
public virtual void CreateSearchServiceClient()
{
searchServiceClient = new SearchServiceClient(searchServiceName, new SearchCredentials("APIKEY"));
}
public virtual Index CreateIndex()
{
CreateSearchServiceClient();
Index index = new Index();
string scoringProfile = "AzureSearchScoringProfile";
// Create index if search client service is available
if (searchServiceClient != null)
{
AzureSearchItem azureSearchItem = new AzureSearchItem();
// Map index schema
var definition = new Index()
{
Name = IndexName,
Fields = FieldBuilder.BuildForType<AzureSearchItem>(),
Suggesters = new List<Suggester>() {
new Suggester()
{
Name = ConstantKeys.AzureSearchTopBarSuggestor,
SourceFields = FormatSuggesterFields()
}
}
};
if (!string.IsNullOrEmpty(scoringProfile))
{
definition.ScoringProfiles = new List<ScoringProfile>()
{
new ScoringProfile()
{
Name = scoringProfile
}
};
}
definition = SetAnalyzer(definition);
// Create index
index = searchServiceClient.Indexes.CreateOrUpdate(definition);
}
return index;
}

Netsuite Transaction search performance

I am using Netsuite API (version v2016_2) to search data. With below code, it seems that Netsuite taking much time to give response for the query. I am searching GL transaction of periticular period that has 149 MainLine record and 3941 LineItem (Journal Entries) record and Netsuite takes almost 22 minutes to give this data in response. Below is code snippet that I am using to search transaction.
public void GetTransactionData()
{
DataTable dtData = new DataTable();
string errorMsg = "";
LoginToService(ref errorMsg);
TransactionSearch objTransSearch = new TransactionSearch();
TransactionSearchBasic objTransSearchBasic = new TransactionSearchBasic();
SearchEnumMultiSelectField semsf = new SearchEnumMultiSelectField();
semsf.#operator = SearchEnumMultiSelectFieldOperator.anyOf;
semsf.operatorSpecified = true;
semsf.searchValue = new string[] { "Journal" };
objTransSearchBasic.type = semsf;
objTransSearchBasic.postingPeriod = new RecordRef() { internalId = "43" };
objTransSearch.basic = objTransSearchBasic;
//Set Search Preferences
SearchPreferences _searchPreferences = new SearchPreferences();
Preferences _prefs = new Preferences();
_serviceInstance.preferences = _prefs;
_serviceInstance.searchPreferences = _searchPreferences;
_searchPreferences.pageSize = 1000;
_searchPreferences.pageSizeSpecified = true;
_searchPreferences.bodyFieldsOnly = false;
//Set Search Preferences
try
{
SearchResult result = _serviceInstance.search(objTransSearch);
/*
Above line taking almost 22 minutes for below record count
result.recordList.Length = 149
Total JournalEntryLine = 3941
*/
List<JournalEntry> lstJEntry = new List<JournalEntry>();
List<JournalEntryLine> lstLineItems = new List<JournalEntryLine>();
if (result.status.isSuccess)
{
for (int i = 0; i <= result.recordList.Length - 1; i += 1)
{
JournalEntry JEntry = (JournalEntry)result.recordList[i];
lstJEntry.Add((JournalEntry)result.recordList[i]);
if (JEntry.lineList != null)
{
foreach (JournalEntryLine line in JEntry.lineList.line)
{
lstLineItems.Add(line);
}
}
}
}
try
{
_serviceInstance.logout();
}
catch (Exception ex)
{
}
}
catch (Exception ex)
{
throw ex;
}
}
I am unable to know that If I am missing something in my code or this is something about the data. Please suggest me some sort of solution for this.
Thanks.
You should set _searchPreferences.bodyFieldsOnly = true. It will improve the performance with searching because it doesn't return the related or sublist data
I think you are doing this search from the outside of the Netsuite to get journal entries data or lines. Instead of doing a direct search outside, do maintain RESTLET in NETSUITE and call that RESTLET. In the RESTLET DO that search you wanted and return results. Within the NETSUITE, search performance gives fast results.

Azure Storage returning no blobs when 1827 shown in azure explorer

I am trying to set the proper content type after uploading almost 2000 images , not realizing I had to set their ContentType property. Fortunately I realized that before I moved from the .png files to some other type.
Here is my method:
private static void ChangeImageTypeInAzureStorage()
{
var client = GetAzureClient();
var blobContainer = client.GetContainerReference("accessibleimages");
var list = blobContainer.ListBlobs().OfType<CloudBlockBlob>().ToList();
if (!list.Any()) return; //log no entries returned
try
{
foreach (var item in list)
{
if (Path.GetExtension(item.Uri.AbsoluteUri) == ".png")
{
item.Properties.ContentType = "image/png";
}
item.SetProperties();
}
}
catch (Exception ex)
{
//log exceptions with your own methods
Console.WriteLine(ex);
}
Console.WriteLine("Done... press a key to end.");
Console.ReadKey();
}
I'm not getting why nothing is returned to list. The client and blobContainer are correct. I had no problem uploading those images to the same client blobContainer. Needless to say it fails because the list always has a count of 0.
Any help appreciated.
Well, I found the answer after a lot of googling. The boolean parameter useFlatBoolListing for the ListBlobs method has to be set to true.
var list = blobContainer.ListBlobs(null, true).OfType<CloudBlockBlob>().ToList();

Getting blob count in an Azure Storage container

What is the most efficient way to get the count on the number of blobs in an Azure Storage container?
Right now I can't think of any way other than the code below:
CloudBlobContainer container = GetContainer("mycontainer");
var count = container.ListBlobs().Count();
If you just want to know how many blobs are in a container without writing code you can use the Microsoft Azure Storage Explorer application.
Open the desired BlobContainer
Click the Folder Statistics icon
Observe the count of blobs in the Activities window
I tried counting blobs using ListBlobs() and for a container with about 400,000 items, it took me well over 5 minutes.
If you have complete control over the container (that is, you control when writes occur), you could cache the size information in the container metadata and update it every time an item gets removed or inserted. Here is a piece of code that would return the container blob count:
static int CountBlobs(string storageAccount, string containerId)
{
CloudStorageAccount cloudStorageAccount = CloudStorageAccount.Parse(storageAccount);
CloudBlobClient blobClient = cloudStorageAccount.CreateCloudBlobClient();
CloudBlobContainer cloudBlobContainer = blobClient.GetContainerReference(containerId);
cloudBlobContainer.FetchAttributes();
string count = cloudBlobContainer.Metadata["ItemCount"];
string countUpdateTime = cloudBlobContainer.Metadata["CountUpdateTime"];
bool recountNeeded = false;
if (String.IsNullOrEmpty(count) || String.IsNullOrEmpty(countUpdateTime))
{
recountNeeded = true;
}
else
{
DateTime dateTime = new DateTime(long.Parse(countUpdateTime));
// Are we close to the last modified time?
if (Math.Abs(dateTime.Subtract(cloudBlobContainer.Properties.LastModifiedUtc).TotalSeconds) > 5) {
recountNeeded = true;
}
}
int blobCount;
if (recountNeeded)
{
blobCount = 0;
BlobRequestOptions options = new BlobRequestOptions();
options.BlobListingDetails = BlobListingDetails.Metadata;
foreach (IListBlobItem item in cloudBlobContainer.ListBlobs(options))
{
blobCount++;
}
cloudBlobContainer.Metadata.Set("ItemCount", blobCount.ToString());
cloudBlobContainer.Metadata.Set("CountUpdateTime", DateTime.Now.Ticks.ToString());
cloudBlobContainer.SetMetadata();
}
else
{
blobCount = int.Parse(count);
}
return blobCount;
}
This, of course, assumes that you update ItemCount/CountUpdateTime every time the container is modified. CountUpdateTime is a heuristic safeguard (if the container did get modified without someone updating CountUpdateTime, this will force a re-count) but it's not reliable.
The API doesn't contain a container count method or property, so you'd need to do something like what you posted. However, you'll need to deal with NextMarker if you exceed 5,000 items returned (or if you specify max # to return and the list exceeds that number). Then you'll make add'l calls based on NextMarker and add the counts.
EDIT: Per smarx: the SDK should take care of NextMarker for you. You'll need to deal with NextMarker if you're working at the API level, calling List Blobs through REST.
Alternatively, if you're controlling the blob insertions/deletions (through a wcf service, for example), you can use the blob container's metadata area to store a cached container count that you compute with each insert or delete. You'll just need to deal with write concurrency to the container.
Example using PHP API and getNextMarker.
Counts total number of blobs in an Azure container.
It takes a long time: about 30 seconds for 100000 blobs.
(assumes we have a valid $connectionString and a $container_name)
$blobRestProxy = ServicesBuilder::getInstance()->createBlobService($connectionString);
$opts = new ListBlobsOptions();
$nblobs = 0;
while($cont) {
$blob_list = $blobRestProxy->listBlobs($container_name, $opts);
$nblobs += count($blob_list->getBlobs());
$nextMarker = $blob_list->getNextMarker();
if (!$nextMarker || strlen($nextMarker) == 0) $cont = false;
else $opts->setMarker($nextMarker);
}
echo $nblobs;
If you are not using virtual directories, the following will work as previously answered.
CloudBlobContainer container = GetContainer("mycontainer");
var count = container.ListBlobs().Count();
However, the above code snippet may not have the desired count if you are using virtual directories.
For instance, if your blobs are stored similar to the following: /container/directory/filename.txt where the blob name = directory/filename.txt the container.ListBlobs().Count(); will only count how many "/directory" virtual directories you have. If you want to list blobs contained within virtual directories, you need to set the useFlatBlobListing = true in the ListBlobs() call.
CloudBlobContainer container = GetContainer("mycontainer");
var count = container.ListBlobs(null, true).Count();
Note: the ListBlobs() call with useFlatBlobListing = true is a much more expensive/slow call...
Bearing in mind all the performance concerns from the other answers, here is a version for v12 of the Azure SDK leveraging IAsyncEnumerable. This requires a package reference to System.Linq.Async.
public async Task<int> GetBlobCount()
{
var container = await GetBlobContainerClient();
var blobsPaged = container.GetBlobsAsync();
return await blobsPaged
.AsAsyncEnumerable()
.CountAsync();
}
With Python API of Azure Storage it is like:
from azure.storage import *
blob_service = BlobService(account_name='myaccount', account_key='mykey')
blobs = blob_service.list_blobs('mycontainer')
len(blobs) #returns the number of blob in a container
If you are using Azure.Storage.Blobs library, you can use something like below:
public int GetBlobCount(string containerName)
{
int count = 0;
BlobContainerClient container = new BlobContainerClient(blobConnctionString, containerName);
container.GetBlobs().ToList().ForEach(blob => count++);
return count;
}
Another Python example, works slow but correctly with >5000 files:
from azure.storage.blob import BlobServiceClient
constr="Connection string"
container="Container name"
blob_service_client = BlobServiceClient.from_connection_string(constr)
container_client = blob_service_client.get_container_client(container)
blobs_list = container_client.list_blobs()
num = 0
size = 0
for blob in blobs_list:
num += 1
size += blob.size
print(blob.name,blob.size)
print("Count: ", num)
print("Size: ", size)
I have spend quite period of time to find the below solution - I don't want to some one like me to waste time - so replying here even after 9 years
package com.sai.koushik.gandikota.test.app;
import com.microsoft.azure.storage.CloudStorageAccount;
import com.microsoft.azure.storage.blob.*;
public class AzureBlobStorageUtils {
public static void main(String[] args) throws Exception {
AzureBlobStorageUtils getCount = new AzureBlobStorageUtils();
String storageConn = "<StorageAccountConnection>";
String blobContainerName = "<containerName>";
String subContainer = "<subContainerName>";
Integer fileContainerCount = getCount.getFileCountInSpecificBlobContainersSubContainer(storageConn,blobContainerName, subContainer);
System.out.println(fileContainerCount);
}
public Integer getFileCountInSpecificBlobContainersSubContainer(String storageConn, String blobContainerName, String subContainer) throws Exception {
try {
CloudStorageAccount storageAccount = CloudStorageAccount.parse(storageConn);
CloudBlobClient blobClient = storageAccount.createCloudBlobClient();
CloudBlobContainer blobContainer = blobClient.getContainerReference(blobContainerName);
return ((CloudBlobDirectory) blobContainer.listBlobsSegmented().getResults().stream().filter(listBlobItem -> listBlobItem.getUri().toString().contains(subContainer)).findFirst().get()).listBlobsSegmented().getResults().size();
} catch (Exception e) {
throw new Exception(e.getMessage());
}
}
}
Count all blobs in a classic and new blob storage account. Building on #gandikota-saikoushik, this solution works for blob containers with a very large number of blobs.
//setup set values from Azure Portal
var accountName = "<ACCOUNTNAME>";
var accountKey = "<ACCOUTNKEY>";
var containerName = "<CONTAINTERNAME>";
uristr = $"DefaultEndpointsProtocol=https;AccountName={accountName};AccountKey={accountKey}";
var storageAccount = Microsoft.WindowsAzure.Storage.CloudStorageAccount.Parse(uristr);
var client = storageAccount.CreateCloudBlobClient();
var container = client.GetContainerReference(containerName);
BlobContinuationToken continuationToken = new BlobContinuationToken();
blobcount = CountBlobs(container, continuationToken).ConfigureAwait(false).GetAwaiter().GetResult();
Console.WriteLine($"blobcount:{blobcount}");
public static async Task<int> CountBlobs(CloudBlobContainer container, BlobContinuationToken currentToken)
{
BlobContinuationToken continuationToken = null;
var result = 0;
do
{
var response = await container.ListBlobsSegmentedAsync(continuationToken);
continuationToken = response.ContinuationToken;
result += response.Results.Count();
}
while (continuationToken != null);
return result;
}
List blobs approach is accurate but slow if you have millions of blobs. Another way that works in a few cases but is relatively fast is querying the MetricsHourPrimaryTransactionsBlob table. It is at the account level and metrics get aggregated hourly.
https://learn.microsoft.com/en-us/azure/storage/common/storage-analytics-metrics
You can use this
public static async Task<List<IListBlobItem>> ListBlobsAsync()
{
BlobContinuationToken continuationToken = null;
List<IListBlobItem> results = new List<IListBlobItem>();
do
{
CloudBlobContainer container = GetContainer("containerName");
var response = await container.ListBlobsSegmentedAsync(null,
true, BlobListingDetails.None, 5000, continuationToken, null, null);
continuationToken = response.ContinuationToken;
results.AddRange(response.Results);
} while (continuationToken != null);
return results;
}
and then call
var count = await ListBlobsAsync().Count;
hope it will be useful

Resources