How to find out the number of entries or size of table in the Accumulo?

How to find out the number of entries or size of table in the Accumulo? - accumulo

I have inserted data into accumulo. But how would i know the exact count of the table ?
Is there any method or API to read the accumulo table size of number of entries ?

First off, Accumulo doesn't know the exact size of a table during active ingest -- it will be an approximate. I don't think any public API methods exist to get this information, although there are some internal methods you could call. Something like the following should work:
ClientContext context = new ClientContext(instance, credentials, clientConfiguration);
MasterClientService.Iface client = null;
MasterMonitorInfo mmi = null;
while (null == mmi) {
try {
client = MasterClient.getConnection(context);
if (client != null) {
mmi = client.getMasterStats(Tracer.traceInfo(), context.rpcCreds())
}
} finally {
if (null != client) {
MasterClient.close(client);
}
}
}
for (Entry<String,TableInfo> table : mmi.getTableMap().entrySet()) {
System.out.println(table.getKey() + "=>" + (table.getValue().recs + table.getValue().recsInMemory));
}
This is similar to how the Accumulo monitor obtains these values. Because these are internal APIs, they're a little rough to use and may change across releases. If you'd like to see these APIs exposed through the normal Instance or Connector methods, please open an issue on the project's JIRA instance!

Do you need to do it programatically? If not, there are various ways that you can do this. The easiest is to go to the Accumulo monitor page on port 50095. If you don't have a ton of data, from the command line you can simply do
accumulo shell -u username -p password -e "scan -t foo -np" | wc -l

Related

Bringing Active Directory Users using JNDI in multiple threads

I have designed an application which brings the users from the active directory to an MySQL database, and shows them on GUI. It also brings the groups of which a user is a member of.
So, my program works this way:
for(String domain : allConfiguredADomains) {
LdapContext domainCtx = getDomainCtx(domain);
// Bring all users from this domain and store them in DB
getAllUsersForDomain(domain, domainCtx);
// Bring all the groups for every user
getAllGroupsForUsersInTheDomain(domain, domainCtx)
}
void getAllUsersForDomain(String domain, LdapContext domainCtx) {
String filter = "(objectClass=User)"
NamingEnumeration<SearchResult> result = domainCtx.search(domain, filter, ..);
while(result.hasMoreElements()) {
SearchResult searchResult = (SearchResult) result.nextElement();
// Process and store in database
storeUserInDatabase(searchResult);
}
}
void getAllGroupsForUsersInTheDomain(String domain, LdapContext domainCtx) {
List<String> userDistinguishedNames = getAllUsersFromDatabase("distinguishedName");
for(String userDn : userDistinguishedNames) {
String filter = "(&(objectClass=Group)(distinguishedName=" + userDn + "))";
NamingEnumeration<SearchResult> result = domainCtx.search(domain, filter, ..);
List<String> allGroupsOfUser = new List<String>();
while(result.hasMoreElements()) {
SearchResult searchResult = (SearchResult) result.nextElement();
String groupDistinguishedName = searchResult.getAttributes().get("distinguishedName").get();
allGroupsOfUser.add(groupDistinguishedName);
}
// Store them in database
storeAllGroupsOfUserInDatabase(userDn, allGroupsOfUser);
}
}
This application, however, takes lot of time, when there are too many users in the active directory. So, I decided to implement parallelism (using Threading). I divided this using search filter on distinguishedName of a user.
String filter = "(&(objectClass=User)(distinguishedName=a*"))";
and so on.. in each thread while fetching users.
I got better performance, but still not so good. Can someone suggest
a better way ?
Also, I don't have an idea how can I introduce
parallelism while fetching groups ?
If someone has any suggestions to do this better with powershell or C#, please suggest, I am open to technology.
Please note: reading user attribute memberOf does not provide all groups, hence I am fetching groups separately.

I'm not an Active Directory expert - just wanted to share some thoughts.
Threading by alphabet letter allows a maximum of 26 threads. Have you considered creating search threads by some other attributes, group membership etc? This might let you create more threads.
Review the Active Directory docs to see whether there is a way to improve search performance (for example, with a database we could create an index).

How to get all assets from an azure media service account

I am trying to get all assets from an azure media service account, Here is my code:
MediaContract mediaService = MediaService.create(MediaConfiguration.configureWithOAuthAuthentication(
mediaServiceUri, oAuthUri, AMSAccountName, AMSAccountKey, scope));
List<AssetInfo> info = mediaService.list(Asset.list());
However, this only gives me 1000 of them, and there are definitely more than that in the account.
In Azure table query, there is a token to be used to get more entries if there are more than 1000 of them.
Does anybody knows how I can get all assets for azure media service?
Thanks,

With Alex's help, i am able to hack the java-sdk the same way as this php implementation
Here are the codes:
List<AssetInfo> allAssets = new ArrayList<>();
int skip = 0;
while (true) {
List<AssetInfo> curAssets = mediaService.list(getAllAssetPage(skip));
if (curAssets.size() > 0) {
allAssets.addAll(curAssets);
if (curAssets.size() == 1000) {
System.out.println(String.format("Got %d assets.", allAssets.size()));
skip += 1000;
} else {
break;
}
} else {
break;
}
}
private static DefaultListOperation<AssetInfo> getAllAssetPage(int skip) {
return new DefaultListOperation<AssetInfo>("Assets",
new GenericType<ListResult<AssetInfo>>() {
}).setSkip(skip);
}

it is the built-in limit due to performance reasons (and REST v2), i believe. I think there is no way to retrieve all of them by one query.
It is possible, however, to use take and skip 1000 by 1000 etc.
But i see that you use MediaContract class, and i could not find it in the .NET repository - i guess it is Java one? I can't comment on that, but i believe the approach should be the same as described in the article (skip/take).
I have found the PHP implementation, maybe will be helpful.
https://msdn.microsoft.com/library/gg309461.aspx#BKMK_skip

Processing an emaillist async in MVC4

I'm trying to make my MVC4-website check to see if people should be alerted with an email because they haven't done something.
I'm having a hard time figuring out how to approach this. I checked if the shared hosting platform would allow me to activate some sort of cronjob, but this is not available.
So now my idea is to perform this check on each page-request, which already seems suboptimal (because of the overhead). But I thought that with using an async it would not be in the way of people just visiting the site.
I first tried to do this in the Application_BeginRequest method in Global.asax, but then it gets called multiple times per page-request, so that didn't work.
Next I found that I can make a Global Filter which executes on OnResultExecuted, which would seemed promising, but still it's no go.
The problem I get there is that I'm using MVCMailer to send the mails, and when I execute it I get the error: {"Value cannot be null.\r\nParameter name: httpContext"}
This probably means that mailer needs the context.
The code I now have in my global filter is the following:
public override void OnResultExecuted(ResultExecutedContext filterContext)
{
base.OnResultExecuted(filterContext);
HandleEmptyProfileAlerts();
}
private void HandleEmptyProfileAlerts()
{
new Thread(() =>
{
bool active = false;
new UserMailer().AlertFirst("bla#bla.com").Send();
DB db = new DB();
DateTime CutoffDate = DateTime.Now.AddDays(-5);
var ProfilesToAlert = db.UserProfiles.Where(x => x.CreatedOn < CutoffDate && !x.ProfileActive && x.AlertsSent.Where(y => y.AlertType == "First").Count() == 0).ToList();
foreach (UserProfile up in ProfilesToAlert)
{
if (active)
{
new UserMailer().AlertFirst(up.UserName).Send();
up.AlertsSent.Add(new UserAlert { AlertType = "First", DateSent = DateTime.Now, UserProfileID = up.UserId });
}
else
System.Diagnostics.Debug.WriteLine(up.UserName);
}
db.SaveChanges();
}).Start();
}
So my question is, am I going about this the right way, and if so, how can I make sure that MVCMailer gets the right context?

The usual way to do this kind of thing is to have a single background thread that periodically does the checks you're interested in.
You would start the thread from Application_Start(). It's common to use a database to queue and store work items, although it can also be done in memory if it's better for your app.

Azure Storage Table Does not return whole partition

I found some situation on production when
CloudContext.TableData.Where( A => A.PartitionKey == "MYKEY").ToList();
where TableData is
public DataServiceQuery<T> TableData { get { return CreateQuery<T>( _TableName ); } }
does not return the whole partition (I have less than 1000 records there).
In my case it returns 367 records while in VS2010 Server Explorer or in Azure Storage Explorer I get 414 records (condition is the same).
Did anyone experience the same problem?
Also If I change the query and add RowKey into the condition - I get required record with no problem.

You have to better understand the Table Service. In the official documentation here there are listed other conditions which affect number of records returned. If you want to retrieve the whole partition you have to inspect the TableResult for Continuation Token and use provided continuation token to execute the same query over and over again, until all the results come.
You can use an approach similar to the following:
private IEnumerable<MyEntityType> GetAllEntities()
{
var result = this._tables.GetSegmentedEntities(100, null); // null is for continuation token
while (result.Results.Count > 0)
{
foreach (var ufs in result.Results)
{
yield return new MyEntityType(ufs.RowKey, ufs.WhateverOtherPropertyINeed);
}
if (result.ContinuationToken != null)
{
result = this._tables.GetSegmentedEntities(100, result.ContinuationToken);
}
else
{
break;
}
}
}
Where GetSegmentedEntities(100, result.ContinuationToken) is defined as:
public TableQuerySegment<MyEntityType> GetSegmentedEntities(int pageSize, TableContinuationToken token)
{
var partKey = "My_Desired_Partition_key_passed_via_Const_or_method_Param";
TableQuery<MyEntityType> query = new TableQuery<MyEntityType>()
.Where(TableQuery.GenerateFilterCondition("PartitionKey", QueryComparisons.Equal, partKey));
query.TakeCount = pageSize;
return this.azureTableReference.ExecuteQuerySegmented<MyEntityType>(query, token);
}
You can use and modify this code for your case.

This is a known and documented behavior. The Table service API will either return 1000 entities or as much entities as possible within 5 seconds. If the query takes longer than 5 seconds to execute, it'll return a continuation token.
With the addition of rowkey you are making the query more specific and hence faster and as a result yo are getting all the entities.
See TimeOuts and Pagination on MSDN for details

If you are getting partial result sets then there will be two factors.
i) You are having more than 1000 records matching the filter
ii) Querying took more than 5 seconds.
iii) Query crosses partition boundary.
As you are having less than 1000 records the first factor wont be a issue.And as you are retrieving based on PartitionKey equality third one also wont cause any problem. You are facing this problem because of second factor.
Two handle this you need to work on continuation token. You can refer this link for more info.

Add or replace entity in Azure Table Storage

I'm working with Windows Azure Table Storage and have a simple requirement: add a new row, overwriting any existing row with that PartitionKey/RowKey. However, saving the changes always throws an exception, even if I pass in the ReplaceOnUpdate option:
tableServiceContext.AddObject(TableName, entity);
tableServiceContext.SaveChangesWithRetries(SaveChangesOptions.ReplaceOnUpdate);
If the entity already exists it throws:
System.Data.Services.Client.DataServiceRequestException: An error occurred while processing this request. ---> System.Data.Services.Client.DataServiceClientException: <?xml version="1.0" encoding="utf-8" standalone="yes"?>
<error xmlns="http://schemas.microsoft.com/ado/2007/08/dataservices/metadata">
<code>EntityAlreadyExists</code>
<message xml:lang="en-AU">The specified entity already exists.</message>
</error>
Do I really have to manually query for the existing row first and call DeleteObject on it? That seems very slow. Surely there is a better way?

As you've found, you can't just add another item that has the same row key and partition key, so you will need to run a query to check to see if the item already exists. In situations like this I find it helpful to look at the Azure REST API documentation to see what is available to the storage client library. You'll see that there are separate methods for inserting and updating. The ReplaceOnUpdate only has an effect when you're updating, not inserting.
While you could delete the existing item and then add the new one, you could just update the existing one (saving you one round trip to storage). Your code might look something like this:
var existsQuery = from e
in tableServiceContext.CreateQuery<MyEntity>(TableName)
where
e.PartitionKey == objectToUpsert.PartitionKey
&& e.RowKey == objectToUpsert.RowKey
select e;
MyEntity existingObject = existsQuery.FirstOrDefault();
if (existingObject == null)
{
tableServiceContext.AddObject(TableName, objectToUpsert);
}
else
{
existingObject.Property1 = objectToUpsert.Property1;
existingObject.Property2 = objectToUpsert.Property2;
tableServiceContext.UpdateObject(existingObject);
}
tableServiceContext.SaveChangesWithRetries(SaveChangesOptions.ReplaceOnUpdate);
EDIT: While correct at the time of writing, with the September 2011 update Microsoft have updated the Azure table API to include two upsert commands, Insert or Replace Entity and Insert or Merge Entity

In order to operate on an existing object NOT managed by the TableContext with either Delete or SaveChanges with ReplaceOnUpdate options, you need to call AttachTo and attach the object to the TableContext, instead of calling AddObject which instructs TableContext to attempt to insert it.
http://msdn.microsoft.com/en-us/library/system.data.services.client.dataservicecontext.attachto.aspx

in my case it was not allowed to remove it first, thus I do it like this, this will result in one transaction to server which will first remove existing object and than add new one, removing need to copy property values
var existing = from e in _ServiceContext.AgentTable
where e.PartitionKey == item.PartitionKey
&& e.RowKey == item.RowKey
select e;
_ServiceContext.IgnoreResourceNotFoundException = true;
var existingObject = existing.FirstOrDefault();
if (existingObject != null)
{
_ServiceContext.DeleteObject(existingObject);
}
_ServiceContext.AddObject(AgentConfigTableServiceContext.AgetnConfigTableName, item);
_ServiceContext.SaveChangesWithRetries();
_ServiceContext.IgnoreResourceNotFoundException = false;

Insert/Merge or Update was added to the API in September 2011. Here is an example using the Storage API 2.0 which is easier to understand then the way it is done in the 1.7 api and earlier.
public void InsertOrReplace(ITableEntity entity)
{
retryPolicy.ExecuteAction(
() =>
{
try
{
TableOperation operation = TableOperation.InsertOrReplace(entity);
cloudTable.Execute(operation);
}
catch (StorageException e)
{
string message = "InsertOrReplace entity failed.";
if (e.RequestInformation.HttpStatusCode == 404)
{
message += " Make sure the table is created.";
}
// do something with message
}
});
}

The Storage API does not allow more than one operation per entity (delete+insert) in a group transaction:
An entity can appear only once in the transaction, and only one operation may be performed against it.
see MSDN: Performing Entity Group Transactions
So in fact you need to read first and decide on insert or update.

You may use UpsertEntity and UpsertEntityAsync methods in the official Microsoft Azure.Data.Tables TableClient.
The fully working example is available at https://github.com/Azure-Samples/msdocs-azure-data-tables-sdk-dotnet/blob/main/2-completed-app/AzureTablesDemoApplicaton/Services/TablesService.cs --
public void UpsertTableEntity(WeatherInputModel model)
{
TableEntity entity = new TableEntity();
entity.PartitionKey = model.StationName;
entity.RowKey = $"{model.ObservationDate} {model.ObservationTime}";
// The other values are added like a items to a dictionary
entity["Temperature"] = model.Temperature;
entity["Humidity"] = model.Humidity;
entity["Barometer"] = model.Barometer;
entity["WindDirection"] = model.WindDirection;
entity["WindSpeed"] = model.WindSpeed;
entity["Precipitation"] = model.Precipitation;
_tableClient.UpsertEntity(entity);
}

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How to find out the number of entries or size of table in the Accumulo? - accumulo

I have inserted data into accumulo. But how would i know the exact count of the table ? Is there any method or API to read the accumulo table size of number of entries ?

Do you need to do it programatically? If not, there are various ways that you can do this. The easiest is to go to the Accumulo monitor page on port 50095. If you don't have a ton of data, from the command line you can simply do accumulo shell -u username -p password -e "scan -t foo -np" | wc -l

Related

Bringing Active Directory Users using JNDI in multiple threads

How to get all assets from an azure media service account

Processing an emaillist async in MVC4

Azure Storage Table Does not return whole partition

Add or replace entity in Azure Table Storage

Categories

Resources