Why custom .net activity in azure data factory never finishes - azure

I have followed several tutorials, and actually had other activities running in azure data factory. Now, this one in particular doesn't perform any action and yet it never finishes processing. In the activity window, attempts, it shows the status: Running (0% complete).
I am looking at a reason and also, at understanding how does one know what is going on at this stage for the activity. Is there a way to debug this things? I will include the source code, i'm sure is possible I am missing something:
public class MoveBlobsToSQLActivity : IDotNetActivity
{
public IDictionary<string, string> Execute(
IEnumerable<LinkedService> linkedServices, IEnumerable<Dataset> datasets, Activity activity, IActivityLogger logger)
{
logger.Write("Start");
//Get extended properties
DotNetActivity dotNetActivityPipeline = (DotNetActivity)activity.TypeProperties;
string sliceStartString = dotNetActivityPipeline.ExtendedProperties["SliceStart"];
//Get linked service details
Dataset inputDataset = datasets.Single(dataset => dataset.Name == activity.Inputs.Single().Name);
Dataset outputDataset = datasets.Single(dataset => dataset.Name == activity.Outputs.Single().Name);
/*
DO STUFF
*/
logger.Write("End");
return new Dictionary<string, string>();
}
}
Update 1:
After finding this post and following the instructions on the github repo I was able to debug my activity.
It was erroring out here Dataset inputDataset = datasets.Single(dataset => dataset.Name == activity.Inputs.Single().Name); and I would have expected it to finish execution with an error, but in the debugger, it kept going and going to the same result until the pipeline timed out. Weird.
Removed the error but still the pipeline never finishes although the debugger does now :(.
Update 2:
Not sure that the data factory is using in any way my code from the custom activity. I made changes: nothing, I deleted the zip file with the code: nothing, just says the activity is running. Nothing seems to change even if the supposed code is no longer there. I am assuming it is cached somewhere.

Could you share me the ADF runId, we can take a look what happen there.
For your local test (#1), it is also strange for me. It should not hang there.
Btw, I think re-deploy pipeline will cancel the run and start new run with new properties. :)

Related

Microsoft.TeamFoundation.Build.WebApi Get Build Status Launched by PR policy

In our pipeline we programmatically create a pull request (PR). The branch being merged into has a policy on it that launches a build. This build takes a variable amount of time. I need to query the build status until it is complete (or long timeout) so that I can complete the PR, and clean up the temp branch.
I am trying to figure out how to get the build that was kicked off by the PR so that I can inspect the status by using Microsoft.TeamFoundation.Build.WebApi, but all overloads of BuildHttpClientBase.GetBuildAsync require a build Id which I don't have. I would like to avoid using the Azure Build REST API. Does anyone know how I might get the Build kicked off by the PR without the build ID using BuildHttpClientBase?
Unfortunately the documentation doesn't offer a lot of detail about functionality.
Answering the question you asked:
Finding a call that provides the single deterministic build id for a pull request doesn't seem to be very readily available.
As mentioned, you can use BuldHttpClient.GetBuildsAsync() to filter builds based on branch, repository, requesting user and reason.
Adding the BuildReason.PullRequest value in the request is probably redundant according to the branch you will need to pass.
var pr = new GitPullRequest(); // the PR you've received after creation
var requestedFor = pr.CreatedBy.DisplayName;
var repo = pr.Repository.Id.ToString();
var branch = $"refs/pull/{pr.PullRequestId}/merge";
var reason = BuildReason.PullRequest;
var buildClient = c.GetClient<BuildHttpClient>();
var blds = await buildClient.GetBuildsAsync("myProject",
branchName: branch,
repositoryId: repo,
requestedFor: requestedFor,
reasonFilter: reason,
repositoryType: "TfsGit");
In your question you mentioned wanting the build (singular) for the pull request, which implies that you only have one build definition acting as the policy gate. This method can return multiple Builds based on the policy configurations on your target branch. However, if that were your setup, it would seem logical that your question would then be asking for all those related builds for which you would wait to complete the PR.
I was looking into Policy Evaluations to see if there was a more straight forward way to get the id of the build being run via policy, but I haven't been able to format the request properly as per:
Evaluations are retrieved using an artifact ID which uniquely identifies the pull request. To generate an artifact ID for a pull request, use this template:
vstfs:///CodeReview/CodeReviewId/{projectId}/{pullRequestId}
Even using the value that is returned in the artifactId field on the PR using the GetById method results in a Doesn't exist or Don't have access response, so if someone else knows how to use this method and if it gives exact build ids being evaluated for the policy configurations, I'd be glad to hear it.
An alternative to get what you actually desire
It sounds like the only use you have for the branch policy is to run a "gate build" before completing the merge.
Why not create the PR with autocomplete.
Name - autoCompleteSetBy
Type - IdentityRef
Description - If set, auto-complete is enabled for this pull request and this is the identity that enabled it.
var me = new IdentityRef(); // you obviously need to populate this with real values
var prClient = connection.GetClient<GitHttpClient>();
await prClient.CreatePullRequestAsync(new GitPullRequest()
{
CreatedBy = me,
AutoCompleteSetBy = me,
Commits = new GitCommitRef[0],
SourceRefName = "feature/myFeature",
TargetRefName = "master",
Title = "Some good title for my PR"
},
"myBestRepository",
true);

Azure function CosmosDbTrigger (Start from the beginning option)

I have a azure function with cosmos db trigger which makes some calculations and write results to db. If something goes wrong i want to have a possibility to start from the first item or specific item make calculations again. Is it possible? Thanks
public static void Run([CosmosDBTrigger(
databaseName: "db",
collectionName: "collection",
ConnectionStringSetting = "DocDbConnStr",
CreateLeaseCollectionIfNotExists = true,
LeaseCollectionName = "leases")]IReadOnlyList<Document> input, TraceWriter log)
{
...
}
Right now, the StartFromBeginning option is not exposed to the Cosmos DB Trigger. The default behavior is to start receiving changes from the moment the Function starts running, leases/checkpoints will be generated in case the Host/Runtime shutsdown so when the Host/Runtime is back up it will pickup from the last checkpointed item.
The Trigger does not implement dead-lettering or error handling as it might generate infinite-loops / unexpected billing / multiple processing of the same batch if the error is not related to the batch itself (for example, you process the documents and then send an email and the email fails, the entire batch would be re-processed for an error not related to the feed itself), so we recommend users to implement their own try/catch or error handling logic inside the Function's code. It's the same approach as the Event Hub Trigger.
That being said, we are in the process of exposing several new options on the Trigger and there is a contributor working on an advanced retrying mechanism.
As #Matias Quaranta and #Pankaj Rawat say in the comments, the accept answer is old and is no longer true. You can use StartFromTheBeginning as a C# attribute within your azure function's parameter list like so:
[FunctionName(nameof(MyAzureFunction))]
public async Task RunAsync([CosmosDBTrigger(
databaseName: "myCosmosDbName",
collectionName: "myCollectionName",
ConnectionStringSetting = "cosmosConnectionString",
LeaseCollectionName = "leases",
CreateLeaseCollectionIfNotExists = true,
MaxItemsPerInvocation = 1000,
StartFromBeginning = true)]IReadOnlyList<Document> documents)
{
....
}
Please change the accepted answer.
The current offsets (positions in Cosmos DB change feed) are managed by clients, Azure Functions runtime in this case.
Functions store the offsets in lease collection (it's called leases in your example).
To restart from a specific item, you would have to make a snapshot of documents in leases collection at some point, and then restore your current collection to that snapshot when needed.
I am not familiar with a tool that automates that for you, other than generic tools working with Cosmos DB collections.
Check startFromBeginning option available in Function v2. Unfortunately, I'm still using V1 and not able to verify.
When set, it tells the Trigger to start reading changes from the beginning of the history of the collection instead of the current time. This only works the first time the Trigger starts, as in subsequent runs, the checkpoints are already stored. Setting this to true when there are leases already created has no effect.

Optimistic concurrency despite lock

I have multiple threads running a batch job. When each thread finishes it calls this method of mine:
private static readonly Object lockVar = new Object();
public void UserIsDone(int batchId, int userId)
{
//Get the batch user
var batchUser = context.ScheduledUsersBatchUsers.SingleOrDefault(x => x.User.Id == userId && x.Batch.Id == batchId);
if (batchUser != null)
{
lock (lockVar)
{
context.ScheduledUsersBatchUsers.Remove(batchUser);
context.SaveChanges();
//Try to get the batch with the assumption it has no users left. If we do get the batch back, it means there are no users left.
var dbBatch = context.ScheduledUsersBatches.SingleOrDefault(x => x.Id == batchId && !x.Users.Any());
//So this must have been the last user, the batch is empty, so we fetch it and remove it.
if (dbBatch != null)
{
context.ScheduledUsersBatches.Remove(dbBatch);
context.SaveChanges();
}
}
}
}
What this method does is very simple, it looks up the "BatchUser" to remove him from the queue, which it does. That part works swell.
However, after removing the user I want to check if that was the last user in the whole batch. But since this is multithreaded a race condition can happen.
So I put the removing of the batch user within a lock, after I remove the user, I check if the batch has no more batch users.
But here is my problem... even tho I have a lock, and the query to get the "dbBatch" clearly requires it to have no users to return the object... even so, I sometimes get it back with users like so:
When I do get that, I also get the following error on SaveChanges()
However, at other times I get the dbBatch object back correctly with no children, like so:
And when I do, it all works great, no exceptions.
With debugger I can catch the error by setting a breakpoint on the lock statement (see screenshot one). Then all threads get to the lock (while one goes in). Then I always get the error.
If I only have a breakpoint inside the if-statement it's more random.
With the lock in place, I don't see how this happens.
Update
I Ninject my context, and this is my ninject code
kernel.Bind<MyContext>()
.To<MyContext>()
.InRequestScope()
.WithConstructorArgument("connectionStringOrName", "MyConnection");
kernel.Bind<DbContext>().ToMethod(context => kernel.Get<MyContext>()).InRequestScope();
Update 2
I also tried this solution https://msdn.microsoft.com/en-us/data/jj592904.aspx
But strangely I don't get a DbUpdateConcurrencyException but rather I get a DbUpdateException that has an InnerException that is OptimisticConcurrencyException.
But neither DbUpdateException or OptimisticConcurrencyException contains a Entries property so I can't do ex.Entries.Single().Reload();
I'm also adding the exception in text form here
Here in text also, The outer exception of type DbUpdateException: {"An error occurred while saving entities that do not expose foreign key properties for their relationships. The EntityEntries property will return null because a single entity cannot be identified as the source of the exception. Handling of exceptions while saving can be made easier by exposing foreign key properties in your entity types. See the InnerException for details."}
The InnerException of type OptimisticConcurrencyException: {"Store update, insert, or delete statement affected an unexpected number of rows (0). Entities may have been modified or deleted since entities were loaded. See http://go.microsoft.com/fwlink/?LinkId=472540 for information on understanding and handling optimistic concurrency exceptions."}

Azure table entity existence/synchronisation

I'm using an azure table query to retrieve all error entities assigned to a user.
Afther that I change a property of the entity to state that the entity is in processing mode.
After I have processed the entity I remove the entity from the table.
When I do parallel tests it can happen that during the query, an entity was already processed and deleted by another thread. So I get the error 404 ResourceNotFound when I want to Replace the entity.
Is there a way to test, if the entity was changed outside of the thread or if it still exists? Is it better to catch error 404 and ignore it or should I query for the entity again (seems all not right for me)?
TableQuery<ErrorObjectTableEntity> query = new TableQuery<ErrorObjectTableEntity>().Where(TableQuery.GenerateFilterCondition("PartitionKey", QueryComparisons.Equal, user));
List<ErrorObjectTableEntity> queryResult = table.ExecuteQuery(query).OrderBy(x => x.action).ToList();
foreach (ErrorObjectTableEntity entity in queryResult)
{
entity.inProcess = true;
try
{
TableOperation updateOperation = TableOperation.Replace(entity);
table.Execute(updateOperation);
}
catch
{
//..some logging here
//catch error 404?
}
//do some action
try
{
TableOperation deleteOperation = TableOperation.Delete(entity);
table.Execute(deleteOperation);
}
catch{...}
}
There are a couple of issues here as far as best practice. Your code as written could simply ignore the exception assuming another worker removed it but this could end up masking other classes of errors. One solution would be to use Queues to insert messages per user query, and then have various workers retrieve a message and process the query for a specific user. This way if a node goes down the app would absorb the fault and continue on. Additionally, this would keep your workers from duplicating work which would optimize the entire application. Lastly, if you don't care about the state of the entity and the keys are predictable you can use the Merge semantic to simply update a given property of an Entity without replacing the entire thing.
You should just catch the 404 error. Although they're represented as exceptions in .NET, HTTP 4xx error codes are more informational than exceptional. (5xx error codes are exceptional.)
Even if you checked that the entity existed before doing the replace, you would still need to catch the NotFound error in case it had been deleted between the check and the replace call. So you might as well skip the check.

MVC 3 EF Code-first to webhost database trouble

Im fairly new to ASP.NET MVC 3, and to coding in general really.
I have a very very small application i want to upload to my webhosting domain.
I am using entity framework, and it works fine on my local machine.
I've entered a new connection string to use my remote database instead however it dosen't really work, first of all i have 1 single MSSQL database, which cannot be de dropped and recreated, so i cannot use that strategy in my initializer, i tried to supply null in the strategy, but to no avail, my tables simply does not get created in my database and thats the problem, i don't know how i am to do that with entity framework.
When i run the application, it tries to select the data from the database, that part works fine, i just dont know how to be able to create those tabes in my database through codefirst.
I could probaly get it to work through manually recreating the tables, but i want to know the solution through codefirst.
This is my initializer class
public class EntityInit : DropCreateDatabaseIfModelChanges<NewsContext>
{
private NewsContext _db = new NewsContext();
protected override void Seed(NewsContext context)
{
new List<News>
{
new News{ Author="Michael Brandt", Title="Test News 1 ", NewsBody="Bblablabalblaaaaa1" },
new News{ Author="Michael Brandt", Title="Test News 2 ", NewsBody="Bblablabalblaaaaa2" },
new News{ Author="Michael Brandt", Title="Test News 3 ", NewsBody="Bblablabalblaaaaa3" },
new News{ Author="Michael Brandt", Title="Test News 4 ", NewsBody="Bblablabalblaaaaa4" },
}.ForEach(a => context.News.Add(a));
base.Seed(context);
}
}
As i said, im really new to all this, so excuse me, if im lacking to provide the proper information you need to answer my question, just me know and i will answer it
Initialization strategies do not support upgrade strategies at the moment.
Initialization strategies should be used to initialise a new database. all subsequent changes should be done using scripts at the moment.
the best practice as we speak is to modify the database with a script, and then adjust by hand the code to reflect this change.
in future releases, upgrade / migration strategies will be available.
try to execute the scripts statement by statement from a custom IDatabaseInitializer
then from this you can read the database version in the db and apply the missing scripts to your database. simply store a db version in a table. then level up with change scripts.
public class Initializer : IDatabaseInitializer<MyContext>
{
public void InitializeDatabase(MyContext context)
{
if (!context.Database.Exists() || !context.Database.CompatibleWithModel(false))
{
context.Database.Delete();
context.Database.Create();
var jobInstanceStateList = EnumExtensions.ConvertEnumToDictionary<JobInstanceStateEnum>().ToList();
jobInstanceStateList.ForEach(kvp => context.JobInstanceStateLookup.Add(
new JobInstanceStateLookup()
{
JobInstanceStateLookupId = kvp.Value,
Value = kvp.Key
}));
context.SaveChanges();
}
}
}
Have you tried to use the CreateDatabaseOnlyIfNotExists
– Every time the context is initialized, database will be recreated if it does not exist.
The database initializer can be set using the SetInitializer method of the Database class.If nothing is specified it will use the CreateDatabaseOnlyIfNotExists class to initialize the database.
Database.SetInitializer(null);
-
Database.SetInitializer<NewsContext>(new CreateDatabaseOnlyIfNotExists<NewsContext>());
I'm not sure if this is the exact syntax as I have not written this in a while. But it should be very similar.
If you are using a very small application, you maybe could go for SQL CE 4.0.
The bin-deployment should allow you to run SQL CE 4.0 even if your provider doesn't have the binaries installed for it. You can read more here.
That we you can actually use whatever initializer you want, since you now don't have the problem of not being able to drop databases and delete tables.
could this be of any help?

Resources