Parallel foreach ConcurrentDictionary<string, string> add - c#-4.0

I have entries like in a phone book: name + address.
The source is on a web site, the count is over 1K records.
Question is:
How do i use/implement ConcurrentDictionary with ParallelForeach?
I might as well ask will it better perform:
ConcurrentDictionary & ParallelForeach
vs
Dictionary & foreach
As the name is not allowed to have duplicates being the key, and i think i understood correctly that ConcurrentDictionary has its own built-in function to add(TryAdd) only if key does not exists.
so the issue of not allowing adding duplicated keys already taken cared of, so from that point i could clearly see the balance is turning towards ConcurrentDictionary rather than standard-sequential Dictionary
So how do I add name & address from any given data source and load it via Parallelforeach into a ConcurrentDictionary

the count is over 1K records.
How much over 1K? Because 1K records would be added in the blink of an eye, without any need for parallelization.
Additionally, if you're fetching the data over the network, that cost will vastly dwarf the cost of adding to a dictionary. So unless you can parallelize fetching the data, there's going to be no point in making the code more complicated to add the data to the dictionary in parallel.

This is quiet an old question, but this might help someone:
If you are trying to chunk through the ConcurrentDictionary and do some processing:
using System.Collections.Generic;
using System.Threading.Tasks;
using System.Collections.Concurrent;
namespace ConcurrenyTests
{
public class ConcurrentExample
{
ConcurrentExample()
{
ConcurrentDictionary<string, string> ConcurrentPairs = new ConcurrentDictionary<string, string>();
Parallel.ForEach(ConcurrentPairs, (KeyValuePair<string, string> pair) =>
{
// Do Stuff with
string key = pair.Key;
string value = pair.Value;
});
}
}
}
I don't think you would be able to use Parallel.ForEach to be able to insert into a new dictionary unless you already had an object of same length that you were iterating over. i.e. an list with the URL's of text documents you were wanting to download and insert into the dictionary. If that were the case, then you could use something along the lines of:
using System.Threading.Tasks;
using System.Collections.Concurrent;
namespace ConcurrenyTests
{
public class ConcurrentExample
{
ConcurrentExample()
{
ConcurrentDictionary<string, string> ConcurrentPairs = new ConcurrentDictionary<string, string>();
ConcurrentBag<string> WebAddresses = new ConcurrentBag<string>();
Parallel.ForEach(WebAddresses, new ParallelOptions { MaxDegreeOfParallelism = 4 }, (string webAddress) =>
{
// Fetch from webaddress
string webText;
// Try Add
ConcurrentPairs.TryAdd(webAddress, webText);
// GetOrUpdate
ConcurrentPairs.AddOrUpdate(webAddress, webText, (string key, string oldValue) => webText);
});
}
}
}
If accessing from a webserver, you may want to increase or decrease the MaxDefreeOfParallelism so that your bandwidth is not choked.
Parallel.ForEach: https://learn.microsoft.com/en-us/dotnet/api/system.threading.tasks.parallel.foreach?view=netcore-2.2
ParallelOptions: https://learn.microsoft.com/en-us/dotnet/api/system.threading.tasks.paralleloptions?view=netcore-2.2

Related

Get the Scan Operator when releasing documents

When releasing documents the scan operator should get logged to a file. I know this is a kofax system variable but how do I get it from the ReleaseData object?
Maybe this value is hold by the Values collection? What is the key then? I would try to access it by using
string scanOperator = documentData.Values["?scanOperator?"].Value;
Kofax's weird naming convention strikes again - during setup, said items are referred to as BatchVariableNames. However, during release they are KFX_REL_VARIABLEs (an enum named KfxLinkSourceType).
Here's how you can add all available items during setup:
foreach (var item in setupData.BatchVariableNames)
{
setupData.Links.Add(item, KfxLinkSourceType.KFX_REL_VARIABLE, item);
}
The following sample iterates over the DocumentData.Values collection, storing each BatchVariable in a Dictionary<string, string> named BatchVariables.
foreach (Value v in DocumentData.Values)
{
switch (v.SourceType)
{
case KfxLinkSourceType.KFX_REL_VARIABLE:
BatchVariables.Add(v.SourceName, v.Value);
break;
}
}
You can then access any of those variables by key - for example Scan Operator's User ID yields the scan user's domain and name.

How do I configure Hazelcast read-through Map when only part of the nodes are able to populate the Map data?

Let's say I have two types of Hazelcast nodes running on cluster:
"Leader" nodes – these are able to load and populate Hazelcast map M. Leaders will also update values in M from time to time (based on external resource).
"Follower" nodes – these will need to read from M
My intent is for Follower nodes to trigger loading missing elements into M (loading thus needs to be done on Leader side) .
Roughly, the steps made to get an element from map could look like this:
IMap m = hazelcastInstance.getMap("M");
if (!m.containsKey(k)) {
if (iAmLeader()) {
Object fresh = loadByKey(k); // loading from external resource
return m.put(k, fresh);
} else {
makeSomeLeaderPopulateValueForKey(k);
}
}
return m.get(k);
What approach could you suggest?
Notes
I want Followers to act as nodes, not just clients, because there are going to be far more Follower instances than Leaders and I would like them to participate in load distribution.
I could just build another level of service, that would run only on Leader nodes and provide interface to populate map with requested keys. But that would mean adding extra layer of communication and configuration, and I was hoping that the kind of requirements stated above could be solved within single Hazelcast cluster.
I think I may have found an answer in the form of MapLoader (EDIT since originally posting, I have confirmed this is indeed the way to do this).
final Config config = new Config();
config.getMapConfig("MY_MAP_NAME").setMapStoreConfig(
new MapStoreConfig().setImplementation(new MapLoader<KeyType, ValueType>(){
#Override
public ValueType load(final KeyType key) {
//when a client asks for data for corresponding key of type
//KeyType that isn't already loaded
//this function will be invoked and give you a chance
//to load it and return it
ValueType rv = ...;
return rv;
}
#Override
public Map<KeyType, ValueType> loadAll(
final Collection<KeyType> keys) {
//Similar to MapLoader#load(KeyType), except this is
//a batched version of it for performance gains.
//this gets called on first access to the cache,
//where MapLoader#loadAllKeys() is called to get
//the keys parameter for this funcion
Map<KeyType, ValueType> rv = new HashMap<>();
keys.foreach((key)->{
rv.put(key, /*figure out what key means*/);
});
return rv;
}
#Override
public Set<KeyType> loadAllKeys() {
//Prepopulate all the keys. My understanding is that
//this is an initialization step, to give you a chance
//to load data on startup so an initial set of datas
//will be available to anyone using the cache. Any keys
//returned here are sent to MapLoader#loadAll(Collection)
Set<KeyType> rv = new HashSet<>();
//figure out what keys need to be in the return value
//to load a key into cache at first access to this map,
//named "MY_MAP_NAME" in this example
return rv;
}
}));
config.getGroupConfig().setName("MY_INSTANCE_NAME").setPassword("my_password");
final HazelcastInstance hazelcast = Hazelcast
.getOrCreateHazelcastInstance(config);

Document not available in query direct after store

I'm trying to store a "Role" object and then get a list of Roles, as shown here:
public class Role
{
public Guid RoleId { get; set; }
public string RoleName { get; set; }
public string RoleDescription { get; set; }
}
//Function store:
private void StoreRole(Role role)
{
using (var docSession = docStore.OpenSession())
{
docSession.Store(role);
docSession.SaveChanges();
}
}
// then it return and a function calls this
public List<Role> GetRoles()
{
using (var docSession = docStore.OpenSession())
{
var Roles = from roles in docSession.Query<Role>() select roles;
return Roles.ToList();
}
}
However, in the GetRoles I am missing the last inserted record/document. If I wait 200ms and then call this function the item is there.
So I am not in sync. ?!
How can I solve this, or alternately how could I know when the result is in the document store for querying?
I've used transactions, but cannot figure this out. Update and delete are just fine, but when inserting I need to delay my 'List' call.
You are treating RavenDB as if it is a relational database, and it isn't. Load and Store are ACID operations in RavenDB, Query is not. Indexes (necessary for queries) are updated asynchronously, and in fact, temporary indexes may have to be built from scratch when you do a session.Query<T>() without a durable index specified. So, if you are trying to query for information you JUST stored, or if you are doing the FIRST query that requires a temporary index to be created, you probably won't get the data you expect.
There are methods of customizing your query to wait for non-stale results but you shouldn't lean on these too much because they're indicative of a bad design - it is better to figure out a better way to do the same thing in a way that embraces eventual consistency, either changing your model (so you get consistency via Load/Store - perhaps you could have one document that defines ALL of the roles in a list?) or by changing the application flow so you don't need to Store and then immediately Query.
An additional way of solving this is to query the index with WaitForNonStaleResultsAsOfLastWrite() turned on inside the save function. That way when the save is completed the index will be updated to at least include the change you just made.
You can read more about this here

Extract paging from IQueryable

I'm using a function to allow query composition from Web UI and I would to implement paging functionality which it will be available for dataBound controls such as ObjectDataSource, gridView, etc:
public class MyClass<TEntity> where TEntity : class
{
FakeEntities xxx = new FakeEntities();
public IEnumerable<TEntity> Get(Func<IQueryable<TEntity>, IQueryable<TEntity>> queryExpression)
{
var query = xxx.Set<TEntity>();
return queryExpression(query).ToList();
}
public int Count()
{
// What Can I return?
}
}
// **** USAGE ****
MyClass<User> u = new MyClass<User>();
var all = u.Get(p => p.Where(z => z.Account == "Smith").OrderBy(order => order.IdOther).Skip(1).Take(2));
The above query use Take and Skip function, so can I get real count of my entities? Obviously I must return Query Count without modifying filter expression.
I found this solution: Get count of an IQueryable<T>
However I get targetInvocationException with inner message {"This method supports the LINQ to Entities infrastructure and is not intended to be used directly from your code."}
I know my request could be freak-abnormal, because best practice should to impose to move "presentation needs" to some wrap class and that's is what I'll do. So I don't need anymore to get Count entities on my business logic class.
That's just UI concern only.
Thank you the same.

Orchard CMS: Do I have to add a new layer for each page when the specific content for each page is spread in different columns?

Lets say I want a different main image for each page, situated above the page title. Also, I need to place page specific images in the left bar, and page specific text in the right bar. In the right and left bars, I also want layer specific content.
I can't see how I can achieve this without creating a layer for each and every page in the site, but then I end up with a glut of layers that only serve one page which seems too complex.
What am I missing?
If there is a way of doing this using Content parts, it would be great if you can point me at tutorials, blogs, videos to help get my head round the issue.
NOTE:
Sitefinity does this sort of thing well, but I find Orchard much simpler for creating module, as well as the fact that it is MVC which I find much easier.
Orchard is free, I understand (and appreciate) that. Just hoping that as the product evolves this kind of thing will be easier?
In other words, I'm hoping for the best of all worlds...
There is a feature in the works for 1.5 to make that easier, but in the meantime, you can already get this to work quite easily with just a little bit of code. You should first add the fields that you need to your content type. Then, you are going to send them to top-level layout zones using placement. Out of the box, placement only targets local content zones, but this is what we can work around with a bit of code by Pete Hurst, a.k.a. randompete. Here's the code:
ZoneProxyBehavior.cs:
=====================
using System;
using System.Collections.Generic;
using System.Linq;
using ClaySharp;
using ClaySharp.Behaviors;
using Orchard.Environment.Extensions;
namespace Downplay.Origami.ZoneProxy.Shapes {
[OrchardFeature("Downplay.Origami.ZoneProxy")]
public class ZoneProxyBehavior : ClayBehavior {
public IDictionary<string, Func<dynamic>> Proxies { get; set; }
public ZoneProxyBehavior(IDictionary<string, Func<dynamic>> proxies) {
Proxies = proxies;
}
public override object GetMember(Func<object> proceed, object self, string name) {
if (name == "Zones") {
return ClayActivator.CreateInstance(new IClayBehavior[] {
new InterfaceProxyBehavior(),
new ZonesProxyBehavior(()=>proceed(), Proxies, self)
});
}
// Otherwise proceed to other behaviours, including the original ZoneHoldingBehavior
return proceed();
}
public class ZonesProxyBehavior : ClayBehavior {
private readonly Func<dynamic> _zonesActivator;
private readonly IDictionary<string, Func<dynamic>> _proxies;
private object _parent;
public ZonesProxyBehavior(Func<dynamic> zonesActivator, IDictionary<string, Func<dynamic>> proxies, object self) {
_zonesActivator = zonesActivator;
_proxies = proxies;
_parent = self;
}
public override object GetIndex(Func<object> proceed, object self, IEnumerable<object> keys) {
var keyList = keys.ToList();
var count = keyList.Count();
if (count == 1) {
// Here's the new bit
var key = System.Convert.ToString(keyList.Single());
// Check for the proxy symbol
if (key.Contains("#")) {
// Find the proxy!
var split = key.Split('#');
// Access the proxy shape
return _proxies[split[0]]()
// Find the right zone on it
.Zones[split[1]];
}
// Otherwise, defer to the ZonesBehavior activator, which we made available
// This will always return a ZoneOnDemandBehavior for the local shape
return _zonesActivator()[key];
}
return proceed();
}
public override object GetMember(Func<object> proceed, object self, string name) {
// This is rarely called (shape.Zones.ZoneName - normally you'd just use shape.ZoneName)
// But we can handle it easily also by deference to the ZonesBehavior activator
return _zonesActivator()[name];
}
}
}
}
And:
ZoneShapes.cs:
==============
using System;
using System.Collections.Generic;
using Orchard.DisplayManagement.Descriptors;
using Orchard;
using Orchard.Environment.Extensions;
namespace Downplay.Origami.ZoneProxy.Shapes {
[OrchardFeature("Downplay.Origami.ZoneProxy")]
public class ZoneShapes : IShapeTableProvider {
private readonly IWorkContextAccessor _workContextAccessor;
public ZoneShapes(IWorkContextAccessor workContextAccessor) {
_workContextAccessor = workContextAccessor;
}
public void Discover(ShapeTableBuilder builder) {
builder.Describe("Content")
.OnCreating(creating => creating.Behaviors.Add(
new ZoneProxyBehavior(
new Dictionary<string, Func<dynamic>> { { "Layout", () => _workContextAccessor.GetContext().Layout } })));
}
}
}
With this, you will be able to address top-level layout zones using Layout# in front of the zone name you want to address, for example Layout#BeforeContent:1.
ADDENDUM:
I have used Bertrand Le Roy's code (make that Pete Hurst's code) and created a module with it, then added 3 content parts that are all copies of the bodypart in Core/Common.
In the same module I have created a ContentType and added my three custom ContentParts to it, plus autoroute and bodypart and tags, etc, everything to make it just like the Orchard Pages ContentType, only with more Parts, each with their own shape.
I have called my ContentType a View.
So you can now create pages for your site using Views. You then use the ZoneProxy to shunt the custom ContentPart shapes (Parts_MainImage, Parts_RightContent, Parts_LeftContent) into whatever Zones I need them in. And job done.
Not quite Sitefinity, but as Bill would say, Good enough.
The reason you have to create your own ContentParts that copy BodyPart instead of just using a TextField, is that all TextFields have the same Shape, so if you use ZoneProxy to place them, they all end up in the same Zone. Ie, you build the custom ContentParts JUST so that you get the Shapes. Cos it is the shapes that you place with the ZoneProxy code.
Once I have tested this, I will upload it as a module onto the Orchard Gallery. It will be called Wingspan.Views.
I am away on holiday until 12th June 2012, so don't expect it before the end of the month.
But essentially, with Pete Hurst's code, that is how I have solved my problem.
EDIT:
I could have got the same results by just creating the three content parts (LeftContent, RightContent, MainImage, etc), or whatever content parts are needed, and then adding them to the Page content type.
That way, you only add what is needed.
However, there is some advantage in having a standard ContentType that can be just used out of the box.
Using placement (Placement.info file) you could use the MainImage content part for a footer, for example. Ie, the names should probably be part 1, part 2, etc.
None of this would be necessary if there was a way of giving the shape produced by the TextField a custom name. That way, you could add as may TextFields as you liked, and then place them using the ZoneProxy code. I'm not sure if this would be possible.

Resources