Is it possible to seed Guid.NewGuid()? Eg. from GetHashCode()? [duplicate] - c#-4.0

In our application we are creating Xml files with an attribute that has a Guid value. This value needed to be consistent between file upgrades. So even if everything else in the file changes, the guid value for the attribute should remain the same.
One obvious solution was to create a static dictionary with the filename and the Guids to be used for them. Then whenever we generate the file, we look up the dictionary for the filename and use the corresponding guid. But this is not feasible because we might scale to 100's of files and didnt want to maintain big list of guids.
So another approach was to make the Guid the same based on the path of the file. Since our file paths and application directory structure are unique, the Guid should be unique for that path. So each time we run an upgrade, the file gets the same guid based on its path. I found one cool way to generate such 'Deterministic Guids' (Thanks Elton Stoneman). It basically does this:
private Guid GetDeterministicGuid(string input)
{
//use MD5 hash to get a 16-byte hash of the string:
MD5CryptoServiceProvider provider = new MD5CryptoServiceProvider();
byte[] inputBytes = Encoding.Default.GetBytes(input);
byte[] hashBytes = provider.ComputeHash(inputBytes);
//generate a guid from the hash:
Guid hashGuid = new Guid(hashBytes);
return hashGuid;
}
So given a string, the Guid will always be the same.
Are there any other approaches or recommended ways to doing this? What are the pros or cons of that method?

As mentioned by #bacar, RFC 4122 ยง4.3 defines a way to create a name-based UUID. The advantage of doing this (over just using a MD5 hash) is that these are guaranteed not to collide with non-named-based UUIDs, and have a very (very) small possibility of collision with other name-based UUIDs.
There's no native support in the .NET Framework for creating these, but I posted code on GitHub that implements the algorithm. It can be used as follows:
Guid guid = GuidUtility.Create(GuidUtility.UrlNamespace, filePath);
To reduce the risk of collisions with other GUIDs even further, you could create a private GUID to use as the namespace ID (instead of using the URL namespace ID defined in the RFC).

This will convert any string into a Guid without having to import an outside assembly.
public static Guid ToGuid(string src)
{
byte[] stringbytes = Encoding.UTF8.GetBytes(src);
byte[] hashedBytes = new System.Security.Cryptography
.SHA1CryptoServiceProvider()
.ComputeHash(stringbytes);
Array.Resize(ref hashedBytes, 16);
return new Guid(hashedBytes);
}
There are much better ways to generate a unique Guid but this is a way to consistently upgrading a string data key to a Guid data key.

As Rob mentions, your method doesn't generate a UUID, it generates a hash that looks like a UUID.
The RFC 4122 on UUIDs specifically allows for deterministic (name-based) UUIDs - Versions 3 and 5 use md5 and SHA1(respectively). Most people are probably familiar with version 4, which is random. Wikipedia gives a good overview of the versions. (Note that the use of the word 'version' here seems to describe a 'type' of UUID - version 5 doesn't supercede version 4).
There seem to be a few libraries out there for generating version 3/5 UUIDs, including the python uuid module, boost.uuid (C++) and OSSP UUID. (I haven't looked for any .net ones)

You need to make a distinction between instances of the class Guid, and identifiers that are globally unique. A "deterministic guid" is actually a hash (as evidenced by your call to provider.ComputeHash). Hashes have a much higher chance of collisions (two different strings happening to produce the same hash) than Guid created via Guid.NewGuid.
So the problem with your approach is that you will have to be ok with the possibility that two different paths will produce the same GUID. If you need an identifier that's unique for any given path string, then the easiest thing to do is just use the string. If you need the string to be obscured from your users, encrypt it - you can use ROT13 or something more powerful...
Attempting to shoehorn something that isn't a pure GUID into the GUID datatype could lead to maintenance problems in future...

MD5 is weak, I believe you can do the same thing with SHA-1 and get better results.
BTW, just a personal opinion, dressing a md5 hash up as a GUID does not make it a good GUID. GUIDs by their very nature are non Deterministic. this feels like a cheat. Why not just call a spade a spade and just say its a string rendered hash of the input. you could do that by using this line, rather than the new guid line:
string stringHash = BitConverter.ToString(hashBytes)

Here's a very simple solution that should be good enough for things like unit/integration tests:
var rnd = new Random(1234); // Seeded random number (deterministic).
Console.WriteLine($"{rnd.Next(0, 255):x2}{rnd.Next(0, 255):x2}{rnd.Next(0, 255):x2}{rnd.Next(0, 255):x2}-{rnd.Next(0, 255):x2}{rnd.Next(0, 255):x2}-{rnd.Next(0, 255):x2}{rnd.Next(0, 255):x2}-{rnd.Next(0, 255):x2}{rnd.Next(0, 255):x2}-{rnd.Next(0, 255):x2}{rnd.Next(0, 255):x2}{rnd.Next(0, 255):x2}{rnd.Next(0, 255):x2}{rnd.Next(0, 255):x2}{rnd.Next(0, 255):x2}");

Related

CRM 2011 Plugin - Does using early bound entities for attribute names cause memory issues?

In my plugin code, I use early bound entities (generated via the crmsvcutil). Within my code, I am using MemberExpression to retrieve the name of the property. For instance, if I want the full name of the user who initiated the plugin I do the following
SystemUser pluginExecutedBy = new SystemUser();
pluginExecutedBy = Common.RetrieveEntity(service
, SystemUser.EntityLogicalName
, new ColumnSet(new string[] {Common.GetPropertyName(() => pluginExecutedBy.FullName)})
, localContext.PluginExecutionContext.InitiatingUserId).ToEntity<SystemUser>();
The code for GetPropertyName is as follows
public static string GetPropertyName<T>(Expression<Func<T>> expression)
{
MemberExpression body = (MemberExpression)expression.Body;
return body.Member.Name.ToLower();
}
The code for RetrieveEntity is as follows
public static Entity RetrieveEntity(IOrganizationService xrmService, string entityName, ColumnSet columns, Guid entityId)
{
return (Entity)xrmService.Retrieve(entityName, entityId, columns);
}
My solution architect's comments:
Instead of writing the code like above, why not write it like this (hardcoding the name of the field - or use a struct).
SystemUser pluginExecutedBy = null;
pluginExecutedBy = Common.RetrieveEntity(service
, SystemUser.EntityLogicalName
, new ColumnSet(new string[] {"fullname"})
, localContext.PluginExecutionContext.InitiatingUserId).ToEntity<SystemUser>();
Reason:
Your code unnecessarily creates an object before it requires it (as you instantiate the object with the new keyword before the RetrieveEntity in order to use it with my GetProperty method) which is bad programming practice. In my code, I have never used the new keyword, but merely casting it and casting does not create a new object. Now, I am no expert in C# or .NET, but I like to read and try out different things. So, I looked up the Microsoft.Xrm.Sdk.dll and found that ToEntity within Sdk, actually did create a new Entity using the keyword new.
If the Common.Retrieve returns null, your code has unnecessarily allocated memory which will cause performance issues whereas mine would not?
A managed language like C# "manages the memory" for me, does it not?
Question
Is my code badly written? If so, why? If it is better - why is it? (I believe it is a lot more cleaner and even if a field name changes as long as as the early bound class file is regenerated, I do not have to re-write any code)
I agree that cast does not create a new object, but does my code unnecessarily create objects?
Is there a better way (a completely different third way) to write the code?
Note: I suggested using the GetPropertyName because, he was hard-coding attribute names all over his code and so in a different project which did not use early bound entities I used structs for attribute names - something like below. I did this 3 weeks into my new job with CRM 2011 but later on discovered the magic of MemberExpression. He was writing a massive cs file for each of the entity that he was using in his plugin and I told him he did not have to do any of this as he could just use my GetPropertyName method in his plugin and get all the fields required and that prompted this code review comments. Normally he does not do a code review.
public class ClientName
{
public struct EntityNameA
{
public const string LogicalName = "new_EntityNameA";
public struct Attributes
{
public const string Name = "new_name";
public const string Status = "new_status";
}
}
}
PS: Or is the question / time spent analyzing just not worth it?
Early Bound, Late Bound, MemberExpression, bla bla bla :)
I can understand the "philosophy", but looking at your code a giant alarm popup in my head:
public static Entity RetrieveEntity(IOrganizationService xrmService, string entityName, ColumnSet columns, Guid entityId)
{
return (Entity)xrmService.Retrieve(entityName, entityId, columns);
}
the Retrieve throws an exception if the record is not found.
About the other things, the GetPropertyName is ok, but are always choices, for example I try to use always late bound in plugins, maybe in a project I prefer to use early bound, often there is more than one way to resolve a problem.
Happy crm coding!
Although GetPropertyName is a quite a clever solution I don't like it, and that's entirely to do with readability. To me its far easier to understand what is going on with: new ColumnSet(new string[] {"fullname"}).
But that's pretty much personal preference, but its important to remember that your not just writing code for yourself you are writing it for your team, they should be able to easily understand the work you have produced.
As a side a hardcoded string probably performs better at runtime. I usually hardcode all my values, if the entity model in CRM changes I will have to revisit to make changes in any case. There's no difference between early and late bound in that situation.
I don't understand the point of this function,
public static Entity RetrieveEntity(IOrganizationService xrmService, string entityName, ColumnSet columns, Guid entityId)
{
return (Entity)xrmService.Retrieve(entityName, entityId, columns);
}
It doesn't do anything (apart from cast something that is already of that type).
1.Your code unnecessarily creates an object before it requires it (as you instantiate the object with the new keyword before the
RetrieveEntity in order to use it with my GetProperty method) which is
bad programming practice. In my code, I have never used the new
keyword, but merely casting it and casting does not create a new
object.
I believe this refers to; SystemUser pluginExecutedBy = new SystemUser(); I can see his/her point here, in this case new SystemUser() doesn't do much, but if the object you were instantiating did something resource intensive (load files, open DB connections) you might be doing something 'wasteful'. In this case I would be surprised if changing SystemUser pluginExecutedBy = null; actually yielded any significant performance gain.
2.If the Common.Retrieve returns null, your code has unnecessarily allocated memory which will cause performance issues
I would be surprised if that caused a performance issue, and anyway as Guido points out that function wont return null in any case.
Overall there is little about this code I strongly feel needs changing - but things can be always be better and its worth discussing (e.g. the point of code review), although it can be hard not to you shouldn't be precious about your code.
Personally I would go with hardcoded attribute names and dump the Common.RetrieveEntity function as it doesn't do anything.
pluginExecutedBy = service.Retrieve(SystemUser.EntityLogicalName, localContext.PluginExecutionContext.InitiatingUserId, new ColumnSet(new String[] {"fullname"} ));

How to compare 2 xsd schema files for equivalent functionality

I would like to compare 2 XSD schemas A and B to determine that all instance documents valid to schema A would also be valid to schema B. I hope to use this to prove that even though schema A and B are "different" they are effectively the same. Examples of differences this would not trigger would be Schema A uses types and Schema B declares all of it's elements inline.
I have found lots of people talking about "smart" diff type tools but these would claim the two files are different because they have different text but the resulting structure is the same. I found some references to XSOM but I'm not sure if that will help or not.
Any thoughts on how to proceed?
Membrane SOA Model - Java API for WSDL and XML Schema
package sample.schema;
import java.util.List;
import com.predic8.schema.Schema;
import com.predic8.schema.SchemaParser;
import com.predic8.schema.diff.SchemaDiffGenerator;
import com.predic8.soamodel.Difference;
public class CompareSchema {
public static void main(String[] args) {
compare();
}
private static void compare(){
SchemaParser parser = new SchemaParser();
Schema schema1 = parser.parse("resources/diff/1/common.xsd");
Schema schema2 = parser.parse("resources/diff/2/common.xsd");
SchemaDiffGenerator diffGen = new SchemaDiffGenerator(schema1, schema2);
List<Difference> lst = diffGen.compare();
for (Difference diff : lst) {
dumpDiff(diff, "");
}
}
private static void dumpDiff(Difference diff, String level) {
System.out.println(level + diff.getDescription());
for (Difference localDiff : diff.getDiffs()){
dumpDiff(localDiff, level + " ");
}
}
}
After executing you get the output shown in listing 2. It is a List of
differences between the two Schema documents.
ComplexType PersonType has changed: Sequence has changed:
Element id has changed:
The type of element id has changed from xsd:string to tns:IdentifierType.
http://www.service-repository.com/ offers an online XML Schema Version Comparator tool that displays a report of the differences between two XSD that appears to be produced from the Membrane SOA Model.
My approach to this was to canonicalize the representation of the XML Schema.
Unfortunately, I can also tell you that, unlike canonicalization of XML documents (used, as an example, to calculate a digital signature), it is not that simple or even standardized.
So basically, you have to transform both XML Schemas to a "canonical form" - whatever the tool you build or use thinks that form is, and then do the compare.
My approach was to create an XML Schema set (could be more than one file if you have more namespaces) for each root element I needed, since I found it easier to compare XSDs authored using the Russian Doll style, starting from the PSVI model.
I then used options such as auto matching substitution group members coupled with replacement of substitution groups with a choice; removal of "superfluous" XML Schema sequences, collapsing of single option choices or moving minOccurs/maxOccurs around for single item compositors, etc.
Depending on what your XSD-aware comparison tool's features are, or you settle to build, you might also have to rearrange particles under compositors such as xsd:choice or xsd:all; etc.
Anyway, what I learned after all of it was that it is extremely hard to build a tool that would work nice for all "cool" XSD features out there... One test case I remember fondly was to deal with various xsd:any content.
I do wonder though if things have changed since...

groovy domain objects in Db4O database

I'm using db4o with groovy (actually griffon). I'm saving dozen of objects into db4o objectSet and see that .yarv file size is about 11Mb. I've checked its content and found that it stores metaClass with all nested fields into every object. It's a waste of space.
Looking for the way to avoid storing of metaClass and therefore reduce the size of result .yarv file, since I'm going to use db4o to store millions of entities.
Should I try callConstructors(true) db4o configuration? Think it would help?
Any help would be highly appreciated.
As an alternative you can just store 'Groovy'-beans instances. Those are compiled down to regular Java-ish classes with no special Groovy specific code attached to them.
Just like this:
class Customer {
// properties
Integer id
String name
Address address
}
class Address{
String street;
}
def customer = new Customer(id:1, name:"Gromit", address:new Address(street:"Fun"))
I don't know groovy but based on your description every groovy object carries metadata and you want to skip storing these objects.
If that is the case installing a "null translator" (TNull class) will cause the "translated" objects to not be stored.
PS: Call Constructor configuration has no effect on what gets stored in the db; it only affects how objects are instantiated when reading from db.
Hope this helps

Code Contracts and Auto Generated Files

When I enabled code contracts on my WPF control project I ran into a problem with an auto generated file which was created at compile time (XamlNamespace.GeneratedInternalTypeHelper). Note, the generated file is called GeneratedInternalTypeHelper.g.cs and is not the same as the GeneratedInternalTypeHelper.g.i.cs which there are several obsolete blog posts about.
I'm not exactly sure what its purpose is, but I am assuming it is important for some internal reflection to resolve XAML. The problem is that it does not have code contracts, nor is the code contract system smart enough to recognize it as an auto generated file. This leads to a bunch of errors from the static checker.
I tried searching for a solution to this problem, but it seems like nobody is developing WPF controls and using code contracts. I did come across an interesting attribute, ContractVerificationAttribute, which takes a boolean value to set whether the assembly or class is to be verified. This allows you to decorate a class as not verified. Sadly the GeneratedInternalTypeHelper is regenerated with every compile, so it is not possible to exclude just this one class. The inverse scenario is possible though, decorate the assembly as not verified and then opt in for every class.
To mitigate the obvious hack I wanted to create a test that would at least verify that the exposed classes have code contract verification with a test like the following to ensure that own classes were at least being verified:
[Fact]
public void AllAssemblyTypesAreDecoratedWithContractVerificationTrue()
{
var assembly = typeof(someType).Assembly;
var exposedTypes = assembly.GetTypes().Where(t=>!string.IsNullOrWhiteSpace(t.Namespace) && t.Namespace.StartsWith("MyNamespace") && !t.Name.StartsWith("<>"));
var areAnyNotContractVerified = exposedTypes.Any(t =>
{
var verificationAttribute = t.GetCustomAttributes(typeof(ContractVerificationAttribute), true).OfType<ContractVerificationAttribute>();
return verificationAttribute.Any() && verificationAttribute.First().Value;
});
Assert.False(areAnyNotContractVerified);
}
As you can see it takes all classes in the controls assembly and finds the one from the company namespace which are not also auto generated anonymous types (<>WeirdClassName).
(I also need to exclude Resources and settings, but I hope you get the idea).
I'm not loving the solution since there are ways of avoiding contract verification, but currently it's the best I can come up with. If anyone has a better solution, please let me know.
So you can treat this class exactly like you would treat any other "3rd party" class or library. I'm sure certain assumptions would hold with the interaction with this generated class so at the interaction points, decorate your own code with Contract.Assume(result != null) or similar.
var result = new GennedClass().GetSomeValue();
Contract.Assume(result != null);
What this does is translate into an assertion that is checked at run time, but it allows the static analyzer to reason about the rest of the code that you do control.

Are string constants overrated?

It's easy to lose track of odd numbers like 0, 1, or 5. I used to be very strict about this when I wrote low-level C code. As I work more with all the string literals involved with XML and SQL, I find myself often breaking the rule of embedding constants in code, at least when it comes to string literals. (I'm still good about numeric constants.)
Strings aren't the same as numbers. It feels tedious and a little silly to create a compile-time constant that has the same name as its value (E.g. const string NameField = "Name";), and although the repetition of the same string literal in many locations seems risky, there's little chance of a typo thanks to copying and pasting, and when I refactor I'm usually doing a global search that involves changing more than just the name of the thing, like how it's treated functionally in relation to the things around it.
So, let's say you don't have a good XML serializer (or aren't in the mood to set one up). Which of these would you personally use (if you weren't trying to bow to peer pressure in some code review):
static void Main(string[] args)
{
// ...other code...
XmlNode node = ...;
Console.WriteLine(node["Name"].InnerText);
Console.WriteLine(node["Color"].InnerText);
Console.WriteLine(node["Taste"].InnerText);
// ...other code...
}
or:
class Fruit
{
private readonly XmlNode xml_node;
public Fruit(XmlNode xml_node)
{
this.xml_node = xml_node;
}
public string Name
{ get { return xml_node["Name"].InnerText; } }
public string Color
{ get { return xml_node["Color"].InnerText; } }
public string Taste
{ get { return xml_node["Taste"].InnerText; } }
}
static void Main(string[] args)
{
// ...other code...
XmlNode node = ...;
Fruit fruit_node = new Fruit(node);
Console.WriteLine(fruit_node.Name);
Console.WriteLine(fruit_node.Color);
Console.WriteLine(fruit_node.Taste);
// ...other code...
}
A defined constant is easier to refactor. If "Name" ends up being used three times and you change it to "FullName", changing the constant is one change instead of three.
For something like that it depends on how often the constant is used. If it's just in one place as per your example, then hard-coding is fine. If it's used in many different places, definitely use a constant. One typo could lead to hours of debugging if you're not careful, because your compiler isn't going to notice that you typed "Tsate" instead of "Taste", while it WILL notice that you typed fruit_node.Tsate instead of fruit_node.Taste.
Edit:
I see now that you mentioned copying and pasting, but if you're doing that you may also be losing the time you save by not creating a constant in the first place. With intellisense and auto-completion, you could have the constant out there in a few keystrokes, instead of going through the trouble of copy/paste.
As you probably guessed. The answer is: it depends on the context.
It depends on what the example code is part of. If it's just part of a small throw away system then hard coding the constants may be acceptable.
If it's part of a large, complex system and the constants will be used in mulitple files, I'd be more drawn to the second option.
As in many matters of programming, this is a matter of taste. The "laws" of proper programming were created from experience -- many people have been burned by global variables causing namespace or clarity problems, so Global Variables Are Evil. Many have used magic numbers, only to later discover that the number was wrong or needed changing. Text search is ill-suited to changing these values, so Constants In Code Are Evil.
But both are permitted, because sometimes they aren't evil. You need to make the decision yourself -- which leads to clearer code? Which is going to be better for maintainers? Does the reasoning behind the original rule apply to my situation? If I had to read or maintain this code later, how would I rather that it were written?
There is no absolute law of good coding style, because no two programmers' minds works exactly alike. The rule is to write the clearest, cleanest code that you can.
Personally, I'd load the fruit from the XML file in advance - something like:
public class Fruit
{
public Fruit(string name, Color color, string taste)
{
this.Name = name; this.Color = color; this.Taste = taste;
}
public string Name { get; private set; }
public Color Color { get; private set; }
public string Taste { get; private set; }
}
// ... In your data access handling class...
public static FruitFromXml(XmlNode node)
{
// create fruit from xml node with validation here
}
}
That way, the "fruit" isn't really tied to the storage.
I'd go with the constants. It is a little more work, but there is no performance impact. And even if you usually copy/paste the values, I've certainly had instances where I changed code when I typed and didn't realize that Visual Studio had focus. I'd much prefer these resulted in compile errors.
For the example given, where the Strings are used as keys to a map or dictionary, I would lean toward use of an enum (or other object) instead. You can often do much more with an enum than with a constant string. In addition, if some code is commented out, IDE's will often miss that when doing a refactor. Also, references to a String constant that are in comments may or may not be included in a refactor.
I will make a constant for a string when the string will be used in many locations, the string is long or complicated (such as a regex), or when a properly-named constant will make the code more obvious.
I prefer my typos, incomplete refactorings, and other bugs of this sort to fail to compile rather than to just fail to operate properly.
Like many other refactorings, it's an arguably optional additional step that leaves you with code that's less risky to maintain and is more easily grokked by the "next guy". If you're in a situation that rewards that kind of thing (most that I'm in do), go for it.
Yeah, pretty much.
I think developers in statically typed languages have an unhealthy fear of anything at all dynamic. Pretty much every line of code in a dynamically typed language is effectively a string literal, and they've been fine for years. For instance, in JavaScript technically this:
var x = myObject.prop1.prop2;
Is equivalent to this:
var x = window["myObject"]["prop1"]["prop2"]; // assuming global scope
But it is definitely not a standard practice in JavaScript to do this:
var OBJ_NAME = "myObject";
var PROP1_NAME = "prop1";
var PROP2_NAME = "prop2";
var x = window[OBJ_NAME][PROP1_NAME][PROP2_NAME];
That would just be ridiculous.
It still depends though, like if a string is used in numerous places and it's rather cumbersome/ugly to type ("name" vs. "my-custom-property-name-x"), then it's probably worth making a constant, even within a single class (at which point it's probably good to be internally consistent within the class and make all the other strings constants too).
Also, if you actually intend for other external users to interact with your library using these constants, then it's also a good idea to define publicly accessible constants and document that users should use those to interact with your library. However, a library which interacts via magic string constants is usually a bad practice and you should consider designing your library in such a way that you don't need to use magic constants to interact with it in the first place.
I think in the specific example you gave, where the strings are relatively simple to type and there are presumably no external users of your API who would expect to work with it using those string values (i.e. they're just for internal data manipulation), readable code is far more valuable than refactorable code, so I would just put the literals directly inline. Again, this is assuming I understand your exact use case specifically.
One thing nobody seemed to notice is that as soon as you define a constant, its scope becomes something to maintain and think about. This actually does have a cost, it's not free like everyone seems to think. Consider this:
Should it be private or public in my class? What if some other namespace/package has a need for the same value, should I now extract the constant to some global static class of constants? What if I now need it in other assemblies/modules, do I extract it further? All these things make the code less and less readable, harder to maintain, less pleasant to work with, and more complicated. All in the name of refactorability?
Usually, these "great refactorings" never occur, and when they do they require a complete rewrite anyway, with all new strings. And if you had been using some shared module before this great refactoring (as in the above paragraph) which didn't have these new strings which you now need, what then? Do you add them to the same shared module of constants (what if you don't have access to the code for this shared module)? Or do you keep them local to you, in which case there are now multiple scattered repositories of string constants, all at different levels, running the risk of duplicated constants all over the code? Once you get to this point (and believe me I've seen it), refactoring becomes moot, because while you'll get all your usages of your constants, you'll miss other people's usages of their constants, even though these constants have the same logical value as your constants and you're actually trying to change all of them.

Resources