As for property names 'isPackageOmited' and 'packageIsOmitted', which should I choose?
Could some native speaker help me?
TLDR: don't put "is" into the middle of the name. It's hard to see quickly. Use isPackageOmitted.
The standard is to always prefix accessors, mutators and predicates with get, set and is.
So you would have methods like getPackage(), setPackage(), and isPackageOmitted().
Even though PackageIsOmitted reads closer to normal English, it is a really good idea to follow the convention of prefixing. It makes it incredibly easy to instantly know that this is a method that retrieves the boolean value.
Compare anIncrediblyLongAnacondaIsAbleToEatSheep with isAnIncrediblyLongAnacondaAbleToEatSheep. The second one instantly tells you that this is a boolean value, while in the first example you have to carefully look through the whole name to figure it out.
Now, if this is just the property, it would probably be best to drop the "is" altogether. Does it really provide any new information? I'd say, generally, it would be best to have a property called packageOmitted and a method isPackageOmitted() to retrieve the property value.
packageIsOmitted is better since "Package is omitted." is an assertion which is either True xor False, whereas "Is Package omitted?" is a question whose answer is either Yes xor No.
Related
Apparently if you add any parse action and return a result in that action, the result will always be encapsulated into a list 'deepening' the output tree.
I suppose this is for returning multiple values but it makes casual use of the library far harder because you then have to remember which parts of the tree you replaced and call result.normalstruct.replaced[0] (or even worse result.normalstruct['replaced'][0])
This is a bit strange and makes refactoring harder, so i'd like a way to avoid it. Any tips?
The problem here is that the argument token of setParseAction is already a list. Instead of operating on str(token_argument) i should operate on str(token_argument[0]) and return that. No more deepening.
edit: though apparently it doesn't happen always. Happened to me with a word action but when i tried to 'unwrap' element zero of a 'And' expression result from a setParseAction functor it gave me the first subexpression.
Man, i'd like consistency here.
I came across an instance where a solution to a particular problem was to use a variable whose value when zero or above meant the system would use that value in a calculation but when less than zero would indicate that the value should not be used at all.
My initial thought was that I didn't like the multipurpose use of the value of the variable: a.) as a range to be using in a formula; b.) as a form of control logic.
What is this kind of misuse of a variable called? Meta-'something' or is there a classic antipattern that this fits?
Sort of feels like when a database field is set to null to represent not using a value and if it's not null then use the value in that field.
Update:
An example would be that if a variable's value is > 0 I would use the value if it's <= 0 then I would not use the value and decided to perform some other logic.
Values such as these are often called "distinguished values". By far the most common distinguished value is null for reference types. A close second is the use of distinguished values to indicate unusual conditions (e.g. error return codes or search failures).
The problem with distinguished values is that all client code must be aware of the existence of such values and their associated semantics. In practical terms, this usually means that some kind of conditional logic must be wrapped around each call site that obtains such a value. It is far too easy to forget to add that logic, obtaining incorrect results. It also promotes copy-and-paste code as the boilerplate code required to deal with the distinguished values is often very similar throughout the application but difficult to encapsulate.
Common alternatives to the use of distinguished values are exceptions, or distinctly typed values that cannot be accidentally confused with one another (e.g. Maybe or Option types).
Having said all that, distinguished values may still play a valuable role in environments with extremely tight memory availability or other stringent performance constraints.
I don't think what your describing is a pure magic number, but it's kind of close. It's similar to the situation in pre-.NET 2.0 where you'd use Int32.MinValue to indicate a null value. .NET 2.0 introduced Nullable and kind of alleviated this issue.
So you're describing the use of a variable who's value really means something other than it's value -- -1 means essentially the same as the use of Int32.MinValue as I described above.
I'd call it a magic number.
Hope this helps.
Using different ranges of the possible values of a variable to invoke different functionality was very common when RAM and disk space for data and program code were scarce. Nowadays, you would use a function or an additional, accompanying value (boolean, or enumeration) to determine the action to take.
Current OS's suggest 1GiB of RAM to operate correctly, when 256KiB was high very few years ago. Cheap disk space has gone from hundreds of MiB to multiples of TiB in a matter of months. Not too long ago I wrote programs for 640KiB of RAM and 10MiB of disk, and you would probably hate them.
I think it would be good to cope with code like that if it's just a few years old (refactor it!), and denounce it as bad practice if it's recent.
I know that NULL isn't necessary in a programming language, and I recently made the decision not to include NULL in my programming language. Declaration is done by initialization, so it is impossible to have an uninitialized variable. My hope is that this will eliminate the NullPointerException in favor of more meaningful exceptions or simply not having certain kinds of bugs.
Of course, since the language is implemented in C, there will be NULLs used under the covers.
My question is, besides using NULL as an error flag (this is handled with exceptions) or as an endpoint for data structures such as linked lists and binary trees (this is handled with discriminated unions) are there any other use-cases for NULL for which I should have a solution? Are there any really important implications of not having NULL which could cause me problems?
There's a recent article referenced on LtU by Tony Hoare titled Null References: The Billion Dollar Mistake which describes a method to allow the presence of NULLs in a programming language, but also eliminates the risk of referencing such a NULL reference. It seems so simple yet it's such a powerful idea.
Update: here's a link to the actual paper that I read, which talks about the implementation in Eiffel: http://docs.eiffel.com/book/papers/void-safety-how-eiffel-removes-null-pointer-dereferencing
Borrowing a page from Haskell's Maybe monad, how will you handle the case of a return value that may or may not exist? For instance, if you tried to allocate memory but none was available. Or maybe you've created an array to hold 50 foos, but none of the foos have been instantiated yet -- you need some way to be able to check for these kinds of things.
I guess you can use exceptions to cover all these cases, but does that mean that a programmer will have to wrap all of those in a try-catch block? That would be annoying at best. Or everything would have to return its own value plus a boolean indicating whether the value was valid, which is certainly not better.
FWIW, I'm not aware of any program that doesn't have some sort of notion of NULL -- you've got null in all the C-style languages and Java; Python has None, Scheme, Lisp, Smalltalk, Lua, Ruby all have nil; VB uses Nothing; and Haskell has a different kind of nothing.
That doesn't mean a language absolutely has to have some kind of null, but if all of the other big languages out there use it, surely there was some sound reasoning behind it.
On the other hand, if you're only making a lightweight DSL or some other non-general language, you could probably get by without null if none of your native data types require it.
The one that immediately comes to mind is pass-by-reference parameters. I'm primarily an Objective-C coder, so I'm used to seeing things kind of like this:
NSError *error;
[anObject doSomething:anArgumentObject error:&error];
// Error-handling code follows...
After this code executes, the error object has details about the error that was encountered, if any. But say I don't care if an error happens:
[anObject doSomething:anArgumentObject error:nil];
Since I don't pass in any actual value for the error object, I get no results back, and I don't really worry about parsing an error (since I don't care in the first place if it occurs).
You've already mentioned you're handling errors a different way, so this specific example doesn't really apply, but the point stands: what do you do when you pass something back by reference? Or does your language just not do that?
I think it's usefull for a method to return NULL - for example for a search method supposed to return some object, it can return the found object, or NULL if it wasn't found.
I'm starting to learn Ruby and Ruby has a very interesting concept for NULL, maybe you could consider implementing something silimar. In Ruby, NULL is called Nil, and it's an actual object just like any other object. It happens to be implemented as a global Singleton object. Also in Ruby, there is an object False, and both Nil and False evaluate to false in boolean expressions, while everything else evaluates to true (even 0, for example, evaluates to true).
In my mind there are two uses cases for which NULL is generally used:
The variable in question doesn't have a value (Nothing)
We don't know the value of the variable in question (Unknown)
Both of common occurrences and, honestly, using NULL for both can cause confusion.
Worth noting is that some languages that don't support NULL do support the nothing of Nothing/Unknown. Haskell, for instance, supports "Maybe ", which can contain either a value of or Nothing. Thus, commands can return (and accept) a type that they know will always have a value, or they can return/accept "Maybe " to indicate that there may not be a value.
I prefer the concept of having non-nullable pointers be the default, with nullable pointers a possibility. You can almost do this with c++ through references (&) rather than pointers, but it can get quite gnarly and irksome in some cases.
A language can do without null in the Java/C sense, for instance Haskell (and most other functional languages) have a "Maybe" type which is effectively a construct that just provides the concept of an optional null pointer.
It's not clear to me why you would want to eliminate the concept of 'null' from a language. What would you do if your app requires you to do some initialization 'lazily' - that is, you don't perform the operation until the data is needed? Ex:
public class ImLazy {
public ImLazy() {
//I can't initialize resources in my constructor, because I'm lazy.
//Maybe I don't have a network connection available yet, or maybe I'm
//just not motivated enough.
}
private ResourceObject lazyObject;
public ResourceObject getLazyObject() { //initialize then return
if (lazyObject == null) {
lazyObject = new DatabaseNetworkResourceThatTakesForeverToLoad();
}
}
public ResourceObject isObjectLoaded() { //just return the object
return (lazyObject != null);
}
}
In a case like this, how could we return a value for getObject()? We could come up with one of two things:
-require the user to initialize LazyObject in the declaration. The user would then have to fill in some dummy object (UselessResourceObject), which requires them to write all of the same error-checking code (if (lazyObject.equals(UselessResourceObject)...) or:
-come up with some other value, which works the same as null, but has a different name
For any complex/OO language you need this functionality, or something like it, as far as I can see. It may be valuable to have a non-null reference type (for example, in a method signature, so that you don't have to do a null check in the method code), but the null functionality should be available for cases where you do use it.
Interesting discussion happening here.
If I was building a language, I really don't know if I would have the concept of null. I guess it depends on how I want the language to look. Case in point: I wrote a simple templating language whose main strength is nested tokens and ease of making a token a list of values. It doesn't have the concept of null, but then it doesn't really have the concept of any types other than string.
By comparison, the langauge it is built-in, Icon, uses null extensively. Probably the best thing the language designers for Icon did with null is make it synonymous with an uninitialized variable (i.e. you can't tell the difference between a variable that doesn't exist and one that currently holds the value null). And then created two prefix operators to check null and not-null.
In PHP, I sometimes use null as a 'third' boolean value. This is good in "black-box" type classes (e.g. ORM core) where a state can be True, False or I Don't Know. Null is used for the third value.
Of course, both of these languages do not have pointers in the same way C does, so null pointers do not exist.
We use nulls all the time in our application to represent the "nothing" case. For example, if you are asked to look up some data in the database given an id, and no record matches that id: return null. This is very handy because we can store nulls in our cache, which means we don't have to go back to the database if someone asks for that id again in a few seconds.
The cache itself has two different kinds of responses: null, meaning there was no such entry in the cache, or an entry object. The entry object might have a null value, which is the case when we cached a null db lookup.
Our app is written in Java, but even with unchecked exceptions doing this with exceptions would be incredibly annoying.
If one accepts the propositions that powerful languages should have some sort of pointer or reference type (i.e. something which can hold a reference to data which does not exist at compile time), and some form of array type (or other means of having a collection of storage slots which are addressable sequentially via integer index), and that slots of the latter should be able to hold the former, and one accepts the possibility that one may have to read some slots of an array of pointers/references before sensible values exist for all of them, then there will be programs which, from a compiler's perspective, will read an array slot before a sensible value has been written to it (trying to ascertain in the general case whether an array slot could be read before it is written would be equivalent to the Halting Problem).
While it would be possible for a language to require that all array slots be initialized with some non-null reference before any of them could be read, in many situations there isn't really anything that could be stored which would be better than null: if an attempt is made to read an as-yet-unwritten array slot and dereference the (non)item contained there, that represents an error, and it would be better to have the system trap that condition than to access some arbitrary object whose sole purpose for existence is to give the array slots some non-null thing they can reference.
Suppose I have an object representing a person, with getter and setter methods for the person's email address. The setter method definition might look something like this:
setEmailAddress(String emailAddress)
{
this.emailAddress = emailAddress;
}
Calling person.setEmailAddress(0), then, would generate a type error, but calling person.setEmailAddress("asdf") would not - even though "asdf" is in no way a valid email address.
In my experience, so-called strings are almost never arbitrary sequences of characters, with no restriction on length or format. URIs come to mind - as do street addresses, as do phone numbers, as do first names ... you get the idea. Yet these data types are most often stored as "just strings".
Returning to my person object, suppose I modify setEmailAddress() like so
setEmailAddress(EmailAddress emailAddress)
// ...
where EmailAddress is a class ... whose constructor takes a string representation of an email address. Have I gained anything?
OK, so an email address is kind of a bad example. What about a URI class that takes a string representation of a URI as a constructor parameter, and provides methods for managing that URI - setting the path, fetching a query parameter, etc. The validity of the source string becomes important.
So I ask all of you, how do you deal with strings that have structure? And how do you make your structural expectations clear in your interfaces?
Thank you.
"Strings with structure" are a symptom of the common code smell "Primitive Obsession".
The remedy is to watch closely for duplication in code that validates or manipulates parts of these structures. At the first hint of duplication - but not before - extract a class that encapsulates the structure and locate validations and queries there.
Welcome to the world of programming!
I don't think your question is a symptom of an error on your part. Rather it is a basic problem which appears in many guises throughout the programming world. Strings that have some structure and meaning are passed around between different subsystems of an application and each subsystem can only do much parsing and validation.
The problem of verifying an email address, for example, is quite tricky. The regular expressions various people offer accepting an email address, for example, are generally either "too tight" (don't accept everything) or "too loose" (accept illegal things). The first google hit for 'regex "email address"', for example says:
The regular expression I receive the
most feedback, not to mention "bug"
reports on, is the one you'll find
right on this site's home page:
\b[A-Z0-9._%+-]+#[A-Z0-9.-]+.[A-Z]{2,4}\b Analyze this regular expression with
RegexBuddy. This regular expression, I
claim, matches any email address. Most
of the feedback I get refutes that
claim by showing one email address
that this regex doesn't match.
The fact is the what is or isn't a valid email address is a complex problem, one that a given program might or might not want to solve. The problem of URLs is even worse, especially given the possibility of malicious URLS.
Ideally, you can have a library or system-call which solves problems of this sort instead of doing anything yourself (Microsoft windows calls a custom dialogue box to allow the user to select or create a file, since validating file names is another tricky problem). But you can't always count on having an appropriate system call for a given "meaningful string" either.
I would say that there no a generic solution to the problem of strings-with-structure. Rather, it is a basic problem that appears right when you design your application. In the process of gathering requirements for your application, you should determine what data the application will take in and how meaningful that data will be to the application. And this is where things get tricky, since you may notice the possibility that the app may grow in ways that your boss or customer might not have thought of - or the app may in fact grow in ways that none of you thought of. Thus the application needs to be a little more flexible than what seems like the minimum BUT only a little. It should also not be so flexible you get bogged down.
Now, if you decide that you need to validate/interpret etc a given string, putting that string into an object or a hash can be a good approach - this is one way I know to make sure your interface is clear. But the tricky thing is deciding just how much validation or interpretation you need.
Making these decisions is thus an art - there are no dogmatic answers that work here.
This is a pretty common problem falling under the title 'validation' - there are many ways to validate textual user input, one of the most common being Regular Expressions.
You might also consider using the built-in System.Net.MailAddress class for this, as it provides validation for email addresses.
Strings are strings. If you need your strings to be smarter than average strings then parsing them into a structural object like you describe would be a good idea. I would use a regex to do that.
Regular expressions are your friend when it comes to formatting strings. you could also store each part separately in a struct to avoid going through the trouble of using regular expressions every time you want to use them. e.g.
struct EMail
{
String BeforeAt = "johndoe123";
String AfterAt = "gmail.com";
}
Struct URL
{
String Protocol = "http";
String Domain = "sub.example.com";
String Path = "stuff/example.html";
}
Well, if you want to do several different kinds of things with an EmailAddress object, those other actions do not have to check if it is a valid email address since the EmailAddress object is guaranteed to have a valid string. You could throw an exception in the constructor or use a factory method or whatever "One True Methodology" approach you're using.
Personally, I like the idea of strong typing, so if I were still working in such languages I'd go with the style of your second example. The only thing I'd change might be to use a more "cast-like" structure, like EmailAddressFromString(String), that generated a new EmailAddress object (or pitched a fit if the string wasn't right), as I'm a bit of a fan of application Hungarian notation.
This whole problem, incidentally, is covered pretty well by Joel in http://www.joelonsoftware.com/articles/Wrong.html if you're interested.
I agree with the calls to strongly type the object, but for those cases where you're parsing from a string to an object, the answer is simple: error handling.
There are two general ways to handle errors: exceptions and return conditions. Generally if you expect to receive badly formed data, then you should return an error message. For cases where the input is not expected, then I would throw an exception. For example, you might pass in an ill formed email address, such as 'bob' instead of 'bob#gmail.com'. However, for null values, you might throw an exception, as you shouldn't try to form an email out of null.
Returning to your question, I do think you gain something by encoding a structure into an object. Specifically, you only need to validate that the string represents a valid email address in one specific place, such as the constructor. Elsewhere, your code is free to assume that an EmailAddress object is valid, and you don't have to rely upon dodgy classes with names like 'EmailHelper' or some such.
I personally do not think strong-typing the email address string as EmailAddress is necessary, in this case.
To create your email address you will, sooner or later, have to do something like:
EmailAddress(String email)
or a setter
SetEmailAddress(String email)
In both cases, you'll have to validate the email string input, which puts you back into your initial validation problem.
I would, as others pointed out, use regular expressions.
Having an EmailAddress class would be useful if you plan on having to perform specific operations on your stored information later on (say get domain name only, stuff like that).
I suppose there could be historical reasons for this naming and that other languages have similar feature, but it also seems to me that parameters always had a name in C#. Arguments are the unnamed ones. Or is there a particular reason why this terminology was chosen?
Oh, you wanted arguments! Sorry, this is parameters - arguments are two doors down the hall on the left.
Yes, you're absolutely right (to my mind, anyway). Ironically, although I'm usually picky about these terms, I still use "parameter passing" when I should probably talk about "argument passing". I suppose one could argue that prior to C# 4.0, if you're calling a method you don't care about the parameter names, whereas the names become part of the significant metadata when you can specify them on the arguments as well.
I agree that it makes a difference, and that terminology is important.
"Optional parameters" is definitely okay though - that's adding metadata to the parameter when you couldn't do so before :) (Having said that, it's not going to be optional in terms of the generated IL...)
Would you like me to ask the team for their feedback?
I don't think so. The names are quite definitely the names of parameters, as they are defined and given a specific meaning in the method definition, where they are properly called the parameters to the method. At the call site, arguments can now be tagged with the name of the parameter that they supply a value for.
The new term refers to the perspective of the method caller - which is logical because that's where the feature applies. Previously, callers only had to think of parameters as being "positioned parameters". Now they can optionally treat them as "named parameters" - hence the name.
I dont know if its worth adding it now, but MS calls it named arguments anyway. See named and optional arguments