Check if string is Alphanumeric

Check if string is Alphanumeric - string

Is there a standard function in D to check if a string is alphanumeric? If not what'd be the most efficient way to do it? I'm guessing there are better ways than looping through the string and checking if the character is in between a range?

I don't think there's a single pre-made function for it, but you could compose two phobos functions (which imo is just as good!):
import std.algorithm, std.ascii;
bool good = all!isAlphaNum(your_string);
I think that does unnecessary utf decoding, so it wouldn't be maximally efficient but that's likely irrelevant for this anyway since the strings are surely short. But if that matters to you perhaps using .representation (from std.string iirc) or foreach(char c; your_string) isAlphaNum(c); yourself would be a bit faster.

I think Adam D. Ruppe's solution may be a better one, but this can also be done using regular expressions. You can view an explanation of the regular expression here.
import std.regex;
import std.stdio;
void main()
{
// Compile-time regexes are preferred
// auto alnumRegex = regex(`^[A-Za-z][A-Za-z0-9]*$`);
// Backticks represent raw strings (convenient for regexes)
enum alnumRegex = ctRegex!(`^[A-Za-z][A-Za-z0-9]*$`);
auto testString = "abc123";
auto matchResult = match(testString, alnumRegex);
if(matchResult)
{
writefln("Match(es) found: %s", matchResult);
}
else
{
writeln("Match not found");
}
}
Of course, this only works for ASCII as well.

Related

Is there a better way to convert a stream of code points into a string in Kotlin?

I have a sequence of code points as Sequence<Int>.
I want to get this into a String.
What I currently do is this:
val string = codePoints
.map { codePoint -> String(intArrayOf(codePoint), 0, 1) }
.joinToString()
But it feels extremely hairy to create a string for each code point just to concatenate them immediately after. Is there a more direct way to do this?
So far the best I was able to do was something like this:
val string2 = codePoints.toList().toIntArray()
.let { codePoints -> String(codePoints, 0, codePoints.size) }
The amount of code isn't really any better, and it has a toList().toIntArray() which I'm not completely fond of. But it at least avoids the packaging of everything into dozens of one-code-point strings, and the logic is still written in the logical order.

You can either go for the simple:
val string = codePoints.joinToString("") { Character.toString(it) }
// or
val string = codePoints.joinToString("", transform = Character::toString)
Or use a string builder:
fun Sequence<Int>.codePointsToString(): String = buildString {
this#codePointsToString.forEach { cp ->
appendCodePoint(cp)
}
}
This second one expresses exactly what you want, and may benefit from future optimizations in the string builder.
it feels extremely hairy to create a string for each code point just to concatenate them immediately after
Did you really measure a performance issue with the extra string objects created here? Using toList() would also create a bunch of object arrays behind the scenes (one for each resize), which is a bit less, but not tremendously better. And as you pointed out toIntArray on top of that is yet another array creation.
Unless you know the number of elements in the sequence up front, I don't believe there is much you can do about that (the string builder approach will also likely use a resizable array behind the scenes, but at least you don't need extra array copies).

val result = codePoints.map { Character.toString(it) }.joinToString("")
Edit, based on Joffrey's comment below:
val result = codePoints.joinToString("") { Character.toString(it) }
Additional edit, full example:
val codePoints: Sequence<Int> = sequenceOf(
'a'.code,
Character.toCodePoint(0xD83D.toChar(), 0xDE03.toChar()),
Character.toCodePoint(0xD83D.toChar(), 0xDE04.toChar()),
Character.toCodePoint(0xD83D.toChar(), 0xDE05.toChar())
)
val result = codePoints.joinToString("") { Character.toString(it) }
println(result)
This will print: a😃😄😅

Short-circuiting in functional Groovy?

"When you've found the treasure, stop digging!"
I'm wanting to use more functional programming in Groovy, and thought rewriting the following method would be good training. It's harder than it looks because Groovy doesn't appear to build short-circuiting into its more functional features.
Here's an imperative function to do the job:
fullyQualifiedNames = ['a/b/c/d/e', 'f/g/h/i/j', 'f/g/h/d/e']
String shortestUniqueName(String nameToShorten) {
def currentLevel = 1
String shortName = ''
def separator = '/'
while (fullyQualifiedNames.findAll { fqName ->
shortName = nameToShorten.tokenize(separator)[-currentLevel..-1].join(separator)
fqName.endsWith(shortName)
}.size() > 1) {
++currentLevel
}
return shortName
}
println shortestUniqueName('a/b/c/d/e')
Result: c/d/e
It scans a list of fully-qualified filenames and returns the shortest unique form. There are potentially hundreds of fully-qualified names.
As soon as the method finds a short name with only one match, that short name is the right answer, and the iteration can stop. There's no need to scan the rest of the name or do any more expensive list searches.
But turning to a more functional flow in Groovy, neither return nor break can drop you out of the iteration:
return simply returns from the present iteration, not from the whole .each so it doesn't short-circuit.
break isn't allowed outside of a loop, and .each {} and .eachWithIndex {} are not considered loop constructs.
I can't use .find() instead of .findAll() because my program logic requires that I scan all elements of the list, nut just stop at the first.
There are plenty of reasons not to use try..catch blocks, but the best I've read is from here:
Exceptions are basically non-local goto statements with all the
consequences of the latter. Using exceptions for flow control
violates the principle of least astonishment, make programs hard to read
(remember that programs are written for programmers first).
Some of the usual ways around this problem are detailed here including a solution based on a new flavour of .each. This is the closest to a solution I've found so far, but I need to use .eachWithIndex() for my use case (in progress.)
Here's my own poor attempt at a short-circuiting functional solution:
fullyQualifiedNames = ['a/b/c/d/e', 'f/g/h/i/j', 'f/g/h/d/e']
def shortestUniqueName(String nameToShorten) {
def found = ''
def final separator = '/'
def nameComponents = nameToShorten.tokenize(separator).reverse()
nameComponents.eachWithIndex { String _, int i ->
if (!found) {
def candidate = nameComponents[0..i].reverse().join(separator)
def matches = fullyQualifiedNames.findAll { String fqName ->
fqName.endsWith candidate
}
if (matches.size() == 1) {
found = candidate
}
}
}
return found
}
println shortestUniqueName('a/b/c/d/e')
Result: c/d/e
Please shoot me down if there is a more idiomatic way to short-circuit in Groovy that I haven't thought of. Thank you!

There's probably a cleaner looking (and easier to read) solution, but you can do this sort of thing:
String shortestUniqueName(String nameToShorten) {
// Split the name to shorten, and make a list of all sequential combinations of elements
nameToShorten.split('/').reverse().inject([]) { agg, l ->
if(agg) agg + [agg[-1] + l] else agg << [l]
}
// Starting with the smallest element
.find { elements ->
fullyQualifiedNames.findAll { name ->
name.endsWith(elements.reverse().join('/'))
}.size() == 1
}
?.reverse()
?.join('/')
?: ''
}

Simplest nested block parser

I want to write a simple parser for a nested block syntax, just hierarchical plain-text. For example:
Some regular text.
This is outputted as-is, foo{but THIS
is inside a foo block}.
bar{
Blocks can be multi-line
and baz{nested}
}
What's the simplest way to do this? I've already written 2 working implementations, but they are overly complex. I tried full-text regex matching, and streaming char-by-char analysis.
I have to teach the workings of it to people, so simplicity is paramount. I don't want to introduce a dependency on Lex/Yacc Flex/Bison (or PEGjs/Jison, actually, this is javascript).

The good choices probably boil down as follows:
Given your constaints, it's going to be recursive-descent. That's a fine way to go even without constraints.
you can either parse char-by-char (traditional) or write a lexical layer that uses the local string library to scan for { and }. Either way, you might want to return three terminal symbols plus EOF: BLOCK_OF_TEXT, LEFT_BRACE, and RIGHT_BRACE.

char c;
boolean ParseNestedBlocks(InputStream i)
{ if ParseStreamContent(i)
then { if c=="}" then return false
else return true
}
else return false;
boolean ParseSteamContent(InputStream i)
{ loop:
c = GetCharacter(i);
if c =="}" then return true;
if c== EOF then return true;
if c=="{"
{ if ParseStreamContent(i)
{ if c!="}" return false; }
else return false;
}
goto loop
}

Recently, I've been using parser combinators for some projects in pure Javascript. I pulled out the code into a separate project; you can find it here. This approach is similar to the recursive descent parsers that #DigitalRoss suggested, but with a more clear split between code that's specific to your parser and general parser-bookkeeping code.
A parser for your needs (if I understood your requirements correctly) would look something like this:
var open = literal("{"), // matches only '{'
close = literal("}"), // matches only '}'
normalChar = not1(alt(open, close)); // matches any char but '{' and '}'
var form = new Parser(function() {}); // forward declaration for mutual recursion
var block = node('block',
['open', open ],
['body', many0(form)],
['close', close ]);
form.parse = alt(normalChar, block).parse; // set 'form' to its actual value
var parser = many0(form);
and you'd use it like this:
// assuming 'parser' is the parser
var parseResult = parser.parse("abc{def{ghi{}oop}javascript}is great");
The parse result is a syntax tree.
In addition to backtracking, the library also helps you produce nice error messages and threads user state between parser calls. The latter two I've found very useful for generating brace error messages, reporting both the problem and the location of the offending brace tokens when: 1) there's an open brace but no close; 2) there's mismatched brace types -- i.e. (...] or {...); 3) a close brace without a matching open.

Why does Processing think I'm passing an int into the color() function at the end of this code?

Preface: I'm working with Processing and I've never used Java.
I have this Processing function, designed to find and return the most common color among the pixels of the current image that I'm working on. the last line complains that "The method color(int) in the type PApplet is not applicable for the arguments (String)." What's up?
color getModeColor() {
HashMap colors = new HashMap();
loadPixels();
for (int i=0; i < pixels.length; i++) {
if (colors.containsKey(hex(pixels[i]))) {
colors.put(hex(pixels[i]), (Integer)colors.get(hex(pixels[i])) + 1);
} else {
colors.put(hex(pixels[i]),1);
}
}
String highColor;
int highColorCount = 0;
Iterator i = colors.entrySet().iterator();
while (i.hasNext()) {
Map.Entry me = (Map.Entry)i.next();
if ((Integer)me.getValue() > highColorCount) {
highColorCount = (Integer)me.getValue();
highColor = (String)me.getKey();
}
}
return color((highColor);
}
The Processing docs that I'm looking at are pretty sparse on the HashMap so I'm not really sure what's going on inside it, but I've been augmenting what's available there with Java docs they point to. But I'm not really grokking what's happening with the types. It looks like the key in the HashMap needs to be a string and the value needs to be an integer, but they come out as objects that I have to cast before using. So I'm not sure whether that's causing this glitch.
Or maybe there's just a problem with color() but the docs say that it'll take a hex value which is what I was trying to use as the key in the HashMap (where I'd rather just use the color itself).
Now that I've talked through this, I'm thinking that the color() function sees the hex value as an int but the hex() function converts a color to a string. And I don't seem to be able to convert that string to an int. I guess I could parse the substrings and reconstruct the color, but there must be some more elegant way to do this that I'm missing. Should I just create a key-value-pair class that'll hold a color and a count and use an arraylist of those?
Thanks in advance for any help or suggestions you can provide!

I'll dig deeper into this, but an initial thought is to employ Java generics so that the compiler will complain about type issues (and you won't get runtime errors):
HashMap<String,Integer> colors = new HashMap<String,Integer>();
So the compiler will know that keys are Strings and elements are Integers. Thus, no casting will be necessary.

I didn't figure it out, but I did work around it. I'm just making my own string from the color components like:
colors.put(red(pixels[i]) + "," + green(pixels[i]) + "," + blue(pixels[i]),1)
and then letting the function drop a color out like this:
String[] colorConstituents = split(highColor, ",");
return color(int(colorConstituents[0]), int(colorConstituents[1]), int(colorConstituents[2]));
This doesn't really seem like the best way to handle it -- if I'm messing with this long-term I guess I'll change it to use an arraylist of objects that hold the color and count, but this works for now.

Best way to build object from delimited string (hopefully not looped case)

this question feels like it would have been asked already, but I've not found anything so here goes...
I have constructor which is handed a string which is delimited. From that string I need to populate an object's instance variables. I can easily split the string by the delimited to give me an array of strings. I know I can simply iterate through the array and set my instance variables using ifs or a switch/case statement based on the current array index - however that just feels a bit nasty. Pseudo code:
String[] tokens = <from generic string tokenizer>;
for (int i = 0;i < tokens.length;i++) {
switch(i) {
case(0): instanceVariableA = tokens[i];
case(1): instanceVarliableB = tokens[i];
...
}
}
Does anyone have any ideas of how I do this better/nicer?
For what it's worth, I'm working in Java, but I guess this is language independant.

Uhm... "nasty" is in the way the constructor handles the parameters. If you can't change that then your code snippet is as good as it may be.
You could get rid of the for loop, though...
instanceVariableA = tokens[0];
instanceVariableB = tokens[1];
and then introduce constants (for readibilty):
instanceVariableA = tokens[VARIABLE_A_INDEX];
instanceVariableB = tokens[VARIABLE_B_INDEX];
NOTE: if you could change the string parameter syntax you could introduce a simple parser and, with a little bit of reflection, handle this thing in a slightly more elegant way:
String inputString = "instanceVariableA=some_stuff|instanceVariableB=some other stuff";
String[] tokens = inputString.split("|");
for (String token : tokens)
{
String[] elements = token.split("=");
String propertyName = tokens[0];
String propertyValue = tokens[1];
invokeSetter(this, propertyName, propertyValue); // TODO write method
}

Could you not use a "for-each" loop to eliminate much of the clutter?

I really think the way you are doing it is fine, and Manrico makes a good suggestion about using constants as well.
Another method would be to create a HashMap with integer keys and string values where the key is the index and the value is the name of the property. You could then use a simple loop and some reflection to set the properties. The reflection part might make this a bit slow, but in another language (say, PHP for example) this would be much cleaner.

just an untested idea,
keep the original token...
String[] tokens = <from generic string tokenizer>;
then create
int instanceVariableA = 0;
int instanceVariableB = 1;
if you need to use it, then just
tokens[instanceVariableA];
hence no more loops, no more VARIABLE_A_INDEX...
maybe JSON might help?

Python-specific solution:
Let's say params = ["instanceVariableA", "instanceVariableB"]. Then:
self.__dict__.update(dict(zip(params, tokens)))
should work; that's roughly equivalent to
for k,v in zip(params, tokens):
setAttr(self, k, v)
depending on the presence/absence of accessors.
In a non-dynamic language, you could accomplish the same effect building a mapping from strings to references/accessors of some kind.
(Also beware that zip stops when either list runs out.)

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Check if string is Alphanumeric - string

Is there a standard function in D to check if a string is alphanumeric? If not what'd be the most efficient way to do it? I'm guessing there are better ways than looping through the string and checking if the character is in between a range?

Related

Is there a better way to convert a stream of code points into a string in Kotlin?

Short-circuiting in functional Groovy?

Simplest nested block parser

Why does Processing think I'm passing an int into the color() function at the end of this code?

Best way to build object from delimited string (hopefully not looped case)

Categories

Resources