Can ArchUnit check for certain string patterns in method calls? - archunit

In our code we again and again have the issue that somebody forgot to adapt the usage of placeholders when switching between the use of the logger and String.format(...) methods.
For log statements one has to use '{}' as placeholders, like so:
logger.info("File {} successfully opened: {} bytes read, {} objects created", file, nrBytes, nrObjects);
But when using String.format(...) to compose a message one has to use '%s' as placeholders for strings and the statement has to read:
logger.info(String.format("File %s successfully opened: %s bytes read, %s objects created", file, nrBytes, nrObjects));
The second form is often used, when logging an error where the second argument is the Throwable that one wants to log.
Too often people forget about this details and then we end up with wrong log statements that output nothing reasonable.
I know and agree that this is absolutely not an architecture issue but rather a simple programming error, but it would be great if one could (ab-)use ArchUnit to check for the use of '%s' (or the absence of '{}') in the first String argument of the String.format()-method. Is something like that possible?

The ArchUnit, currently in version 0.16.0, does not analyze parameter values for method calls.
The sonar rule "Printf-style format strings should be used correctly" might however catch these bugs.

As already noted ArchUnit can't do this - PMD's [invalidlogmessageformat][1] rule is useful though (and I find PMD easier to deal with than sonar).

Related

Perl critic policy violation in checking index of substring in a string

for my $item (#array) {
if (index($item, '$n') != -1) {
print "HELLO\n";
}
}
Problem is: Perl critic gives below policy violation.
String may require interpolation at line 168, near '$item, '$n''. (Severity: 1)
Please advise how do I fix this?
In this case the analyzer either found a bug or is plain wrong in flagging your code.
Are you looking for a literal "$n" in $item, or for what $n variable evaluates to?
If you want to find the literal $n characters then there is nothing wrong with your code
If you expect $item to contain the value stored in $n variable then allow it to be evaluated,
if (index($item, $n) != -1)
If this is indeed the case but $n may also contain yet other escaped sequences or encodings which you need as literal characters (so to suppress their evaluation) then you may need to do a bit more, depending of what exactly may be in that variable.
In case you do need to find characters $ followed by n (what would explain a deliberate act of putting single quotes around a variable) you need to handle the warning.
For the particular policy that is violated see Perl::Critic::Policy::ValuesAndExpressions
This policy warns you if you use single-quotes or q// with a string that has unescaped metacharacters that may need interpolation.
To satisfy the policy you'd need to use double quotes and escape the $, for example qq(\$n). In my opinion this would change the fine original code segment into something strange to look at.
If you end up wanting to simply silence the warning see documentation, in Bending The Rules
A comment. The tool perlcritic is useful but you have to use it right. It's a static code analyzer and it doesn't know what your program is doing, so to say; it can catch bad practices but can't tell you how to write programs. Many of its "policies" are unsuitable for particular code.
The book that it is based on says all this very nicely in its introduction. Use sensibly.
When I look at the question where this comes from it appears that you are looking for index at which substrings were matched, so you need the content of $n variable, not literal "$n". Then perlcritic identified a bug in the code, good return for using it!

Having command line arguments to flex as search strings

I use flex, the linux/unix not the Adobe type, to generate small scanners. In the past I have always used static search strings. I now want to provide a command line provided search string by providing a string via getopt and then being able to use it for searching with.
The old way of searching was:
.*"_"\n ECHO;
To find lines that ended with an underscore.
Now I want to search this way:
.*<arbitrary string>.*\n ECHO;
I don't know how to get flex to accept the <arbitrary string>. I can get it via getopt, but I haven't been able to get flex to accept my syntax.
What I am doing is a special purpose very limited grep for a special problem I am having.
Any help would be appreciated.
.*\n { if(strstr(yytext, "arbitrary string")) ECHO; else REJECT; }
The REJECT statement will skip to next rule if yytext doesn't contain "arbitrary string". This will of course not provide the same performance as if the search string was known at compile time. regcomp()/regexec() in glibc might be faster than flex if you are implementing your own grep program.

Groovy how to multi line GStrings for exception messages

What is the standard (or best practice) for Groovy error messages that that shouldn't span over a certain number of characters/line, e.g., 80 characters?
Consider the following (which is working fine)
throw new IOException("""\
A Jenkins configuration for the given version control
system (${vcs.name}) does not exist."""
.stripIndent()
.replaceAll('\n', ' '))
This will result in a one-line error message with no indention characters (what I want). But is there some other way ("the Groovy way of doing it") how to achieve this? If not, how could you add such a method to the GString class in a standalone Groovy application (if found hints regarding a Bootstrap.groovy file but it seems to be related to Grails)?
Example: """Consider a multi line string as shown above""".toSingleLine()
You could use the String continuation character then strip multiple spaces:
throw new IOException( "A Jenkins configuration for the given version control \
system (${vcs.name}) does not exist.".replaceAll( /( )\1+/, '$1' ) )
Or you could wrap this in a function and add it to the String.metaClass as I believe the answers you've seen point to.
You're right in thinking that Bootstrap.groovy is a Grails thing, but if you just set the metaClass early on in your applications lifecycle, you should get the same result...
String.metaClass.stripRepeatedWhitespace = { delegate.replaceAll( /( )\1+/, '$1' ) }
In saying all this however, I'd probably just keep the message on a single line

Protection from Format String Vulnerability

What exactly is a "Format String Vulnerability" in a Windows System, how does it work, and how can I protect against it?
A format string attack, at its simplest, is this:
char buffer[128];
gets(buffer);
printf(buffer);
There's a buffer overflow vulnerability in there as well, but the point is this: you're passing untrusted data (from the user) to printf (or one of its cousins) that uses that argument as a format string.
That is: if the user types in "%s", you've got an information-disclosure vulnerability, because printf will treat the user input as a format string, and will attempt to print the next thing on the stack as a string. It's as if your code said printf("%s");. Since you didn't pass any other arguments to printf, it'll display something arbitrary.
If the user types in "%n", you've got a potential elevation of privilege attack (at least a denial of service attack), because the %n format string causes printf to write the number of characters printed so far to the next location on the stack. Since you didn't give it a place to put this value, it'll write to somewhere arbitrary.
This is all bad, and is one reason why you should be extremely careful when using printf and cousins.
What you should do is this:
printf("%s", buffer);
This means that the user's input is never treated as a format string, so you're safe from that particular attack vector.
In Visual C++, you can use the __Format_string annotation to tell it to validate the arguments to printf. %n is disallowed by default. In GCC, you can use __attribute__(__printf__) for the same thing.
In this pseudo code the user enters some characters to be printed, like "hello"
string s=getUserInput();
write(s)
That works as intended. But since the write can format strings, for example
int i=getUnits();
write("%02d units",i);
outputs: "03 units". What about if the user in the first place wrote "%02d"... since there is no parameters on the stack, something else will be fetched. What that is, and if that is a problem or not depends on the program.
An easy fix is to tell the program to output a string:
write("%s",s);
or use another method that don't try to format the string:
output(s);
a link to wikipedia with more info.

How to make this Groovy string search code more efficient?

I'm using the following groovy code to search a file for a string, an account number. The file I'm reading is about 30MB and contains 80,000-120,000 lines. Is there a more efficient way to find a record in a file that contains the given AcctNum? I'm a novice, so I don't know which area to investigate, the toList() or the for-loop. Thanks!
AcctNum = 1234567890
if (testfile.exists())
{
lines = testfile.readLines()
words = lines.toList()
for (word in words)
{
if (word.contains(AcctNum)) { done = true; match = 'YES' ; break }
chunks += 1
if (done) { break }
}
}
Sad to say, I don't even have Groovy installed on my current laptop - but I wouldn't expect you to have to call toList() at all. I'd also hope you could express the condition in a closure, but I'll have to refer to Groovy in Action to check...
Having said that, do you really need it split into lines? Could you just read the whole thing using getText() and then just use a single call to contains()?
EDIT: Okay, if you need to find the actual line containing the record, you do need to call readLines() but I don't think you need to call toList() afterwards. You should be able to just use:
for (line in lines)
{
if (line.contains(AcctNum))
{
// Grab the results you need here
break;
}
}
When you say efficient you usually have to decide which direction you mean: whether it should run quickly, or use as few resources (memory, ...) as possible. Often both lie on opposite sites and you have to pick a trade-off.
If you want to search memory-friendly I'd suggest reading the file line-by-line instead of reading it at once which I suspect it does (I would be wrong there, but in other languages something like readLines reads the whole file into an array of strings).
If you want it to run quickly I'd suggest, as already mentioned, reading in the whole file at once and looking for the given pattern. Instead of just checking with contains you could use indexOf to get the position and then read the record as needed from that position.
I should have explained it better, if I find a record with the AcctNum, I extract out other information on the record...so I thought I needed to split the file into multiple lines.
if you control the format of the file you are reading, the solution is to add in an index.
In fact, this is how databases are able to locate records so quickly.
But for 30MB of data, i think a modern computer with a decent harddrive should do the trick, instead of over complicating the program.

Resources