PexObserve only records 255 characters - visual-studio-2012

I am using Pex from the command line to find input values for test case generation.
I use PexObserve to record certain values during execution.
One of the values that I want to record is an XML-String.
However, when parsing the XML I receive "malformed XML" exceptions, since Pex only writes the first 255 characters into the log.
Is there a way to record the full XML string? or does PexObserve have a different type that will let me record longer texts?

Leaving this here, in case somebody at any point has the same issue.
I've found a solution that helped me.
Unfortunately the 255 character limit is set internally in static readonly fields.
Therefore I needed to use reflection.
My solution works by including the following line in the PUT:
typeof(Microsoft.Pex.Framework.PexObserve.ValueWriterManager).GetField("MaxWrittenElements").SetValue(null, 1000);
Replace the 1000 with any value you like.
BUT: remember that this is a quick-fix solution, that might not work for you.
It may have unwanted side-effects. You're also changing the number of List elements that are written, and perhaps other things.

Related

Way to find a number at the end of a string in Smalltalk

I have different commands my program is reading in (i.e., print, count, min, max, etc.). These words can also include a number at the end of them (i.e., print3, count1, min2, max6, etc.). I'm trying to figure out a way to extract the command and the number so that I can use both in my code.
I'm struggling to figure out a way to find the last element in the string in order to extract it, in Smalltalk.
You didn't told which incarnation of Smalltalk you use, so I will explain what I would do in Pharo, that is the one I'm familiar with.
As someone that is playing with Pharo a few months at most, I can tell you the sheer amount of classes and methods available can feel overpowering at first, but the environment actually makes easy to find things. For example, when you know the exact input and output you want, but doesn't know if a method already exists somewhere, or its name, the Finder actually allow you to search by giving a example. You can open it in the world menu, as shown bellow:
By default it seeks selectors (method names) matching your input terms:
But this default is not what we need right now, so you must change the option in the upper right box to "Examples", and type in the search field a example of the input, followed by the output you want, both separated by a ".". The input example I used was the string 'max6', followed by the desired result, the number 6. Pharo then gives me a list of methods that match that:
To get what would return us the text part, you can make a new search, changing the example output from number 6 to the string 'max':
Fortunately there is several built-in methods matching the description of your problem.
There are more elegant ways, I suppose, but you can make use of the fact that String>>#asNumber only parses the part it can recognize. So you can do
'print31' reversed asNumber asString reversed asNumber
to give you 31. That only works if there actually is a number at the end.
This is one of those cases where we can presume the input data has a specific form, ie, the only numbers appear at the end of the string, and you want all those numbers. In that case it's not too hard to do, really, just:
numText := 'Kalahari78' select: [ :each | each isDigit ].
num := numText asInteger. "78"
To get the rest of the string without the digits, you can just use this:
'Kalahari78' withoutTrailingDigits. "Kalahari"6
As some of the Pharo "OGs" pointed out, you can take a look at the String class (just type CMD-Return, type in String, hit Return) and you will find an amazing number of methods for all kinds of things. Usually you can get some ideas from those. But then there are times when you really just need an answer!

logstash custom patterns not parsing

i am facing an issue in parsing the below pattern
the log file will have log importance in the form of == or <= or >= or << or >>
I am trying the below custom pattern. Some of the log msgs may not have this pattern, so I am using *
(?(=<>)*)
But the log mesages are not parsing and give 'grokparsefailure'
kindly check and suggest if the above pattern is wrong.. Thanks much
below pattern is working fine.
(?[=<>]*)
the one which I used earlier and was erroring is
(?(=<>)*)
One thing to note, there is a better way to handle the "some do, some don't" aspect of your log-data.
(?<Importance>(=<>)*)
That will match more than you want. To get the sense of 'sometimes':
((?<Importance>(=<>)*)|^)
This says, match these three characters and define the field Importance, or leave the field unset.
Second, you're matching specifically two characters, in combinations:
((?<Importance>(<|>|=){2})|^)
This should match two instances of any of the trio of characters you're looking for.

How to represent a missing xsd:dateTime in RDF?

I have a dataset with data collected from a form that contains various date and value fields. Not all fields are mandatory so blanks are possible and
in many cases expected, like a DeathDate field for a patient who is still alive.
How do I best represent these blanks in the data?
I represent DeathDate using xsd:dateTime. Blanks or empty spaces are not allowed. All of these are flagged as invalid when validating using Jena RIOT:
foo:DeathDate_1
a foo:Deathdate ;
time:inXSDDatetime " "^^xsd:dateTime .
foo:DeathDate_2
a foo:Deathdate ;
time:inXSDDatetime ""^^xsd:dateTime .
foo:DeathDate_3
a foo:Deathdate ;
time:inXSDDatetime "--"^^xsd:dateTime .
I prefer to not omit the triple because I need to know if it was blank on the source versus a conversion error during construction of my RDF.
What is the best way to code these missing values?
You should represent this by just omitting the triple. That's the meaning of a triple that's "not present": it's information that is (currently) unknown.
Alternatively, you can choose to give it the value "unknown"^^xsd:string when there's no death date. The solution in this case is to not datatype it as an xsd:dateTime, but just as a simple string. It doesn't have to be a string of course, you could use any kind of "special" value for this, e.g. a boolean false - just as long as it's a valid literal value that you can distinguish from actual death dates. This will solve the parsing problem, but IMHO if you do this, you are setting yourself up for headaches in processing the data further down the line (because you will need to ask queries over this data, and they will have to take two different types of values into account, plus the possibility that the field is missing).
I prefer to not omit the triple because I need to know if it was blank
on the source versus a conversion error during construction of my RDF.
This sounds like an XY problem. If there are conversion errors, your application should signal that in another way, e.g. by logging an error. You shouldn't try to solve this by "corrupting" your data.

sas generate all possible miss spelling

Does any one know how to generate the possible misspelling ?
Example : unemployment
- uemployment
- onemploymnet
-- etc.
If you just want to generate a list of possible misspellings, you might try a tool like this one. Otherwise, in SAS you might be able to use a function like COMPGED to compute a measure of the similarity between the string someone entered, and the one you wanted them to type. If the two are "close enough" by your standard, replace their text with the one you wanted.
Here is an example that computes the Generalized Edit Distance between "unemployment" and a variety of plausible mispellings.
data misspell;
input misspell $16.;
length misspell string $16.;
retain string "unemployment";
GED=compged(misspell, string,'iL');
datalines;
nemployment
uemployment
unmployment
uneployment
unemloyment
unempoyment
unemplyment
unemploment
unemployent
unemploymnt
unemploymet
unemploymen
unemploymenyt
unemploymenty
unemploymenht
unemploymenth
unemploymengt
unemploymentg
unemploymenft
unemploymentf
blahblah
;
proc print data=misspell label;
label GED='Generalized Edit Distance';
var misspell string GED;
run;
Essentially you are trying to develop a list of text strings based on some rule of thumb, such as one letter is missing from the word, that a letter is misplaced into the wrong spot, that one letter was mistyped, etc. The problem is that these rules have to be explicitly defined before you can write the code, in SAS or any other language (this is what Chris was referring to). If your requirement is reduced to this one-wrong-letter scenario then this might be managable; otherwise, the commenters are correct and you can easily create massive lists of incorrect spellings (after all, all combinations except "unemployment" constitute a misspelling of that word).
Having said that, there are many ways in SAS to accomplish this text manipulation (rx functions, some combination of other text-string functions, macros); however, there are probably better ways to accomplish this. I would suggest an external Perl process to generate a text file that can be read into SAS, but other programmers might have better alternatives.
If you are looking for a general spell checker, SAS does have proc spell.
It will take some tweaking to get it working for your situation; it's very old and clunky. It doesn't work well in this case, but you may have better results if you try and use another dictionary? A Google search will show other examples.
filename name temp lrecl=256;
options caps;
data _null_;
file name;
informat name $256.;
input name &;
put name;
cards;
uemployment
onemploymnet
;
proc spell in=name
dictionary=SASHELP.BASE.NAMES
suggest;
run;
options nocaps;

How to make this Groovy string search code more efficient?

I'm using the following groovy code to search a file for a string, an account number. The file I'm reading is about 30MB and contains 80,000-120,000 lines. Is there a more efficient way to find a record in a file that contains the given AcctNum? I'm a novice, so I don't know which area to investigate, the toList() or the for-loop. Thanks!
AcctNum = 1234567890
if (testfile.exists())
{
lines = testfile.readLines()
words = lines.toList()
for (word in words)
{
if (word.contains(AcctNum)) { done = true; match = 'YES' ; break }
chunks += 1
if (done) { break }
}
}
Sad to say, I don't even have Groovy installed on my current laptop - but I wouldn't expect you to have to call toList() at all. I'd also hope you could express the condition in a closure, but I'll have to refer to Groovy in Action to check...
Having said that, do you really need it split into lines? Could you just read the whole thing using getText() and then just use a single call to contains()?
EDIT: Okay, if you need to find the actual line containing the record, you do need to call readLines() but I don't think you need to call toList() afterwards. You should be able to just use:
for (line in lines)
{
if (line.contains(AcctNum))
{
// Grab the results you need here
break;
}
}
When you say efficient you usually have to decide which direction you mean: whether it should run quickly, or use as few resources (memory, ...) as possible. Often both lie on opposite sites and you have to pick a trade-off.
If you want to search memory-friendly I'd suggest reading the file line-by-line instead of reading it at once which I suspect it does (I would be wrong there, but in other languages something like readLines reads the whole file into an array of strings).
If you want it to run quickly I'd suggest, as already mentioned, reading in the whole file at once and looking for the given pattern. Instead of just checking with contains you could use indexOf to get the position and then read the record as needed from that position.
I should have explained it better, if I find a record with the AcctNum, I extract out other information on the record...so I thought I needed to split the file into multiple lines.
if you control the format of the file you are reading, the solution is to add in an index.
In fact, this is how databases are able to locate records so quickly.
But for 30MB of data, i think a modern computer with a decent harddrive should do the trick, instead of over complicating the program.

Resources