Running QuickCheck properties in parallel - multithreading

In my project, I have several QuickCheck properties, most of which I collect using forAllProperties, defined in Test.QuickCheck.All. I am trying to run all my properties in parallel which is proving troublesome: at the end of a run, I get the output printed in the terminal and counterexamples and property names are often scattered so as to make it difficult to match properties with their counter examples.
I see that the purpose of library pqc is to run properties in parallel but it does not provide a replacement for forAllProperties nor does it provide a way of combining forAllProperties with the parallel test driver.
It feels like all I would need is for forAllProperties to pass the property name to the function it gets as an argument.
I have also looked into redirecting stdout on a thread by thread basis, first by using system-posix-redirect (which is not thread safe), then by studying Test.QuickCheck.State, especially the terminal field. The latter didn't pan out because I did not find a way of rewriting the terminal field.
Is there a way for me to, somehow, output the counter-examples together with the property names without copy / pasting the Test.QuickCheck.All module and making the changes I need?

Related

Writing DRY Spock tests

Assume that I have a Spock specification that given a city and state tests for the correct zip code. Assume I have a text file of cities and states that is used to drive the tests in the where clause.
Now assume that I want to split the tests so that I can run for "Virginia" or "Maryland". The approach that I have taken is to create a new VirginiaSpec and a new MarylandSpec and in that spec, I modify the where clause.
This works, but seems inefficient because every other part of the VirginiaSpec and MarylandSpec is exactly the same. In addition, if the logic changes, then I would need to change it in every spec that I have.
So what I am looking for is an approach that allows me to have one StateSpec in which the where clause can be parameterized.
I realize I have not included a code example, however if my question is not clear, then I can provide one. Thanks for your help.
-Dan
You have a couple of options. You could put the basic setup and structure and even the test itself in a base test class, then extend that class w/ your VirginiaSpec and MarylandSpec. Your spec classes would be very small probably just defining a constant that is the state for the spec.
But that seems needless. If both the cities and states are in this file, you could just read in the file in the where section of your test.
https://snekse.github.io/test-often-and-prosper-slides/#/42
If you cannot get the WHERE section working, you could always read in your file during the setupSpec and store the data in some kind of data structure then loop through it.
Spock: Reading Test Data from CSV File
But in general, using the Where label is going to be the right answer.

confused about render pass in Vulkan API

Recently i started learning Vulkan API,there are some topics that confuses me, my question is what is a render pass, why it is used concurrently with command buffer recording? and finally what are sub pass, sub pass dependencies and attachments? that are commonly related to render pass.
It's the only way to get something drawn (draw commands can only be inside render pass). So don't overthink it. As a begginer you only need to create one render pass with one (mandatory) subpass and that's it. You can understand the depths of it later.
Also you should give some chance to all those videos and tutorials, which are written at length and with more care than whatever will someone write here in short SO format.
Give the spec a chance (it's not so bad — but avoids redundant semantic and conceptual information). Try to read up some intro by AMD, vulkan-tutorial.com, Vulkan in 30 minutes (this one helped me started anyway — well there was not much more available at the time), API without secrets and watch e.g. Vulkan GDC session Part1, Part2.
Now you heard some people behind it and seen some of the commands. You should get back to us with more specific aspects you do not understand about it.
OK, I am just gonna add some conceptual description of it here to formally answer the question.
Render pass is sort of a description or a map or a scheme of a graphics job (which revolves around particular organization/use of Image resources). But it does not describe the actual commands nor the actual resources (that is done in command buffer recording for render pass instance between vkCmdBeginRenderPass() and vkCmdEndRenderPass())
Maybe a "black box" or "C++ like declaration" for which you provide implementation later is a good analogy.
Render pass has some set of attachments. Let's think of them as descriptions of needed frame image outputs and temporaries (but not the specific frame images themselves).
Render pass has some set of subpasses. Subpass describes how an attachment will be treated during its execution (e.g. as a color buffer in a color image layout).
Render pass has some set of subpass dependencies. Dependencies describe the execution order between subpasses (it forms a dependency DAG). Dependency also describes an equivalent of a pipeline barrier between two subpasses, or between a subpass and outside of the whole render pass (VK_SUBPASS_EXTERNAL dependency). Subpasses are executed in any order and can overlap (at the leisure of the driver), except for what you describe in the dependencies (or otherwise synchronize).
In command buffer using vkCmdBeginRenderPass() you create render pass instance (you provide actual Images for the attachments with VkFramebuffer, and actual commands which write to them).
The things that are part of the render pass description are executed automagically (the image layout transitions, barriers, and MSAA resolutions).
For the rest you record the commands for subpasses of the renderpass instance for the current CB. You do it sequentially for subpass 0, 1, 2, 3, 4, ... — that is not what the actual execution order will be though — you have described that with the subpass dependencies (and otherwise is at the leisure of the driver).
Then the command buffer with such render pass instanc(es) is submitted to queue and actually being executed.
It is perhaps these indirections that make it harder to grasp. Commands are recorded before they are even executed. And render pass is created before it is even recorded. :)

How do I create an array of resources using Jena?

I am using Jena and Java, and am reading a CSV file. For each line of the file there is a subject resource. Two subject resources, on adjacent lines, might have share the same value of a field in the line (e.g: both lines have the same process id). In this case, I need to combine the two subject resources as each one represents a sub-process in production (for example).
My question is: how can I reference those two resources dynamically so that I can combine them? I came to the idea that when I find that they share the same property to store them in an array resource subjects. Is it the right approach?
This question would be a lot easier to answer if you could show some sample data. As it is, I think you're focusing on the wrong bit of the question. If you can decide clearly what it means to have two rows in your CSV with identical process, and then you decide how you're going to encode that meaning in your RDF model, then the question of how to write the code - as an array or whatever - will be much clearer.
For example (and I'm going to make up some data here - as I said, it would be easier if you show an actual example), suppose your CSV contains:
processId,startTime,endTime
123,15:22:00,15:23:00
123,16:22:00,16:25:00
So process 123 has, apparently two start and end time pairs. If you model this naively in RDF, you'll end up with a confusing model:
process:process123
a :Process;
process:start "15:22:00"^^xsd:time;
process:end "15:23:00"^^xsd:time;
process:start "16:22:00"^^xsd:time;
process:end "16:25:00"^^xsd:time;
.
which would suggest that one process had two start times (and two end times) which looks nonsensical. However, it might be that in reality you have a single process with multiple episodes, suggesting one way to model it, or a periodic process which occurs at different times, or, as you suggested, sub-processes of a parent process. Or something else entirely (I'm only guessing, I don't know your domain). Once you are clear what the data means, you can produce a suitable RDF model. For example, an episodic process might be:
process:process123
a :Process;
process:episode [
a process:Episode;
process:start "15:22:00"^^xsd:time;
process:end "15:23:00"^^xsd:time;
];
process:episode [
a process:Episode;
process:start "16:22:00"^^xsd:time;
process:end "16:25:00"^^xsd:time;
]
.
Once the modelling is clear in your mind, I think you can see that the question of how to produce the desire RDF triples from Java code - and whether or not you need an array - is much clearer. Equally importantly, you can think in terms of the JUnit tests you would write to test whether your code is behaving correctly.

How to test a program processing large amounts of data stored in an unpredictable format

What I have to do
I'm trying to manipulate some rather large amounts of data stored in Excel files (one of the workbooks has as much as 150 spreadsheets). The result of these manipulations may yield approximately 800.000 rows in a database table.
The problem
Data stored in the spreadsheets has unpredictable format. The company that generated these spreadsheets had no fixed/documented format for exporting these files, and sometimes erroneous data appear. For example most of the years are represented like "2009" but there are cases where a year is represented as "20". Other example, data is not really normalized in these files, so I use separators to split the values of certain cells. Sometimes these separators change.
There are things like these that I couldn't predict and I only discovered them only after running an already evolved version of my program over a pretty large part of the available data.
The question
How can one test the correctness of a program in such a situation? Or rather, how to achieve a pretty stable version of the product without running it over the whole available data?
Shall I take a defensive approach and throw exceptions whenever some kind of unexpected issue arises? Then the main loop of the program may catch and log them and continue with the available data? This would yield some processed data, but that means that on a subsequent iteration of the program I have to have checks for what's already inside the database from previous iterations (which I don't really like).
What's your opinion? How would you tackle this problem?
If there is no specification for what the format of the data is, then anything is acceptable.
If not, then there is either an explicit or implicit specification of the data. I would try and nail this down right now. If you can't get an explicit enough definition of the data to write your program so that it can be expected to run without error, then I would say you are taking a very large risk in causing some serious damage depending on how this data is being used.
You should write your program so that it either throws an exception or logs an error whenever running across data that does not meet the specification. Then, run the program on PART of the available data until it runs without exception. This can be viewed as a training set for the development of your program. Then, use some of the saved data to use as a TEST set. This will give you an estimate of how many exceptions/errors your program will generate in production.
Overfitting is a common machine learning concept, but it is useful to other tasks such as this - program development. It is surprising to me how developers can write a bunch of unit tests, code their application to perform well on it, and then expect similar or bug-free performance in production.
If you're not willing to take all these steps (i.e. run your code on essentially all of the data -- since the test set is also making use of the data) then I would say the task is too large to do.
As an aside, rather than creating a definition of a format that is very strange and peculiar to account for all the "errors" in the current data, you might want to create a new, normalized (in the sense these things are simplified away) specification for the data, and then write a "faulty document patcher" that can be run on faulty documents to fix the data.
If the application generating the data is still in production, then you might need to go to the developers of this application to get a buy in on the new spec. Once you have that, you can then start logging bugs against their application, so hopefully the faulty document patcher can be retired.
More likely, I'm guessing that the software developers are long gone, no one understands the code anymore, if it is even running at all.
How can one test the correctness of a program in such a situation? Or rather, how to achieve a pretty stable version of the product without running it over the whole available data?
For every single data type I would set reasonable constraints on the values that it is allowed to be.
If a cell violates these constraints then throw an exception containing the piece of data it failed on and its data type. If a piece of data violated its constraints you can modify the source to include the additional constraints required for that piece of data, and a conversion method to make it uniform.
To give an example on the date you gave, initially a date would have the constraint that it could be only four digits. When the program came across the "20" it would throw an exception.
Then you could go and allow two digit dates, and a method to convert the two-digit dates into a four digit one to allow further processing.
One question is, will you run your program more than once? From your question it sounds possible you only want to run it once, and then you will then work with the data in the database.
In which case you can be very defensive - throw exceptions whenever unexpected data appears. Run the program repeatedly on ever-larger sets of the data. Initially, solve any exceptions by altering the code, as it's a good rule of thumb that the exceptions you find first are going to be common. You might want to empty the output database between runs.
Later on, you will be finding rare exceptions that might only occur a couple of times in the input. Just solve these by hand and insert the corresponding rows in the database yourself. Or write another small program that reads your exception information and inserts the new rows, rather than running your whole big program again.
Typically for this sort of thing I do these as #MarkJ suggested, and I encode the whole thing in unit tests.
So I compose a small datafile that at first contains only a few rows of normal data. That's unit test number 1.
Then I take a quick visual scan of some of the data to spot any obvious exceptions. Unit tests 2 through n.
Finally, I write parser code until it passes all unit tests, and throws and logs exceptions for all un-managed data.
I then use these oddball bits of data to make new unit tests, and improve the parser until it can pass those too.
Although sometimes accommodating some really strange bit of data adds more parser complexity than it's worth, and I'll just log the exception, dump it, and move on. This is a matter of professional judgment.
How about processing every piece of data (so you don't have to check for dupes). Those that pass go into the database. The exceptions go into an exception file. The user can open the exception file and make corrections/modifications to the data. Then they can run your program on the exception file.
This will isolate unhandled data for the user to correct and prevent you from processing the same data twice (or more).

What is meant by the term "hook" in programming?

I recently heard the term "hook" while talking to some people about a program I was writing. I'm unsure exactly what this term implies although I inferred from the conversation that a hook is a type of function. I searched for a definition but was unable to find a good answer. Would someone be able to give me an idea of what this term generally means and perhaps a small example to illustrate the definition?
Essentially it's a place in code that allows you to tap in to a module to either provide different behavior or to react when something happens.
A hook is functionality provided by software for users of that software to have their own code called under certain circumstances. That code can augment or replace the current code.
In the olden days when computers were truly personal and viruses were less prevalent (I'm talking the '80's), it was as simple as patching the operating system software itself to call your code. I remember writing an extension to the Applesoft BASIC language on the Apple II which simply hooked my code into the BASIC interpreter by injecting a call to my code before any of the line was processed.
Some computers had pre-designed hooks, one example being the I/O stream on the Apple II. It used such a hook to inject the whole disk sub-system (Apple II ROMs were originally built in the days where cassettes were the primary storage medium for PCs). You controlled the disks by printing the ASCII code 4 (CTRL-D) followed by the command you wanted to execute then a CR, and it was intercepted by the disk sub-system, which had hooked itself into the Apple ROM print routines.
So for example, the lines:
PRINT CHR(4);"CATALOG"
PRINT CHR(4);"IN#6"
would list the disk contents then re-initialize the machine. This allowed such tricks as protecting your BASIC programs by setting the first line as:
123 REM XIN#6
then using POKE to insert the CTRL-D character in where the X was. Then, anyone trying to list your source would send the re-initialize sequence through the output routines where the disk sub-system would detect it.
That's often the sort of trickery we had to resort to, to get the behavior we wanted.
Nowadays, with the operating system more secure, it provides facilities for hooks itself, since you're no longer supposed to modify the operating system "in-flight" or on the disk.
They've been around for a long time. Mainframes had them (called exits) and a great deal of mainframe software uses those facilities even now. For example, the free source code control system that comes with z/OS (called SCLM) allows you to entirely replace the security subsystem by simply placing your own code in the exit.
In a generic sense, a "hook" is something that will let you, a programmer, view and/or interact with and/or change something that's already going on in a system/program.
For example, the Drupal CMS provides developers with hooks that let them take additional action after a "content node" is created. If a developer doesn't implement a hook, the node is created per normal. If a developer implements a hook, they can have some additional code run whenever a node is created. This code could do anything, including rolling back and/or altering the original action. It could also do something unrelated to the node creation entirely.
A callback could be thought of as a specific kind of hook. By implementing callback functionality into a system, that system is letting you call some additional code after an action has completed. However, hooking (as a generic term) is not limited to callbacks.
Another example. Sometimes Web Developers will refer to class names and/or IDs on elements as hooks. That's because by placing the ID/class name on an element, they can then use Javascript to modify that element, or "hook in" to the page document. (this is stretching the meaning, but it is commonly used and worth mentioning)
Simple said:
A hook is a means of executing custom code (function) either before, after, or instead of existing code. For example, a function may be written to "hook" into the login process in order to execute a Captcha function before continuing on to the normal login process.
Hooks are a category of function that allows base code to call extension code. This can be useful in situations in which a core developer wants to offer extensibility without exposing their code.
One usage of hooks is in video game mod development. A game may not allow mod developers to extend base functionality, but hooks can be added by core mod library developers. With these hooks, independent developers can have their custom code called upon any desired event, such as game loading, inventory updates, entity interactions, etc.
A common method of implementation is to give a function an empty list of callbacks, then expose the ability to extend the list of callbacks. The base code will always call the function at the same and proper time but, with an empty callback list, the function does nothing. This is by design.
A third party, then, has the opportunity to write additional code and add their new callback to the hook's callback list. With nothing more than a reference of available hooks, they have extended functionality at minimal risk to the base system.
Hooks don't allow developers to do anything that can't be done with other structures and interfaces. They are a choice to be made with consideration to the task and users (third-party developers).
For clarification: a hook allows the extension and may be implemented using callbacks. Callbacks are generally nothing more than a function pointer; the computed address of a function. There appears to be confusion in other answers/comments.
Hooking in programming is a technique employing so-called hooks to make a chain of procedures as an event handler.
Hook denotes a place in the code where you dispatch an event of certain type, and if this event was registered before with a proper function to call back, then it would be handled by this registered function, otherwise nothing happens.
hooks can be executed when some condition is encountered. e.g. some variable changes or some action is called or some event happens. hooks can enter in the process and change things or react upon changes.
Oftentimes hooking refers to Win32 message hooking or the Linux/OSX equivalents, but more generically hooking is simply notifying another object/window/program/etc that you want to be notified when a specified action happens. For instance: Having all windows on the system notify you as they are about to close.
As a general rule, hooking is somewhat hazardous since doing it without understanding how it affects the system can lead to instability or at the very leas unexpected behaviour. It can also be VERY useful in certain circumstances, thought. For instance: FRAPS uses it to determine which windows it should show it's FPS counter on.
A chain of hooks is a set of functions in which each function calls the next. What is significant about a chain of hooks is that a programmer can add another function to the chain at run time. One way to do this is to look for a known location where the address of the first function in a chain is kept. You then save the value of that function pointer and overwrite the value at the initial address with the address of the function you wish to insert into the hook chain. The function then gets called, does its business and calls the next function in the chain (unless you decide otherwise). Naturally, there are a number of other ways to create a chain of hooks, from writing directly to memory to using the metaprogramming facilities of languages like Ruby or Python.
An example of a chain of hooks is the way that an MS Windows application processes messages. Each function in the processing chain either processes a message or sends it to the next function in the chain.
In the Drupal content management system, 'hook' has a relatively specific meaning. When an internal event occurs (like content creation or user login, for example), modules can respond to the event by implementing a special "hook" function. This is done via naming convention -- [your-plugin-name]_user_login() for the User Login event, for example.
Because of this convention, the underlying events are referred to as "hooks" and appear with names like "hook_user_login" and "hook_user_authenticate()" in Drupal's API documentation.
Many answers, but no examples, so adding a dummy one: the following complicated_func offers two hooks to modify its behavior
from typing import List, Callable
def complicated_func(
lst: List[int], hook_modify_element: Callable[[int], int], hook_if_negative=None
) -> int:
res = sum(hook_modify_element(x) for x in lst)
if res < 0 and hook_if_negative is not None:
print("Returning negative hook")
return hook_if_negative
return res
def my_hook_func(x: int) -> int:
return x * 2
if __name__ == "__main__":
res = complicated_func(
lst=[1, 2, -10, 4],
hook_modify_element=my_hook_func,
hook_if_negative=0,
)
print(res)
A function that allows you to supply another function rather than merely a value as an argument, in essence extending it.
In VERY short, you can change the code of an API call such as MessageBox to where it does a different function edited by you (globally will work system wide, locally will work process wide).

Resources