Enhanced Python Multiprocessing Data Pipeline Wrapper
This is the goal...
Objective
This is a piece of a big project I'm working on. This is an important part that will massively simplify report transmission in my program. The program tests a function against millions of inputs and uses multiprocessing to speed thing up. Source code on Pastebin.
Goals and Benefit
Put simply, multiprocessing.Pipe() is inadequate. It should be able to handle massive strings and switch process execution between a sender and receiver. I wrote this to implement:
Automatic error handling
Transmission error categorization
Data transmission chunking and reassembly
Unlimited data transmission size
Process synchronization
Simple abstraction to enhance usability
Former Problem
It has a weird bug I can't find. Days and plenty of documentation later, it's not fixed. I've left in a good many debug lines. Try entering "hi": you don't see "Receiver.Test: Output: hi" but should. Try a second time, it just hangs: Sample output.
Fixed by a dear friend.
Tests
The GPE works. Both of these first two tests work. For test 1, this source code outputs these results correctly and consistently. For test 2, this source code outputs something like these results correctly. For test 3, this source code outputs something like these results correctly.
Plea!
It's time to ask for help. It is part of a larger project. To be fair, there are a good many lines of code. This should be part of the multiprocessing module. I'm humbled. Can someone tell me what up? PLEASE? ANYONE??
Nobody answered...
Problem
In receive_oscillate, yield from looks like it never clears.
Notes
Also, deeply nested functions and one liner compound conditionals are not idiosyncratic to Python. Furthermore, breaking up your deeply nested functions and adding automated unit tests will help reduce bugs and facilitate maintenance.
Related
I've got two different scenarios that use the same example block. I need to run the example block for two different times of the day and I'm looking for a succinct way to do this (without copy+pasting my example block).
I'm replacing the yyymmdd with an actual date in my stepdef.
I'd like to reuse my Example block because in real life it's a MUCH longer list.
Scenario Outline: File arrives in the morning
Given a file <file> arrives in the morning
When our app runs
Then The file should be moved to <newFile>
And the date should be today
Examples:
|Filename|NewFilename|
|FileA|NewFileA_yyyymmdd|
|FileB|NewFileB_yyyymmdd|
Scenario Outline: File arrives in the evening
Given a file <file> arrives in the evening
When our app runs
Then The file should be moved to <newFile>
And the date should be tomorrow
Examples:
|Filename|NewFilename|
|FileA|NewFileA_yyyymmdd|
|FileB|NewFileB_yyyymmdd|
I'm implementing this in java, though I don't know if that's a relevant detail.
No, this is not supported in the Gherkin syntax. I don't often advised copy-and-paste, but this is one case where it is warranted due to a missing feature of the language.
Generally this should not be a big deal, as the example size should be small. It you really need a large number of examples then recreating this test in code only (Java, Python, C#, etc.) might be the best idea. Most unit test libraries offer some form of data driven tests that might provide a DRYer, more maintainable solution than Gherkin.
This is something thats better tested at a lower level. What you are testing here is your file renaming algorithm. You could write a unit test to do this which would
run much much faster (100 1000 or even 10K times faster is perfectly realistic)
be much more expressive
deal with edge cases better
Once you have that done I would write a single scenario that deals with the whole end to end process, and just ensures that the file is moved and renamed e.g.
Given a file has arrived
When our app runs
Then the file should be moved
And it should be renamed
And the new name should contain the current date
Cukes are expensive to create and particularly to run, so you need to get lots of functionality exercised for each one. When you use outlines and create identical scenarios you are just wasting loads of runtime and adding complexity for little benefit.
I have been trying to understand how ACO optimization can be implemented with data parallelism. I have read some content after searching in Google. I only need the basic idea in simple way. Most of the papers are talking about everything else instead of the main thing in simple words.
What I understood so far is, we will make it work parallel by using multi-tasking(threading). But am not sure what each thread would do or how we could separate it into threads without causing trouble.
Does it means that we should create separate thread for each ants? But that would cause lots of threads to be created! So if there are 200 ants, then 200 threads?
Am still having confusion at this data parallelism topic in ACO. I would really love to hear in simple words on how we would implement it parallely.
A few simple ideas to run ACO in parallel
Since you have already read up on ACO, here are a few simple ideas on ways to run ACO in parallel. Rather than getting caught up in multiple-threads and mutli-tasking, it might be helpful to think in terms of 'parallel compute resources' at your disposal.
ACO is one case of Agent-Based Simulation (ABS), and ABS lends itself particularly well to parellization.
Simple Options
Option 1. Run a full version of ACO in each of the parallel resources.
Code your ACO algorithm, run it in parallel fashion. (Since there is a stochastic element to the algorithm, you can then look for the 'best' solution for your problem.)
Option 2. To explore effects of varying ACO parameters
Like any simulation approach, any ACO implementation has a large number of runtime parameters: Number of vertices, time to run, Number of ants, Pheromone evaporation rates, probability functions to choose path options and many more. When you mutliply these options, they add up to some large number of cases to be run. Divide up the work among your parallel compute resources.
The two options mentioned above are sometimes referred to as 'embarrassingly' parallel. Very easy to implement (think of it as a Design of Experiments) and you get back a whole matrix of results, and you can make conclusions by studying what effect the changes in the parameters had on the solution.
Option with solution sharing
Option 3: Master-Slave approach, with Partial Solution sharing
Going up one more level in complexity, we can use each node to contribute its 'knowledge/findings' to the overall problem solution. This is sometimes called a master-slave approach. The master is trying to solve the overall problem (Could be TSP, or some similar complex problem) and each 'slave' is solving some aspect of it, but with some fairly simple algorithm. The idea is that when combined they produce powerful results.
After a certain number of iterations, the solutions are passed back and forth, with 'bad' solutions thrown out. Some variant of the Map-Shuffle-Reduce paradigm would do that. The master evaluates the current best solution, and that is transferred back to each 'slave' node (Example: the latest overall pheromone levels are given to all the slave nodes). The next round of solving resumes.
Option 3 has tons of nuanced variations, and some people spend their entire lives improving various aspects of it.
Hope some of these ideas help.
I am looking for a stable, free, easy to use, tool for generating decision table. TestCaseGenerator is exactly what I'm looking for, but is far from being stable, and if I have thousands of test cases it stops generating the test case. DecisionTableCreator is another example, but is not working if you have too many conditions.
I spent long time searching for such tool which I am sure must exist (I don't think TDD can do without such tool).
10x,
Sharon
The decision-table code generator, CCIDE ( http://twysf.users.sourceforge.net/ ) might be what you're looking for.
TestCaseGenerator is exactly what I'm looking for, but is far from being stable, > and if I have thousands of test cases it stops generating the test case.
How much cases are you need? I'm using http://decision-table.com - it could generate 16300 cases on my low-end computer. I guess more RAM could give you more cases.
By the way, why it is need so much test cases? I mean, 15+ conditions could possibly be splitted in two/tree/four test suits, so your decision tables will be smaller. May be it'll be helpfull to look at pairwise testing - technique of reducing number of test cases without leveraging decreasing of test coverage
I recently inherited a large codebase and am having to read it. The thing is, I've usually been the dev starting a project. As a result, I don't have a lot of experience reading code.
My reaction to having to read a lot of code is, well, umm to rewrite it. But I need to bring myself up to speed quickly and build on top of an existing system.
Do other people have techniques they've learned to absorb a code base? At this point, I'm just reading through the code. I've tried generating UML diagrams using UModel. They're so big they won't print cleanly and when I zoom in, I really do lose the perspective of seeing all the relationships.
How have other people dealt with this problem?
Wow - I literally just finished listening to a podcast on reading code!!!
http://www.pluralsight-training.net/community/blogs/pluralcast/archive/2010/03/01/pluralcast-10-reading-code-with-alan-stevens.aspx?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+pluralcast+%28Pluralcast+by+Pluralsight%29
I would recommend listening to this. One interesting point that was made that I found radical and may be something you could try (I know I'm going to!). Download the entire source code base. Start editing and refactoring the code then...throw that version away!!! I think with all the demands that we have with deadlines that doing this would not even occur to most developers.
I am in a similar position to you in my own work and I have found the following has worked for me:
- Write test cases on existing code. To be able to write the test case you need to be able to understand the cde base.
- If it is available, look at the bug\issues that have been documented through the life cycle of the product and see how they were resolved.
- Try and refactor some of the code - you'll probably break it, but that's fine you can throw it away and start again. By decomposing the code into smaller problems you'll understand it bettter
You don't need to make drastic changes when refactoring though. When your reading the code and you understand something, rename the variable or the method names so the better reflect the problem the are trying to solve.
Oh and if you can, please get a copy of Working Effectively with Legacy Code by Michael C. Feathers - I think you'll find it invaluable in your situation.
Good luck!
This article provides a guideline
Visualization: a visual representation of the application's design.
Design Violations: an understanding of the health of the object
model.
Style Violations: an understanding of the state the code is currently
in.
Business Logic Review: the ability to test the existing source.
Performance Review: where are the bottlenecks in the source code?
Documentation: does the code have adequate documentation for people
to understand what they're working on?
In general I start at the entry point of the code (main function, plugin hook, etc) and work through the basic execution flow. If its a decent code base it should be broken up into decent size chunks, and you can then go through and figure out what each chunk of the code is responsible for. Looking back at when in the execution flow of the system its called.
For the package/module/class exploration I use whatever doxygen will generate once its been run over the sources. It generates some nice class relation diagrams, inheritance hierarchies and file dependencies graphs. The benefit of these is they are each focused on a single class and how it ties it with its neighbors, siblings and parents, so the graphs are usually of manageable size and easy to understand.
As you understand what different, classes, functions and sub-systems do I like to add comments to fill what sounds like obviously missing documentation. This helps you when you re-read through the code the second time.
I would recommend another podcast and resources:
SE-Radion episode on Software Archeology
What are some ways in which I can write code that is easily modified?
The one I have learned from experience is that I almost always need to write one to throw away. That way I have developed a sense of the domain knowledge and program structure required before coding the actual application.
The general guidelines are offcourse
High cohesion, low coupling
Dont repeat yourself
Recognize design patterns and implement them
Dont recognize design patterns where they are not existing or necassary
Use a coding standard, stick to it
Comment everyting that should be commented, when in doubt : comment
Use unit tests
Write comments and tests before implementation, that way you know exactly what you want to do
And when it goes wrong : refactor, refactor, refactor. With good tests you can be sure nothing breaks
And oh yeah:
read this : http://www.pragprog.com/the-pragmatic-programmer
Everything (i think) above and more is in it
I think your emphasis on modifiability is more important than readability. It is not hard to make something easy to read, but the real test of how well it is understood comes when someone else (or you) has to modify it in repsonse to changing requirements.
What I try to do is assume that modifications will be necessary, and if it is not really clear how to do them, leave explicit directions in the code for how to do them.
I assume that I may have to do some educating of the reader of the code to get him or her to know how to modify the code properly. This requires energy on my part, and it requires energy on the part of the person reading the code.
So while I admire the idea of literate programming, that can be easily read and understood, sometimes it is more like math, where the only way to do it is for the reader to buckle down, pay close attention, re-read it a few times, and make sure they understand.
Readability helps a lot: If you do something non-obvious, or you are taking a shortcut, comment. Comments are places where you can go back and refactor if you have time later. Use sensible names for everything, makes it easier to understand what is going on.
Continuous revision will let you move from that first draft to a better one without throwing away (too much) work. Any time you rewrite from scratch you may lose lessons learned. As you code, use refactoring tools to eliminate code representing areas of exploration that are no longer needed, and to make obvious things that were obscure. The first one reduces the amount that you need to maintain; the second reduces the effort per square foot. (Sqft makes about as much sense as lines of code, really.)
Modularize appropriately and enforce encapsulation and separation of logic between your modules. You don't want too many dependencies on any one part of the code or that part becomes inherently harder to understand.
Considering using tried and true methods over cutting edge ones. You give up some functionality for predictability.
Finally, if this is code that people will be using before and after modification, you need(ed) to have an appropriate API insulating your code from theirs. Having a strong API lets you change things behind the scenes without needing to alert all your consumers. I think there's a decent article on Coding Horror about this.
Hang Your Code Out to D.R.Y.
I learned this early when assigned the task of changing the appearance of a web-interface. The code was in C, which I hated, and was compiled to a CGI executable. And, worse, it was built on a library that was abandoned—no updates, no support, and too many man-hours put into its use to change it. On top of the framework was a disorderly web of code, consisting of various form and element builders, custom string implementations, and various other arcane things (for a non-C programmer to commit suicide with).
For each change I made there were several, sometimes many, exceptions to the output HTML. Each one of these exceptions required a small change or improvement in the form builder, thanks to the language there's no inheritance and therefore only functions and structs, and instead of putting the hours in the team instead wrote these exceptions frequently.
In my inexperience I was forced to change the output of each exception, rather than consolidate the changes in an improved form builder. But, trawling through 15,000 lines of code for several hours after ineffective changes would induce code-burn, and a fogginess that took a night's sleep to cure.
Always run your code through the DRY-er.
The easiest way to modify a code is NOT to write code. Write pseudo code not just for algo but how your code should be structured if you are unsure.
Designing while writing code never works...for me :-)
Here is my current experience: I'm working (Java) with a kind of database schema that might often change (fields added/removed, data types modified). My strategy is to parse this schema and to generate the code with apache velocity. The BaseClass generated is never modified by the programmer. Else, a MyClass extends BaseClass is created and the logical components of this class (e.g. toString() ! )are implemented using the 'getters' and the 'setters' of the super class.