visiting all nodes/subnodes of Ecore model generated by DSL - dsl

I know basics of xtext, and I have written a grammar(which works fine). I know that an Ecore model is generated. But I have problem in visiting all rules/sub-rules(nodes) of grammar in AST (programmatically).
Whenever I am applying some validation checks(or others) I always find it difficult to navigate to that particular rule on which check is to be applied. (to some extent I do it by hit and trail).
I want to ask that is there any way to print all the nodes sub-nodes of your grammar(programmatically) just to clarify in my mind how each node is being accessed.

Related

Best practices for creating a customized report based on user form input?

My Question
What are the best practices for creating a customized report based on a user form input? Specifically, how do I create an easy to maintain system which takes user input which is collected in a form and generate multiple paragraphs that explains the results of analysis.
Background
I am working on a very large multiyear project with a startup (who is my client). My job is to program analysis and generate reports to users. The pipeline for data looks like this:
Users enter information into a form -> results are calculated based on user input -> reports are displayed to users that share analysis.
It is really important to my client that some of the analysis results are displayed in paragraphs in a non-formal user friendly tone. The challenge is that the form and analysis are quite complex and will only get more complex over time. An example of the type of template for the paragraphs looks something like this:
resultsParagraphText=`Hi ${userName}. We found that the best ice cream flavour for you is ${bestIceCreamFlavor}. These other flavors ${otherFlavors} might be good for you. Here are the reasons why you might enjoy these flavors: ${reasonsWhyGoodFlavors}.
However we would not recommend these other flavors ${badFlavors}. Here are the reasons you should avoid this bad flavors: ${reasonsWhyBadFlavors}.`
These results paragraphs, of which there of many, have several minor problems which combined are significant:
If there is a bug in the code, minor visual errors would be visible to end users (capitalization errors, missing/extra commas, and so on).
A lot of string comparisons (e.g. if answers.previousFlavors.includes("Vanilla")) are required to generate the results paragraphs. Minor errors in the forms (e.g. vanilla in the form is not capitalized so answers.previousFlavors.includes("Vanilla") returns false even when user enters vanilla.) can cause errors in the results paragraph.
Changes in different parts of the project (form, analysis) directly effect how the results paragraph is made. Bad types, differences in string values, null or undefined values not being caught directly have an impact on how the results paragraph is made.
There are many edge cases (e.g. What if the user has no other suitable good flavors for them? The the sentence These other flavors ${otherFlavors} might be good for you. needs to be excluded).
It is hard to write paragraphs that use templates and have a non-formal tone.
and so on.
I have charts and other types of ways to display results and have explained to the client the challenges of sharing the information in paragraph form.
What I am looking for
I need examples, how tos, best practices on how to build a maintainable system for generating customized paragraphs based on user input. I know how to solve each of the individual issues (as they are fairly simple) but in a large project this will become very hard to maintain.
Notes
I have no clue what tags to use for the post. Feel free to edit/add tags if you know more appropriate ones.
The project is planning to use machine learning in the future other parts of the project. If there is a ML/AI solution that is useful please tell me.
I am working primarily in JavaScript, Python, C, and R, but if there is a library or tool in any other language please tell me. Finding a solution is very important to me and I would be willing to learn a lot find a best solution.
To avoid this question being removed because I have rephrased it to avoid asking for personal opinion, instead asking for existing examples or how tos. I can also imagine that others might find a solution fairly useful. If you can edit it to make the question less subjective please do so.
If you have any questions or need clarification feel free to ask. Any help is appreciated.

Handle different layout of document using kofax

I am new to KofaxTotalAgility solution, but i am well aware of OCR, OMR and recognition mechanism.
I have two forms in one folder, A and B.
both of them are identical, but due to manual scan there are slight axes change, say 20 pixel right shift, so Layout is slightly differ.
Layout of Image A and Image B are different, position of a form in a page are not fix.
I know, other solution like "abbyy fine reader", provide flexilayout where we can handle this by finding the text and setting up right left top down to automatically identify zones.
As i have started learning KofaxTotalAgility, i am unaware of all option provided by "kofax Transformation Designer".
My question is which Locator should i use, i am currently using/working-on advance zone locator and for one document(Image A) which i set as a reference, extraction is proper. But for other,(Image B) due to layout mismatch text/box field are not getting extracted.
Can anyone point out the right direction from where i can get this case handled properly.
I know, i am asking direct option/solution, any help is highly appreciable.
In general, Kofax Transformations has two groups of locators:
Deterministic. You tell the locator precisely what to do, and how to do it (similar to an imperative approach when programming)
Probabilistic. You just tell your locator what to extract, and it works out the rest (based on AI).
Here's a (non-exhaustive) diagram I created the other day:
When working with forms, you might be tempted to rely on forms-specific locators such as the Advanced Zone Locator. While this locator can account for fields "moving around", for example due to images being jolted, zoomed, or distorted, there are certain limitations. Other locators don't have these limitations - the format locator for example allows you to define a certain pattern (a Regular Expression) that should be matched along with a keyword that has to be found somewhere around that pattern.
For your example, you could create a regex like M|F|X, and then define "Gender" as the keyword that needs to be present on the left.
However, any locator that's ruled by determinism follows Murphy's law - at some point that keyword might change. There could be different languages. And maybe additional letters for certain genders might be added; ultimately breaking your extraction logic.
Enter AI - while Murphy's law still applies when using Group Locators, the difference here is that users can train the system to pick up the new data. Said locator will automatically work out the best way to extract that piece of data. If you used a format locator, the customer would need to get back to you to add additional expressions, or have the keywords changed.
In your particular case, I'd try to use a Trainable Group Locator first. If you already know what you're looking for - for example SSNs that you have somewhere in a database, go for the Database Locator. Use Format Locators as a last resort, as tempting as they may be. Advanced Zone Locators are useful when you deal with forms, but I find myself using them almost exclusively for handprint or checkbox recognition.

Tired of web development, looking for extreme ways to avoid boilerplate code - a meta-framework? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Closed 8 years ago.
Questions asking us to recommend or find a book, tool, software library, tutorial or other off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Improve this question
Having written quite a few apps with modern web frameworks (Symfony2, Laravel, Rails, expressjs, angularjs...) I can't help but think that about 90% of the time I spend developing is spent on writing CRUDs. Then I spend 10% of the time doing the interesting part: basically defining how models should behave.
This is demeaning, I want to reverse the ratio.
The above mentioned frameworks almost all go out of their way to make the boilerplate tasks easier, but still, it is not as easy as I feel it should be, and developer brain time should be devoted to application logic, not to writing boilerplate code.
I'll try to illustrate the problem with a Rails example:
Scaffold a CRUD (e.g. rails generate scaffold user name:string email:string password:string)
Change a few lines in the scaffolded views (maybe the User needs a Role, but scaffold doesn't know how to handle it)
... do other things ...
Realize you wanted to use twitter bootstrap after all, add the most magical gems I can find to help me and...
Re-scaffold my User CRUD
Re-do the various edits I performed on the views, now that they've been overriden by scaffold
...
And this will go on and on for a while.
It seems to me that most magic tools such as rails generate will only help you with initial setup. After that, you're on your own. It's not as DRY as it seems.
I'll be even more extreme: I, as a developer, should be able to almost build an entire project without worrying about the UI (and without delegating the task to someone else).
If in a project I need Users with Roles, I would like to be able to write just, say, a .json file containing something along the lines of:
{
"Schema": {
"User": {
"name" : "string",
"email": "email (unique)",
"password": "string",
"role": "Role"
},
"Role": {
"name": "string (unique)"
}
}
}
I would then let the framework create the database tables, corresponding views and controllers. If I want to use bootstrap for the UI, there would be a setting to toggle this in the master .json file. And at no point would I edit a single view. If later I add a field to User or want to change the UI style, I just edit the master .json file.
This would open the way for a creative branch of UX design. I could, for instance, assign an importance flag to each field of the User model and let a clever designer write a plugin that designs forms whose layout is optimized by the relative importance of fields. This is not my job, and it is not the UX designers' jobs to rewrite the same thing a 100 times over for different projects, they should write "recipes" that work on general, well specified cases.
I have a feeling that the MVC pattern, with all its virtues, has too much decoupled the view from the model. How many times have you had to write validation code twice: once server-side and once client-side because you wanted richer feedback? There is much information to be gotten from the model, and you should be able to get client side validation by just setting a property on the model telling the framework to do so.
You may say that Rails scaffold is close to what I'm imagining, sure. But scaffolding is only good at the beginning of the project. Even with scaffolding, I need to rewrite many things to, say, only allow an Admin to change a User's role.
The ideal framework would provide me with a simple hook to define whether the action of changing the Role field on a User is allowed or not, and the UI would automagically display the error to the user if I return it properly from the hook.
Writing a User system should only take a few minutes. Even with things like devise for Rails or FOSUserBundle for Symfony2 it takes a huge, unnecessary amount of configuring and tuning.
The meta-framework I have in mind could be, in theory, used to generate the boilerplate code for any modern web framework.
I would like my development workflow to look like this:
Create app.json file defining the models and their relationships, + the hooks I want to specialize
Run a command like generate_app app.json --ui=bootstrap --forms=some_spectacular_plugin --framework=rails4
Implement the hooks I need
Done.
The resulting app would then update itself whenever app.json changes, and adding entities, fields, custom logic should never be harder than writing a few lines of JSON and implementing the interesting parts of the logic in whatever the target language is.
I strongly believe that the vast number of frameworks out there address the wrong question: how to write little bits of unconnected code more efficiently. Frameworks should ask: what is an application and how can I describe it?
Do you know of any projects that would be going in this direction? Any pointers to literature on this?
I have been confronted couple of times to similar development tireness – boilerplate over an over. For me boiler plate is all that does not bring any added business value (static: project setup, contextual: CRUD (backend, frontend), drop-down list, sub-Usecase for affectation etc..., etc...)
The approach presented is command line generation of rails scaffold artifacts which has the following pitfalls : incomplete and brittle on maintenance.
Generation covers the aspects of generating same information (redundant info) over different type of artifacts (DB storage, presentation layer, persistence layer etc…)
Furthermore consecutive generation overrides your changes.
To solve this inconvenience, I see only two solutions :
Do not use generator but a generic framework that manages in one central place persistence and presentation aspects. In java there is Openxava that is designed for that. It works with annotations (persistence and presentation), it also answers your validation question with stereotype.
Use an updatable-generated-code generator.
Minuteproject gets updatable-generated-code feature. You can modify generated parts and the next generation keeps your modifications.
Meanwhile none of those solutions matches your technology target (Rails).
Minuteproject can be extended to generate for any text based language (java, ruby, python, c, js…).
Minuteproject generates from DB structure, query statement, transfer-object definition see productivity facet for analysts
Remark : minuteproject also provides a reverse-engineering solution for Openxava
In the sequence summary proposed :
create model, relationship + hooks, I would go rather consider the DB model as a center place and add hooks (not yet present) in model enrichment (minuteproject propose a dedicate to enrich the model with conventions…)
I rather go for reverse-engineering solution than Forward engineering for the following reasons :
correct DB storage is too crucial to be generated :
Forward engineering cannot generate view, stored procedure, function etc...
Forward engineering may not tune correctly (tablespace) your persistence model.
generate by picking up your technology target
implements the hooks in updatable-generated-code sections, so that at the next generation (model structure has changed, new hooks are to be implemented), your previous hook-implementations are kept.
I have create some to-be-implementated (hook) for Openxava (http://minuteproject.blogspot.be/2013/08/transient-definition-as-productivity.html)
But minuteproject approach is technology agnostic so artifacts could be as well generated for other frameworks.

about semantic search

I am a "rookie" in Semantic Web. So a lot things confuse me right now. I am going to make a semantic web search in website. But I am not sure what should be the workflow of that?
I just have basic opinion.
Please correct me
use a webspider to get web resources, and put thoses reources in files.
parse those resource files (lexical ananlysis) and use RDF format to describe those resources (now, the RDF contains the ontologies,
which are about resources).
parse the RDF files (contain resources), use OWL (combine inference mechanism) to describe the ontologies in RDF files.
semantic analyze the user input (from search text box), match it in OWL files, and then match in the RDF reources files, then provide
the related results.
Please give me suggestions and correct me.
See this resource for your engine.
You should learn to search and use existing resources (ontologies and more generally APIs), that allow to reuse semantic annotations on data. (Linked Data, see here). Anyway, if you get web resources, don't put them in files and reference the origin, because the copy changes the links semantic. Knowledge evolve over time...
Regards the semantic analysis, could be a difficult task. Before you start to implement yourself, search if there is some API out there, that fits your bill.

Hackproofing the site?

I don't know how to make my site hackproof at all. I have inputs where people can enter information that get published on the site. What should I filter and how?
Should I not allow script tags? (issue is, how will they put YouTube embed code on the site?)
iFrame? (People can put inappropriate sites in iFrames...)
Please let me know some ways I can prevent issues.
First of all, run the user's input through a strict XML parser.
Reject any invalid markup.
You should use a whitelist of HTML tags and attributes (in the parsed XML).
Do not allow <script> tags, <iframe>s, or style attributes.
Run all URLs (href and src attributes) through a URI parser (eg, .Net's Uri class), and ensure that the protocol is http, https, or perhaps mailto. Again, reject any invalid URLs.
If you want to allow YouTube embedding, add your own <youtube> tag that takes a URL or video ID as a parameter (content or attribute), and transform it into a script on the server (after validating the parameter).
After you finish, make sure that you're blocking everything on this giant list.
There is no such thing as hacker proof. You want to do everything you can to decrease the possibility of being hacked. The most obvious weaknesses are going to be preventing against xss (cross site scripting) hacks and sql injection attacks. There are easy ways to avoid both, most notably using newer technologies that instinctively seek to ward against them (text outputs that are encoded by default, conversions of queries before execution), etc.
If you need to go beyond those levels, there are a number of both automated (mostly fuzzy numbers you can give your sales guys after they are all "good") services that will "test" your system down to hard-core analysts that will pick apart your system for various audits.
Other than the basics mentioned above (xss & sql injection), the level of security you should try and obtain will really depend on your market.
Didn't see this mentioned explicitly, but also use fuzzers ( http://en.wikipedia.org/wiki/Fuzz_testing ).
It basically shoves random crap (strings of varying characters and length) into your input fields; It's used in industry practice bc it finds lots of bugs (ie. overflows).
http://www.fuzzing.org/ has a list of great fuzzers for you to try.
You can check a penetration testing framework like ISAAF. It give you a check list and a methodology to test important security aspects of your application.

Resources