Liferay Build language issue (Automatic copy) - liferay

I am trying to build language in Liferay. For Language_en.properties, it works fine but for other languages it is not translating key value's to respective languages. It only shows english value prefix with "(Automatic copy)".
Any help on this is appreciated.
Thanks.

Liferay's translation is done via Microsoft Translator. You'll need to sign up for an account on Azure Marketplace and configure it according to the documentation. Note that this translation often is only a first step and it will require you to manually correct those translations: Often the typically small snippets don't have enough context for the algorithm to choose the appropriate translation.
Any automatic translation will be flagged as "(Automatic Translation)" (which appears in the properties files, but will not be shown on the UI) - you can use this to manually double-check those entries that have just been automatically translated (when you delete the flag after you've corrected the translation).
Note that some languages have deliberately been disabled from automatic translation, as the algorithm was supposed to have more incorrect, rude, hillarious or offensive translation than one would like to have - for those languages Liferay relies on manual translation even when you have set up an account for automatic translation. You have the option to change those settings, of course, but you'll have to willingly change the code that I'm linking to.
My personal opinion: Don't go with automatic translation - it's rarely correct, more often rude, hillarious or plain wrong.

Related

Tools for Domain Specific Language/Functions

Our users can enter questions that get answered by students. Our users need a extensible, flexible way to define the correct answers to these questions (which are stored as a simple string).
I would like to expose a library of domain specific functions that users can call on to describe the correct answer. Eg:
exact_match("puppy") // means the correct answer is the string 'puppy'
or
contains("yesterday") // means any answer with the word 'yesterday' is correct
The naive implementation would involve eval'ing user supplied strings in a sandboxed runtime (like a javascript vm or ruby vm). But I'd like to go further and only allow specific functions to be called. Any other scripting would be discarded. Such that:
puts("foo"); contains("yesterday")
would be illegal. Since we don't expose or allow puts().
How can I constrain the execution environment to only run a whitelist of functions? Or is there a different approach to build this kind of external-facing DSL instead of trying to constrain an existing language to a subset of functions?
I would check out MPS by JetBrains if I were you, its an open source DSL creation tool. I have never used it myself, but from everything I have seen on it, it's very intuitive; and all of their other products are incredibly powerful.
Just because you're creating a DSL, that doesn't necessarily mean that you have to give the user the ability to enter the code in text.
The key to this is providing a list of method names and your special keyword for them, the "FunCode" tag in the code example below:
Create a mapping from keyword to code, and letting them define everything they need, and then use it. And I would actually build my own XML parser so that it's not hackable, at least not on a list of zero-day-exploits hackable.
<strDefs>
<strDef><strNam>sickStr</strNam>
<strText>sick</strText><strNum>01</strNum><strDef>
<strDef><strNam>pupStr</strNam>
<strText>puppy</strText><strNum>02</strNum><strDef>
</strDefs>
<funDefs>
<funDef><funCode>pfContainsStr</funCode><funLabel>contains</funLabel>
<funNum>01</funNum></funDef>
<funDef><funCode>pfXact</funCode><funLabel>exact_match</funLabel>
<funNum>02</funNum></funDef>
</funDefs>
<queries>
<query><fun>01</fun><str>02</str>
</query>
</queries>
The above XML more represents the idea and the structure of what to do, but rather in a user interface, so the user is constrained. The user interface code that allows the data-entry of the above data should be running on your server, and they only interact with it. Any code that runs on their browser is hackable, because they can just save the page, edit the HTML (and/or JavaScript), and run that, which is their code now, not yours anymore.
You can't really open the door (pandora's box) and allow just anyone to write just any code and have it evaluated / interpreted by the language parser, because some hacker is going to exploit it. You must lock down the strings, probably by having them enter them into your database in an earlier step, and each string gets its own token that YOU generate (a SQL Server primary key is very simple, usable, and secure), but give them a display representation so it's readable to them.
Then give them a list of methods / functions they can use, along with a token (a primary key can also serve here, perhaps with a kind of table prefix) and also a display representation (label).
If you have them put all of their labels into yet another table, you can have SQL make sure that all of their labels are unique to each other in the whole "language", and then you can allow them to try to define their expressions in the language they want to use. This has the advantage that foreign languages can be used, but you don't have to do anything terribly special.
An important piece would be the verify button, that would translate their expression into unique tokens and back again, checking that the round-trip was successful. If it wasn't successful, there's some kind of ambiguity, and you might be able to allow them an option to use the list of tokens as the source in that case.
If you heavily rely on set-based logic for the underlying foundation of the language and your tables, you should be able to produce a coherent DSL that works. Many DSL creation problems are ones of integrity, where there are underlying assumptions that are contradictory, unintentionally mutually exclusive, or nonsensical. Truth is an unshakeable foundation. Anything else has a lie somewhere -- that you're trying to build on.
Sudoku is illustrative here. When you screw up a Sudoku, you often don't know that you have done so, and you keep building on that false foundation, until you get to the completion of the puzzle, and one whole string of assumptions disagrees with a different string of assumptions. They can't both be true. But you can't tell where you went wrong because you're too far away from the mistake and can not work backwards (easily). All steps taken look correct. A DSL, a database schema, and code, are all this way. Baby steps, that are double- and even triple-checked, and hopefully "correct by inspection", are the best way to "grow" a DSL, slowly, piece-by-piece. The best way to not have flaws is to not add them in the first place.
You don't want bugs in your DSL. Keep it spartan. KISS - Keep it simple, Sparticus! And I have personally found that keeping it set-based, if not overtly, under the covers, accomplishes this very well.
Finally, to be able to think this way, I've studied languages for a long time, and have cultivated a curiosity about how languages have come to be. Books are a good quality source of information, as they have a higher quality level than the internet, which is nevertheless also an indispensable source. Some of my favorite languages: Forth, Factor, SETL, F#, C#, Visual FoxPro (especially for its embedded SQL), T-SQL, Common LISP, Clojure, and probably my favorite, Dylan, an INFIX Lisp without parentheses that Apple experimented with and abandoned, with a syntax that seems to me reminiscent of Pascal, which I sort of liked. The language list is actually much longer than that (and I haven't written code for many of them -- just studied them or their genesis), but that's enough for now.
One of my favorite books, and immensely interesting for the "people" side of it, is "Masterminds of Programming: Conversations with the Creators of Major Programming Languages" (Theory in Practice (O'Reilly)) 1st Edition, Kindle Edition
by Federico Biancuzzi (Author), Chromatic (Author)
By the way, don't let them compromise the integrity of your DSL -- require that it is expressible set-based, and things should go well (IMHO). I hope it works out well for you. Add a comment to my answer telling me how it worked out, if you think of it. And don't forget to choose my answer if you think it's the best! We work hard for the money! ;-)

Translation - Standard Software Menus and Terms Implemenation

Most applications have quite a few predictable elements: "Home", "File", "About"...etc.
I was wondering if there is a standard "Helper" or fast implementation of translated common terms for applications.
Example:
Standard Software Menu Terms
-Parent Term 1
---French: Transliterated 1.
---Spanish: Transliterated 1.
---Chinese: Transliterated 1.
-Parent Term 2
---French: Transliterated 2.
---Spanish: Transliterated 2.
---Chinese: Transliterated 2.
/Stand Software Menu Terms
I thought some sort of object or XML could be utilized at installation or initialization. Given the evolution of modern software, I'd be surprised if some standard library of this sort didn't exist.
I searched quite extensively on this and can't seem to find anything.
Actually, there are standard translation "libraries". They are called Translation Memories or Terminology Databases.
Microsoft has a default Terminology DB for all it's applications and it's open source.
https://www.microsoft.com/Language/en-US/Terminology.aspx
I think it's in the tmx or tbx format. Any real translation tool can handle them. I think OmegaT which is open source should be able to handle them too. Or you could try transifex.
But if you work with a professional translation agency like e.g. Supertext they can do this for you.
Let me know if you have additional questions.
As far as I know, there is no standard library. Probably, since applications usually share some common menu elements, but are specific in other menus.
Nevertheless, there are many tools, which ease the translation of culture depend contents. For Microsoft .NET applications, for example, RESX Manager may help you (my project). It builds up your own translation storage, in order to use common translations in several projects, by extracting contents from existing projects.

Command line software for testing accessibility

I've done some searching around and I can't seem to find any command line utilities out there that will allow me to evaluate accessibility on web pages.
Essentially I want to automate the process of wget'ing a large number of websites and evaluating their accessibility.
So I would have a cron job that would get all of the necessary pages and then run the evaluation software on them. The output would then be parsed into a website ranking accessibility.
Does anyone know of anything that may work for this purpose?
Thanks!
If only accessibility evaluation were that simple... Unfortunately, what you're looking for isn't reasonably possible.
The main issue is that it's not possible to evaluate for accessibility by programatic/automated means alone. There's certainly some things you can check for and flag, but it's rare that you can say that they are either in error or correct with 100% accuracy.
As an example, take the issue of determining whether an IMG has suitable ALT text. It's impossible for a tool to determine whether the ALT text is actually meaningful in the overall context of the page: you need someone to look at the page to make that determination. But a tool can help somewhat: it can flag IMGs that don't have ALT attributes; or perhaps even flag those that have ALT attributes that look like filenames instead of descriptive text (a common error). But if there is already ALT text, it can't say for sure whether the ALT is correct and meaningful or not.
Similar with determining whether a page is using semantic markup correctly. If a tool sees that a page is not using any H1 or similar headers and only using styles for formatting, that's a potential red flag. But if there are H1's and other present, it can't determine whether they are in the right meaningful order.
And those are just two of the many issues that pertain to web page accessibility!
The issue gets even more complicated with pages that use AJAX and Javascript: it may be impossible to determine via code whether a keyboard user can accesses everything they need to on a page, or whether a screenreader user will understand the controls that are used. At the end of the day, while automated tools can help somewhat, the only way to really verify accessibility in these cases is by actual testing: by attempting to use the site with a keyboard and also with a screenreader.
You could perhaps use some of the existing accessibility tools to generate a list of potential issues on a page, but this would make for a very poor rating system: these lists of potential issues can be full of false positives and false negatives, and are really only useful as a starting point for manual investigation - using them as a rating system would likely be a bad idea.
--
For what it's worth, there are some tools out there that may be useful starting points. There is an Accessibility Evaluation Toolbar Add-On for Firefox, but it doesn't actually do any evaluation itself; rather it pulls information from the web page to make it easier for a human to evaluate it.
There's also the Web Accessibility Evaluation Tool (WAVE); again, it focuses on making accessibility-relevant information in the page more visible so that the tool user can more easily perform the evaluation.
Also worth checking out is Cynthia Says, which does more of what could be called 'evaluation' in that it generates a report from a web page - but again its only useful as a starting point for manual investigation. For example, if an IMG tag has empty ALT text - which is recommended practice if the image is purely cosmetic or is a spacer - then the generated report states "contains the 'alt' attribute with an empty value. Please verify that this image is only used for spacing or design and has no meaning." - so it's flagging potential errors, but could flag things that are not errors, or possibly miss some things that are errors.
For other information on Web Accessibility in general, I can recommend the WebAIM (Accessibility In Mind) site as a good starting point for everything Web Accessibility related.
+1 to #BrendanMcK answer ... and for the part that can (and should (*)) be automated, I know of Tanaguru and Opquast.
Tanaguru is both a SaaS and free software. Based on checklist Accessiweb 2.1 (that follows closely WCAG 2.0), it can audit pages or thousands of them. You can try it for free here: http://my.tanaguru.com/home/contract.html?cr=17 > Pages audit
I never installed it on a server, there's a mailing list if you've difficulties installing this huge Java thing
Opquast is a service that you can try for free for a one page audit but otherwise isn't free.
It'll let you audit your site with quality checklist (the Opquast one), a brand new "Accessibility first step" (aimed to problems so obvious that they should be corrected before contacting an accessibility expert) and also accessibility checklist Accessiweb and RGAA (both are based on WCAG 2.0 but I don't think that RGAA has been translated from french to english).
EDIT 2014: Tenon.io is a fairly new API by K. Groves that is very promising
(*) because it's tedious work like checking if images, area and input[type="image"] lack alt attribute ... That is work better done by computers than by people. What computers can't do is evaluating if alt, when present, are poorly written or are OK.

Open source spell check

Was evaluating adding spell check to a product I own. As per my research the major decisions that need to be made:
The library to use.
Dictionary( this can be region specific, British english, American etc).
Exclusion lists. Anytime a typo is detected its possible that its not a typo but is
verbiage specific to the user. At this point the users should be given the ability to
add this to his custom exclusion list.
Besides a per user custom list also a list of exclusion based on the user space of the
clients of the tool. That is terms/acronyms in the users work domain. For example FX will not be a typo for currency traders.
The open questions I had are listed below and if I could get input into them that would be very useful.
For 1, I was thinking of hunspell, whcih is the open source library offered under MPL and is used by firefox and OpenOffice family of products. Any horror stories out there using this?
Any grey areas with the licensing? The spell checking will happen on a windows client.
Dictionaries are available from a variety of sources some free under MPL while some are not. Any suggestions on good sources for free dictionaries.
Multi lingual support and what needs to be worked out to support them?
For 4, how are custom dictionaries kept in sync with the server side and the clientside? The spell check needs to happen on the clientside so are they pushed down with the initial launch everytime or are they synced up ever so often?
As already mentioned Hunspell is a state of the art spell checker. It is the Open Office, Thunderbird, Firefox and Google Chrome spell checker. Ports to all major programming languages are available. It works with the Open Office Directories, so a lot of languages are supported.
I've used Hunspell for a few things, and I don't really have any horror stories with it. I've only used it with English (American) though, but it claims to work with other languages.
As for licensing, it offers a choice of GPL, LGPL, and MPL. If you don't like the MPL, you can always choose to use the LGPL.
There are several pupular options that widely used: myspell, aspell. Check them.
http://en.wikipedia.org/wiki/MySpell
http://en.wikipedia.org/wiki/GNU_Aspell
Here is a good demonstration by Peter Norvig: I find this simple explanation much more intuitive. Follow the links in the doc as well for more indepth analysis.
http://norvig.com/spell-correct.html

Tracking changes to a (functional) design document

I am looking for a good way to keep a design document up to date with the latest decisions.
We are a small team (two developers, game designer, graphic designer, project manager, sales guy). Most of our projects last a couple of months. At the start of the project a design is made but we generally find ourselves making changes or new decisions throughout the project. Most of these changes are improvements, so we want to keep our process like that. (If the changed design results in more time needed this is generally taken care of, so that part is OK)
However, at the moment we have no nice way of capturing the changes to the initial design document and this results in the initial design quickly being abandoned as a source while coding. This is of course a waste of effort.
Currently our documents are OpenOffice/Word, and the best way to track changes in those documents will probably be adding a changelist to the top of the document and making the changes in the text in parallel — not really an option I'd think as ideal.
I've looked at requirements management software, but that looks way to specialized. The documents could be stored in subversion but I think that is a bit too low level to give insight in the changes.
Does anyone know a good way to track changes like these and keep the design document a valuable resource throughout the project?
EDIT: At the moment we mostly rely on changes to the original design being put in the bugtracker, that way they are at least somewhere.
EDIT: Related question
Is version control (ie. Subversion) applicable in document tracking?
I've found a wiki with revision logging works well as a step-up from Word documents, provided the number of users is relatively small. Finding one that makes it easy to make quick edits is helpful in ensuring it's kept up to date.
Both openoffice and word include capaiblities for showing/hiding edits to your document. Assuming there's resistance to changing, then that's your best option - either that or export to text and put it into any source control software.\
Alternatively, maintain a separate (diffable using the appropriate tool) document for change-description text, and save archive versions at appropriate points in time.
This problem has been a long standing issue in our programming shop too. The funny thing is that programmers tend to look at this from the wrong optimization angle: "keep everything in one place". In my opinion, you have two main issues:
The changes' descriptions must be easy to read ("So what's new?")
The process should be optimized for writing of the specification to agree upon, and then get to work already!
Imagine how this problem is solved in another environment: government law making. The lawbook is not rewritten with "track changes" turned on every time the government adds another law, or changes one...
The best way is to never touch a released document. Don't stuff everything into the same file, you'll get the:
dreaded version history table
eternal status "draft",
scattered inconsistencies,
horribly rushed sentences, and
foul smelling blend of authors' styles
Instead, release an addendum, describing only the changes in detail, and possibly replacing full paragraphs/pages of the original.
With the size of our project, this can never work, can it?
In my biggest project so far, I released one base spec, and 5 consecutive addenda. Each of around 5 pages. Worked like a charm!
I don't know any good, free configuration management tools, but why not place your design under source control? Just add it to SVN, CVS, or whatever you are using. This is good because:
1) It is always up to date (if you check it in, of course)
2) It is centralized
3) You can keep track of changes by using the built-in compare feature, available in almost any source control system
It may not be the 'enterprisish' solution you'd want, but you are a small team of developers anyway, so for that situation, it is more than perfect.
EDIT: I see now that you already mentioned a source control system, my mistake. Still, I think it should work well.
Use Google Docs. Its free, web based, muti-user in real time, you can choose who has access to your documents, and keeps versioning. You can also upload all your word documents and it will transform them for you.
For more information: http://www.google.com/google-d-s/intl/en/tour2.html

Resources