Multi Language Approaches with ExpressionEngine - expressionengine

So it's been two years since I last built a multi language site and I'm starting a new one strait away.
The last site I built I used the biber multi language module which seems to have had a name change since then and is now called Multi Language Support. With the last build I used matrix and an a drop down to set the language setting for each row, thus ensuring that each page was all in one entry and it would be easy to add additional languages, just by adding an additional row to the matrix field.
My question is twofold. Are there other approaches to consider at this point, since it's been two years since I last built a multi language site.
Secondly with the last site all the SEO (meta description, meta keywords and title tags were in English only) The new site will need all of this in both English and French. I typically use the SEO Lite module to handle SEO, but I don't see a way to have multiple languages available with this. Is there a simple way to set this up or will I need to go with matrix fields as mentioned above?
Am open to any and all approaches that are available for me to evaluate.
** UPDATE **
Will not be using Structure so that does not need to be a factor.

I don't have a long-winded answer, but Publisher from Boldminded looks quite nice and full of features. And it's on sale.
Brian is a great dev with great support. Based on past experiences with support I highly recommend taking a look.
http://boldminded.com/add-ons/publisher

I have been using a few different approaches:
Transcribe is have been using recently. It works very well with template rewriting, language variables. It's pretty easy to set up, but is also sometimes a bit too powerfull. It's supposed to work well with structure, but i have not yet really investigated it. one of the benefits is that you are not required to have all pages in all languages.
Structure
With structure and Low Variables you can easily set up a tree like:
home
/about us
/products
/nl/home
/nl/over ons
/nl/producten
You can reuse your field groups and not all content has to exist in the different languages

I would say it would depend on the size of the website. For a small site building out two columns out in matrix would be perfect like you have done. I have done this with a Mobile that simply had a column for English and for Spanish for all content elements as well as meta tags too. Then I simply passed a variable based on what language the visitor initially selected.
But I would say going with Multi Site Manager is probably your best bet. It's true that you will need to purchase some additional addons if you're using them on your 'default' site, but it's very easy to manage. Plus won't you want your URL structures to be different? For SEO purposes you'd really want to have spanish/french or whatever language to have its own unique URL.
I typically will use MSM and the NSM Better Meta addon. I would say this offers the most flexibility no matter the size of the site.

my approach on a couple of sites, regardless of size, has been:
Low Variables for in-template language variables
strategic planning of template group and template names (since it's the one
thing that's a bit more difficult to handle translation on with
anything other than htaccess)
a "languages" channel for extensibility
each channel has a language field which uses a radio button drawn from the languages channel so the language is assigned to the entry (and then used in the entries loop as a limiter)
Playa to relate the entry "equivalents" together with one another (i.e. allow a one to one language switch WITHOUT all the languages being in the same entry) - a relationship field is technically the better fix, but Playa has searchability where relationship fields do not
SEO handled with custom fields in the channels, with conditional fall back to a default set in Low Variables
This approach is what I preferred (I have used Transcribe but I did find it a bit resource intensive in terms of queries where this approach is a bit lighter, especially with caching). It allows me to maintain consistent enforcement and validation of required fields, for example, and allows for a variety of other benefits - language-specific entry url titles (without having to fore the content editor to create them manually), asynchronous content, asynchronous content translation (which always seems to be an issue - english is available before french is, awaiting translation, for example), separate workflows potentially, etc.
You can see this approach in action at www.cps.ca

I've done a number of multi-language sites with EE, albeit never with the Bieber module. My preference is to use Republic Variables for creating a variable matrix for labels (then it's just a simple flexible tag on the page). There's a bit of set-up that needs to be done, but once you've done it a couple of times it's only 5 minutes work:
A basic overview of steps (I've begun documenting them on my very old EE site):
1) Use the .htaccess method to remove index.php from URLs (making them clean) and in EE, set the system to use the title as the article link
2) Create ANSI directories for each language and move copies of index.php and .htaccess where
the system path is corrected:
$system_path = '..system';
and in the language director, create a .htaccess to relaunch queries with the current language:
RewriteEngine on
RewriteCond $1 !^(index\.php) [NC]
RewriteRule ^(.*)$ /ru/index.php/$1 [L]
(check my site for more detailed directions)
3) Install the wonderful Republic Variables and set for the number of languages you need.
(_en for English, _ru for Russian, _es for Spanish, etc..) and make the same one default as the default language in your index.php. Under configuration, I prefer to set for a Language Postfix. Add a variable "teaser" for testing, and fill in all languages.
4) On the page, drop in a tag with this format: {variable_{language}}, e.g.
{teaser_{language}
and you should see the default language variable. Insert the language in the URL before the template/page (e.g., www.sitename.com/ru/directory/template) and the language will switch on the fly. I'll be documenting this in a follow-up post this weekend.

I think MSM could also be considered as an option for Multilingual sites, because of the following factors:
For content rich sites, MSM is really easier for content contributors, which are usually responsible for a signle language. The one language == one site equation is an easy one to understand
Multilingual websites tend to cater for various audiences. Those audiences needs usually evolve over the 3 or 4 years your site will last. MSM offers you the possibility to add sections or functionalities for a specific language down th line, which is a lot more difficult with a more entangled data structure
As a developer, an MSM setup offers you a lot of flexibility. Adding or deleting languages is as easy as copying, deleting an MSM site, SEO tend to be easier since you can translate your template / template groups / url_titles quickly. If your template structure is clean and if you use add-ons like low variables or even global variables to avoid having content in your templates, maintaining various sets of templates is really easy. It can also be a bonus if one day you need to deal with right to left languages or something like that.
Here is an example of a site I maintain

I also want to point out the Multi-Language Episodes on the EE Podcast:
http://ee-podcast.com/episodes/tag/multi-lingual
Ep 49 actually has Transcribe's Tom Jaeger chatting about multi-language process and gotchas.
Ep 54 has Nicolas Bottari chat about his processes

Related

Localisation practices, how to handle difference between language and region

I'm developing a website that will require different content for different regions.
Different regions also have a preferred language (defaulting to English), but may have multiple.
For example, Taiwan and Hong Kong pages have different content, despite having the same preferred language (Traditional Chinese). Each region may have targeted content, but a lot of the content would overlap with each other and other regions. Furthermore, Hong Kong would also want Hong Kong content to be able to display in English.
As someone new to localisation, do existing l10n libraries typically handle these kinds of cases and demarcate between region and language. Would you have to copy the language specific content multiple times for each region? Or would you just create one language file (or however the language strings are stored) for each language, and different regions just pull relevant content from the same language files depending on what that region needs?
I am likely going to be using a CMS (currently thinking Silverstripe), but I haven't decided yet as I am still figuring out the requirements (including localisation).
Thanks!
It's an interesting, but very broad question.
I would draw a distinction here between content (different pages, sections, menus..) and translations (different versions of the content).
In the sense that i18n libraries display translations within a common template - the locale (language+region) tends to be one and the same thing. Taking your example, you described three locales: zh-HK, en-HK and zh-TW. (Possibly en-TW too, although that wasn't clear).
The question is whether these locales are merely translations - Is it enough to simply create three versions of the same bits? If so, a common approach to the overlap factor is fallback. i.e. You might allow your zh-TW locale to fall back to the zh-HK in cases where they can be identical.
If that approach works, then check if your chosen CMS supports per-language fallback (as opposed to a single global fallback to English, as is common)
However, if the content differs wildly between regions (totally different pages, menus etc..) then I would say it's typical to run separate instances of your CMS. hk.example.com and tw.example.com both available in whatever language translations you see fit. This will probably prevent sharing of content when it overlaps. That's the case in every CMS I've worked with but perhaps someone else can tell you differently.

Smart search for acronyms in Salesforce

In Salesforce's Service Cloud one can enable the out of the box search function where the user enters a term and the system searches all parts of the database for a match. I would like to enable smart searching of acronyms so that if I spell an organizations name the search functionality will also search for associated acronyms in the database. For example, if I search type in American Automobile Association, I would also get results that contain both "American Automobile Association" and "AAA".
I imagine such a script would involve declaring that if the term being searched contains one or more spaces or periods, take the first letter of the first word and concatenate it with the letters that follow subsequent spaces or periods.
I have unsuccessfully tried to find scripts for this or articles on enabling this functionality in Salesforce. Any guidance would be appreciated.
Interesting question! I don't think there's a straightforward answer but as it's standard search functionality, not 100% programming related - you might want to cross-post it to salesforce.stackexchange.com
Let's start with searchable fields list: https://help.salesforce.com/articleView?id=search_fields_business_accounts.htm&type=0
In Setup there's standard functionality for Synonyms, quite easy to use. It's not a silver bullet though, applies only to certain objects like Knowledge Base (if you use it). Still - it claims to work on Cases too so if there's "AAA" in Case description it should still be good enough?
You could also check out the trick with marking a text field as indexed and/or external ID and adding there all your variations / acronyms: https://success.salesforce.com/ideaView?id=08730000000H6m2 This is more work, to prepare / sanitize your data upfront but it's not a bad idea.
Similar idea would be to use Tags although that could explode in size very quickly. It's ridiculous to create a tag for every single company.
You can do some really smart things in data deduplication rules. Too much to write it all here, check out the trailhead: https://trailhead.salesforce.com/en/modules/sales_admin_duplicate_management/units/sales_admin_duplicate_management_unit_2 No idea if it impacts search though.
If you suffer from bad address data there are State & Country picklists, no more mess with CA / California / SoCal... https://resources.docs.salesforce.com/204/latest/en-us/sfdc/pdf/state_country_picklists_impl_guide.pdf Might not help with Name problem...
Data.com cleanup might help. Paid service I think, no idea if it affects search too. But if enabling it can bring these common abbreviations into your org - might be better than reinventing the wheel.

Hackproofing the site?

I don't know how to make my site hackproof at all. I have inputs where people can enter information that get published on the site. What should I filter and how?
Should I not allow script tags? (issue is, how will they put YouTube embed code on the site?)
iFrame? (People can put inappropriate sites in iFrames...)
Please let me know some ways I can prevent issues.
First of all, run the user's input through a strict XML parser.
Reject any invalid markup.
You should use a whitelist of HTML tags and attributes (in the parsed XML).
Do not allow <script> tags, <iframe>s, or style attributes.
Run all URLs (href and src attributes) through a URI parser (eg, .Net's Uri class), and ensure that the protocol is http, https, or perhaps mailto. Again, reject any invalid URLs.
If you want to allow YouTube embedding, add your own <youtube> tag that takes a URL or video ID as a parameter (content or attribute), and transform it into a script on the server (after validating the parameter).
After you finish, make sure that you're blocking everything on this giant list.
There is no such thing as hacker proof. You want to do everything you can to decrease the possibility of being hacked. The most obvious weaknesses are going to be preventing against xss (cross site scripting) hacks and sql injection attacks. There are easy ways to avoid both, most notably using newer technologies that instinctively seek to ward against them (text outputs that are encoded by default, conversions of queries before execution), etc.
If you need to go beyond those levels, there are a number of both automated (mostly fuzzy numbers you can give your sales guys after they are all "good") services that will "test" your system down to hard-core analysts that will pick apart your system for various audits.
Other than the basics mentioned above (xss & sql injection), the level of security you should try and obtain will really depend on your market.
Didn't see this mentioned explicitly, but also use fuzzers ( http://en.wikipedia.org/wiki/Fuzz_testing ).
It basically shoves random crap (strings of varying characters and length) into your input fields; It's used in industry practice bc it finds lots of bugs (ie. overflows).
http://www.fuzzing.org/ has a list of great fuzzers for you to try.
You can check a penetration testing framework like ISAAF. It give you a check list and a methodology to test important security aspects of your application.

Any flexible CMS perfect for restaurant website’s back-end?

I’m building a website for a restaurant which consists of several static pages like ‘About us’ and editable menu.
I need a CMS flexible enough to be able to add items individually (by individually, I mean adding items doesn’t equal pasting a HTML list of n products into another static page).
Each item should contain its name, description, price and category. The list of added items should be displayed using templates the way I want them to.
Can you suggest any lightweight CMS which can provide similar conditions?
There are tons of options for simple page creation. Have you considered just using one of the many free website builders out there? Then you don't even have to worry about finding hosting, just make it happen quickly and easily with one of them. For instance, take a look at Weebly (review here) or Wix. Both allow for free pages and both are incredibly easy to use. Squarespace (review here) is another solid option (and one of my favorites) but charges a small fee (which I personally think is worth it).
Weebly allows for some slick drag and drop of page elements into place as does Wix. They are what I would classify as the easiest of the batch while Squarespace provides for an excellent user interface experience.
Other options if you'd prefer something hosted on your own would depend on your experience level. I am a huge fan of Processwire and ImpressPages has come along nicely and is great little CMS too.
These are exceptions to the typical Top Three that everyone tends to recommend I know but I like to spread the word about other projects instead of the usual ones.
Cheers!
Mike
Sounds like a job for Wordpress 3.0 plus Custom Post Types UI + Verve Meta Boxes plugins. Wordpress will handle the static pages, the other two plugins will allow you to make a Menu Item post type with custom fields.
It is not exactly lightweight, but you could do it with Drupal. You can define you own content type "product", use the CCK module to add your fields (price, ...) and use the Views module to display it how you want.
Drupal has a relatively steep learning curve, so it may be overkill for this project. It is definitely flexible enough for this, though.

Developing a crawler and scraper for a vertical search engine

I need to develop a vertical search engine as part of website. The data for the search engine comes from websites of specific category. I guess for this I need to have a crawler that crawls several (a few hundred) sites (in a specific business category) and extract content and urls of products and services. Other types of pages may be irrelevant. Most of the sites are tiny or small (a few hundred pages at the most). The products have 10 to 30 attributes.
Any ideas on how to write such a crawler and extractor. I have written a few crawlers and content extractors using usual ruby libraries, but not a full fledged search engine. I guess, crawler, from time to time, wakes up and downloads the pages from websites. Usual polite behavior like checking robots exclusion rules will be followed, of course. While the content extractor can update the database after it reads the pages. How do I synchronize crawler and extractor? How tightly should they be integrated?
Nutch builds on Lucene and already implements a crawler and several document parsers.
You can also hook it to Hadoop for scalability.
In the enterprise-search context that I am used to working in,
crawlers,
content extractors,
search engine indexes (and the loading of your content into these indexes),
being able to query that data effciently and with a wide range of search operators,
programmatic interfaces to all of these layers,
optionally, user-facing GUIs
are all seperate topics.
(For example, while extracting useful information from an HTML page VS PDF VS MS Word files are conceptually similar, the actual programming for these tasks are still very much works-in-progress for any general solution.)
You might want to look at the Lucene suite of open-source tools, understand how those fit together, and possibly decide that it would be beter to learn how to use those tools (or others, similar), than to reinvent the very big, complicate wheel.
I believe in books, so thanks to your query, I have discovered this book and have just ordered it. It looks like good take on one possible solution to the search-tool conumdrum.
http://www.amazon.com/Building-Search-Applications-Lucene-LingPipe/product-reviews/0615204252/ref=cm_cr_pr_hist_5?ie=UTF8&showViewpoints=0&filterBy=addFiveStar
Good luck and let us know what you find out and the approach you decide to take.

Resources