Calculate Translation Repetitions - string

I have looked around the web for the standard formula for calculating repetitions in a document to be translated. I have not found it. For those who don't know what repetitions in translation means, this gives a good description of it.
I first tried something like this
using System;
using System.Collection.Generic;
using System.Text.RegularExpressions;
using System.Linq;
<snip>
Dictionary<string, int> _dict = new Dictionary<string, int>();
int CalculateRepetitions(string plainTextDoc) {
foreach (string item in Regex.Split(plainTextDoc, "\\P{L}+"))
if (_dict.ContainsKey(item))
_dict[item]++;
else
_dict.Add(item, 0);
return _dict.Where((key, value) => value > 0).Count();
}
but that was not close to the sample number from Trados for the same document, and was the wrong definition of repetitions anyway. Does anyone have a good example for calculating translation repetitions? I'm not expecting only C# answers, I'm good with java and c++ answers as well.

The GMX/V standard might be your answer and there seems to be a C# implementation.

Related

Using a lookup table using google guava

I am new to Java. I have a requirement of holding a lookup table in memory(Abbreviations and their expansions). I was thinking of using Java Hash map. But I want to know if that really is the best approach.
Also, If there are any equivalent libraries in Google Guava, for the same requirement.
I want it to me optimized and very efficient w.r.t time and memory
Using Maps
Maps are indeed fine for this, as used below.
Apparently, it's a bit early for you to care that much about performance or memory consumption though, and we can't really help you if we don't have more context on the actual use case.
In Pure Java
final Map<String, String> lookup = new HashMap<>();
lookup.put("IANAL", "I Ain't A Lawyer");
lookup.put("IMHO", "In My Humble Opinion");
Note that there are several implementations of the Map interface, or that you can write your own.
Using Google Guava
If you want an immutable map:
final Map<String, String> lookup = ImmutableMap.<String, String>builder()
.put("IANAL", "I Ain't A Lawyer")
.put("IMHO", "In My Humble Opinion")
.build();
Retrieving Data
Then to use it to lookup an abbreviation:
// retrieval:
if (lookup.containsKey("IMHO")) {
final String value = lookup.get("IMHO");
/* do stuff */
}
Using Enums
I was speaking of alternatives...
If you know at coding time what the key/value pairs will be, you may very well be better off using a Java enum:
class Abbrevations {
IANAL ("I Ain't A Lawyer")
IMHO ("In My Humble Opinion");
private final String value;
private Abbreviations(final String value) {
this.value = value;
}
public String getValue() {
return (value);
}
}
You can then lookup values directly, ie either by doing this:
Abbreviations.IMHO.getValue()
Or by using:
Abbreviations.valueOf("IMHO).getValue()
Considering where you seem to be in your learning process, I'd recommend you follow the links and read through the Java tutorial and implement the examples.

Configuring Solr for Suggestive/Predictive Auto Complete Search

We are working on integrating Solr 3.6 to an eCommerce site. We have indexed data & search is performing really good.
We have some difficulties figuring how to use Predictive Search / Auto Complete Search Suggestion. Also interested to learn the best practices for implementing this feature.
Our goal is to offer predictive search similar to http://www.amazon.com/, but don't know how to implement it with Solr. More specifically I want to understand how to build those terms from Solr, or is it managed by something else external to solr? How the dictionary should be built for offering these kind of suggestions? Moreover, for some field, search should offer to search in category. Try typing "xper" into Amazon search box, and you will note that apart from xperia, xperia s, xperia p, it also list xperia s in Cell phones & accessories, which is a category.
Using a custom dictionary this would be difficult to manage. Or may be we don't know how to do it correctly. Looking to you to guide us on how best utilize solr to achieve this kind of suggestive search.
I would suggest you a couple of blogpost:
This one which shows you a really nice complete solution which works well but requires some additional work to be made, and uses a specific lucene index (solr core) for that specific purpose
I used the Highlight approach because the facet.prefix one is too heavy for big index, and the other ones had few or unclear documentation (i'm a stupid programmer)
So let's suppose the user has just typed "aaa bbb ccc"
Our autocomplete function (java/javascript) will call solr using the following params
q="aaa bbb"~100 ...base query, all the typed words except the last
fq=ccc* ...suggest word filter using last typed word
hl=true
hl.q=ccc* ...highlight word will be the one to suggest
fl=NONE ...return empty docs in result tag
hl.pre=### ...escape chars to locate highlight word in the response
hl.post=### ...see above
you can also control the number of suggestion with 'rows' and 'hl.fragsize' parameters
the highlight words in each document will be the right candidates for the suggestion with "aaa bbb" string
more suggestion words are the ones before/after the highlight words and, of course, you can implement more filters to extract valid words, avoid duplicates, limit suggestions
if interested i can send you some examples...
EDITED: Some further details about the approach
The portion of example i give supposes the 'autocomplete' mechanism given by jquery: we invoke a jsp (or a servlet) inside a web application passing as request param 'q' the words just typed by user.
This is the code of the jsp
ByteArrayInputStream is=null; // Used to manage Solr response
try{
StringBuffer queryUrl=new StringBuffer('putHereTheUrlOfSolrServer');
queryUrl.append("/select?wt=xml");
String typedWords=request.getParameter("q");
String base="";
if(typedWords.indexOf(" ")<=0) {
// No space typed by user: the 'easy case'
queryUrl.append("&q=text:");
queryUrl.append(URLEncoder.encode(typedWords+"*", "UTF-8"));
queryUrl.append("&hl.q=text:"+URLEncoder.encode(typedWords+"*", "UTF-8"));
} else {
// Space chars present
// we split the search in base phrase and last typed word
base=typedWords.substring(0,typedWords.lastIndexOf(" "));
queryUrl.append("&q=text:");
if(base.indexOf(" ")>0)
queryUrl.append("\""+URLEncoder.encode(base, "UTF-8")+"\"~1000");
else
queryUrl.append(URLEncoder.encode(base, "UTF-8"));
typedWords=typedWords.substring(typedWords.lastIndexOf(" ")+1);
queryUrl.append("&fq=text:"+URLEncoder.encode(typedWords+"*", "UTF-8"));
queryUrl.append("&hl.q=text:"+URLEncoder.encode(typedWords+"*", "UTF-8"));
}
// The additional parameters to control the solr response
queryUrl.append("&rows="+suggestPageSize); // Number of results returned, a parameter to control the number of suggestions
queryUrl.append("&fl=A_FIELD_NAME_THAT_DOES_NOT_EXIST"); // Interested only in highlights section, Solr return a 'light' answer
queryUrl.append("&start=0"); // Use only first page of results
queryUrl.append("&hl=true"); // Enable highlights feature
queryUrl.append("&hl.simple.pre=***"); // Use *** as 'highlight border'
queryUrl.append("&hl.simple.post=***"); // Use *** as 'highlight border'
queryUrl.append("&hl.fragsize="+suggestFragSize); // Another parameter to control the number of suggestions
queryUrl.append("&hl.fl=content,title"); // Look for result only in some fields
queryUrl.append("&facet=false"); // Disable facets
/* Omitted section: use a new URL(queryUrl.toString()) to get the solr response inside a byte array */
is=new ByteArrayInputStream(solrResponseByteArray);
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
Document doc = dBuilder.parse(is);
XPathFactory xPathfactory = XPathFactory.newInstance();
XPath xpath = xPathfactory.newXPath();
XPathExpression expr = xpath.compile("//response/lst[#name=\"highlighting\"]/lst/arr[#name=\"content\"]/str");
NodeList valueList = (NodeList) expr.evaluate(doc, XPathConstants.NODESET);
Vector<String> suggestions=new Vector<String>();
for (int j = 0; j < valueList.getLength(); ++j) {
Element value = (Element) valueList.item(j);
String[] result=value.getTextContent().split("\\*\\*\\*");
for(int k=0;k<result.length;k++){
String suggestedWord=result[k].toLowerCase();
if((k%2)!=0){
//Highlighted words management
if(suggestedWord.length()>=suggestedWord.length() && !suggestions.contains(suggestedWord))
suggestions.add(suggestedWord);
}else{
/* Words before/after highlighted words
we can put these words inside another vector
and use them if not enough suggestions */
}
}
}
/* Finally we build a Json Answer to be managed by our jquery function */
out.print(request.getParameter("json.wrf")+"({ \"suggestions\" : [");
boolean firstSugg=true;
for(String suggestionW:suggestions) {
out.print((firstSugg?" ":" ,"));
out.print("{ \"suggest\" : \"");
if(base.length()>0) {
out.print(base);
out.print(" ");
}
out.print(suggestionW+"\" }");
firstSugg=false;
}
out.print(" ]})");
}catch (Exception x) {
System.err.println("Exception during main process: " + x);
x.printStackTrace();
}finally{
//Gracefully close streams//
try{is.close();}catch(Exception x){;}
}
Hope to be helpfull,
Nik
This might help you out.I am trying to do the same.
http://solr.pl/en/2010/10/18/solr-and-autocomplete-part-1/

Resharper Code Pattern for IDisposable not inside using

I'd like to know how to build a Resharper (6.1) code pattern to search and replace the following issues:
var cmd = new SqlCommand();
cmd.ExecuteNonQuery();
and turn it into this:
using (var cmd = new SqlCommand())
{
cmd.ExecuteNotQuery();
}
and:
StreamReader reader = new StreamReader("myfile.txt");
string line = reader.Read();
Console.WriteLine(line);
becomes:
using (StreamReader reader = new StreamReader("file.txt"))
{
string line = reader.ReadLine();
Console.WriteLine(line);
}
EDIT: Thanks for the answers, but I'm looking for anything that implements IDisposable
Search pattern:
var $cmd$ = $sqlcommand$;
$cmd$.ExecuteNonQuery();
Replace pattern:
using (var $cmd$ = $sqlcommand$)
{
$cmd$.ExecuteNonQuery();
}
where cmd = identifier
and sqlcommand = expression of type System.Data.SqlClient.SqlCommand
It looks like what you're really after is an inspection mechanism that goes off looking for IDisposable objects and ensures they are disposed. If that's the case, I doubt custom patterns would be the right approach - after all, what if you do call Dispose() a few lines later?
One way to implement this is by using the ReSharper SDK. In fact, one of the examples the SDK comes with is a PowerToy which implements IDisposable on a particular class, so you could take that code as a foundation for possible analysis of usage.
Use the Search with Pattern tool under the ReSharper | Find menu.
In the Search pattern make sure you have C# selected and enter the code you're searching for in the box. Click the Replace button in the top-right, and enter the code you want to replace it with in the Replace pattern box.
You can save the search and replace pattern and R# will use it for subsequent code analysis should you so desire. You can also add additional patterns in R# Options under Code Inspection | Custom Patterns.

ArrayList changing after sorting function

I have just started utilizing ArrayLists in some C# code and am having some problems when sorting.
First I define create an ArrayList object under my class:
ArrayList cutList = new ArrayList;
Then I set and sort the array list to find the minimum:
cutList.Add("2200","1800","1200","1");
int minList = (int)GetMinValue(cutList);
Using the function:
public static object GetMinValue(ArrayList arrList)
{
ArrayList sortArrayList = arrList;
sortArrayList.Sort();
return sortArrayList[0];
}
Later I try to find the index cutList[2] and I find "1200" because the function also sorted cutList. I have also had the same problem in the past, when I set a variable to an Application settings and then the Applications setting changes when I modify the variable. How to I correctly fix these problems. I have been learning C# on my own and am guilty of skipping around a little bit. Is there a lesson on Objects that I am missing?
The issue in your code is that ArrayList sortArrayList = arrList; does not copy arrList to sortArrayList: the assignment merely creates a new alias for the existing object. To make your code work, use
ArrayList sortArrayList = (ArrayList)arrList.Clone();
I must add that this is probably the most inefficient way of looking up the min element in a list, and also a rather archaic container. I would prefer using List<string> instead of ArrayList, and using LINQ's Min() function to get the minimum element.

CATextlayer with AttributedString in MonoTouch

I am trying to create a "label" with different styles on different words, kind of like described here.
The problem is - as far as I can see - the MonoTouch implementation of UATextLayer does not accept assigning an NSAttributedString to the String property since the String property has the type string.
Is this an error in the implementation or is there another way of doing this?
(Yes, I am aware I can add separate labels - but I would rather not when there is a better solution).
EDIT (in response to the answer from Miguel):
After changing to GetHandler and correcting to "void_objc_msgSend_IntPtr" instead of "void_objc_msgSend_IntPrt" the code in the answer compiles and runs, but it doesn't quite work anyway (I was a bit fast in marking it as the answer).
No errors are thrown, but the text doesn't show.
Code:
string _text="Example string";
if(_textLayer==null) {
_textLayer = new CATextLayer();
_textLayer.Frame = new RectangleF(50,698,774,50);
_textLayer.Wrapped=true;
_textLayer.ForegroundColor=UIColor.White.CGColor;
_textLayer.BackgroundColor=UIColor.Clear.CGColor;
Layer.AddSublayer(_textLayer);
}
//_textLayer.String=_text;
CTFont _font=new CTFont("MarkerFelt-Thin",48);
CTStringAttributes _attrs=new CTStringAttributes();
_attrs.Font=_font;
_attrs.ForegroundColor = UIColor.White.CGColor;
var nsa = new NSAttributedString(_text);
Messaging.void_objc_msgSend_IntPtr(
_textLayer.Handle,
Selector.GetHandle("string"),
nsa.Handle);
If I uncomment the _textLayer.String=_text I see the text (but without attributes of course), so the problem is not with the layer.
For now, you can try:
using MonoTouch.ObjCRuntime;
var caTextLayer = new CATextLayer ();
var nsa = new NSAttributedString ();
[..]
Messaging.void_objc_msgSend_IntPrt (
caTextLayer.Handle,
Selector.sel_registerName ("string"),
nsa.Handle);
Alternatively, can you download this preview of the upcoming version:
http://tirania.org/tmp/monotouch.dll
It implements a property AttributedString in CATextLayer that you can set.

Resources