Best Structure text to Translate on Crowdin - text

i'm working on a Location Fan Based of a videogame and for now i have a TXT with this structure :
<ID1>\t<ID2>\t<ID3>\t<ID4>\t"<stringToTranslate>"
<ID1>\t<ID2>\t<ID3>\t<ID4>\t"<stringToTranslate>"
<ID1>\t<ID2>\t<ID3>\t<ID4>\t"<stringToTranslate>"
<ID1>\t<ID2>\t<ID3>\t<ID4>\t"<stringToTranslate>"
I need to create a formatted file to translate and use the platform Crowdin ...
But I don't know what kind of structure to create, if to make a json, an ini, an xml, because then I have to create a script to convert my txt into this new type.
Thanks a lot for your help.

Any key-value file type would do the trick I believe - JSON format is a good choice (looks like you have multiple string IDs, so nested JSON will be perfect in terms of the structure)

Related

Parsing a XML file and handling missing tags

I'm trying to parse a XML file in order to retrieve the data in a list.
I need to extract the TITRE_N, the AUTEURS_N and the RESUME_N. I know how to do it but my problem is that for some reference, I don't have any data for AUTEURS_N. there is no tag and the result as you can think it that all the data after are shift! Do you know how I can parse this doc and handle the fact that sometimes I'm missing one tag that I usefully use?
thx a lot!

Using a list for a feature in an ML model

I want to run a machine learning algorithm on some data, so I'm exporting the data into a file first.
But one of my features for the text I'm classifying is a list of tags,
and each text can have multiple tags ex. (["mystery", "thriller"]).
Is it recommended that when I write to my CSV file for exporting the data, that I write that entire list as one of the features for my data (the "tags" feature).
Or is it better to make a separate feature for each tag. The only problem then is that most examples will only have one tag, so the other feature columns for those will be blank.
So it seems like writing this list of tags as one feature makes the most sense, but then when parsing it for training, would I then treat every element of that list as its own feature still or no?
If you do it as a single feature just make sure to use some delimiter to separate the tags that won't occur in any of the tags, and also isn't a comma (as that will mess with the csv format), something like | would probably do fine. When you go to build your models and read in that list of tags you can then split it based on that delimiter. In Java this would look like:
String[] tagList = inputString.split("|");
I'm sure most languages will have a similar method to do this.

Indexing SVG files with SOLR

I am new to Solr, but I suppose that there is an easy way to index SVG files with Solr. I have installed Solr 6.3.0 and I am using an example 'files' core. It works well, but it seems that it parses the SVG files as plain text.
Is there an easy way to take only the texts between the text tags?
Ideally, I want to combine some meta data from a JSON file with the text from the SVG files. The JSON file looks like:
{
"id":"000001",
"title":"Some diagram",
...
} ...
The associated svg file is 000001.svg.Is there a way to create a scheme in Solr, that can take the fields from the json and merge a field with the text from the SVG file?
The most flexible way that will do what you want is to write a custom indexing utility that parses your JSON, picks up the SVG and extracts the relevant elements, then submits the complete structure to Solr. Depending on your programming language of choice you'll do this with something like SolrJ, Solrnet or another client library.
This is way more flexible and maintainable than integrating it directly into Solr, but if you want to do custom SVG indexing (without the additional JSON), you could use the XSLT support in the regular update handler, or using an XPathEntityProcessor in a DataImportHandler configuration.
My choice would be the custom indexing code.

Application.LoadfromText...load from string instead?

I was wondering if it is possible to use the code saved in the .txt file using the application.savetotext and save the code in a table, then use the application.loadfromtext to to build the object from a string rather that a .txt file
Does that make any sense? Basically I'm wanting to store all the object codes in a table on separate rows and allow users to select the relevant row and build the object without having to import the .txt file
Yes and no. You would have to write the field content to a (temp) file, then use LoadFromText to read in the object.
But it doesn't make much sense, and I think you are on a wrong track. You could just as well have the objects ready-made in application.

Extract parameters from Exiftool to haskell

I'm writing a program in Haskell that needs the metadata from media files, such as runtime, artist, size, name, copyright, height....
Basically I need to get this information and create some pdf's with it, but I can't find a way to get the values like "60s", "AC/DC", "5000", "Thunderstruck", "copyright"...
Any ideas hot to parse info that exiftool gives? Which parameters in exiftool are better to use? Should I use Text.Regex?
Since exiftool can produce XML or JSON output, you can pick one format and parse the output accordingly. Haskell has Text.XML.Light (and bunch of others) for parsing XML and aeson for JSON.
As for what tags available in EXIF, take a look at this convenient list.

Resources