Reading/Editing XLIFF using C# - c#-4.0

I need to parse an XLIFF file using C#, but I'm having some trouble. These files are fairly complex, containing a huge amount of nodes.
Basically, all I need to do is read the source node from each trans-unit node, do some processing on it, and insert the processed text into the corresponding target node (which will always be present, but empty).
An example of one of the nodes I need to parse would be (the whole file may contain 100s of these):
<trans-unit id="0000000002" datatype="text" restype="string">
<source>Windows Update is not installed</source>
<target/>
<iws:segment-metadata tm_score="0.00" ws_word_count="6" max_segment_length="0">
<iws:status target_content="placeholders_only"/>
</iws:segment-metadata>
<iws:boundary-seg sequence="bs20721"/>
<iws:markup-seg sequence="0000000001">
</trans-unit>
The trans-unit nodes can be buried deep in the files, the header section contains a lot of data. I'd like to use LINQ to XML to read the data, but I'm not having any luck getting it to work. Here's my current code (just trying to read and output the source nodes from the file:
XDocument doc = XDocument.Load(path);
Console.WriteLine("Before loop");
foreach (var transUnitNode in doc.Descendants("trans-unit"))
{
Console.WriteLine("In loop");
XElement sourceNode = transUnitNode.Element("source");
XElement targetNode = transUnitNode.Element("target");
Console.WriteLine("Source: " + sourceNode.Value);
}
I never see 'In loop' and I don't know why, can someone tell me what I'm doing wrong here, or suggest a better way to achieve what I'm trying to do here?
Thanks.

Try
XNamespace df = doc.Root.Name.Namespace;
foreach (XElement transUnitNode in doc.Descendants(df + "trans-unit"))
{
XElement sourceNode = transUnitNode.Element(df + "source");
// and so one, use the df namespace object to qualify any elements names
}
See also http://msdn.microsoft.com/en-us/library/bb387093.aspx.

Related

Transforming large array of objects to csv using json2csv

I need to transform a large array of JSON (that can have over 100k positions) into a CSV.
This array is created directly in the application, it's not the result of an uploaded file.
Looking at the documentation, I've thought on using parser but it says that:
For that reason is rarely a good reason to use it until your data is very small or your application doesn't do anything else.
Because the data is not small and my app will do other things than creating the csv, I don't think it'll be the best approach but I may be misunderstanding the documentation.
Is it possible to use the others options (async parser or transform) with an already created data (and not a stream of data)?
FYI: It's a nest application but I'm using this node.js lib.
Update: I've tryied to insert with an array with over 300k positions, and it went smoothly.
Why do you need any external modules?
Converting JSON into a javascript array of javascript objects is a piece of cake with the native JSON.parse() function.
let jsontxt=await fs.readFile('mythings.json','uft8');
let mythings = JSON.parse(jsontxt);
if (!Array.isArray(mythings)) throw "Oooops, stranger things happen!"
And, then, converting a javascript array into a CSV is very straightforward.
The most obvious and absurd case is just mapping every element of the array into a string that is the JSON representation of the object element. You end up with a useless CSV with a single column containing every element of your original array. And then joining the resulting strings array into a single string, separated by newlines \n. It's good for nothing but, heck, it's a CSV!
let csvtxt = mythings.map(JSON.stringify).join("\n");
await fs.writeFile("mythings.csv",csvtxt,"utf8");
Now, you can feel that you are almost there. Replace the useless mapping function into your own
let csvtxt = mythings.map(mapElementToColumns).join("\n");
and choose a good mapping between the fields of the objects of your array, and the columns of your csv.
function mapElementToColumns(element) {
return `${JSON.stringify(element.id)},${JSON.stringify(element.name)},${JSON.stringify(element.value)}`;
}
or, in a more thorough way
function mapElementToColumns(fieldNames) {
return function (element) {
let fields = fieldnames.map(n => element[n] ? JSON.stringify(element[n]) : '""');
return fields.join(',');
}
}
that you may invoke in your map
mythings.map(mapElementToColumns(["id","name","element"])).join("\n");
Finally, you might decide to use an automated for "all fields in all objects" approach; which requires that all the objects in the original array maintain a similar fields schema.
You extract all the fields of the first object of the array, and use them as the header row of the csv and as the template for extracting the rest of the elements.
let fieldnames = Object.keys(mythings[0]);
and then use this field names array as parameter of your map function
let csvtxt= mythings.map(mapElementToColumns(fieldnames)).join("\n");
and, also, prepending them as the CSV header
csvtxt.unshift(fieldnames.join(','))
Putting all the pieces together...
function mapElementToColumns(fieldNames) {
return function (element) {
let fields = fieldnames.map(n => element[n] ? JSON.stringify(element[n]) : '""');
return fields.join(',');
}
}
let jsontxt=await fs.readFile('mythings.json','uft8');
let mythings = JSON.parse(jsontxt);
if (!Array.isArray(mythings)) throw "Oooops, stranger things happen!";
let fieldnames = Object.keys(mythings[0]);
let csvtxt= mythings.map(mapElementToColumns(fieldnames)).join("\n");
csvtxt.unshift(fieldnames.join(','));
await fs.writeFile("mythings.csv",csvtxt,"utf8");
And that's it. Pretty neat, uh?

How to add object to file in nodejs?

title pretty much explains it all. I'm trying to add objects into a nodejs file and cant seem to get it working.
Each file essentially looks like this
[{"name":name,"date":date},{"name":name,"date":date}] (in simplest terms)
I want to be able to add an object, to that array that is in that file. Here is the code I came up with
for(o in collections){
fs.readFile(__dirname + "/HowIsCollections/"+collections[o].mintDate,'utf8',function(err,data){
const dat = JSON.parse(data)
const existedData = []
//console.log(existedData)
for(i in dat){
existedData.push(JSON.stringify(dat[i]))
}
const project = JSON.stringify(collections[o])
if(!existedData.includes(project)){
console.log("?")
dat.push(project)
}
fs.writeFileSync(__dirname + "/HowIsCollections/"+collections[o].mintDate,JSON.stringify(dat))
console.log("????")
})
}
Its pretty self explanatory. From the top, its reading the file, getting the data, taking all of the objects found in the file and putting it into an array.
the second half of the code, stringifys each object, it then compares against the array to see if that object exists in the array (existedData, the data from the file). If it doesnt, it adds it. Then at the end im just resaving the file.
dat.push(project) is the array in the file.
I have similar setups like this in other parts of my code, which work. this however does not, i get no errors, nothing, it just doesnt work. All of my console.log's show, but thats it.
I tried looking on here mostly for solutions, but most of them were just talking about stringifying an object in fs.writefile, which isnt what i need here.

Why would you use the spread operator to spread a variable onto itself?

In the Google Getting started with Node.js tutorial they perform the following operation
data = {...data};
in the code for sending data to Firestore.
You can see it on their Github, line 63.
As far as I can tell this doesn't do anything.
Is there a good reason for doing this?
Is it potentially future proofing, so that if you added your own data you'd be less likely to do something like data = {data, moreData}?
#Manu's answer details what the line of code is doing, but not why it's there.
I don't know exactly why the Google code example uses this approach, but I would guess at the following reason (and would do the same myself in this situation):
Because objects in JavaScript are passed by reference, it becomes necessary to rebuild the 'data' object from it's constituent parts to avoid the original data object being further modified by the ref.set(data) call on line 64 of the example code:
await ref.set(data);
For example, in MongoDB, when you pass an object into a write or update method, Mongo will actually modify the object to add extra properties such as the datetime it was insert into a collection or it's ID within the collection. I don't know for sure if Firestore does the same, but if it doesn't now, it's possible that it may in future. If it does, and if your original code that calls the update method from Google's example code goes on to further manipulate the data object that it originally passed, that object would now have extra properties on it that may cause unexpected problems. Therefore, it's prudent to rebuild the data object from the original object's properties to avoid contamination of the original object elsewhere in code.
I hope that makes sense - the more I think about it, the more I'm convinced that this must be the reason and it's actually a great learning point.
I include the full original function from Google's code here in case others come across this in future, since the code is subject to change (copied from https://github.com/GoogleCloudPlatform/nodejs-getting-started/blob/master/bookshelf/books/firestore.js at the time of writing this answer):
// Creates a new book or updates an existing book with new data.
async function update(id, data) {
let ref;
if (id === null) {
ref = db.collection(collection).doc();
} else {
ref = db.collection(collection).doc(id);
}
data.id = ref.id;
data = {...data};
await ref.set(data);
return data;
}
It's making a shallow copy of data; let's say you have a third-party function that mutates the input:
const foo = input => {
input['changed'] = true;
}
And you need to call it, but don't want to get your object modified, so instead of:
data = {life: 42}
foo(data)
// > data
// { life: 42, changed: true }
You may use the Spread Syntax:
data = {life: 42}
foo({...data})
// > data
// { life: 42 }
Not sure if this is the particular case with Firestone but the thing is: spreading an object you get a shallow copy of that obj.
===
Related: Object copy using Spread operator actually shallow or deep?

Text Reader Classes in Hadoop

I have a directory OUTPUT where I have the output files from a Map Reduce job. The output files are Text files written with a TextOutputFormat.
Now I want to read the key value pairs from the output file. How can I do so using some existing classes in hadoop. One way I could do it was as follows
FileSystem fs = FileSystem.get(conf);
FileStatus[] files = fs.globStatus(new Path(OUTPUT + "/part-*"));
for(FileStatus file:files){
if(file.getLen() > 0){
FSDataInputStream in = fs.open(file.getPath());
BufferedReader bin = new BufferedReader(new InputStreamReader(
in));
String s = bin.readLine();
while(s!=null){
System.out.println(s);
s = bin.readLine();
}
in.close();
}
}
This approach would work but increases my task to a great deal as I now need to manually parse the key value pairs out of each individual line. I am looking for something more handy that directly lets me read key and value in some variables.
Are you forced to use TextOutputFormat as your output format in the previous job?
If not then consider using SequenceFileOutputFormat, then you can use a SequenceFile.Reader to read back the file in Key / Value pairs. You can also still 'view' the file using hadoop fs -text path/to/output/part-r-00000
EDIT: You can also use the KeyValueLineRecordReader class, you'll just need to pass in a FileSplit to teh constructor.

Dynamic data structures in C#

I have data in a database, and my code is accessing it using LINQ to Entities.
I am writing some software where I need to be able to create a dynamic script. Clients may write the scripts, but it is more likely that they will just modify them. The script will specify stuff like this,
Dataset data = GetDataset("table_name", "field = '1'");
if (data.Read())
{
string field = data["field"];
while (cway.Read())
{
// do some other stuff
}
}
So that script above is going to read data from the database table called 'table_name' in the database into a list of some kind based on the filter I have specified 'field='1''. It is going to be reading particular fields and performing normal comparisons and calculations.
The most important thing is that this has to be dynamic. I can specify any table in our database, any filter and I then must be able to access any field.
I am using a script engine that means the script I am writing has to be written in C#. Datasets are outdated and I would rather keep away from them.
Just to re-iterate I am not really wanting to keep with the above format, and I can define any method I want to behind the scenes for my C# script to call. The above could end up like this for instance,
var data = GetData("table_name", "field = '1'");
while (data.ReadNext())
{
var value = data.DynamicField;
}
Can I use reflection for instance, but perhaps that would be too slow? Any ideas?
If you want to read dynamically a DataReader context, it's a pretty easy step:
ArrayList al = new ArrayList();
SqlDataReader dataReader = myCommand.ExecuteReader();
if (dataReader.HasRows)
{
while (dataReader.Read())
{
string[] fields = new string[datareader.FieldCount];
for (int i =0; i < datareader.FieldCount; ++i)
{
fields[i] = dataReader[i].ToString() ;
}
al.Add(fields);
}
}
This will return an array list composed by a dynamic object based on the number of field the reader has.

Resources