How to append to existing apache arrow array

How to append to existing apache arrow array - rust

I can create an arrow array with a builder:
extern crate arrow;
use arrow::array::Int16Array;
// Create a new builder with a capacity of 100
let mut builder = Int16Array::builder(100);
// Append a slice of primitive values
builder.append_slice(&[2, 3, 4]).unwrap();
// Build the array
let finished_array = builder.finish();
But once I have finished building the array (thus called .finish), is there any option to create a new builder with the data offinished_array without copying the data into a new builder?
What I basically want is a cheap append operation.

After reading some more, I found out arrow arrays are alway immutable. An append operation to an array is not possible. If you want a zero copy append like behavior, you can write/ use a chunked array (this is not yet available in rust, but for instance is supported in pyarrow

Related

Delete element in the middle of mapping

I need to delete an item in the middle of the Everscale solidity mapping containing struct:
struct Example {
string data;
uint64 validFrom;
uint64 valiUntil;
}
mapping(uint64 => Example) example;
example[1668161798] = Example("Start", 1668161798, 1668162798);
...
example[1668163798] = Example("Middle", 1668163798, 1668164798); // <-- Need to delete this one
...
example[1668165798] = Example("End", 1668165798, 1668166798);
Question 1
What is the best way to do this in terms of:
Gas consumption?
Storage?
Is it using the delete instruction work from the Ethereum example, or is it better to rebuild and reassign the mapping?
delete example[1668163798];
Question 2
What happens to the data contained in the mapping's item after using delete? Is there any garbage collector that wipes them out to minimize the storage?
What will happen if I reassign new data on the same index after deletion?

delete example[1668163798];
is the right way to do it. "delete" assigns the default value of the type for the variable it is applied to. For the mapping key, it removes the pair from the dictionary, thus freeing the storage space.
assigning a new value to the previously deleted key is no different from adding any other (key, value) pair to the dictionary; it works just fine.

Transforming large array of objects to csv using json2csv

I need to transform a large array of JSON (that can have over 100k positions) into a CSV.
This array is created directly in the application, it's not the result of an uploaded file.
Looking at the documentation, I've thought on using parser but it says that:
For that reason is rarely a good reason to use it until your data is very small or your application doesn't do anything else.
Because the data is not small and my app will do other things than creating the csv, I don't think it'll be the best approach but I may be misunderstanding the documentation.
Is it possible to use the others options (async parser or transform) with an already created data (and not a stream of data)?
FYI: It's a nest application but I'm using this node.js lib.
Update: I've tryied to insert with an array with over 300k positions, and it went smoothly.

Why do you need any external modules?
Converting JSON into a javascript array of javascript objects is a piece of cake with the native JSON.parse() function.
let jsontxt=await fs.readFile('mythings.json','uft8');
let mythings = JSON.parse(jsontxt);
if (!Array.isArray(mythings)) throw "Oooops, stranger things happen!"
And, then, converting a javascript array into a CSV is very straightforward.
The most obvious and absurd case is just mapping every element of the array into a string that is the JSON representation of the object element. You end up with a useless CSV with a single column containing every element of your original array. And then joining the resulting strings array into a single string, separated by newlines \n. It's good for nothing but, heck, it's a CSV!
let csvtxt = mythings.map(JSON.stringify).join("\n");
await fs.writeFile("mythings.csv",csvtxt,"utf8");
Now, you can feel that you are almost there. Replace the useless mapping function into your own
let csvtxt = mythings.map(mapElementToColumns).join("\n");
and choose a good mapping between the fields of the objects of your array, and the columns of your csv.
function mapElementToColumns(element) {
return `${JSON.stringify(element.id)},${JSON.stringify(element.name)},${JSON.stringify(element.value)}`;
}
or, in a more thorough way
function mapElementToColumns(fieldNames) {
return function (element) {
let fields = fieldnames.map(n => element[n] ? JSON.stringify(element[n]) : '""');
return fields.join(',');
}
}
that you may invoke in your map
mythings.map(mapElementToColumns(["id","name","element"])).join("\n");
Finally, you might decide to use an automated for "all fields in all objects" approach; which requires that all the objects in the original array maintain a similar fields schema.
You extract all the fields of the first object of the array, and use them as the header row of the csv and as the template for extracting the rest of the elements.
let fieldnames = Object.keys(mythings[0]);
and then use this field names array as parameter of your map function
let csvtxt= mythings.map(mapElementToColumns(fieldnames)).join("\n");
and, also, prepending them as the CSV header
csvtxt.unshift(fieldnames.join(','))
Putting all the pieces together...
function mapElementToColumns(fieldNames) {
return function (element) {
let fields = fieldnames.map(n => element[n] ? JSON.stringify(element[n]) : '""');
return fields.join(',');
}
}
let jsontxt=await fs.readFile('mythings.json','uft8');
let mythings = JSON.parse(jsontxt);
if (!Array.isArray(mythings)) throw "Oooops, stranger things happen!";
let fieldnames = Object.keys(mythings[0]);
let csvtxt= mythings.map(mapElementToColumns(fieldnames)).join("\n");
csvtxt.unshift(fieldnames.join(','));
await fs.writeFile("mythings.csv",csvtxt,"utf8");
And that's it. Pretty neat, uh?

Best approach to saving big 2d vec to vec of structs

i'm not sure if that's the right place to ask this kind of questions, but I feel like the way i'm doing things now is 'dumb way' and there's room for improvement in my code.
I'm trying to build stock data website as my side project, and im using rust for backend. One microservice i'm writing is responsible for scraping data from web and then saving it in database. The result of web scraping is 2d vector where each row is responsible for one attribute of struct i'll later construct. Then I save rows to variables.
Then i use izip! macro from itertools to make iterate over all those attributes and create struct.
izip!(
publication_dates,
quarter_dates,
income_revenue,
...
)
.for_each(
|(
publication_date,
quarter_date,
income_revenue,
...
)| {
Financials {
ticker: self.ticker.to_owned(),
publication_date,
quarter_date,
...
},
})
My issue is the fact, that one data table can have more than 40 attributes, to saving data from just one page can be over 250 lines of code so i'd have total of 2000 lines just to store webscraped data, most of it repetitive (parsing rows to correct data types). I'm pretty sure that's not correct approach since any changes i'd like to make would have to be done in many places.
One of my ideas to make it better was to create enum with desired types, then create vector of those enums like vec!([dataType::quarter_date, dataType::int32, dataType::int32 ...]) and iteratoe over both rows and new vector, and use match statement to use according function for data processing. That would get shorten rows allocation part a bit, but probably not by much.
Do you have any advice? Any hint would be great help, i just need a direction that i can later explore by myself :-)

If you want to only reduce the code duplication, I would recommend using a macro for that. A simple example is this (playground):
macro_rules! create_financials {
($rows:ident, $($fun:ident > $column:ident),+) => {{
$(
let $column = $rows
.next()
.ok_or("None")?
.into_iter()
.flat_map($fun);
)+
itertools::izip!($($column,)+).map(
|($($column,)+)| {
Financials {
$($column,)+
}
}
).collect::<Vec<_>>()
}}
}
Note that I removed the .collect::<Vec<_>>() part, it is not needed and allocates additional memory.
I also replaced the for_each with map to return a Vec from the macro which could be used outside of the macro.
The macro can be used simply like this:
let financials: Vec<Financials> = create_financials!(
rows,
quarter_string_date_to_naive_date > quarter_date,
publish_date_string_to_naive_date > publication_date,
income_revenue > income_revenue
);
To remove the code duplication of parsing to the different data types, look if the data types implement FromStr, From or TryFrom. Else you could define your own trait which does the conversion and which you can implement for each data type.

Add new item to Realtime Database array

How I could append an element to an array like that ?.

Adding an item to an array structure like that, requires three steps:
Read the existing data.
Determine the key of the new item.
Write the new item.
In code that'd be something like:
const ref = admin.database().ref("javascript");
ref.once("value").then((snapshot) => {
let numChildren = parseInt(snapshot.numChildren());
ref.child(""+(numChildren+1)).set(4);
});
Note that this type of data structure is fairly non-idiomatic when it comes to Firebase, and I recommend reading:
Best Practices: Arrays in Firebase.

Creating Node.js enum in code to match list of values in database

I have a list of valid values that I am storing in a data store. This list is about 20 items long now and will likely grow to around 100, maybe more.
I feel there are a variety of reasons it makes sense to store this in a data store rather than just storing in code. I want to be able to maintain the list and its metadata and make it accessible to other services, so it seems like a micro-service data store.
But in code, we want to make sure only values from the list are passed, and they can typically be hardcoded. So we would like to create an enum that can be used in code to ensure that valid values are passed.
I have created a simple node.js that can generate a JS file with the enum right from the data store. This could be regenerated anytime the file changes or maybe on a schedule. But sharing the enum file with any node.js applications that use it would not be trivial.
Has anyone done anything like this? Any reason why this would be a bad approach? Any feedback is welcome.

Piggy-backing off of this answer, which describes a way of creating an "enum" in JavaScript: you can grab the list of constants from your server (via an HTTP call) and then generate the enum in code, without the need for creating and loading a JavaScript source file.
Given that you have loaded your enumConstants from the back-end (here I hard-coded them):
const enumConstants = [
'FIRST',
'SECOND',
'THIRD'
];
const temp = {};
for (const constant of enumConstants) {
temp[constant] = constant;
}
const PlaceEnum = Object.freeze(temp);
console.log(PlaceEnum.FIRST);
// Or, in one line
const PlaceEnum2 = Object.freeze(enumConstants.reduce((o, c) => { o[c] = c; return o; }, {}));
console.log(PlaceEnum2.FIRST);
It is not ideal for code analysis or when using a smart editor, because the object is not explicitly defined and the editor will complain, but it will work.

Another approach is just to use an array and look for its members.
const members = ['first', 'second', 'third'...]
// then test for the members
members.indexOf('first') // 0
members.indexOf('third') // 2
members.indexOf('zero') // -1
members.indexOf('your_variable_to_test') // does it exist in the "enum"?
Any value that is >=0 will be a member of the list. -1 will not be a member. This doesn't "lock" the object like freeze (above) but I find it suffices for most of my similar scenarios.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How to append to existing apache arrow array - rust

After reading some more, I found out arrow arrays are alway immutable. An append operation to an array is not possible. If you want a zero copy append like behavior, you can write/ use a chunked array (this is not yet available in rust, but for instance is supported in pyarrow

Related

Delete element in the middle of mapping

Transforming large array of objects to csv using json2csv

Best approach to saving big 2d vec to vec of structs

Add new item to Realtime Database array

Creating Node.js enum in code to match list of values in database

Categories

Resources