Reducing duplicates across arrays in Twig

Reducing duplicates across arrays in Twig - twig

Essential, due to crappy circumstances, I need to do this in native Twig if possible: (I know this shouldn't be done in a VIEW template language)
loop over object.key
object.key["key1"] = ["val1","val2-a"]
object.key["key2"] = ["val1","val2-b"]
object.key["key3"] = ["val1","val2-c"]
manipulate as needed, into a new array or object or whatever and get
object.key["key1"] = ["val2-a"]
object.key["key2"] = ["val2-b"]
object.key["key3"] = ["val2-c"]
As you can see, I need to reduce duplicates values across different keys.
I'm having hard time finding a way to do this with out adding a custom filter or changing some architecture in the back-end that, basically, a deadline doesn't have time for. Any thoughts?

Related

In Gatling, how can I generate a random number each time a call is executed? (not using feeder)

I need to find a way to generate a random number each time the REST call is executed.
I have the following GET call:
exec(http("Random execution")
.get("/randomApi")
.queryParam("id", getRandomId()))
}
Obviously it doesn't work as the random number is only generated once and I end up with the same
number whenever this call is executed. I cant use the feeder option as my feeder is already huge and is generated by a 3rd party for each test.

.queryParam takes Expressions as its arguments, and since Expression is an alias for a session function, you can just do...
.queryParam("id", session => getRandomId())
You could also define a second feeder that uses a function to generate the values - no need to update your existing feeder or add another csv file. This would be useful if you had more complicated logic for getting / generating an Id
val idFeeder = Iterator.continually(Map("id" -> Random.nextInt(999999)))
//in your scenario...
.feed(idFeeder)
.exec(http("Random execution")
.get("/randomApi")
.queryParam("id", "${id}")
)

In the spirit of having options, another option you have is to store an object in the session that support toString, which generates whatever you need. It's a nifty trick that you can use for all kinds of things.
object RANDOM_ID {
toString() { return RandomId().toString() }
}
...
exec( _.set( "RANDOM_ID", RANDOM_ID ) )
...
.exec(
http("Random execution")
.get("/randomApi")
.queryParam( "id", "${RANDOM_ID}" )
)
You can apply the same principle to generating random names, addresses, telephone numbers, you name it.
So, which is the better solution? The feeder, or the object in session?
Most of the time, it'll be the feeder, because you control when it is updated. The object in session will be different every time, whereas the feeder solution, you control when the value updates, and then you can reference it multiple times before you change it.
But there may be instances where the stored object solution results in easier to read code, provided you are good with the value changing every time it is accessed. So it's good to know that it is an option.

Exposing the current combo selection index for the CGridCellCombo class

For several years I have been using the CGridCellCombo class. It is designed to be used with the CGridCtrl.
Several years ago I did make a request in the comments section for an enhancement but I got no replies.
The basic concept of the CGridCellCombo is that it works with the text value of the cell. Thus, when you present the drop list it will have that value selected. Under normal circumstances this is fine.
But I have places where I am using the combo as a droplist. In some situations it is perfectly fine to continue to use the text value as the go-between.
But is some situations it would have been ideal to know the actual selected index of the combo. When I have a droplist and it is translated into 30 languages, and I need to know the index, I have no choice but to load the possible options for that translation and then examine the cell value and based on the value found in the array I know the index.
It works, but is not very elegant. I did spend a bit of time trying to keep track of the selected index by adding a variable to CInPlaceList and setting it but. I then added a wrapper method to the CGridCellCombo to return that value. But it didn't work.
I wondered if anyone here has a good understanding of the CGridCellCombo class and might be able to advise me in exposing the CComboCell::GetCurSel value.
I know that the CGridCtrl is very old but I am not away of another flexible grid control that is designed for MFC.

The value that is transfered back to the CGridCtrl is choosen in CInPlaceList::EndEdit. The internal message GVN_ENDLABELEDIT is used, and this message always use a text to set it into the grid.
The value is taken here via GetWindowText from the control. Feel free to overwrite this behaviour.
The handler CGridCtrl::OnEndInPlaceEdit again calls OnEndEditCell. All take a string send from GVN_ENDLABELEDIT.
When you want to make a difference between the internal value and the selected value you have to manage this via rewriting the Drawing and selecting. The value in the grid is the GetCurSel value and you have to show something different... There isn't much handling about this in the current code to change.
More information
The key is CInPlaceList::EndEdit(). There is a call to GetWindowText (CInPlaceList is derived from CComboBox), just get the index here. Also in CGridCellCombo::EndEdit you have access to the m_pEditWnd, that is the CInPlaceList object and derived from CComboBox, so you have access here too.

I have found this to be the simplest solution:
int CGridCellCombo::GetSelectedIndex()
{
int iSelectedIndex = CB_ERR;
CString strText = GetText();
for (int iOption = 0; iOption < m_Strings.GetSize(); iOption++)
{
if (strText.CollateNoCase(m_Strings[iOption]) == 0) // Match
{
iSelectedIndex = iOption;
break;
}
}
return iSelectedIndex;
}

Referencing external doc in CouchDB view

I am scraping an 90K record database using JSON-RPC and I am trying to put in some basic error checking. I want to start by scraping the database twice using two different settings and adding a prefix to the second scrape. This way I can check to ensure that the two settings are not producing different records (due to dropped updates, etc). I wanted to implement the comparison using a view which compares each document from the first scrape with it's twin produced by the second scrape and then emit the names of records with a difference between them.
However, I cannot quite figure out how to pull in another doc in the view, everything I have read only discusses external docs using the emit() function, which is too late to permit me to compare it. In the example below, the lookup() function would grab the referenced document.
Is this just not possible?
function(doc) {
if(doc._id.slice(0,1)!=='$' && doc._id.slice(0,1)!== "_"){
var otherDoc = lookup('$test" + doc._id);
if(otherDoc){
var keys = doc.value.keys();
var same = true;
keys.forEach(function(key) {
if ((key.slice(0,1) !== '_') && (key.slice(0,1) !=='$') && (key!=='expires')) {
if (!Object.equal(otherDoc[key], doc[key])) {
same = false;
}
}
});
if(!same){
emit(doc._id, 1);
}
}
}
}

Context
You are correct that this is not possible in CouchDB. The whole point of the map function is that it must be idempotent, otherwise you lose all the other nice benefits of a pre-calculated index.
This is why you cannot access external resources in the map function, whether they be other records or the clock. Any time you run a map you must always get the same result if you put the same record into it. Since there are no relationships between records in CouchDB, you cannot promise that this is possible.
Solution
However, you can still achieve your end goal, just be different means. Some possibilities...
Assuming there is some meaningful numeric value in each doc, you could use a view to take the sum of all those values and group them by which import you did ({key: <batch id>, value: <meaningful number>}). Then compare the two numbers in your client or the browser to see if they match.
A brute force approach would be to use a view to pair the docs that should match. Each doc is on a different row, but they're grouped by a common field. Then iterate through the entire index comparing the pairs. This would certainly be the quickest to code and doesn't depend on your application or data.
Implement a validation function to enforce a schema on your data. Just be warned that this will reduce your write throughput since each written record will be piped out of Erlang and into the JS engine. Also, this is only applicable if you're worried about properly formed records instead of their precise content, which might not be the case.
Instead of your different batch jobs creating different docs, have them place them into the same doc. The structure might look like this: { "_id": "something meaningful", "batch_one": { ..data.. }, "batch_two": { ..data.. } } Then your validation function could compare them or you could create a view that indexes all the docs that don't match. All depends on where in your pipeline you want to do the error checking and correction.
Personally I like the last option better, but only if you don't plan to use the database as is in production. Ie., you wouldn't want to carry around all that extra data in each record.
Hope that helps.
Cheers.

Creating a pagination index in CouchDB?

I'm trying to create a pagination index view in CouchDB that lists the doc._id for every Nth document found.
I wrote the following map function, but the pageIndex variable doesn't reliably start at 1 - in fact it seems to change arbitrarily depending on the emitted value or the index length (e.g. 50, 55, 10, 25 - all start with a different file, though I seem to get the correct number of files emitted).
function(doc) {
if (doc.type == 'log') {
if (!pageIndex || pageIndex > 50) {
pageIndex = 1;
emit(doc.timestamp, null);
}
pageIndex++;
}
}
What am I doing wrong here? How would a CouchDB expert build this view?
Note that I don't want to use the "startkey + count + 1" method that's been mentioned elsewhere, since I'd like to be able to jump to a particular page or the last page (user expectations and all), I'd like to have a friendly "?page=5" URI instead of "?startkey=348ca1829328edefe3c5b38b3a1f36d1e988084b", and I'd rather CouchDB did this work instead of bulking up my application, if I can help it.
Thanks!

View functions (map and reduce) are purely functional. Side-effects such as setting a global variable are not supported. (When you move your application to BigCouch, how could multiple independent servers with arbitrary subsets of the data know what pageIndex is?)
Therefore the answer will have to involve a traditional map function, perhaps keyed by timestamp.
function(doc) {
if (doc.type == 'log') {
emit(doc.timestamp, null);
}
}
How can you get every 50th document? The simplest way is to add a skip=0 or skip=50, or skip=100 parameter. However that is not ideal (see below).
A way to pre-fetch the exact IDs of every 50th document is a _list function which only outputs every 50th row. (In practice you could use Mustache.JS or another template library to build HTML.)
function() {
var ddoc = this,
pageIndex = 0,
row;
send("[");
while(row = getRow()) {
if(pageIndex % 50 == 0) {
send(JSON.stringify(row));
}
pageIndex += 1;
}
send("]");
}
This will work for many situations, however it is not perfect. Here are some considerations I am thinking--not showstoppers necessarily, but it depends on your specific situation.
There is a reason the pretty URLs are discouraged. What does it mean if I load page 1, then a bunch of documents are inserted within the first 50, and then I click to page 2? If the data is changing a lot, there is no perfect user experience, the user must somehow feel the data changing.
The skip parameter and example _list function have the same problem: they do not scale. With skip you are still touching every row in the view starting from the beginning: finding it in the database file, reading it from disk, and then ignoring it, over and over, row by row, until you hit the skip value. For small values that's quite convenient but since you are grouping pages into sets of 50, I have to imagine that you will have thousands or more rows. That could make page views slow as the database is spinning its wheels most of the time.
The _list example has a similar problem, however you front-load all the work, running through the entire view from start to finish, and (presumably) sending the relevant document IDs to the client so it can quickly jump around the pages. But with hundreds of thousands of documents (you call them "log" so I assume you will have a ton) that will be an extremely slow query which is not cached.
In summary, for small data sets, you can get away with the page=1, page=2 form however you will bump into problems as your data set gets big. With the release of BigCouch, CouchDB is even better for log storage and analysis so (if that is what you are doing) you will definitely want to consider how high to scale.

Insert/update Doctrine object from Excel

On the project which I am currently working, I have to read an Excel file (with over a 1000 rows), extract all them and insert/update to a database table.
in terms of performance, is better to add all the records to a Doctrine_Collection and insert/update them after using the fromArray() method, right? One other possible approach is to create a new object for each row (a Excel row will be a object) and them save it but I think its worst in terms of performance.
Every time the Excel is uploaded, it is necessary to compare its rows to the existing objects on the database. If the row does not exist as object, should be inserted, otherwise updated. My first approach was turn both object and rows into arrays (or Doctrine_Collections); then compare both arrays before implementing the needed operations.
Can anyone suggest me any other possible approach?

We did a bit of this in a project recently, with CSV data. it was fairly painless. There's a symfony plugin tmCsvPlugin, but we extended this quite a bit since so the version in the plugin repo is pretty out of date. Must add that to the #TODO list :)
Question 1:
I don't explicitly know about performance, but I would guess that adding the records to a Doctrine_Collection and then calling Doctrine_Collection::save() would be the neatest approach. I'm sure it would be handy if an exception was thrown somewhere and you had to roll back on your last save..
Question 2:
If you could use a row field as a unique indentifier, (let's assume a username), then you could search for an existing record. If you find a record, and assuming that your imported row is an array, use Doctrine_Record::synchronizeWithArray() to update this record; then add it to a Doctrine_Collection. When complete, just call Doctrine_Collection::save()
A fairly rough 'n' ready implementation:
// set up a new collection
$collection = new Doctrine_Collection('User');
// assuming $row is an associative
// array representing one imported row.
foreach ($importedRows as $row) {
// try to find an existing record
// based on a unique identifier.
$user = Doctrine_Core::getTable('User')
->findOneByUsername($row['username']);
// create a new user record if
// no existing record is found.
if (!$user instanceof User) {
$user = new User();
}
// sync record with current data.
$user->synchronizeWithArray($row);
// add to collection.
$collection->add($user);
}
// done. save collection.
$collection->save();
Pretty rough but something like this worked well for me. This is assuming that you can use your imported row data in some way to serve as a unique identifier.
NOTE: be wary of synchronizeWithArray() if you're using sf1.2/doctrine 1.0 - if I remember correctly it was not implemented correctly. it works fine in doctrine 1.2 though.

I have never worked on Doctrine_Collections, but I can answer in terms of database queries and code logic in a broader sense. I would apply the following logic:-
Fetch all the rows of the excel sheet from database in a single query and store them in an array $uploadedSheet.
Create a single array of all the rows of the uploaded excel sheet, call it $storedSheet. I guess the structures of the Doctrine_Collections $uploadedSheet and $storedSheet will be similar (both two-dimensional - rows, cells can be identified and compared).
3.Run foreach loops on the $uploadedSheet as follows and only identify the rows which need to be inserted and which to be updated (do actual queries later)-
$rowsToBeUpdated =array();
$rowsToBeInserted=array();
foreach($uploadedSheet as $row=>$eachRow)
{
if(is_array($storedSheet[$row]))
{
foreach($eachRow as $column=>$value)
{
if($value != $storedSheet[$row][$column])
{//This is a representation of comparison
$rowsToBeUpdated[$row]=true;
break; //No need to check this row anymore - one difference detected.
}
}
}
else
{
$rowsToBeInserted[$row] = true;
}
}
4. This way you have two arrays. Now perform 2 database queries -
bulk insert all those rows of $uploadedSheet whose numbers are stored in $rowsToBeInserted array.
bulk update all the rows of $uploadedSheet whose numbers are stored in $rowsToBeUpdated array.
These bulk queries are the key to faster performance.
Let me know if this helped, or you wanted to know something else.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Reducing duplicates across arrays in Twig - twig

Related

In Gatling, how can I generate a random number each time a call is executed? (not using feeder)

Exposing the current combo selection index for the CGridCellCombo class

Referencing external doc in CouchDB view

Creating a pagination index in CouchDB?

Insert/update Doctrine object from Excel

Categories

Resources