[I have one collection named "Total Amount' and having column named "Amount", so i am fetching some amounts from one application and putting them into above collection under same mentioned column hence there are some negative amounts exists into it. So ideally my robot should recognize the negative amount under "Amount" column and if exists, should stop the bot.
It's not clear to me whether you want to loop through the 'Total Amount' collection and filter the negative amounts out of there, or skip appending negative amounts to the collection when you're filling it.
Also it's not clear why you would want to stop the robot if you can just remove the negative values from the collection.
What I would suggest is to use the 'Filter Collection' action in the 'Utility - Collection Manipulation' object.
This action basically checks every item in the collection and matches it to your filter query (in this case "Amount < 0").
If the result is True, it will be put into the output collection, if not, if will be omitted.
Another way of doing it, is to loop through the collection using a loop and program the action you want to take with a decision stage when you come across a negative number like Esqew already said in his/her comment.
Hope this helps :).
You can Filter the collection to check if there is negative values available in specific column. The Filter action under Utility - Collection Manipulation will allow you to save filtered data into in one more collection. Check the count of generated collection if it is greater than zero then the collection is having negative values else collection does not hold any negative value.
To Filter the collection please check the below screen shot:
Related
Hi I am new to PySpark and want to create a function that takes a table of duplicate rows and a dict of {field_names : ["the source" : "the approach for getting the record"]} as an input and creates a new record. The new record will be equal to the first non-null value in the priority list where each "approach" is a function.
For example, the input table looks like this for a specific component:
And given this priority dict:
The output record should look like this:
The new record looks like this because for each field there is a function selected that dictates how the value is selected. (e.g. phone is equal to 0.75 as Amazon's most complete record is null so you coalesce to the next approach in the list which is the value of phone for the most complete record for Google = 0.75)
Essentially, I want to write a pyspark function that groups by components and then applies the appropriate function for each column to get the correct value. While I have a function that "works" the time complexity is terrible as I am naively looping through each component then each column, then each approach in the list to build the record.
Any help is much appreciated!
I think you can solve this using pyspark.sql.functions.when . See this blog post for some complicated usage patterns. You're going to want to group by id, and then use when statements to implement your logic. For example, 'title': {'source': 'Google', 'approach': 'first record'} can be implemented as
(df.groupBy('id').agg(
when(col("source") == lit("Google"), first("title") ).otherwise("null").alias("title" )
)
'Most recent' and 'most complete' are more complicated and may require some self-joins, but you should still be able to use when clauses to get the aggregates you need.
Imagine situation when a client has feed of objects with limit 10.
When the next 10 are required it sends request with skip 10 and limit 10.
But what if there are some new objects were added (or deleted) to collection since the 1st request with offset == 0.
Then on 2nd request (with offset == 10) response may have wrong objects order.
Sorting on time of their creation does not work here, because I have some feeds which are formed on sorting via some numeric field.
You can add a time field like created_at or updated_at. It must updated when ever the document is created or modified and the field must be unique.
Then query the DB for the range of time using $gte and $lte along with a sort on this time field.
This ensures that any changes made outside the time window will not get reflected in the pagination, provided that the time field does not have duplicates. Most probably if you include microtime, duplicates wont happen.
It really depends on what you want the result to be.
If you want the original objects in their original order regardless of Delete and Add operations then you need to make a copy of the list (or at least of the order) and then page through that. Copy every Id to a new collection that doesn't change once the page has loaded and then paginate through that.
Alternatively, and perhaps more likely, what you want is to see the next 10 after the last one in the current set including any Delete or Add operations that have take place since. For this, you can use the sorted order in which you are viewing them and a filter, $gt whatever the last item was. BUT that doesn't work when there are duplicates in the field on which you are sorting. To get around that you will need to index on that field PLUS some other field which is unique per record, for example, the _id field. Now, you can take the last record in the first set and look for records that are $eq the indexed value and $gt the _id OR are simply $gt the indexed value.
We have an Xpages application and recently discovered an issue where there are several Notes documents that have duplicates but the duplicates are PARENT documents too and NOT response documents. Is it possible to create a Notes view that will show duplicates where all the duplicates are parents? I know the formula for showing conflicts is the following but what about where they are all parents?
SELECT #IsAvailable($Conflict)
Expounding on my comment:
Create a view which is categorized on the first column
In the first column formula, put in criteria that you would use to determine a duplicate. This may be the Document Unique ID, or maybe another field or combination of fields.
Add a second column that contains the number 1. Then enable column totals on this column.
Now look at this view you created. With the view categories collapsed, look for any number greater that 1 to determine which documents are duplicates.
I think what you are asking is not how to identify the duplicates - but how to find out which of them are parent documents. So basically you would create a view as Steve suggests - but instead of putting a constant of 1 into the second column I would suggest putting either #DocChildren (for immediate responses) or #DocDescendants (for all responses and responses to responses).
If I understand your logic then all the ones returning 0 (zero) are child documents and those returning 1 or higher would be parent documents. Of course you could also use an item on the document in your view formula - if it only exists on the parent doc (or its value can tell that it is a parent doc)
View selection formulas act on only one document at a time. They cannot perform lookups. They have no way to compare two documents. There is therefore no possible way for a view to identify duplicates.
A view can, as per the other answers, categorize documents based on common values. If there is a single field that is supposed to be unique across all documents, you can categorize on that field. That will give you a visualization of the duplicates, but it won't filter them in or out.
The only way for a view to filter duplicates - either to show only duplicates, or to exlude duplicates - would be if you run an agent that reads all documents, looks for those that are duplicates, and marks them with a special field value - e.g., IsDuplicate = 1. Once you do that, you can create a view that selects all documents with IsDuplicated = 1, or a view that excludes IsDuplicated = 1.
There may be a better way to do this but my client has a list of books they want displayed. A number of them are by the society itself and they want them to display first before any of the other books. I set up a custom type called "books for sale" and I added a boolean field indicating that this was a society book.
I created a query "all books" in which I sorted against the title and then the boolean field however it does not sort preferring to display numbers first followed by an alphabetical listing.
Is there a way to do this so those books that have been flagged as "true" display first in the list?
Kind Regards
Simon
You should sort by boolean field first and the title second, not the other way around.
Sorting by multiple columns is done like prioritizing - it means that collection will be sorted by the first column, and if there are any rows with the same value of that column, only then will those rows be sorted by the second column. If there are some rows with the same values in those 2 columns, the sort would be performed by the third sorting column if there is one. And so on..
Here is the case:
I need to display a list of records on a page, and paging is necessary. The question I got is whether a record should be displayed depends on validation result computed in memory, after selected from database.
for example, 50 records for one page:
Select 50 records from database
30 records are left after validation
Solution I have now it get all records from database, do validation and then get valid list of records. Paging is based on this list.
Is there any other good solution for this?
In the optimal case pagination can be divided in two steps. In the first step according to the selecting query a set of rows is selected from a database. All these rows could be displayed. Instead of fetching actual rows just retrieve a list of their identifiers. This list however large can be usually kept in memory. Second step is paging through the list by asking for n-th page of m items. Then only m rows are being completely retrieved from the database using their ids.
Additional step of computation is negating the idea of pagination that is having a list of identifiers of the whole result set.
What I can think of now without seeing the computation is to store results of computation in the database whenever a displaying row is being inserted/updated in the db. Since computation results depend from input parameters then for every row and for every range of input parameters you could have a different result.
This would then make paging possible. Performing the first step of paging should now include precomputed validation results and provide much faster retrieval of the list of row ids.