considering I have core data objects stored like this:
|Name | ActionType | Content | Date |
|-----|------------|---------|-----------|
|Abe | Create | "Hello" | 2014-07-01|
|Cat | Create | "Well" | 2014-07-01|
|Abe | Create | "Hi" | 2014-07-02|
|Bob | Edit | "Yo" | 2014-07-03|
|Cat | Delete | "What" | 2014-07-04|
|Abe | Edit | "Haha" | 2014-07-05|
I would like to get the last action of each user, so the results would be
|Abe | Edit | "Haha" | 2014-07-05|
|Cat | Delete | "What" | 2014-07-04|
|Bob | Edit | "Yo" | 2014-07-03|
Does anyone knows how to do that with a NSFetchRequest? So far from what I've gathered, if you want to use "group by", you can only retrieve the values in the group by cause (it will return "Abe, Cat, Bob" without the rest of the data in the core data object). Similar with "returnsDistinctResults", it will not return the whole object.
I have a feeling that core data is not equipped for that, any helps & hints would be appreciated!
Core Data is an object graph, not a database. Core Data itself has no concept of uniqueness, it's up to you to implement that in your application. This is most typically done using the find or create pattern. This pattern helps you prevent duplicate objects from being stored.
That said, you CAN return distinct results from Core Data using the NSDictionaryResultType. This will not prevent duplicates from being stored, but can be used to return distinct results from a fetch. There is an example of this in the programming guide. You can give this request all properties for a given entity by working with the NSEntityDescription of the managed object you are fetching.
For getting the object with the "last" timestamp for each, you actually want to get the object with the maximum value for that key path. That can be done a number of ways - a subquery, key path operators, expressions, etc.
Related
In a Python notebook on Databricks "Community Edition", I'm experimenting with the City of San Francisco open data about emergency calls to 911 requesting firefighters. (The old 2016 copy of the data used in "Using Apache Spark 2.0 to Analyze the City of San Francisco's Open Data" (YouTube) and made available on S3 for that tutorial.)
After mounting the data and reading it with the explicitly defined schema into a DataFrame fire_service_calls_df, I aliased that DataFrame as an SQL table:
sqlContext.registerDataFrameAsTable(fire_service_calls_df, "fireServiceCalls")
With that and the DataFrame API, I can count the call types that occurred:
fire_service_calls_df.select('CallType').distinct().count()
Out[n]: 34
... or with SQL in Python:
spark.sql("""
SELECT count(DISTINCT CallType)
FROM fireServiceCalls
""").show()
+------------------------+
|count(DISTINCT CallType)|
+------------------------+
| 33|
+------------------------+
... or with an SQL cell:
%sql
SELECT count(DISTINCT CallType)
FROM fireServiceCalls
Why do I get two different count results? (It seems like 34 is the correct one, even though the talk in the video and the accompanying tutorial notebook mention "35".)
To answer the question
Can Spark SQL not count correctly or can I not write SQL correctly?
from the title: I can't write SQL correctly.
Rule <insert number> of writing SQL: Think about NULL and UNDEFINED.
%sql
SELECT count(*)
FROM (
SELECT DISTINCT CallType
FROM fireServiceCalls
)
34
Also, i apparently can't read:
pault suggested in a comment
With only 30 something values, you could just sort and print all the distinct items to see where the difference is.
Well, I actually thought of that myself. (Minus the sorting.) Except, there wasn't any difference, there were always 34 call types in the output, whether I generated it with SQL or DataFrame queries. I simply didn't notice that one of them was ominously named null:
+--------------------------------------------+
|CallType |
+--------------------------------------------+
|Elevator / Escalator Rescue |
|Marine Fire |
|Aircraft Emergency |
|Confined Space / Structure Collapse |
|Administrative |
|Alarms |
|Odor (Strange / Unknown) |
|Lightning Strike (Investigation) |
|null |
|Citizen Assist / Service Call |
|HazMat |
|Watercraft in Distress |
|Explosion |
|Oil Spill |
|Vehicle Fire |
|Suspicious Package |
|Train / Rail Fire |
|Extrication / Entrapped (Machinery, Vehicle)|
|Other |
|Transfer |
|Outside Fire |
|Traffic Collision |
|Assist Police |
|Gas Leak (Natural and LP Gases) |
|Water Rescue |
|Electrical Hazard |
|High Angle Rescue |
|Structure Fire |
|Industrial Accidents |
|Medical Incident |
|Mutual Aid / Assist Outside Agency |
|Fuel Spill |
|Smoke Investigation (Outside) |
|Train / Rail Incident |
+--------------------------------------------+
Suppose i have a cucumber scenario like:
Scenario Outline: do something
Given do something with "<data1>"
AND done some process on "<data2>"
When again done some experiment on "<data3>"
Then checking "<result>"
Examples:
| data1 | data2 | data3 |result |
| value1| value2| value3| result1|
This scenario is completely fine, but imagine a this scenario with 5 more step with new data. That looks very annoying. Is there any way i can spit this examples table into column. If column spit is not possible then any other suggestion.
Each row of examples should have a reason behind them. If two rows have the same reason behind them, then you are just wasting runtime repeating yourself.
Lets take a simple example
Scenario Outline
When I register as <account> with <passsword>
Then I should be <result>
Examples
| account | password | result |
| free | too_short | unregistered |
| taken | ok | unregistered |
...
You can easily replace this complex scenario with two much simpler ones
Scenario: register with too short a password
When I register with too short a password
Then I should be told I need a longer password
Scenario: register with existing account
When I register with an existing account
Then I should be told the account is taken
There are several reasons to prefer doing things in this way
Each scenario is simpler to read
Each scenario tells you WHAT the behaviour is and WHY its important (with the examples you have to infer that from the data)
Each step definition is much simpler to implement
By making concrete the specifics of the example, you invite writing more scenarios around a particular subject.
You can apply this pattern to every Scenario Outline, and doing so will
fix your problem with to many examples
help you write better scenarios and code. Finding out the reason behind each example helps you write better code.
You do not have any other solution. The purpose of the examples is especially to have several lines:
Examples:
| data1 | data2 | data3 |result |
| value1| value2| value3| result1|
| value11| value12| value13| result2|
| value21| value22| value23| result3|
if you do not have several lines, you can put the data directly in your scenario (not Scenario Outline):
Scenario : do something
Given do something with "value1"
AND done some process on "value2"
When again done some experiment on "value3"
Then checking "result1"
I am having trouble understanding what you want to do, or what you mean by 'column split', so not sure if this answers your question or not.
You can split your examples up into different groups where the examples have a common theme, you can even tag those groups
#golden_path
Examples: valid values
| data1 | data2 | data3 | result |
| value1 | value2 | value3 | result1 |
| bar | foo | fizz | buzz |
#low_boundary #negative_test
Examples: low boundaries
| data1 | data2 | data3 | result |
| 0 | smag | cruft | resultx |
| bar | 0 | fizz | resulty |
That allows you to better understand the purpose of given groups of example values, and even run subsets of scenario outlines based on using tags.
Key thing to remember is that the header row has to be there for each set of examples (I always try to just split them up and leave that out, resulting in an error)
I'm new to the world of SSAS and Cubes, and this question/title might be way off (as I have no idea how to formulate it. Apologize if that's the case).
Anyway, here goes. I was asked to take a look on a cube (not made by, making it more complex), allowing user-uploaded *.csv files to limit the data in the Cube.
The setup seems to match the Dynamic Security used here: Analysis Services Dynamic Security
Three tables are in play
+-----------------+
| User |
+-----------------+
| (PK) DW_EK_User |
| User |
+-----------------+
+--------------+
| UserUpload |
+--------------+
| DW_EK_Upload |
| DW_EK_User |
| DW_EK_Person |
| GroupNo |
| GroupLabel |
+--------------+
+-------------------+
| Person |
+-------------------+
| (PK) DW_EK_Person |
| __ |
| __ |
| __ |
+-------------------+
The user now uploads a *.csv with ID's of interest, including a label. These are temporarily stored in the fact-table UserUpload and used to filter and only show the results for included ID's.
My question is, if its possible to include the uploaded GroupLabels as a filter?
If my *.csv look like this:
ID1 GroupA
ID5 GroupA
ID2 GroupA
ID2 GroupB
I would like to be able to see the measures for the individual groups. Now I see the measure for all ID's.
I'm looking into Named Sets, but the data is in the "wrong" table to do like this:
Exists(
StrToSet("[User].[User].[All].[" + UCase(Mid(Username, InStr(1, Username, "\") + 1)) + "]"),
[Person].[DW EK Person].[All].Children,
"Measure")
This will return the Username from the User-Dimension.
We are creating Gherkin feature files for our application to create executable specifications. Currently we have files that look like this:
Given product <type> is found
When the product is clicked
Then detailed information on the product appears
And the field text has a value
And the field price has a value
And the field buy is available
We are wondering if this whole list of and keywords that validate if fields are visible on the screen is the way to go, or if we should shorten that to something like 'validate input'.
We have a similar case in that our service can return a lot of 10's of elements for each case that we could validate. We do not validate every element for each interaction, we only test the elements that are relevant to the test case.
To make it easier to maintain and switch which elements we are using, we use scenario outlines and tables of examples.
Scenario Outline: PO Boxes correctly located
When we search in the USA for "<Input>"
Then the address contains
| Label | Text |
| PO Box | <PoBox> |
| City name | <CityName> |
| State code | <StateCode> |
| ZIP Code | <ZipCode> |
| +4 code | <ZipPlus4> |
Examples:
| ID | Input | PoBox | CityName | StateCode | ZipCode |
| 01 | PO Box 123, 12345 | PO Box 123 | Boston | MA | 12345 |
| 02 | PO Box 321, Whitefish | PO Box 123 | Whitefish | MN | 54321 |
By doing it this way, we have a generic step "the address contains" that uses the 'Label' and 'Text' to test the individual elements. It is a neat and tidy way to test a lot of potential combinations - but it probably depends on your individual use case - how important all of the fields are.
You only need to validate the ones that provide business value, which is probably all of them. I would avoid using tech terms like "field" because it isn't related to a behavior. Al Mills is right on for using the tables.
I'd word it like this:
Scenario Outline: Review product details
Given I find the product <Type>
When I select the product
Then detailed information on the product appears including
| Description | <Description> |
| Price | <Price> |
And I can buy the product
Examples:
| Type | Description | Price |
| Hose | Rubber Hose | 31.99 |
| Sprinkler | Rotating Sprinker | 12.99 |
The words I chose are behaviors or whats, not technical implementations or hows.
Is there any way to reuse data in SpecFlow feature files?
E.g. I have two scenarios, which both uses the same data table:
Scenario: Some scenario 1
Given I have a data table
| Field Name | Value |
| Name | "Tom" |
| Age | 16 |
When ...
Scenario: Some scenario 2
Given I have a data table
| Field Name | Value |
| Name | "Tom" |
| Age | 16 |
And I have another data table
| Field Name | Value |
| Brand | "Volvo" |
| City | "London" |
When ...
In these simple examples the tables are small and there not a big problem, however in my case, the tables have 20+ rows and will be used in at least 5 tests each.
I'd imagine something like this:
Having data table "Employee"
| Field Name | Value |
| Name | "Tom" |
| Age | 16 |
Scenario: Some scenario 1
Given I have a data table "Employee"
When ...
Scenario: Some scenario 2
Given I have a data table "Employee"
And I have another data table
| Field Name | Value |
| Brand | "Volvo" |
| City | "London" |
When ...
I couldn't find anything like this in SpecFlow documentation. The only suggestion for sharing data was to put it into *.cs files. However, I can't do that because the Feature Files will be used by non-technical people.
The Background is the place for common data like this until the data gets too large and your Background section ends up spanning several pages. It sounds like that might be the case for you.
You mention the tables having 20+ rows each and having several data tables like this. That would be a lot of Background for readers to wade through before the get to the Scenarios. Is there another way you could describe the data? When I had tables of data like this in the past I put the details into a fixtures class in the automation code and then described just the important aspects in the Feature file.
Assuming for the sake of an example that "Tom" is a potential car buyer and you're running some sort of car showroom then his data table might include:
| Field | Value |
| Name | Tom |
| Age | 16 |
| Address | .... |
| Phone Number | .... |
| Fav Colour | Red |
| Country | UK |
Your Scenario 2 might be "Under 18s shouldn't be able to buy a car" (in the UK at least). Given that scenario we don't care about Tom's address phone number, only his age. We could write that scenario as:
Scenario: Under 18s shouldnt be able to buy a car
Given there is a customer "Tom" who is under 16
When he tries to buy a car
Then I should politely refuse
Instead of keeping that table of Tom's details in the Feature file we just reference the significant parts. When the Given step runs the automation can lookup "Tom" from our fixtures. The step references his age so that a) it's clear to the reader of the Feature file who Tom is and b) to make sure the fixture data is still valid.
A reader of that scenario will immediately understand what's important about Tom (he's 16), and they don't have to continuously reference between the Scenario and Background. Other Scenarios can also use Tom and if they are interested in other aspects of his information (e.g. Address) then they can specify the relevant information Given there is a customer "Tom" who lives at 10 Downing Street.
Which approach is best depends how much of this data you've got. If it's a small number of fields across a couple of tables then put it in the Background, but once it gets to be 10+ fields or large numbers of tables (presumably we have many potential customers) then I'd suggest moving it outside the Feature file and just describing the relevant information in each Scenario.
Yes, you use a background, i.e. from https://github.com/cucumber/cucumber/wiki/Background
Background:
Given I have a data table "Employee"
| Field Name | Value |
| Name | "Tom" |
| Age | 16 |
Scenario: Some scenario 1
When ...
Scenario: Some scenario 2
Given I have another data table
| Field Name | Value |
| Brand | "Volvo" |
| City | "London" |
If ever you aren't sure I find http://www.specflow.org/documentation/Using-Gherkin-Language-in-SpecFlow/ a great resource