I'm using 2-D arraylist to store the query results of sql.rows in SOAP UI Groovy. Outputrows, in the below code, is an arraylist.
Outputrows = sql.rows("select CORR.Preferred as preferred ,CORR.Category as category,CORR.Currency as currency\
from BENEFICIARY CORR \
JOIN LOCATION LOC on CORR.UID=LOC.UID")
The problem with the arraylist is that I'm unable to update any value of a particular cell with the set command. Set is not a valid one for GroovyRowResult class.
Outputrows.get(row).set(col,categoryValue)
So I am just wondering if I can store the queryresults(Outputrows) to a 2D Map(Outputrows) and if so, how can I update the value of any particular row with the given map key.
[{'preferred': 'N', 'category': 'Commerical'}, {'currency': 'USD'}.. ] and so on.
If I want to update Currency for the 3rd row, how can I update that.
Data in the output
Preferred | Category | Currency |
----------------------------------
N | CMP | USD |
----------------------------------
Y | RTL | GBP |
----------------------------------
N | CMP | JPY |
----------------------------------
Y | RTL | USD |
----------------------------------
Now here in 'outputrows' the values are stored from first row(N, CMP, USD) as Arraylist. I would like to store the values of the query result,'outputrows', as a Maps instead of Arraylist, so I can easily access any value in 'outputrows'
with Map key.
Hope this makes sense.
I need to use the column name with put instead of the column number.
Outputrows.get(row).put("currency",categoryValue) .. this is correct
Outputrows.get(row).put(2,categoryValue).. adds a new column with name "2", instead of column reference to currency
Related
I am using application insights to record custom measurements about our application. I have a customEvent that has data stored in the customMeasurements object. The object ontains 4 key-value pairs. I have many of these customEvents and I am trying to average the key-value pairs from all the events and display the results in a 2 column table.
I want to have one table that has 2 columns. First column is the key
name, and the second column in the key-value of all the events
averaged.
For example, event1 has key1's value set to 2. event2 has key1's value set to 6. If those are the only two events I received in the last 7 days, I want my table to show the number 4 in the row containing data for key1.
I can only average 1 key per query since I cannot put multiple summarizes inside of 1 query... Here is what I have for averaging the first key in the customMeasurements object:
customEvents
| where name == "PerformanceMeasurements"
| where timestamp > ago(7d)
| summarize key1average=avg(toint(customMeasurements.key1))
| project key1average
But I need to average all the keys inside of this object and build 1 table as described above.
For reference, I have attached a screenshot of the layout of a customEvent customMeasurements object:
If amount of Keys is limited and is known beforehand, then I'd recommend using multiple aggregations within | summarize operator by separating them with comma:
| summarize key1average=avg(toint(customMeasurements.key1)), key2average=avg(toint(customMeasurements.key2)), key3average=avg(toint(customMeasurements.key3))
If Keys may vary, then you'd to flatten out custom dimensions first with |mvexpand operator:
customEvents
| where timestamp > ago(1h)
| where name == "EventName"
| project customDimensions
| mvexpand bagexpansion=array customDimensions
| extend Key = customDimensions[0], Value = customDimensions[1]
| summarize avg(toint(Value)) by tostring(Key)
In this case, each Key-Value pair from customDimensions will become its own row and you will be able to operate on those with the standard query language constructs.
We are dealing with a situation where we store items with an X amount of properties (it is a SaaS solution and every instance has a different amount of properties). What we are struggling with is the dimension of time.
What would be the best way to store the data if we want to be able to:
Quickly get individual items.
Get the value of a property with a certain timestamp (ie, historic info).
Note: we do not want to search for property values, we want speed :-) We will have many items with many properties, with many timestamps that we should be able to fetch as fast as possible.
Example use case of the SaaS solution: We have a ship with 10.000 sensors, they collect temperature every minute. This means that we have 10.000 "items" with "temperature" as one of the properties. They will be updated every minute and we want to store the history.
Option 1. Store all in maps (Id = Primary Key)
------------------------------------------------
Id | Name | Props
------------------------------------------------
1 | Foo | map<timestamp, map<name, text>>
------------------------------------------------
2 | Bar | map<timestamp, map<name, text>>
------------------------------------------------
In the map we will have something like:
{
"1518023285": {
"propName": "Prop A",
"propValue": "Value A"
},
"1518011111": {
"propName": "Prop A",
"propValue": "Value B"
},
"1518011111": {
"propName": "Prop B",
"propValue": "Value C"
}
}
Prop A and Prop B are created at the same time, Prop A got updated.
We will collect the complete item and use our application to find the right value at the right time.
Option 2. Store time in maps and props as rows (Id = Primary Key)
-----------------------------------------------------------
Id | Name | Prop_A | Prop_B
-----------------------------------------------------------
1 | Foo | map<timestamp, text> | map<timestamp, text>
-----------------------------------------------------------
2 | Bar | map<timestamp, text> | map<timestamp, text>
-----------------------------------------------------------
In the column Prop_A we will have something like:
{
"1518023285": "Value B",
"1518011111": "Value A"
}
Meaning that Prop_A got created with Value A and updated later with Value B.
We will collect the complete item and use our application to find the right value at the right time.
Option 3. Properties in a map and time in a row (Id = Primary Key, ItemId has index, Time has index)
-------------------------------------------------
Id | ItemId | Name | Time | Props
-------------------------------------------------
1 | 1 | Foo | 1518011111 | map<name, text>
-------------------------------------------------
2 | 2 | Bar | 1518011111 | map<name, text>
-------------------------------------------------
3 | 2 | Bar | 1518023285 | map<name, text>
-------------------------------------------------
A map will look like:
{
"Prop A": "Value A",
"Prop B": "Value B"
}
We will collect all rows of items and find the right time in our application
Option 4. Properties and time in a row (Id = Primary Key, ItemId has index, Time has index)
----------------------------------------------------
Id | ItemId | Name | Time | Prop_A | Prop_B
----------------------------------------------------
1 | 1 | Foo | 1518011111 | Value A | Value B
----------------------------------------------------
2 | 2 | Bar | 1518011111 | Value A | Value B
----------------------------------------------------
3 | 2 | Bar | 1518023285 | Value A | Value C
----------------------------------------------------
Row 3 got updated.
We create 2 CQL queries, one to find the latest version and seconly to collect the props.
CQL collections are (with some exceptions) completely deserialized into memory, this could be really bad long term. Especially from a perf perspective its less than ideal, they are for convenience with smaller maps, not performance.
I would actually recommend something like Option 4, like: ((id, item_id), name, time, prop) where prop can just be "A" or "B" and a value field for its value. if "prop" is really limited to just A-C or something, can switch time and prop so you can query for timelines of each property and just make a few queries merged together. Be sure to change ordering of time so that the recent data is at beginning of partition for more efficient reads on getting latest value. If theres a ton of inserts you will want too break up the partitions more, maybe including a "year-month" to your partition key.
I would go for option 3, but with a similar change to what Chris is proposing:
((id, item_id), time, name, map)
If the maps don't change in each timestamp (meaning they are read-only for that timestamp), I don't see a downside with taking advantage of the collection. It will also save you some disk space having all the properties in one map, instead of having them in separate columns.
I've been playing with leveldb and it's really good at what it's designed to do--storing and getting key/value pairs based on keys.
But now I want to do something more advanced and find myself immediately stuck. Is there no way to find a record by value? The only way I can think of is to iterate through the entire database until I find an entry with the value I'm looking for. This becomes worse if I'm looking for multiple entries with the value (basically a "where" query) since I have to iterate through the entire database every time I try to do this type of query.
Am I trying to do what Leveldb isn't designed to do and should I be using another database instead? Or is there a nice way to do this?
You are right. Basically what you need to know about is key composition.
Second, you don't query by value itself in SQL WHERE clause, but using a boolean query like age = 42.
To answer your particlular question imagine you have a first key-value namespace in leveldb, where you store your objects where the value is serialized in json for instance:
key | value
-------------------------------------------------
namespace | uid | value
================================================
users | 1 | {name:"amz", age=32}
------------------------------------------------
users | 2 | {name:"abki", age=42}
In another namespace, you index users uid by age:
key | value
----------------------------------
namespace | age | uid | value
==================================
users-by-uid | 32 | 1 | empty
----------------------------------
users-by-uid | 42 | 2 | empty
Here the value is empty because, the key must be unique. What we could think as the value of the given rows would be uid column it's composed
into the key to make each row's key unique.
In that second namespace, every key that starts with the (user-by-uid, 32) match records that answer the query age = 32.
I'm looking for some thoughts on how you might recreate a 'vlookup' that I currently do in excel.
I have two tables: Data contains a list of datetime values; DateConverter; contains a list of calendar dates and their associated "network dates." Imagine for a business - not every day is a workday, so if I want to calculate differences in dates, I'm most interested in the number of work days that elapsed between my two dates.
Here is what the data might look like:
Data Table DateConverter Table
================= ===================
| Datetime | | Calendar date | Netowrk date |
| ------------- | | ------------- | ------------ |
| 6-1-15 8:00a | | 6-1-15 | 1000 |
| 6-2-15 1:00p | | 6-2-15 | 1001 |
| 6-3-15 7:00a | | 6-3-15 | 1002 |
| 6-10-15 3:00p | | 6-4-15 | 1003 |
| 6-15-15 1:00p | | 6-5-15 | 1004 |
| 6-12-15 2:00a | | 6-8-15 | 1005 | // Skips the weekend
| ... | | ... | ... |
In excel, I can easily map in the network date for each date in the Datetime field with a variant of vlookup:
// Assume that Datetime values are in Column A, Calendar date values in
// Column C, Network date values in Column D - this formula fills Column B
// Headers are in row 1 - first values are in row 2
B2=OFFSET($D$1,COUNTIFS($C:$C,"<"&A2),)
The formula counts the dates that are less than the lookup value (using countifs because the values in the search array are dates, and the search value is datetime) and returns the associate network date.
Is there a way to do this in Tableau? Will it require a calculated field or can I do this with some kind of join?
Thanks in advance for the help! Let me know if there is anything I can clarify. Thanks!
If the tables are on the same data server, you have the option to use joins, which is usually the most efficient way to combine information from different tables. If the tables are on different servers or platforms, then you can't use a single query to join them.
In either case, you can use Tableau data blending, which is sort of like a client-side join of aggregated results from multiple queries. Its a pretty useful technique, but a little more complex and restricted and also usually less efficient than a server side join.
So if you have the option to have both tables on the same server, start with that. It will be simpler and likely faster.
Note if you are going to use a date as a join key, you probably want to define it is a date and not a datetime.
#alex-blakemore's response would normally be adequate, but if you can change the schema, you could simply add the network date to the DataTable. The hourly granularity should not cause excessive growth and you don't need to navigate the joining.
Then, instead of counting rows and requiring a sorted table, simply subtract the Network date from each other and add 1.
I am pretty new to NoSQL and Cassandra but I was told by my architecture committee to use this. I just want to understand how to convert the RDBMS model to noSQL.
I have a database where user needs to import data from an excel or csv file into the database. This file may have different columns each time.
For example in the excel file data might look something like this:
Name| AName| Industry| Interest | Pint |Start Date | End date
x | 111-121 | IT | 2 | 1/1/2011 | 1/2/2011
x | 111-122 | hotel | 1 | "" | ""
y| 111-1000 | IT | 2 | 1/1/2011 | 1/2/2011
After we upload this the next excel file might look
Name| AName| Industry| Interest | Pint |Start Date | isTrue | isNegative
x | 111-121 | IT | 2 | 1/1/2011 | 1/2/2011 | yes | no
x | 111-122 | hotel | 1 | "" | no | no
y| 111-1000 |health | 2 | 1/1/2010 | yes|""
I would not know in advance what columns I am going to create when importing data. I am totally confused with noSQL and unable to understand how handle this on how to import data when I don't know the table structure
Start with the basic fact that a column family (cassandra for "table") is made up of rows. Each row has a row key and some number of key/value pairs (called columns). For a particular column in a row the name of the column is the key for the pair and the value of the column is the value of the pair. Just because you have a column by some name in one row does not necessarily mean you'll have a column by that name in any other row.
Internally, row keys, column names and column values are stored as byte arrays and you'll need to use serializers to convert program data to the byte arrays and back again.
It's up to you as to how you define the row key, column name and column value.
One approach would be to have a row in the CF correspond to a row from Excel. You'd have to identify the one Excel column that will provide a unique id and store that in the row key. The remained of the Excel columns can get stored in cassandra columns, one-to-one. This lets you be very flexible on most column names, but you have to have a unique key value somewhere. The unique key requirement will always hold for any storage scheme you use.
There are other storage schemes, but they all boil down to you defining in the Excel what your row key is and how you break the Excel data into key/value pairs.
Check out some noSQL patterns and I highly suggest reading "Building on Quicksand" by Pat Helland
some good patterns(with or without using PlayOrm)...
http://buffalosw.com/wiki/Patterns-Page/