How can I merge 2 spotfire tables by a regex match? - spotfire

I am working on a spotfire tool, and I am using a calculated column in my main data table to group data rows into 'families' through a regex match. For example, one row might have a 'name' of ABC1234xyz, so it would be part of the ABC family because it contains the string 'ABC'. Another rows could be something like AQRST31x2af, and belong to the QRST family. The main point is that the 'family' is decided by matching a substring in the the name, but that substring could be any length, and isn't necessarily the beginning of the name string.
Right now I am doing this by a large nested If statement with a calculated column. However, this is tedious for adding new families, and maintaining the current list of families. What I would like to do is create a table with 2 columns, the string match and the family name. Then, I would like to match from this table to determine family instead of the nested if. So, it might look like the below tables:
Match Table:
id_string | family
----------------------
ABC | ABC
QRST | QRST
SUP | Super
Main Data Table:
name | data | family
---------------------------------------
ABC1234 | 1.02342 | ABC
ABC1215 | 1.23749 | ABC
AQRST31x2af | 1.04231 | QRST
BQRST32x2ac | 1.12312 | QRST
1903xSUP | 1.51231 | Super
1204xSUP | 1.68123 | Super
If you have any suggestions, I would appreciate it.
Thanks.

#wcase6- As far as I know, you cannot add columns from one table to another based on expression. When you add columns, the value in one matching column should exactly match with the other one.
Instead, you can try the below solution on your 'Main Data Table'.
Note: This solution is based on the scenarios posted. If there are more/different scenarios, you might have to tweak the custom expressions provided.
Step 1: Add a calculated column 'ID_string' which ignores lower case letters and digits.
Trim(RXReplace([Name],"[a-z0-9]","","g"))
Step 2: Add a calculated column 'family'.
If([ID_string]="SUP","Super",If(Len([ID_string])>3,right([ID_string],4),[ID_string]))
Final Output:
Hope this helps!

As #ksp585 mentioned, it doesn't seem like Spotfire can do exactly what I want, so I have come up with a solution using IronPython. Essentially, here is what I have done:
Created a table called FAMILIES, with the columns IDString and Family, which looks like this (using the same example strings above):
IDString | Family
------------------------
ABC | ABC
SUP | Super
QRST | QRST
Created a table called NAMES, as a pivot off of my main data table, with the only column being NAME. This just creates a list of unique names (since the data table has many rows for each name):
NAME
------------------------
ABC1234
ABC1215
AQRST31x2af
BQRST32x2ac
...
Created a Text Area with a button labeled Match Families, which calls an IronPython script. That script reads the NAMES table, and the FAMILIES table, compares each name to the IDString column with a regex, and associates each name with a family from the results. Any names that don't match a single IDString get the family name 'Other'. Then, it generates a new table called NAME_FAMILY_MAP, with the columns NAME and FAMILY.
With this new table, I can then add a column back to the original data table using a left outer join from NAME_FAMILY_MAP, matching on NAME. Because NAME_FAMILY_MAP is not directly linked to the NAMES table (generated by a button press), it does not create a circular dependency.
I can then add families to the FAMILIES table using another script, or by just replacing the FAMILIES table with an updated list. It's slightly more tedious than what I was hoping, but it works, so I'm happy.

Related

How does Cucumber handle a list of non-parameterised strings with pipe characters in feature file

I have a scenario where I used pipe characters to introduce a list of items for a better readability:
Scenario: Search users
Then I should see the user list with the following columns:
| Name |
| Age |
| DOB |
| Address|
The items in the list are non-parameterised, so the scenario will only run once.
I created the step definition for the step like this:
#Then("I should see the user list with the following columns:")
On execution, the test was not found and I got the error: io.cucumber.core.gherkin.FeatureParserException: Failed to parse resource at: classpath:features
If I remove pipe characters and condense the list like this, then the test works fine:
Then I should see the user list with the following columns: Name, Age, DOB, Address
I am not sure how step definitions handle a step with a list of non-parameterised items having pipe characters, without cucumber thinking the step has a parameter
The first convention
Then I should see the user list with the following columns:
| Name |
| Age |
| DOB |
| Address|
resembles a scenario with at data table as parameters. For this to work, your step definition method must have a DataTable argument.
#Given ("I should see the user list with the following columns:")
public void verifyUserList(DataTable table) {
// Your logic here
}
Depending on the table, you will have to convert it to some kind of list. A single column list like the one here, should convert to List<String>. A row with multiple columns should convert to a <List<List<String>>. It is up to you to code the conversion correctly. A quick search on Cucumber Data Tables should help you code this correctly.
Another similar convention is Scenario Outline. For scenario outlines to be valid, the test must be tagged with Scenario Outline and the table with Examples. For instance:
Scenario: Search users
Then I should see the user list with the following columns:
| col1 | col2 | col3 | col4 |
| Name | Age | DOB | Address |
The examples table should contain at least two rows: the top row for the variable names, and the second and subsequent rows for the values. So, if you want to check the column names for 5 database tables, you will have 5 rows of values and each row will be processed in its own test run. For this case, the step definition should look something like this:
#Given ("I should see the user list with the following columns: <col1> <col2> <col3> <col4>")
public void verifyUserList(String col1, String col2, String col3, String col4) {
// Your logic here
}
The name of the variables don't need to match the identifiers inside the angular braces. However, they must match the column headers on the examples table.
The last convention, is a single test step with four values. In this case, value is substituted in the step definition during a single test run.
Basically, each format has distinct requirements. Now, to answer why the first one didn't work, it should all boil down to the method mapped to the Cucumber step. My guess is that it doesn't contain a data table as an argument.

Need initial N characters of column in Postgres where N is unknown

I have one column in my table in Postgres let's say employeeId. We do some modification based on the employee type and store it in DB. Basically, we append strings from these 4 strings ('ACR','AC','DCR','DC'). Now we can have any combination of these 4 strings appended after employeeId. For example, EMPIDACRDC, EMPIDDCDCRAC etc. These are valid combinations. I need to retrieve EMPID from this. EMPID length is not fixed. The column is of varying length type. How can this be done in Postgres?
I am not entirely sure I understand the question, but regexp_replace() seems to do the trick:
with sample (employeeid) as (
values
('1ACR'),
('2ACRDCR'),
('100DCRAC')
)
select employeeid,
regexp_replace(employeeid, 'ACR|AC|DCR|DC.*$', '', 'gi') as clean_id
from sample
returns:
employeeid | clean_id
-----------+---------
1ACR | 1
2ACRDCR | 2
100DCRAC | 100
The regular expression says "any character after any of those string up to the end of the string" - and that is then replace with nothing. This however won't work if the actual empid contains any of those codes that are appended.
It would be much cleaner to store this information in two columns. One for the empid and one for those "codes"

Excel - Help Needed with Formulas

I'm looking to try do the following;
I want to have say 3 columns.
Transaction | Category | Amount
so I want to be able to enter a certain Name in Transaction say for argument sake "Tesco" then have a returned result in Category Column say "Groceries" and I can enter a specific amount then myself in Amount Colum.
Thing is I will need to have unlimited or quite a lot of different Transactions and have them all in pre determined Categories so that each time when I type in a Transaction it will automatically display the category for me.
All help much appreciated.
I know a simple If Statement wont suffice I can get it to work no problem using a Simple IF Statement but as each Transaction is different I don't know how to program further.
Thanks.
Colin
Use a lookup table. Let's say it's on a sheet called "Categories" and it looks like this:
| A | B
1 | Name | Category
2 | Tesco | Groceries
3 | Shell | Fuel
Then, in the table you describe, use =VLOOKUP(A2, Categories!$A$2:$B$3, 2, FALSE) in your "Category" field, assuming it's in B2.
I do this a fair bit using Data Validation and tables.
In this case I would have two tables containing my pick lists on a lookup sheet.
Transaction Table : [Name] = "loTrans" - with just the list of transactions sorted
Category Table : [Name] = "loCategory" - two columns in table, sorted by Both columns - Trans and Category
Header1 : Transactions
Header2 : Category
The Details Table:
the transaction field will have a simple data validation, using a
named range "trans", that selects from the table loTrans.
the transaction field will also use data validation, using a named
range, but the source of the named range ("selCat" will be a little more
complex. It will be something like:
=OFFSET(loCategory[Trans],MATCH(Enter_Details!A3,loCategory[Trans],0)-1,1,COUNTIF(loCategory[Trans],Enter_Details!A3),1)
As you enter details, and select different Transactions, the data validation will be limited to the Categorys of your selected transactions
An example file

Usage of cqlsh is similar with mysql, what's the difference?

cqlsh create table:
CREATE TABLE emp(
emp_id int PRIMARY KEY,
emp_name text,
emp_city text,
emp_sal varint,
emp_phone varint
);
insert data
INSERT INTO emp (emp_id, emp_name, emp_city,
emp_phone, emp_sal) VALUES(1,'ram', 'Hyderabad', 9848022338, 50000);
select data
SELECT * FROM emp;
emp_id | emp_city | emp_name | emp_phone | emp_sal
--------+-----------+----------+------------+---------
1 | Hyderabad | ram | 9848022338 | 50000
2 | Hyderabad | robin | 9848022339 | 40000
3 | Chennai | rahman | 9848022330 | 45000
looks just same as mysql, where is column family, column?
A column family is a container for an ordered collection of rows. Each row, in turn, is an ordered collection of columns.
A column is the basic data structure of Cassandra with three values, namely key or column name, value, and a time stamp.
so table emp is a column family?
INSERT INTO emp (emp_id, emp_name, emp_city, emp_phone, emp_sal) VALUES(1,'ram', 'Hyderabad', 9848022338, 50000); is a row which contains columns?
column here is something like emp_id=>1 or emp_name=>ram ??
In Cassandra, although the column families are defined, the columns are not. You can freely add any column to any column family at any time.
what does this mean?
I can have something like this?
emp_id | emp_city | emp_name | emp_phone | emp_sal
--------+-----------+----------+------------+---------
1 | Hyderabad | ram | 9848022338 | 50000
2 | Hyderabad | robin | 9848022339 | 40000 | asdfasd | asdfasdf
3 | Chennai | rahman | 9848022330 | 45000
A super column is a special column, therefore, it is also a key-value pair. But a super column stores a map of sub-columns.
Where is super column, how to create it?
Column family is an old name, now it's called just table.
About super column, also an old term, you have "Map" data type for example, or user defined data types for more complex structures.
About freely adding columns - in the old days, Cassandra was working with unstructured data paradigm, so you didn't had to define columns before you insert them, for now it isn't possible, since Cassandra team moved to be "structured" only (as many in the DB's industry came to conclusion that unstructured data makes more problems than effort).
Anyway, Cassandra's data representation on storage level is very different from MySQL, and indeed saves only data for the columns that aren't empty. It may look same row when you are running select from cqlsh, but it is stored and queried in very different way.
The name column family is an old term for what's now simply called a table, such as "emp" in your example. Each table contains one or many columns, such as "emp_id", "emp_name".
When saying something like being able to freely add columns any time, this would mean that you're always able to omit values for columns (will be null) or add columns using the ALTER TABLE statement.

Excel Index Match with Multiple Criteria and Multiple Match Types: Variant 2

This question is similar to this one, but with different variable.
I want to extract the, "Stage" from data B using two criteria ("Date of Activity" and "Opportunity Name") from data A; The two criteria from data A will have different match types. The question I am trying to answer is, "At what Stage did an activity occur?" and believe some form of Index Match to be part of the answer.
If the two criteria from data A were both match type = exact, I know I could use array formula:
=MATCH(lookup_value_1&lookup_value_2, lookup_array_1&lookup_array_2, match_type).
Unfortunately, the "Date of Activity" needs to use the, "Less Than" match type and the "Opportunity Name" needs to use the, "Exact" match type.
Data A
Assigned|Date of Activity|Type of Activity|Opportunity Name
-----------------------------------------------------------
John |11/15/2016 |CheckIn |Ford
Peter |11/15/2016 |Review |Chevy
Data B
Last Modified|Opportunity Name|Stage
------------------------------------
11/1/2016 |Ford |0
11/1/2016 |Chevy |0
11/10/2016 |Ford |1
11/10/2016 |Chevy |1
11/20/2016 |Ford |2
11/20/2016 |Chevy |2
...
It can be done with a formula, using a match that ringfences one of the parameters into a range. This does not seem possible with the last modified date, but it can be done with the opportunity name. If the Data B table is sorted by Opportunity Name, then this formula will extract the stage for that opportunity name where the date of activity is less than or equal to the last modified date.
=INDEX(INDEX(Table1[Stage],MATCH(H2,Table1[Opportunity Name],0))
:INDEX(Table1[Stage],MATCH(H2,Table1[Opportunity Name],1)),
MATCH(F2,
INDEX(Table1[Last Modified],MATCH(H2,Table1[Opportunity Name],0))
:INDEX(Table1[Last Modified],MATCH(H2,Table1[Opportunity Name],1))))
The formula uses structured referencing for the table references, which helps with readability, but you can replace them with regular references, of course.
Again, the data table in columns A to C must be sorted by Opportunity name, ascending, for this to work.

Resources