Need initial N characters of column in Postgres where N is unknown - string

I have one column in my table in Postgres let's say employeeId. We do some modification based on the employee type and store it in DB. Basically, we append strings from these 4 strings ('ACR','AC','DCR','DC'). Now we can have any combination of these 4 strings appended after employeeId. For example, EMPIDACRDC, EMPIDDCDCRAC etc. These are valid combinations. I need to retrieve EMPID from this. EMPID length is not fixed. The column is of varying length type. How can this be done in Postgres?

I am not entirely sure I understand the question, but regexp_replace() seems to do the trick:
with sample (employeeid) as (
values
('1ACR'),
('2ACRDCR'),
('100DCRAC')
)
select employeeid,
regexp_replace(employeeid, 'ACR|AC|DCR|DC.*$', '', 'gi') as clean_id
from sample
returns:
employeeid | clean_id
-----------+---------
1ACR | 1
2ACRDCR | 2
100DCRAC | 100
The regular expression says "any character after any of those string up to the end of the string" - and that is then replace with nothing. This however won't work if the actual empid contains any of those codes that are appended.
It would be much cleaner to store this information in two columns. One for the empid and one for those "codes"

Related

How does Cucumber handle a list of non-parameterised strings with pipe characters in feature file

I have a scenario where I used pipe characters to introduce a list of items for a better readability:
Scenario: Search users
Then I should see the user list with the following columns:
| Name |
| Age |
| DOB |
| Address|
The items in the list are non-parameterised, so the scenario will only run once.
I created the step definition for the step like this:
#Then("I should see the user list with the following columns:")
On execution, the test was not found and I got the error: io.cucumber.core.gherkin.FeatureParserException: Failed to parse resource at: classpath:features
If I remove pipe characters and condense the list like this, then the test works fine:
Then I should see the user list with the following columns: Name, Age, DOB, Address
I am not sure how step definitions handle a step with a list of non-parameterised items having pipe characters, without cucumber thinking the step has a parameter
The first convention
Then I should see the user list with the following columns:
| Name |
| Age |
| DOB |
| Address|
resembles a scenario with at data table as parameters. For this to work, your step definition method must have a DataTable argument.
#Given ("I should see the user list with the following columns:")
public void verifyUserList(DataTable table) {
// Your logic here
}
Depending on the table, you will have to convert it to some kind of list. A single column list like the one here, should convert to List<String>. A row with multiple columns should convert to a <List<List<String>>. It is up to you to code the conversion correctly. A quick search on Cucumber Data Tables should help you code this correctly.
Another similar convention is Scenario Outline. For scenario outlines to be valid, the test must be tagged with Scenario Outline and the table with Examples. For instance:
Scenario: Search users
Then I should see the user list with the following columns:
| col1 | col2 | col3 | col4 |
| Name | Age | DOB | Address |
The examples table should contain at least two rows: the top row for the variable names, and the second and subsequent rows for the values. So, if you want to check the column names for 5 database tables, you will have 5 rows of values and each row will be processed in its own test run. For this case, the step definition should look something like this:
#Given ("I should see the user list with the following columns: <col1> <col2> <col3> <col4>")
public void verifyUserList(String col1, String col2, String col3, String col4) {
// Your logic here
}
The name of the variables don't need to match the identifiers inside the angular braces. However, they must match the column headers on the examples table.
The last convention, is a single test step with four values. In this case, value is substituted in the step definition during a single test run.
Basically, each format has distinct requirements. Now, to answer why the first one didn't work, it should all boil down to the method mapped to the Cucumber step. My guess is that it doesn't contain a data table as an argument.

Azure SQL: join of 2 tables with 2 unicode fields returns empty when matching records exist

I have a table with a few key columns created with nvarchar(80) => unicode.
I can list the full dataset with SELECT * statement (Table1) and can confirm the values I need to filter are there.
However, I can't get any results from that table if I filter rows by using as input an alphabet char on any column.
Columns in table1 stores values in cyrilic characters.
I know it must have to do with character encoding => what I see in the result list is not what I use as input characters.
Unicode nvarchar type should resolve automatically this character type mismatch.
What do you suggest me to do in order to get results?
Thank you very much.
Paulo

How can I merge 2 spotfire tables by a regex match?

I am working on a spotfire tool, and I am using a calculated column in my main data table to group data rows into 'families' through a regex match. For example, one row might have a 'name' of ABC1234xyz, so it would be part of the ABC family because it contains the string 'ABC'. Another rows could be something like AQRST31x2af, and belong to the QRST family. The main point is that the 'family' is decided by matching a substring in the the name, but that substring could be any length, and isn't necessarily the beginning of the name string.
Right now I am doing this by a large nested If statement with a calculated column. However, this is tedious for adding new families, and maintaining the current list of families. What I would like to do is create a table with 2 columns, the string match and the family name. Then, I would like to match from this table to determine family instead of the nested if. So, it might look like the below tables:
Match Table:
id_string | family
----------------------
ABC | ABC
QRST | QRST
SUP | Super
Main Data Table:
name | data | family
---------------------------------------
ABC1234 | 1.02342 | ABC
ABC1215 | 1.23749 | ABC
AQRST31x2af | 1.04231 | QRST
BQRST32x2ac | 1.12312 | QRST
1903xSUP | 1.51231 | Super
1204xSUP | 1.68123 | Super
If you have any suggestions, I would appreciate it.
Thanks.
#wcase6- As far as I know, you cannot add columns from one table to another based on expression. When you add columns, the value in one matching column should exactly match with the other one.
Instead, you can try the below solution on your 'Main Data Table'.
Note: This solution is based on the scenarios posted. If there are more/different scenarios, you might have to tweak the custom expressions provided.
Step 1: Add a calculated column 'ID_string' which ignores lower case letters and digits.
Trim(RXReplace([Name],"[a-z0-9]","","g"))
Step 2: Add a calculated column 'family'.
If([ID_string]="SUP","Super",If(Len([ID_string])>3,right([ID_string],4),[ID_string]))
Final Output:
Hope this helps!
As #ksp585 mentioned, it doesn't seem like Spotfire can do exactly what I want, so I have come up with a solution using IronPython. Essentially, here is what I have done:
Created a table called FAMILIES, with the columns IDString and Family, which looks like this (using the same example strings above):
IDString | Family
------------------------
ABC | ABC
SUP | Super
QRST | QRST
Created a table called NAMES, as a pivot off of my main data table, with the only column being NAME. This just creates a list of unique names (since the data table has many rows for each name):
NAME
------------------------
ABC1234
ABC1215
AQRST31x2af
BQRST32x2ac
...
Created a Text Area with a button labeled Match Families, which calls an IronPython script. That script reads the NAMES table, and the FAMILIES table, compares each name to the IDString column with a regex, and associates each name with a family from the results. Any names that don't match a single IDString get the family name 'Other'. Then, it generates a new table called NAME_FAMILY_MAP, with the columns NAME and FAMILY.
With this new table, I can then add a column back to the original data table using a left outer join from NAME_FAMILY_MAP, matching on NAME. Because NAME_FAMILY_MAP is not directly linked to the NAMES table (generated by a button press), it does not create a circular dependency.
I can then add families to the FAMILIES table using another script, or by just replacing the FAMILIES table with an updated list. It's slightly more tedious than what I was hoping, but it works, so I'm happy.

Split a string of numbers passed to stored procedure and perform a lookup in a table [duplicate]

This question already has answers here:
Oracle LISTAGG() for querying use
(2 answers)
Closed 9 years ago.
I have to pass string of numbers ( like 234567, 678956, 345678) to a stored procedure, the SP will split that string by comma delimiter and take each value ( eg: 234567) and do a look up in another table and get the corresponding value from another column and build a string.
For instance if have a table, TableA with 3 columns Column1, Column2, and Column3 with data as follows:
1 123456 XYZ
2 345678 ABC
I would pass a string of numbers to a stored procedure, for instance '123456', '345678'. It would then split this sting of numbers and take the first number - 123456 and do a look up in TableA and get the matching value from Column3 - i.e. 'XYZ'.
I need to loop through the table with split string of numbers ('12345', '345678') and return the concatenated string - like "XYZ ABC"
I am trying to do it in Oracle 11g.
Any suggestions would be helpful.
It's almost always more efficient to do everything in a single statement if at all possible, i.e. don't use a function if you can avoid it.
There is a little trick you can use to solve this using REGEXP_SUBSTR() to turn your string into something usable.
with the_string as (
select '''123456'', ''345678''' as str
from dual
)
, the_values as (
select regexp_substr( regexp_replace(str, '[^[:digit:],]')
, '[^,]+', 1, level ) as val
from the_string
connect by regexp_substr( regexp_replace(str, '[^[:digit:],]')
, '[^,]+', 1, level ) is not null
)
select the_values.val, t1.c
from t1
join the_values
on t1.b = the_values.val
This works be removing everything but the digits you require and something to split them on, the comma. You then split it on the comma and use a hierarchical query to turn this into a column, which you can then use to join.
Here's a SQL Fiddle as a demonstration.
Please note that this is highly inefficient when used on large datasets. It would probably be better if you passed variables normally to your function...

Cassandra-secondary index on part of the composite key?

I am using a composite primary key consisting of 2 strings Name1, Name2, and a timestamp (e.g. 'Joe:Smith:123456'). I want to query a range of timestamps given an equality condition for either Name1 or Name2.
For example, in SQL:
SELECT * FROM testcf WHERE (timestamp > 111111 AND timestamp < 222222 and Name2 = 'Brown');
and
SELECT * FROM testcf WHERE (timestamp > 111111 AND timestamp < 222222 and Name1 = 'Charlie);
From my understanding, the first part of the composite key is the partition key, so the second query is possible, but the first query would require some kind of index on Name2.
Is it possible to create a separate index on a component of the composite key? Or am I misunderstanding something here?
You will need to manually create and maintain an index of names if you want to use your schema and support the first query. Given this requirement, I question your choice in data model. Your model should be designed with your read pattern in mind. I presume you are also storing some column values as well that you want to query by timestamp. If so, perhaps the following model would serve you better:
"[current_day]:Joe:Smith" {
123456:Field1 : value
123456:Field2 : value
123450:Field1 : value
123450:Field2 : value
}
With this model you can use the current day (or some known day) as a sentinel value, then filter on first and last names. You can also get a range of columns by timestamp using the composite column names.

Resources