I need to deal with the following and after searching I wasn't able to find exactly what I'm looking for:
Let's say I have a column which may or may not have an alphanumeric string
SKU
-----
12345ABC
12345-Abc
12345-Ab23
12345
Which I would like to break into
SKU | BATCH
------------------
12345 | ABC
12345 | Abc
12345 | Ab23
12345 | NULL
using PostgreSQL 9.4+ I've tried the string and sub_string method's but I'm not getting the results I'm looking to achieve... any ideas?
You can use the substring function.
with a (SKU) as (values('12345ABC'), ('12345-Abc'), ('12345-Ab23'), ('12345'))
select substring(sku from '^\d+'), substring(sku from '[a-zA-Z][a-zA-Z0-9]*$') from a;
substring | substring
-----------+-----------
12345 | ABC
12345 | Abc
12345 | Ab23
12345 |
(4 rows)
You can use regexp_matches:
with a (SKU) as (values('12345ABC'), ('12345-Abc'), ('12345-Ab23'), ('12345'))
select res[1], res[2]
from (
SELECT regexp_matches(SKU, '(\d+)[^[:alnum:]]*([[:alnum:]]+)?') res
FROM a
) y;
Related
I have a reference table on sheet1
| A | B |
|---------------|----------|
| dog | 10 |
|---------------|----------|
| cat | 20 |
|---------------|----------|
I then have a list with values on sheet 2
| D | E |
|-------------------|----------|
| wild dog 2 | |
|-------------------|----------|
| strange cat Willy | |
|-------------------|----------|
I would like E to contain the value of B from the reference table, using the first substring match
I tried with VLOOKUP and INDEX ( MATCH ..) but this is not getting me anywhere. Help or pointers appreciated.
With your current sample data following formula will work. But don't know how is your actual data.
=INDEX($B$1:$B$10,MATCH(TRIM(MID(SUBSTITUTE(D1," ", REPT(" ",100)),100,100)),$A$1:$A$10,0))
I ended up using the formula from Harun24HR and simplifying it.
=(INDEX($B$1:$B$10;MATCH(1;COUNTIF(D1;"*" & $B$1:$B$10 & "*");0));
If I have a table test with values like :
id | value
----------------
1 | ABC 1-2-3
2 | AB 1-2-3-4-5
3 | ABC 1
4 | ABC 1-2
5 | ABC
and the input string I'm trying to is ABC 1-2-3-4-5, then the closest substring match (if I could call it that) should be ABC 1-2-3. Row # 2 should not match because it doesn't have the "ABC". I've only been able to search for the string if the input string is shorter than the actual records, but not if it's longer. e.g
select * from test where value ilike 'ABC 1-2%';
but this also does not give me an exact record, but only those starting with ABC 1-2. How do I construct the proper sql statement to solve this?
You may be interested in pg_trgm extension:
create extension if not exists pg_trgm;
Standard similarities for your data are as follows:
select *, similarity(value, 'ABC 1-2-3-4-5')
from test
order by 3 desc;
id | value | similarity
----+--------------+------------
2 | AB 1-2-3-4-5 | 0.8
1 | ABC 1-2-3 | 0.714286
4 | ABC 1-2 | 0.571429
3 | ABC 1 | 0.428571
5 | ABC | 0.285714
(5 rows)
However you can always add additional criteria in WHERE clause:
select *, similarity(value, 'ABC 1-2-3-4-5')
from test
where value ilike 'abc%'
order by 3 desc;
id | value | similarity
----+-----------+------------
1 | ABC 1-2-3 | 0.714286
4 | ABC 1-2 | 0.571429
3 | ABC 1 | 0.428571
5 | ABC | 0.285714
(4 rows)
Reverse the comparison:
select * from test
where 'ABC 1-2-3-4-5' ilike value || '%'
order by length(value) desc
The best (ie longest) matches will be returned first.
I find that string-values displayed using cqlsh are right-aligned. Is there a reason for this? And is there a way to left-align strings?
cqlsh:test> create table test (id int, a ascii, t text, primary key(id));
cqlsh:test> insert into test (id, a, t) values (1, 'ascii', 'text');
cqlsh:test> insert into test (id, a, t) values (2, 'a', 't');
cqlsh:test> select * from test;
id | a | t
----+-------+------
1 | ascii | text
2 | a | t
(2 rows)
I think this is mostly done for aesthetic reasons, however you can change it!
cqlsh is simply a python file that uses the python-driver. You can simply change the following code in the print_formatted_result method of cqlsh:
for row in formatted_values:
line = ' | '.join(col.rjust(w, color=self.color) for (col, w) in zip(row, widths))
self.writeresult(' ' + line)
You can change col.rjust to ljust, center, etc. or you can simply change it to 'col' to print the data as is.
Example using ljust:
cqlsh:friends> select * from group_friends;
groupid | friendid | time
--------------------------------------+----------+--------
13814000-1dd2-11b2-8080-808080808080 | 1 | 123456
13814000-1dd2-11b2-8080-808080808080 | 2 | 123456
13814000-1dd2-11b2-8080-808080808080 | 4 | 123456
13814000-1dd2-11b2-8080-808080808080 | 8 | 123456
13814000-1dd2-11b2-8080-808080808080 | 22 | 123456
13814000-1dd2-11b2-8080-808080808080 | 1002 | 123456
Try using the shell's column program to align columns:
$CASSANDRA_HOME/bin/cqlsh <<EOF | grep -v '+--' | perl -pe 's{[ ]{4,}}{|}g' | column -t -s '|' | tee out.txt
select mycol1,mycol2 from mykeyspace.mytable;
EOF
Use a here document to send input to cqlsh
Removing excess spaces with your favorite regex tool (but be careful not to remove them in your data)
Align fields based on | as the separator / delimiter using the column program
(Optional) Copy the output to a txt file using tee
I have a table like so (the first column):
| Table | What I want to achieve |
|--------|------------------------|
| 088888 | convert to number |
| 88888 | convert to number |
| 588888 | convert to number |
| 688888 | convert to number |
| V44100 | ignore and return text |
| W44101 | ignore and return text |
| S54001 | ignore and return text |
| V44102 | ignore and return text |
| BOLUTY | ignore and return text |
| SHOLIA | ignore and return text |
|--------|------------------------|
The table is generated from a database so all numbers comes formatted as text.
I want a formula that will help convert all text-formatted-numbers to numbers like the first 4 numbers in the cell above. the formula should be smart enough not to try to convert text to numbers, i.e when it encounters a text it should return the actual text.
I tried to use =VALUE(A1), while it works for the first 4 numbers above, it returns #VALUE error when it encounters real texts (last 6 texts in column A of the table above.
I have another formula like this (IF(OR(LEFT(A1)='1',LEFT(A1)='2',VALUE(A1),A1) This works as desired but it will be too long as I want to test for prefix numbers 0 through 9 i.e IF(OR) 0,1,2,3,4,5,6,7,8,9 etc
Is there a shorter/simpler way of achieving this without using the above unusually long formula?
Thanks.
As I mentioned in the comments use the error checking to do a mass conversion but if you still insist on a formula then here it is.
=IFERROR(INT(A1),A1)
EDIT: If you have decimal values in Col A then use VALUE instead of INT
So I've been searching for the information I need and have not really been able to find a simple solution, though it seems like there should be one. Basically, I have the following
John | Doe | 123 Wallaby Ln | 00123 | | |
John | Doe | | 00123 | xxx | yy |
Jane | Doe | | 01234 | | zz |
Jane | Doe | bleep blop ln | | xx | |
And I need
John | Doe | 123 Wallaby Ln | 00123 | xxx| yy |
Jane | Doe | bleep blop ln | 01234 | xx | zz |
Basically pretty simple, I need to merge cells with the same Column 1 & Column 2 data to get as comprehensive and concise a list of data. You'd think this would be readily available through google as a simple formula but I have only found VBA solutions (I have never used VBA before, or macros for that matter so I'm not sure how to use them or fix errors in them). Any help is greatly appreciated.
Thanks in advance!
The easiest option is to merge the content of A & B in one column ( insert a new column in C)
C1 =A1&" "&B1
Roll down the formula
Sort per column C
Make sure you have descriptive name on row 1 to describe your column.
Select the complete table
Create a pivot table, drop the C column in the row section to obtain the list of unique name.
Copy the list of unique names in a new sheet
and then look at vlookupall describe here excel vlookup with multiple results to create your own function.