Working on postgres SQL.
I have a table with a column that contains values of the following format:
Set1/Set2/Set3/...
Seti can be a set of values for each i. They are delimited by '/'.
I would like to show distinct entries of the form set1/set2 and that is - I would like to trim or truncate the rest of the string in those entries.
That is, I want all distinct options for:
Set1/Set2
A regular expression would work great: I want a substring of the pattern: .*/.*/
to be displayed without the rest of it.
I got as far as:
select distinct column_name from table_name
but I have no idea how to make the trimming itself.
Tried looking in w3schools and other sites as well as searching SQL trim / SQL truncate in google but didn't find what I'm looking for.
Thanks in advance.
mu is too short's answer is fine if the the lengths of the strings between the forward slashes is always consistent. Otherwise you'll want to use a regex with the substring function.
For example:
=> select substring('Set1/Set2/Set3/' from '^[^/]+/[^/]+');
substring
-----------
Set1/Set2
(1 row)
=> select substring('Set123/Set24/Set3/' from '^[^/]+/[^/]+');
substring
--------------
Set123/Set24
(1 row)
So your query on the table would become:
select distinct substring(column_name from '^[^/]+/[^/]+') from table_name;
The relevant docs are http://www.postgresql.org/docs/8.4/static/functions-string.html
and http://www.postgresql.org/docs/8.4/static/functions-matching.html.
Why do you store multiple values in a single record? The preferred solution would be multiple values in multiple records, your problem would not exist anymore.
Another option would be the usage of an array of values, using the TEXT[] array-datatype instead of TEXT. You can index an array field using the GIN-index.
SUBSTRING() (like mu_is_too_short showed you) can solve the current problem, using an array and the array functions is another option:
SELECT array_to_string(
(string_to_array('Set1/Set2/Set3/', '/'))[1:2], '/' );
This makes it rather flexible, there is no need for a fixed length of the values. The separator in the array functions will do the job. The [1:2] will pick the first 2 slices of the array, using [1:3] would pick slices 1 to 3. This makes it easy to change.
If they really are that regular you could use substring; for example:
=> select substring('Set1/Set2/Set3/' from 1 for 9);
substring
-----------
Set1/Set2
(1 row)
There is also a version of substring that understands POSIX regular expressions if you need a little more flexibility.
The PostgreSQL online documentation is quite good BTW:
http://www.postgresql.org/docs/current/static/index.html
and it even has a usable index and sensible navigation.
If you want to use .*/.* then you'd want something like substring(s from '[^/]+/[^/]+'), for example:
=> select substring('where/is/pancakes/house?' from '[^/]+/[^/]+');
substring
-----------
where/is
(1 row)
Related
I have a column containing a string value as shown is the example below :
ZAE/GER-ERT/HEZ/PDC
The idea is to extract the first trigraph (ZAE in this extract) and a second one based a rule.
The rule is, if there is a '-' separating two trigraphs, we don't extract them, we just take the first trigraph after a '/' and without a '-' after it.
We then use a - to separate the two results, here is the aim for the example : ZAE-HEZ
I would like to get this value in a new calculated column.
I've tried to play with the indexes based on the Find() and ExtractRX() functions, but couldn't make it work.
Thanks in advance !
I am not sure this is the simplest way, but it works for your example (assuming the strings are always alphanumeric in chunks of 3).
You can do it via an intermediate column (for sanity, although you could put the [tmp] formula directly into the final column):
[tmp] as
RXReplace(RXReplace([your_column],'\\w{3}-\\w{3}','','g'),'/+','/','g')
This removes any double trigraph like GER-ERT and then removes any leftover double /
Then the final column splits [tmp] by / and concatenates the first and second item
Concatenate(Split([tmp],'/',1),'-',Split([tmp],'/',2))
I would like to solve this either in Excel or in SPSS:
I have categorical data (each number representing a medical diagnosis) that are combined into single cells. In other words, a row (patient) has multiple diagnoses. However, I would like to know the frequencies of each diagnosis. What is the best way to go about this? (See picture for reference)
For SPSS:
First just creating some sample data to demonstrate on:
data list free/e_cerv_dis_state (a20).
begin data
"{1/2/3/6}" "{1/2/4}" "{2/4/5}" "{1/5/6}" "{4}" "{4/5/6}" "{1/2/3/4/5/6}"
end data.
Now the following code will create a separate variable for each possible diagnosis, and will put a 1 in it if the diagnosis exists in the original variable.
do repeat vr=diag1 to diag9/vl=1 to 9.
compute vr=char.index(e_cerv_dis_state, string(vl, f1) ) > 0.
end repeat.
freq diag1 to diag6.
Note this will only work for up to 9 diagnoses. If you have more than that the solution will have to be adapted to multiple digits.
Assuming that the number of columns is fairly regular, I would suggest using text to columns, and then using COUNTIF on the cells if they are the value wanted. However there is a more robust and reproducible solution that would involve using SQL. If you download the free version of SQL Express here: https://www.microsoft.com/en-gb/sql-server/sql-server-downloads
Then you can import your table of data, here's how to do that: How to import an Excel file into SQL Server?
Then you could use the more friendly SQL database to get the answers you want. For example you can use a select statement that would say:
SELECT count(e_cerv_dis_state)
WHERE e_cerv_dis_state = '6'
It would also be possible to use a CASE WHEN statement to add-in the names of the diagnoses.
I'm trying to translate 2 types of data using only postgres SQL
I've got a column "type" that may contains the kind of data.
type is a string column and may have "ACTUAL" or "OLD" values
+-type-+
+ACTUAL+
+OLD +
+------+
when I show the list with a lot of other joins I would like to show only "A" or "O" values
I couldnt find other way to do this than:
SELECT replace(replace(mytable.type, 'ACTUAL', 'A'),'OLD', 'O');
With that I can replace the text the way I need,
but I was looking for some more function using an array as parameter.
something like a cross-reference simple function:
translate(['ACTUAL','OLD'], ['A','O'])
Does anyone know a way to do this that doesn't use SQL views and neither needs another table like joining the results of this value with other table?
Thanks in advance,
Andre
I would use something like CASE...
SELECT
(CASE type
WHEN 'ACTUAL' THEN 'A'
WHEN 'OTHER' THEN 'O'
ELSE '?' END)
FROM
Using this method, you can return whatever you want based on whatever criteria you want, not just sub-stringing.
What's the most straightforward way of finidng the index of a substring in a varchar column? charindex doesn't exist in the stock version of SQLite3 -- which is still a little surprising to me.
Specifically, I have a column with values like 010000, 011000, 010110, etc. I want to find the index of the first occurence of 11. For the examples I gave, I would expect something like NULL (or -1), 1, and 3.
I have a hacked together idea that uses length and ltrim, but it seems like a lot of work for something I need to do several times.
This is now possible using the builtin instr function.
sqlite> select instr(010000,11);
0
sqlite> select instr(011000,11);
1
sqlite> select instr(010110,11);
3
Unfortunately, I think you've found the only answer that currently works. SQLite does not have a charindex equivalent function. You an make your own with length and trim, but nothing is built in:(
I have this number extracting problem.
I want to get all matches that don't have a certain number in it
ex : 125501874, 125001873
Every number that as 55 at the position 2 are not to be considered.
The first numbers range is 0 to 9 and the second is 1-9 so the real range is [01-99]
(we cannot have 00 as the first two number)
With Lucene I wanted to add NOT field:[01-99]55*
But it doesn't seem to work. Is there an easy way to find ??55* and disregard it in a Search("NOT field:[01-99]55*")?
Thank you Lucene guru
Lucene can do this very efficiently if one creates an "index-only" field with only the third and fourth digits in it. The complete value can be "stored" (or stored and indexed if other queries use the whole number) in the original field.
Update: A followup comment asked, "Is [there] a way to create a temporary index on only the second digit?"
Using a ParallelReader "vertically partitions" the fields of an index. One partition could hold the current index, with its fields, while the other is a temporary index with the new field, possibly stored in a RAMDirectory.
Assuming the number is "stored" in the original index, iterate over each document in the original index, retrieve the stored field, parse out the key digits, and add a Document to the temporary index with the new field. As the ParallelReader documentation states, it is imperative that the document numbers match in both indexes.
Thank you erickson, Your solution is probably the best, using ParallelReader if only I could use temporary indexes, cause we cache the search query, we will need those later.
But like you said before, better start with an index on the relevant digits straighaway.
I have another solution.
NOT field:0?55*
NOT field:1?55*
...
NOT field:9?55*
It is efficient enough for the search I'm doing and it bypass the first character wildcard limitation. I wouldn't use that if their where more digits to check or if they where farther from the start.
Now I'm testing this on a million of row and it's pretty efficient for our needs.