Split string mixed input into range of single values - string

I have an input which allows multiple IDs.
They can be entered like this:
[ 1000, 1001, 1050-1060, 1100 ]
Out of this input string I want to get all the single IDs.
I already found this to split after each ,, so the part with 1000, 1001 already works.
data : itab TYPE TABLE OF string,
SPLIT l_bukrs_string AT ';' INTO TABLE itab.
My problem is the self-built range. Any idea how I could combine this with the case above to split 1050-1060 into single values?
I want to get 1050 | 1051 | 1052 | ... | 1060 out of it.
Appreciate every hint :) Thank you so much!

The easiest solution would be to use a real range/select-option for user (?) input instead. Then you would use that range to select every value from the database table.
If you cannot use a real range/select-option, then you could convert the string to one as shown below.
DATA: bukrs_string TYPE string,
split_bukrs TYPE TABLE OF string,
bukrs TYPE bukrs,
bukrs_between TYPE TABLE OF bukrs,
bukrs_range TYPE RANGE OF bukrs,
bukrs_rline LIKE LINE OF bukrs_range,
bukrs_table TYPE TABLE OF bukrs.
FIELD-SYMBOLS: <string> TYPE string,
<bukrs> TYPE bukrs,
<bukrs_from> TYPE bukrs,
<bukrs_to> TYPE bukrs.
bukrs_string = '1000, 1001, 1050-1060, 1100'.
CONDENSE bukrs_string NO-GAPS.
SPLIT bukrs_string AT ',' INTO TABLE split_bukrs.
LOOP AT split_bukrs ASSIGNING <string>.
bukrs_rline-sign = 'I'.
IF <string> CA '-'.
SPLIT <string> AT '-' INTO TABLE bukrs_between.
bukrs_rline-option = 'BT'.
READ TABLE bukrs_between INDEX 1 ASSIGNING <bukrs_from>.
bukrs_rline-low = <bukrs_from>.
READ TABLE bukrs_between INDEX 2 ASSIGNING <bukrs_to>.
bukrs_rline-high = <bukrs_to>.
ELSE.
bukrs_rline-option = 'EQ'.
bukrs = <string>.
bukrs_rline-low = bukrs.
ENDIF.
APPEND bukrs_rline TO bukrs_range.
CLEAR bukrs_rline.
ENDLOOP.
SELECT bukrs
FROM t001
INTO TABLE bukrs_table
WHERE bukrs IN bukrs_range.
Before you split the string, you would condense it, to remove all spaces. Then you would loop over the resulting parts and check if it contains any '-'. If that is the case, you split it again and create a BETWEEN entry in your range (consider if you may want an additional check to see if the latter number is actually higher). If there is no '-', you just create an EQUAL entry.
After you have your real range, you use it to select from the database. This is because not every bukrs in that range has to exist. You may only have 1000, 1050, 1055 and 1060, for example.
Edit: The reason there is no command, function module or class to convert a range to individual values is because what needs to be done changes heavily depending on WHAT data the range is for and if/how much values need to be verified.
If you have an integer range, then all you need to do is take the from-value and add 1 to it until you reach the to-value. What about a range of binary floating point numbers? What about a range of colours? What about your range of company codes, where not all of them necessarily exist? That's why the conversion has to be done manually.

Provided you were given a string with a list of mixed values, both single and interval BUKRS values divided by dash, and this list is separated by comma+space, then
DATA: input TYPE string VALUE '1000, 1001, 1050-1060, 1100, 1300-1340',
itab TYPE TABLE OF char10,
r_bukrs TYPE RANGE OF bukrs.
SPLIT input AT `, ` INTO TABLE itab.
r_bukrs = VALUE #( FOR GROUPS bukrs OF <bukrs> IN itab WHERE ( table_line+4(1) NE '-' ) GROUP BY <bukrs> WITHOUT MEMBERS ( sign = 'I' option = 'EQ' low = bukrs ) ).
DATA(ranges) = VALUE ddtest_ttyp_char( FOR GROUPS bukrs OF <bukrs> IN itab WHERE ( table_line+4(1) EQ '-' ) GROUP BY <bukrs> WITHOUT MEMBERS ( bukrs ) ).
LOOP AT ranges ASSIGNING FIELD-SYMBOL(<range>).
r_bukrs = VALUE #( BASE r_bukrs FOR j = CONV i( <range>(4) ) UNTIL j = CONV i( <range>+5(4) ) + 1 ( sign = 'I' option = 'EQ' low = j ) ).
ENDLOOP.
The first table expression (7th line) fills r_bukrs with unique values from initial table string.
The second table expression (8th line) fills ranges table with dash ranges found in initial table string, 1050-1060 and 1300-1340 in our case.
In the loop through ranges table the <range>(4) is the left extrema of interval, and <range>+5(4) is the right extrema, e.g. 1300 and 1340 correspondingly for last value interval.

Related

MATLAB: Count string occurrences in table columns

I'm trying to find the amount of words in this table:
Download Table here: http://www.mediafire.com/file/m81vtdo6bdd7bw8/Table_RandomInfoMiddle.mat/file
Words are indicated by the "Type" criteria, being "letters". The key thing to notice is that not everything in the table is a word, and that the entry "" registers as a word. In other words I need to determine the amount of words, by only counting "letters", except if it is a "missing".
Here is my attempt (Yet unsuccessful - Notice the two mentions of "Problem area"):
for col=1:size(Table_RandomInfoMiddle,2)
column_name = sprintf('Words count for column %d',col);
MiddleWordsType_table.(column_name) = nnz(ismember(Table_RandomInfoMiddle(:,col).Variables,{'letters'}));
MiddleWordsExclusionType_table.(column_name) = nnz(ismember(Table_RandomInfoMiddle(:,col).Variables,{'<missing>'})); %Problem area
end
%Call data from table
MiddleWordsType = table2array(MiddleWordsType_table);
MiddleWordsExclusionType = table2array(MiddleWordsExclusionType_table); %Problem area
%Take out zeros where "Type" was
MiddleWordsTotal_Nr = MiddleWordsType(MiddleWordsType~=0);
MiddleWordsExclusionTotal_Nr = MiddleWordsExclusionType(MiddleWordsExclusionType~=0);
%Final answer
FinalMiddleWordsTotal_Nr = MiddleWordsTotal_Nr-MiddleWordsExclusionTotal_Nr;
Any help will be appreciated. Thank you!
You can get the unique values from column 1 when column 2 satisfies some condition using
MiddleWordsType = numel( unique( ...
Table_RandomInfoMiddle{ismember(Table_RandomInfoMiddle{:,2}, 'letters'), 1} ) );
<missing> is a keyword in a categorical array, not literally the string "<missing>". That's why it appears blue and italicised in the workspace. If you want to check specifically for missing values, you can use this instead of ismember:
ismissing( Table_RandomInfoMiddle{:,1} )

SELECT statement returning the column name instead of the VALUE (for that said column)

I'm trying to parse information in to a SELECT statement using the two column names 'id' and 'easy_high_score' so I can manipulate values of them two columns in my program, but when trying to get the value of the column 'easy_high_score', which should be an integer like 46 or 20, it instead returns a string of ('easy_high_score',).
Even though there is no mention of [('easy_high_score',)] in the table, it still prints this out. In the table, id 1 has the proper values and information i'm trying get but to no avail. I am fairly new to SQLite3.
if mode == "Easy":
mode = 'easy_high_score'
if mode == "Normal":
mode = "normal_high_score"
if mode == 'Hard':
mode == "hard_high_score"
incrementor = 1 ##This is used in a for loop but not necessary for this post
c.execute("SELECT ? FROM players WHERE id=?", (mode, incrementor))
allPlayers = c.fetchall()
print(allPlayers) #This is printing [('easy_high_score',)], when it should be printing an integer.
Expected Result: 20 (or an integer which represents the high score for easy mode)
Actual Result: [('easy_high_score',)]
Column name cannot be specified using a parameter it should be present verbatim in the query. Modify the line that executes the query like this:
c.execute("SELECT %s FROM players WHERE id=?" % mode, (incrementor,))
A possible cause of this is double quotes vs single quotes.
'SELECT "COLUMN_NAME" FROM TABLE_NAME' # will give values as desired
"SELECT 'COLUMN_NAME' FROM TABLE_NAME" # will give column name like what you got

How to convert matlab table [Inf], '' entry to char string

I have a Matlab table and want to create an SQL INSERT statement of this line(s).
K>> obj.ConditionTable
obj.ConditionTable =
Name Data Category Description
________________ ____________ _________________ ___________
'Layout' 'STR' '' ''
'Radius' [ Inf] 'Radius_2000_inf' ''
'aq' [ 0] '0' ''
'VehicleSpeed' [ 200] 'Speed_160_230' ''
Erros when conditionTable = obj.ConditionTable(1,:);
K>> char(conditionTable.Data)
Error using char
Cell elements must be character arrays.
K>> char(conditionTable.Description)
ans =
Empty matrix: 1-by-0
problem: the [Inf] entry
problem: possibly [123] number entries
problem: '' entries
Additionally, following commands are also useless in this matter:
K>> length(conditionTable.Data)
ans =
1
K>> isempty(conditionTable.Description)
ans =
0
Target Statement would be something like this:
INSERT INTO `ConditionTable` (`Name`, `Data`, `Category`, `Description`, `etfmiso_id`) VALUES ("Layout", "STR", "", "", 618);
Yes, num2str accept a single variable of any type and will return a string, so all these operations are valid:
>> num2str('123')
ans =
123
>> num2str('chop')
ans =
chop
>> num2str(Inf)
ans =
Inf
However, it can deal with purely numeric arrays (e.g. num2str([5 456]) is also valid), but it will bomb out if you try to throw a cell array at it (even if all your cells are numeric).
There are 2 possible way to work around that to convert all your values to character arrays:
1) use an intermediate cell array
I recreated a table [T] with the same data than in your example. Then running:
%% Intermediate Cell array
T3 = cell2table( cellfun( #num2str , table2cell(T) , 'uni',0) ) ;
T3.Properties.VariableNames = T.Properties.VariableNames
T3 =
Name Data Category Description
______________ _____ _________________ ___________
'Layout' 'STR' '' ''
'Radius' 'Inf' 'Radius_2000_inf' ''
'aq' '0' '0' ''
'VehicleSpeed' '200' 'Speed_160_230' ''
produces a new table containing only strings. Notice that we had to recreate the column names (copied from the initial table), as these are not transferred into the cell array during conversion.
These method is suitable for relatively small tables, as the round trip table/cellarray/table plus the call to cellfun will probably be quite slow for larger tables.
2) Use varfun function
varfun is for Tables the equivalent of cellfun for cell arrays. You'd think that a simple
T2 = varfun( #num2str , T )
would do the job then ... well no. This will error too. If you look at the varfun code at the line indicated by the error, you'll notice that internally, data in your table are converted to cell arrays and the function is applied to that. As we saw above, num2str errors when met with a cell array. The trick to overcome that, is to send a customised version of num2str which will accept cell arrays. For example:
cellnum2str = #(x) cellfun(#num2str,x,'uni',0)
Armed with that, you can now use it to convert your table:
%% Use "varfun"
cellnum2str = #(x) cellfun(#num2str,x,'uni',0) ;
T2 = varfun( cellnum2str , T ) ;
T2.Properties.VariableNames = T.Properties.VariableNames ;
This will produce the same table than in the example 1 above. Notice that again we had to reassign the column headers on the newly created table (the irony is varfun choked trying to apply the function on the column headers, but does not re-use or return them in the output ... go figure.)
discussion: Initially I tried to make the varfun solution work (hence the T2 name of the result), and wanted to recommend this one, because I didn't like the table/cell/table conversion of the other solution. Now I have seen what goes on into varfun, I am not so sure that this solution will be faster. It might be slightly more readable in a semantic way, but if speed is a concern you'll have to try both version and choose which one gives you the best results.
for the record: num2str(cell2mat(conditionTable.Data)), works, independant if 'abc', [Inf], [0], [123.123], apparently..

SELECT part of string between symbol and space

I need to create a subquery (or view) column with values pulled from part of a long string. Values will appear like this:
"Recruiter: Recruiter Name Date:..."
I need to select the recruiter name after : and end with the space after the recruiter name. I understand that normalizing would be better, but we only have query access not database setup access in this case.
Ideas appreciated!
You can use a regex for this. A regex will let you express that you want to search for the text Recruiter followed by a colon, a space, and a series of characters followed by a space, and that you want it to extract those characters.
The expression might look a bit like this (untested)
Recruiter: (.+) Date:
This would look for 'Recruiter: ' literally, followed by a string of any characters (.) of length 1 or larger (+), which is to be extracted (the brackets), followed by the literal string ' Date:'.
How you use this with SQL depends on your vendor.
I would create a function that pulls out the value for a given key. You would use it like:
select [dbo].[GetValue]('recruiter',
'aKey: the a value Recruiter: James Bond cKey: the c value')
This returns 'James Bond'
Here is the function:
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
create function [dbo].[GetValue](#Key varchar(50), #Line varchar(max))
returns varchar(max)
as
begin
declare #posStart int, #posEnd int
select #posStart=charindex(#Key, #Line) -- the start of the key
if(#posStart = 0)
return '' -- key not found
set #posStart = #posStart + len(#Key) + 1 -- the start of the value
select #line = substring(#line, #posStart, 1000) -- start #Line at the value
select #posEnd=charindex(':', #line) -- find the next key
if(#posEnd > 0)
begin
-- shorten #line to next ":"
select #line = substring(#line, 0, #posEnd)
-- take off everything after the value
select #posEnd = charindex(' ', reverse(#line));
if(#posEnd > 0)
select #line = substring(#line, 0, len(#line) - #posEnd + 1)
end
return rtrim(ltrim(#line))
end
go

Replace empty strings with null values

I am rolling up a huge table by counts into a new table, where I want to change all the empty strings to NULL, and typecast some columns as well. I read through some of the posts and I could not find a query, which would let me do it across all the columns in a single query, without using multiple statements.
Let me know if it is possible for me to iterate across all columns and replace cells with empty strings with null.
Ref: How to convert empty spaces into null values, using SQL Server?
To my knowledge there is no built-in function to replace empty strings across all columns of a table. You can write a plpgsql function to take care of that.
The following function replaces empty strings in all basic character-type columns of a given table with NULL. You can then cast to integer if the remaining strings are valid number literals.
CREATE OR REPLACE FUNCTION f_empty_text_to_null(_tbl regclass, OUT updated_rows int)
LANGUAGE plpgsql AS
$func$
DECLARE
_typ CONSTANT regtype[] := '{text, bpchar, varchar}'; -- ARRAY of all basic character types
_sql text;
BEGIN
SELECT INTO _sql -- build SQL command
'UPDATE ' || _tbl
|| E'\nSET ' || string_agg(format('%1$s = NULLIF(%1$s, '''')', col), E'\n ,')
|| E'\nWHERE ' || string_agg(col || ' = ''''', ' OR ')
FROM (
SELECT quote_ident(attname) AS col
FROM pg_attribute
WHERE attrelid = _tbl -- valid, visible, legal table name
AND attnum >= 1 -- exclude tableoid & friends
AND NOT attisdropped -- exclude dropped columns
AND NOT attnotnull -- exclude columns defined NOT NULL!
AND atttypid = ANY(_typ) -- only character types
ORDER BY attnum
) sub;
-- RAISE NOTICE '%', _sql; -- test?
-- Execute
IF _sql IS NULL THEN
updated_rows := 0; -- nothing to update
ELSE
EXECUTE _sql;
GET DIAGNOSTICS updated_rows = ROW_COUNT; -- Report number of affected rows
END IF;
END
$func$;
Call:
SELECT f_empty2null('mytable');
SELECT f_empty2null('myschema.mytable');
To also get the column name updated_rows:
SELECT * FROM f_empty2null('mytable');
db<>fiddle here
Old sqlfiddle
Major points
Table name has to be valid and visible and the calling user must have all necessary privileges. If any of these conditions are not met, the function will do nothing - i.e. nothing can be destroyed, either. I cast to the object identifier type regclass to make sure of it.
The table name can be supplied as is ('mytable'), then the search_path decides. Or schema-qualified to pick a certain schema ('myschema.mytable').
Query the system catalog to get all (character-type) columns of the table. The provided function uses these basic character types: text, bpchar, varchar, "char". Only relevant columns are processed.
Use quote_ident() or format() to sanitize column names and safeguard against SQLi.
The updated version uses the basic SQL aggregate function string_agg() to build the command string without looping, which is simpler and faster. And more elegant. :)
Has to use dynamic SQL with EXECUTE.
The updated version excludes columns defined NOT NULL and only updates each row once in a single statement, which is much faster for tables with multiple character-type columns.
Should work with any modern version of PostgreSQL. Tested with Postgres 9.1, 9.3, 9.5 and 13.

Resources