Table comprehensions: get subset from internal table into another one - subset

As stated in the topic, I want to have a conditioned subset of an internal
table inside another internal table.
Let us first look, what it may look like the old fashioned way.
DATA: lt_hugeresult TYPE tty_mytype,
lt_reducedresult TYPE tty_mytype.
SELECT "whatever" FROM "wherever"
INTO CORRESPONDING FIELDS OF TABLE lt_hugeresult
WHERE "any_wherecondition".
IF sy-subrc = 0.
lt_reducedresult[] = lt_hugeresult[].
DELETE lt_reducedresult WHERE col1 EQ 'a value'
AND col2 NE 'another value'
AND col3 EQ 'third value'.
.
.
.
ENDIF.
We all may know this.
Now I was reading about the table reducing stuff, which is introduced
with abap 7.40, appearently SP8.
Table Comprehensions – Building Tables Functionally
Table-driven:
VALUE tabletype( FOR line IN tab WHERE ( … )
( … line-… … line-… … )
)
For each selected line in the source table(s), construct a line in the result table. Generalization of value constructor from static to dynamic number of lines.
I was experimenting with that, but the results seem not really to fit,
perhaps I am doing it wrong, or I might even need the condition-driven approach.
So, how would it look like, if I want to write the above statement with table comprehension techniques ?
Until now I have this, delivering not that, what I need, and I have seen, that
it seems, as if the "not equal" is not possible...
DATA(reduced) = VALUE tty_mytype( FOR checkline IN lt_hugeresult
WHERE ( col1 = 'a value' )
( col2 = 'another value' )
( col3 = space )
).
Anyone having some hints ?
EDIT: Seems still not to work. Here is, as I do it:
Executable line:
Debugger results:
Wrong Reduced:
And what now ???

You could use the FILTER operator with the EXCEPT WHERE addition to filter out any rows that match the where clause:
lt_reducedresult = FILTER # ( lt_hugeresult EXCEPT WHERE col1 = 'a value'
AND col2 <> 'another value'
AND col3 = 'a third value' ).
Note that lt_hugeresult would have to be a sorted table, and the col1/col2/col3 need to be key components (you can specify a secondary key using the USING KEY addition).
The documentation for FILTER explicitly notes that:
Table filtering can also be performed using a table comprehension or a table reduction with an iteration expression for table iterations with FOR. The operator FILTER provides a shortened format for this special case and is more efficient to execute.
A table filter constructs the result row by row. If the result contains almost all rows in the source table, this method can be slower than copying the source table and deleting the surplus rows from the target table.
So your approach of using DELETE might actually be appropriate depending on the size of the table.

The Table Iterations may be a lot confusing when you use WHERE, because of parenthesis groups.
The "NOT EQUAL" condition is very well supported, as I show below in the solution of your first example. The issue you observe is due to misproper use of parenthesis groups.
You must absolutely define the whole logical expression after WHERE Inside ONE parenthesis group (one, or several elementary conditions separated by logical operators AND, OR, etc.)
After the parenthesis group for WHERE, you define usually only one parenthesis group which corresponds to the line to be added to the target internal table. You may define subsequent parenthesis groups, if for each line in the source internal table, you want to add several lines in the target internal table.
In your example, only the first parenthesis group applies to WHERE (either col1 = 'a value' in your first example, or insplot = _ilnum in your second example).
The subsequent parenthesis groups correspond to the lines to be added, i.e. 2 lines are added for each source line in the first example (one line with col2 = 'another value', and one line with col3 = space), and 3 lines are added for each source line in the second example (one line with inspoper = i_evaluation-inspoper, one line with inspchar = i_evaluation-inspchar, one line corresponding to the line of _single_results).
So, you should write your code as follows.
First example :
DATA(reduced) = VALUE tty_mytype( FOR checkline IN lt_hugeresult
WHERE ( col1 = 'a value'
AND col2 <> 'another value'
AND col3 = 'third value'
)
( checkline )
).
Second example :
DATA(singres) = VALUE tbapi2045d4( FOR checkline IN _single_results
WHERE ( insplot = _ilnum
AND inspoper = i_evaluation-inspoper
AND inspchar = i_evaluation-inspchar
)
( checkline )
).

I compared old-fashioned syntax of your above example with table comprehension technique and got exactly the same result.
Actually, your sample is not functional because it lacks row specification for constructed table reduced.
Try this one, which worked for me.
DATA(reduced) = VALUE tty_mytype( FOR checkline IN lt_hugeresult
WHERE ( col1 = 'a value' AND
col2 = 'another value' AND
col3 = space )
( checkline )
).
In the above sample we have the most basic type of result row specification where is is absolutely similar to source table. More sophisticated examples, where new table rows are evaluated with table iterations, can be found here.

Related

What is the industry standard Deduping method in Dataflows?

So Deduping is one of the basic and imp Datacleaning technique.
There are a number of ways to do that in dataflow.
Like myself doing deduping with help of aggregate transformation where i put key columns(Consider "Firstname" and "LastName" as cols) which are need to be unique in Group by and a column pattern like name != 'Firstname' && name!='LastName'
$$ _____first($$) in aggregate tab.
The problem with this method is ,if we have a total of 200 cols among 300 cols to be considered as Unique cols, Its a very tedious to do include 200 cols in my column Pattern.
Can anyone suggest a better and optimised Deduping process in Dataflow acc to the above situation?
I tried to repro the deduplication process using dataflow. Below is the approach.
List of columns that needs to be grouped by are given in dataflow parameters.
In this repro, three columns are given. This can be extended as per requirements.
Parameter Name: Par1
Type: String
Default value: 'col1,col2,col3'
Source is taken as in below image.
(Group By columns: col1, col2, col3;
Aggregate column: col4)
Then Aggregate transform is taken and in group by,
sha2(256,byNames(split($Par1,','))) is given in columns and it is named as groupbycolumn
In Aggregates, + Add column pattern near column1 and then delete Column1. Then Enter true() in matching condition. Then click on undefined column expression and enter $$ in column name expression and first($$) in value expression.
Output of aggregation function
Data is grouped by col1,col2 and col3 and first value of col4 is taken for every col1,col2 and col3 combination.
Then using select transformation, groupbycolumn from above output can be removed before copying to sink.
Reference: ** MS document** on Mapping data flow script - Azure Data Factory | Microsoft Learn

Reduce results to first match for each pattern with spark sql

I have a spark sql query, where I have to search for multiple identifiers:
SELECT * FROM my_table WHERE identifier IN ('abc', 'cde', 'efg', 'ghi')
Now I get hundreds of results for each of these matches, where I am only interested in the first match for each identifier, i.e. one row with identifier == 'abc', one where identifier == 'cde' and so on.
What is the best way to reduce my result to only the first row for each match?
The best approach certainly depends a bit on your data and also on what you mean by first. Is that any random row that happens to be returned first? Or first by some particular sort order?
A general flexible approach is using window functions. row_number() allows you to easily filter for the first row by window.
SELECT * FROM (
SELECT *, row_number() OVER (PARTITION BY identifier ORDER BY ???) as row_num
FROM my_table
WHERE identifier IN ('abc', 'cde', 'efg', 'ghi')) tmp
WHERE
row_num = 1
Though, aggregations like first or max_by are often more efficient. But these get quickly inconvenient when dealing with lots of columns.
You can use the first() aggregation function (after grouping by identifier) to only get the first row in each group.
But I don't think you'll be able to select * with this approach. Instead, you can list every individual column you want to get:
SELECT identifier, first(col1), first(col2), first(col3), ...
FROM my_table
WHERE identifier IN ('abc', 'cde', 'efg', 'ghi')
GROUP BY identifier
Another approach would be to fire a query for each identifier value with a limit of 1 and then union all the results.
With the DataFrame API, you can use your original query and then use .dropDuplicates(["identifier"]) on the result to only keep a single row for each identifier value.

SQL Server: use all the words of a string as separate LIKE parameters (and all words should match)

I have a string containing a certain number of words (it may vary from 1 to many) and I need to find the records of a table which contains ALL those words in any order.
For instances, suppose that my input string is 'yellow blue red' and I have a table with the following records:
1 yellow brown white
2 red blue yellow
3 black blue red
The query should return the record 2.
I know that the basic approach should be something similar to this:
select * from mytable where colors like '%yellow%' and colors like '%blue%' and colors like '%red%'
However I am not being able to figure out how turn the words of the string into separate like parameters.
I have this code that splits the words of the string into a table, but now I am stuck:
DECLARE #mystring varchar(max) = 'yellow blue red';
DECLARE #terms TABLE (term varchar(max));
INSERT INTO #terms
SELECT Split.a.value('.', 'NVARCHAR(MAX)') term FROM (SELECT CAST('<X>'+REPLACE(#mystring, ' ', '</X><X>')+'</X>' AS XML) AS String) AS A CROSS APPLY String.nodes('/X') AS Split(a)
SELECT * FROM #terms
Any idea?
First, put that XML junk in a function:
CREATE FUNCTION dbo.SplitThem
(
#List NVARCHAR(MAX),
#Delimiter NVARCHAR(255)
)
RETURNS TABLE
WITH SCHEMABINDING
AS
RETURN ( SELECT Item = y.i.value(N'(./text())[1]', N'nvarchar(4000)')
FROM ( SELECT x = CONVERT(XML, '<i>'
+ REPLACE(#List, #Delimiter, '</i><i>')
+ '</i>').query('.')
) AS a CROSS APPLY x.nodes('i') AS y(i));
Now you can extract the words in the table, join them to the words in the input string, and discard any that don't have the same count:
DECLARE #mystring varchar(max) = 'red yellow blue';
;WITH src AS
(
SELECT t.id, t.colors, fc = f.c, tc = COUNT(t.id)
FROM dbo.mytable AS t
CROSS APPLY dbo.SplitThem(t.colors, ' ') AS s
INNER JOIN (SELECT Item, c = COUNT(*) OVER()
FROM dbo.SplitThem(#mystring, ' ')) AS f
ON s.Item = f.Item
GROUP BY t.id, t.colors, f.c
)
SELECT * FROM src
WHERE fc = tc;
Output:
id
colors
fc
tc
2
red blue yellow
3
3
Example db<>fiddle
This disregards any possibility of duplicates on either side and ignores the larger overarching issue that this is the least optimal way possible to store sets of things. You have a relational database, use it! Surely you don't think the tags on this question are stored somewhere as the literal string
string sql-server-2012 sql-like
Of course not, these question:tag relationships are stored in a, well, relational table. Splitting strings is for the birds and those with all kinds of CPU and time to spare.
If you are storing a delimited list in a single column then you really need to normalize it out into a separate table.
But assuming you actually want to just do multiple free-form LIKE comparisons, you can do them against a list of values:
select *
from mytable t
where not exists (select 1
from (values
('%yellow%'),
('%blue%'),
('%red%')
) v(search)
where t.colors not like v.search
);
Ideally you should pass these values through as a Table Valued Parameter, then you just put that into your query
select *
from mytable t
where not exists (select 1
from #tmp v
where t.colors not like v.search
);
If you want to simulate an OR semantic rather than AND the change not exists to exists and not like to like.

(Power) Pivot - show true/false and calculate at the same time

I have the following data source:
My pivot rows are Team => Project Name with "Value" column in the Values. I am calculating the % ration of all projects that have value "True" compared to all projects that have a value (disregarding those without values). Here's the formula I use in PowerPivot:
=CALCULATE(COUNTROWS(),'Table'[Value]=TRUE()) / CALCULATE(COUNTROWS(), ('Table'[Value]=FALSE() || 'Table'[Value]=TRUE()), ISLOGICAL('Table'[Value]))
The formula works, however I only need to see this percentage on the "Team" level, the expanded projects should still have "True/False" values. Is this possible? Preferably, without VBA.
Format your code. If you like reading very long lines, that's fine, but use DAX Formatter for the rest of us.
True vs All =
CALCULATE(
COUNTROWS( 'Table' ) // It's considered a best practice
// to explicitly name the table in
// COUNTROWS()
,'Table'[Value]=TRUE()
) / CALCULATE(
COUNTROWS( 'Table' )
// You can remove the test for [Value] = TRUE() ||
// [Value] = FALSE()
,ISLOGICAL('Table'[Value])
)
ConditionalDisplay =
IF(
ISFILTERED( 'Table'[Project] )
&& HASONEVALUE( 'Table'[Project] )
,VALUES( 'Table'[Value] )
,[True vs All]
)
[True vs All] is a cleaned up version of your existing measure.
[ConditionalDisplay] does what its name says. Displays a different value based on conditions.
We check for ISFILTERED() to cover the edge case where a given value of [Team] has only a single project. We check for HASONEVALUE() to cover the case where an explicit filter (slicer or filter) exists on [Project], but more than one is in context (grand total level).
When the two are true, we return VALUES( 'Table'[Value] ), the column made up of the distinct values in [Value]. This is only evaluated when we already know there's exactly one distinct value. A 1x1 table is implicitly converted to scalar in DAX.
When there's more than one distinct value of [Value] or it's not filtered, then we return your original measure.
[ConditionalDisplay] will fail if you have two rows for the same value of [Project] with multiple values of [Value].

How to insert column name in the destination table in ssis?

!reference to question1
As shown in the image... I have an excel sheet, which contains 32 tables one after the other (I have taken 2 tables in the image) may grow the table count... but the metadata is same for all the tables.Table has two columns one is constant(Name) & another one will get change(TPA,TPB.. etc) but there is no change in the column position.
now the problem is how to hold the header and inserted as a T_type value into the destination table ?
the no of rows in each table is not fixed( so we can't go for cell reference).
The problem as I understand it
I believe you have data in Excel that looks approximately
Name | TPA
abc | x
...
Name | TPB
acz | p
The data could be described as blocks of data. A block is defined bounded by a starting row with the value of Name in it. The next cell on that row will contain a value that applies to all subsequent rows.
After the header row, you will need to pull out the key value pairs and write them plus the table name into your destination.
The meta data remains consistent, it's just the source data is all banjaxed.
Resolution
This is exactly the problem I had to overcome when I wrote SSIS Excel Source via SSIS. We had to source our data feeds from reports instead of clean tabular data. Using that approach, you would simply define your equivalent ParseSample method and there in the foreach loop (line 71 of ExcelParser) you'd put in the logic of a block is everything from a field with a value of 'Name' until you encounter an empty row.
Psuedocode approximate
# enumerate through all my source data
foreach row in source data
# assign values to local variables
col0 = row[0]
col1 = row[1]
# Test for end of block
if col0 == "Name"
tableName = col1
else if col0 == string.Empty
# do nothing
else
newRow = dataTable.NewRow()
newRow[0] = col0
newRow[1] = tableName
newRow[2] = col1
dataTable.Add(newRow)
If you want to simplify the matter, you can have all the parsing logic in the ScriptMain and dispense with all the data table nonsense.
Upside is there'd be less code, downside is that debugging scripts is the devil in SSIS pre-2012. It's still kludgey in 2012 but it's better than the nothing that came before it.

Resources