Is it possible to skip loading a row using the kiba-etl gem? - kiba-etl

Is there a way I can skip loading certain rows if I deem the row invalid using the kiba-etl gem?
For example, if there is a validation that must be passed before I load it into the system or errors that occur and I still need to push the data into to sys regardless while logging the problem.

Author of Kiba here! To remove a row from the pipeline, simply return nil at the end of a transform:
transform do |row|
row_valid = some_custom_operation
row_valid ? row : nil
end
You could also "write down" the offending rows, and report on them later using a post_process block like this (in this case, require a moderate to low number of bogus rows):
#bogus_row_ids = []
transform do |row|
# SNIP
if row_valid(row)
row
else
#bogus_row_ids << row[:id]
nil # remove from pipeline
end
end
post_process do
# do something with #bogus_row_ids, send an email, write a file etc
end
Let me know if this properly answers your question, or if you need a more refined answer.

I'm dumb. I realized you can just catch your errors within the transformation/loading process and return nil.

Related

Python Warning Panda Dataframe "Simple Issue!" - "A value is trying to be set on a copy of a slice from a DataFrame"

first post / total Python novice so be patient with my slow understanding!
I have a dataframe containing a list of transactions by order of transaction date.
I've appended an additional new field/column called ["DB/CR"], that dependant on the presence of "-" in the ["Amount"] field populates 'Debit', else 'Credit' in the absence of "-".
Noting the transactions are in date order, I've included another new field/column called [Top x]. The output of which is I want to populate and incremental independent number (starting at 1) for both debits and credits on a segregated basis.
As such, I have created a simple loop with a associated 'if' / 'elif' (prob could use else as it's binary) statement that loops through the data sent row 0 to the last row in the df and using an if statement 1) "Debit" or 2) "Credit" increments the number for each independently by "Debit" 'i' integer, and "Credit" 'ii' integer.
The code works as expected in terms of output of the 'Top x'; however, I always receive a warning "A value is trying to be set on a copy of a slice from a DataFrame".
Trying to perfect my script, without any warnings I've been trying to understand what I'm doing incorrect but not getting it in terms of my use case scenario.
Appreciate if someone can kindly shed light on / propose how the code needs to be refactored to avoid receiving this error.
Code (the df source data is an imported csv):
#top x debits/credits
i = 0
ii = 0
for ind in df.index:
if df["DB/CR"][ind] == "Debit":
i = i+1
df["Top x"][ind] = i
elif df["DB/CR"][ind] == "Credit":
ii = ii+1
df["Top x"][ind] = ii
Interpreter
df["Top x"][ind] = i
G:\Finances Backup\venv\Statementsv.03.py:173: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
df["Top x"][ind] = ii
Many thanks :)
You should use df.loc["DB/CR", ind] = "Debit"
Use iterrows() to iterate over the DF. However, updating DF while iterating is not preferable
see documentation here
Refer to the documentation here Iterrows()
You should never modify something you are iterating over. This is not
guaranteed to work in all cases. Depending on the data types, the
iterator returns a copy and not a view, and writing to it will have no
effect.

Cognos Report Studio: CASE and IF Statements

I'm very new in using Cognos report studio and trying to filter some of the values and replace them into others.
I currently have values that are coming out as blanks and want to replace them as string "Property Claims"
what i'm trying to use in my main query is
CASE WHEN [Portfolio] is null
then 'Property Claims'
ELSE [Portfolio]
which is giving me an error. Also have a different filter i want to put in to replace windscreen flags to a string value rather than a number. For example if the flag is 1 i want to place it as 'Windscreen Claims'.
if [Claim Windscreen Flag] = 1
then ('Windscreen')
Else [Claim Windscreen Flag]
None of this works with the same error....can someone give me a hand?
Your first CASE statement is missing the END. The error message should be pretty clear. But there is a simpler way to do that:
coalesce([Portfolio], 'Property Claims')
The second problem is similar: Your IF...THEN...ELSE statement is missing a bunch of parentheses. But after correcting that you may have problems with incompatible data types. You may need to cast the numbers to strings:
case
when [Claim Windscreen Flag] = 1 then ('Windscreen')
else cast([Claim Windscreen Flag], varchar(50))
end
In future, please include the error messages.
it might be syntax
IS NULL (instead of = null)
NULL is not blank. You might also want = ' '
case might need an else and END at the bottom
referring to a data type as something else can cause errors. For example a numeric like [Sales] = 'Jane Doe'
For example (assuming the result is a string and data item 2 is also a string),
case
when([data item 1] IS NULL)Then('X')
when([data item 1] = ' ')Then('X')
else([data item 2])
end
Also, if you want to show a data item as a different type, you can use CAST

Ruby Selenium send_keys_characters not sending the strings I want

xlsx = Roo::Excelx.new($docs_dir + '/mytestsheet.xlsx')
xlsx.each_row_streaming do |row|
send_keys_characters(row)
step %[I wait for 2 sec]
end
end
I've been struggling with pulling the values from an xlsx file and sending each cell with send_keys_characters into a manual input field on my website; I have 2 issues here and I can't seem to figure out the solutions:
it does not pull ONLY the value of the cell I want e.g "test1", "test2", instead it writes one after the other excel feature:
[#<Roo::Excelx::Cell::String:0x0000000008076b30 #cell_value="test1", #style=0, #coordinate=[1, 1], #value="test1">]
[#<Roo::Excelx::Cell::String:0x0000000006399800 #cell_value="test2", #style=0, #coordinate=[2, 1], #value="test2">]
How can I pull only the value of the cell and mention it on my input field and then Submit?
It needs to pick up the first value and then Submit and continue the scenario and then to return and pick the second value and third and so forth... How can I pick one value at a time and then go to the next step?
This is a ruby / selenium question.
So in essence your problem is looking for the code to enumerate over your objects and send the values you want.
I'm not familiar with Roo, but if they use standard readers then you can just do
array.each do |excel_string|
input_field.send_keys(excel_string)
end
This will concatenate all of the values, so your input field will say test1test2 e.t.c.
If you need to type these in across a set of fields i.e. input_field 1/2/3 I would use #zip to create a 2d-array pair
text_values = array.map(&:value)
zipped_array = input_fields.zip(text_values)
zipped_array.each do |input_field, value|
input_field.send_keys(value)
end

In Ruby, how would one create new CSV's conditionally from an original CSV?

I'm going to use this as sample data to simplify the problem:
data_set_1
I want to split the contents of this csv according to Column A - DEPARTMENT and place them on new csv's named after the department.
If it were done in the same workbook (so it can fit in one image) it would look like:
data_set_2
My initial thought was something pretty simple like:
CSV.foreach('test_book.csv', headers: true) do |asset|
CSV.open("/import_csv/#{asset[1]}", "a") do |row|
row << asset
end
end
Since that should take care of the logic for me. However, from looking into it, CSV#foreach does not accept file access rights as second parameter, and it gets an error when I run it. Any help would be appreciated, thanks!
I don't see why you would need to pass file access rights to CSV#foreach. This method just reads the CSV. How I would do this is like so:
# Parse the entire CSV into an array.
orig_rows = CSV.parse(File.read('test_book.csv'), headers: true)
# Group the rows by department.
# This becomes { 'deptA' => [<rows>], 'deptB' => [<rows>], etc }
groups = orig_rows.group_by { |row| row[1] }
# Write each group of rows to its own file
groups.each do |dept, rows|
CSV.open("/import_csv/#{dept}.csv", "w") do |csv|
rows.each do |row|
csv << row.values
end
end
end
A caveat, though. This approach does load the entire CSV into memory, so if your file is very large, it wouldn't work. In that case, the "streaming" approach (line-by-line) that you show in your question would be preferrable.

SYBASE cursor declare as for read only become infinitive

There is a simple cursor, which has 260 records. Inside the loop not only print but update some tables. There is not only code, but explanation
/*Declare cursor for read only*/
DECLARE crsr_one CURSOR FOR
SELECT a,b,c,d
FROM table_name
WHERE a>=100
for read only
OPEN crsr_one /*open cursor*/
DECLARE #a,#b,#c,#d /*declare loop variables*/
while (1=1) /*start while loop*/
BEGIN
FETCH crsr_one into #a,#b,#c,#d /*fetch into variable */
IF (##sqlstatus = 2) /*Break if no more records*/
break
/*some other code with select and update table*/
print "%1! %2! %3! %4!", #a,#b,#c,#d /*Print variables*/
END
The problem is:
In the while loop it became infinitive and brings the same data.
Any idea why and how to correct it?
Your code looks good (except that the syntax for the DECLARE is invalid).
If the loop doesn't break on ##sqlstatus = 2, then the obvious question is: what value does it have? It can also be '1', indicating an error. To find out, print the value.
To be fully correct you should therefor also test for ##sqlstatus = 1. The easiest way to do this is to test for ##sqlstatus != 0 which covers both values 1 and 2.

Resources