In Bigquery, How can I convert Struct of Struct of String to Columns - struct

So, In the table, there are 3 columns as per Image , 3rd one is Record(Struct), conntaing 2 structs old and new. Inside those structs there are columns and values.
I can access each final column by this -change.old.name , But I want to convert them as normal columns and create another taable with that ?
tried unnest but doesn't work as it's not array.
Data structure image
UPDATE :
Finally got it sorted. Should select and convert all columns by selecting all of the nested data and set the alias as how we want or replace dot with an underscore. Then create a table with that.
create table abc
as
select
ID
,Created_on
,Change.old.add as Change_old_add
,Change.old.name as Change_old_name
,Change.old.count_people as Change_old_count_people
,Change.new.add as Change_new_add
,Change.new.name as Change_new_name
,Change.new.count_people as Change_new_count_people
FROM `project.Table`

Finally got it sorted. Should select and convert all columns by selecting all of the nested data and set the alias as how we want or replace dot with an underscore. Then create a table with that.
create table abc
as
select
ID
,Created_on
,Change.old.add as Change_old_add
,Change.old.name as Change_old_name
,Change.old.count_people as Change_old_count_people
,Change.new.add as Change_new_add
,Change.new.name as Change_new_name
,Change.new.count_people as Change_new_count_people
FROM `project.Table`

Related

How to Flatten a semicolon Array properly in Azure Data Factory?

Context: I've a data flow that extracts data from SQL DB, when data comes is just one column with a string separated by tab, in order to manipulate the data properly, I've tried to separate every single column with its corresponding data:
Firstly, to 'rebuild' the table properly I used a 'Derived Column' activity replacing tab with semicolons instead (1)
dropLeft(regexReplace(regexReplace(regexReplace(descripcion,[\t],';'),[\n],';'),[\r],';'),1)
So, after that use 'split()' function to get an array and build the columns (2)
split(descripcion, ';')
Problem: When I try to use 'Flatten' activity (as here https://learn.microsoft.com/en-us/azure/data-factory/data-flow-flatten), is just not working and data flow throws me just one column or if I add an additional column in the 'Flatten' activity I just get another column with the same data that the first one:
Expected output:
column2
column1
column3
2000017
ENVASE CORONA CLARA 24/355 ML GRAB
PC13
2004297
ENVASE V FAM GRAB 12/940 ML USADO
PC15
Could you say me what i'm doing wrong, guys? thanks by the way.
You can use the derived column activity itself, try as below.
After the first derived column, what you have is a string array which can just be split again using derived schema modifier.
Where firstc represent the source column equivalent to your column descripcion
Column1: split(firstc, ';')[1]
Column2: split(firstc, ';')[2]
Column3: split(firstc, ';')[3]
Optionally you can select the columns you need to write to SQL sink

How to get the data from previous row in Azure data factory

I am working on transforming data in Azure data factory
I have a source file that contains data like this:
ABC Code-01
DEF
GHI
JKL Code-02
MNO
I need to make the data looks like this to the sink file:
ABC Code-01
DEF Code-01
GHI Code-01
JKL Code-02
MNO Code-02
You can achieve this using Fill down concept available in Azure data factory. The code snippet is available here.
Note: The code snippet assumes that you have already added source transformation in data flow.
Steps:
Add source and link it with the source file (I have generated file with your sample data).
Edit the data flow script available on the right corner to add code.
Add the code snippet after the source as shown.
source1 derive(dummy = 1) ~> DerivedColumn
DerivedColumn keyGenerate(output(sk as long),
startAt: 1L) ~> SurrogateKey
SurrogateKey window(over(dummy),
asc(sk, true),
Rating2 = coalesce(Rating, last(Rating, true()))) ~> Window1
After adding the code in the script, data flow generated 3 transformations
a. Derived column transformation with a new dummy column with constant “1”
b. SurrogateKey transformation to generate Key value for each row starting with value 1.
c. Window transformation to perform window based aggregation. Here the code add predefined clause last() to take previous row not Null vale if current row value is NULL.
For more information on Window transformation refer - https://learn.microsoft.com/en-us/azure/data-factory/data-flow-window
As I am getting the values as single column in source, added additional columns in Derived column to split and store the single source column into 2 columns.
Substitute NULL values if column value is blank. If it is blank, last() clause will not recognize as NULL to substitute previous values.
case(length(dropLeft(Column_1,4)) >1, dropLeft(Column_1,4), toString(null()))
Preview of Derived column: Column_1 is the Source raw data, dummy is the column generated from the code snippet added with constant 1, Column1Left & Column1Right are to store the values after splitting (Column_1) raw data.
Note: Column1Right blank values are replaced with NULLs.
In windows transformation:
a. Over – This partition the source data based on the column provided. As there no other columns to uses as partition column, add the dummy column generated using derived column.
b. Sort – Sorts the source data based on the sort column. Add the Surrogate Key column to sort the incoming source data.
c. Window Column – Here, provide the expression to copy not Null value from previous rows only when the current value is Null
coalesce(Column1Right, last(Column1Right,true()))
d. Data preview of window transformation: Here, Column1Right data Null Values are replaced by previous not Null values based on the expression added in Window Columns.
Second derived column is added to concat Column1Left and Column1Right as single column.
Second Derived column preview:
A select transformation is added to only select required columns to the sink and remove unwanted columns (This is optional).
sink data output after fill down process.

How to get the column names after table extracted from a PDF file using camelot? I'm new for this

Briefly I am doing this steps.
tables = camelot.read_pdf(doc_file)
tables[0].df
I am using tables[0].df.columns to get column names from the extracted table.
But it does not give the column names.
Camelot extracted tables have no alphabetic column names.
tables[0].df.columns returns, for example, for three columns table:
RangeIndex(start=0, stop=3, step=1)
Instead, you can try to read the first row and get a list from it: tables[0].df.iloc[0].tolist().
The output could be:
['column1', 'column2', 'column3']

Tableau: Multiple columns in a filter

I have three numeric fields named A,B,C and wants them in a single filter in tableau and based on the one selected in that filter a line chart will be shown. For e.g. in filter Stages B column is selected and line chart of B is shown. Had it been column A selected then line chart of A would be displayed .
Pardon my way of asking question by showing a image. I just picked up learning tableau and not getting this trick any where.
Here is the snapshot of data
Create a (list) parameter named 'ABC'. With the values
A
B
C
Then create a calculated field
IF ABC = 'A' THEN [column_a]
ELSEIF ABC = 'B' THEN [column_b]
ELSEIF ABC = 'C' THEN [column_c]
END
Something like that should work for you. Check out Tableau training here. It's free, but you have to sign up for an account.
Another way without creating a calculated field. Just pivot the three columns to rows and your field on which you can apply filter is created. Let me show you
This is screenshot of input data
I converted three cols to pivots to get data reshaped like this
After renaming pivoted-fields column to Stages I can add directly this one to view and get my desired result.

Power Query: Split table column with multiple cells in the same row

I have a SharePoint list as a datasource in Power Query.
It has a "AttachmentFiles" column, that is a table, in that table i want the values from the column "ServerRelativeURL".
I want to split that column so each value in "ServerRelativeURL"gets its own column.
I can get the values if i use the expand table function, but it will split it into multiple rows, I want to keep it in one row.
I only want one row per unique ID.
Example:
I can live with a fixed number of columns as there are usually no more than 3 attachments per ID.
I'm thinking that I can add a custom column that refers to "AttachmentFiles ServerRelativeURL Value(1)" but I don't know how.
Can anybody help?
Try this code:
let
fn = (x)=> {x, #table({"ServerRelativeUrl"},List.FirstN(List.Zip({{"a".."z"}}), x*2))},
Source = #table({"id", "AttachmentFiles"},{fn(2),fn(3),fn(1)}),
replace = Table.ReplaceValue(Source,0,0,(a,b,c)=>a[ServerRelativeUrl],{"AttachmentFiles"}),
cols = List.Transform({1..List.Max(List.Transform(replace[AttachmentFiles], List.Count))}, each "url"&Text.From(_)),
split = Table.SplitColumn(replace, "AttachmentFiles", (x)=>List.Transform({0..List.Count(x)-1}, each x{_}), cols)
in
split
I manged to solve it myself.
I added 3 custom columns like this
CustomColumn1: [AttachmentFiles]{0}
CustomColumn2: [AttachmentFiles]{1}
CustomColumn3: [AttachmentFiles]{2}
And expanded them with only the "ServerRelativeURL" selected.
It would be nice to have a dynamic solution. But this will work fine for now.

Resources