How can I convert rows with three columns into SQL insert statements? - excel

I have a spreadsheet looking like this:
あう to meet
青 あお blue
青い あおい blue
Is there a way that I could convert the data in these columns into three SQL statements that I could then use to enter the data into a database? I am not so much concerned in asking how to form the statement but I would like to know if there's a way I can take data in columns and make it into a script?
If I could convert this into:
col1: 青い col2: あおい col3: blue
I could add and modify this for the correct format
INSERT INTO JLPT col1,col2,col3 VALUES ('青い', 'あおい', 'blue')
etc

Use the formula
="('"&A1&"', '"&B1&"', '"&C1&"'), "
in column D and copy the formula down for all the rows. Then prepend
Insert into JPT (col1, col2, col3) values
and your are done. The end result will be something like this:
Just don't forget to delete the last comma (and optionally exchange it for a semicolon) when you copy over the data from Excel.

Related

how to delete double quotes from dataframe column?

I have got a dataframe in which there is one column that I want to split, a sample row looks as follows:
"Parameter:'river':Chainage"
Now if i split this row
test_line.split(':')
I receive:
['Parameter', "'river'", 'Chainage']
However, I dont want river to include "''", but only '' , hence:
['Parameter', 'river', 'Chainage']
How do I get there?

I want to create a computed column based off a substring of another column in SQL

I have a column called TAG_
Data in the TAG_ column could like the below:
STV-123456
TV-12456
ME-666666
I want to create two computed columns
One that shows the first part of TAG_ before the hyphen
STV
TV
ME
One that shows the second part of TAG_ after the hyphen
123456
12456
666666
This shouldn't be hard but the light bulb is not on yet. Please help.
try this:
SELECT SUBSTRING(TAG_ ,0,CHARINDEX('-',TAG_ ,0)) AS before,
SUBSTRING(TAG_ ,CHARINDEX('-',TAG_ ,0)+1,LEN(TAG_ )) AS after from testtable
and the result:
Hope this helps!
Example for MySQL, syntax is likely different for other vendors:
create table t
( tag_ text not null
, fst text generated always as (substr(tag_, 1, locate('-', tag_)-1)) stored
, snd text generated always as (substr(tag_, locate('-', tag_)+1)) stored
);
Fiddle

In Bigquery, How can I convert Struct of Struct of String to Columns

So, In the table, there are 3 columns as per Image , 3rd one is Record(Struct), conntaing 2 structs old and new. Inside those structs there are columns and values.
I can access each final column by this -change.old.name , But I want to convert them as normal columns and create another taable with that ?
tried unnest but doesn't work as it's not array.
Data structure image
UPDATE :
Finally got it sorted. Should select and convert all columns by selecting all of the nested data and set the alias as how we want or replace dot with an underscore. Then create a table with that.
create table abc
as
select
ID
,Created_on
,Change.old.add as Change_old_add
,Change.old.name as Change_old_name
,Change.old.count_people as Change_old_count_people
,Change.new.add as Change_new_add
,Change.new.name as Change_new_name
,Change.new.count_people as Change_new_count_people
FROM `project.Table`
Finally got it sorted. Should select and convert all columns by selecting all of the nested data and set the alias as how we want or replace dot with an underscore. Then create a table with that.
create table abc
as
select
ID
,Created_on
,Change.old.add as Change_old_add
,Change.old.name as Change_old_name
,Change.old.count_people as Change_old_count_people
,Change.new.add as Change_new_add
,Change.new.name as Change_new_name
,Change.new.count_people as Change_new_count_people
FROM `project.Table`

How to Flatten a semicolon Array properly in Azure Data Factory?

Context: I've a data flow that extracts data from SQL DB, when data comes is just one column with a string separated by tab, in order to manipulate the data properly, I've tried to separate every single column with its corresponding data:
Firstly, to 'rebuild' the table properly I used a 'Derived Column' activity replacing tab with semicolons instead (1)
dropLeft(regexReplace(regexReplace(regexReplace(descripcion,[\t],';'),[\n],';'),[\r],';'),1)
So, after that use 'split()' function to get an array and build the columns (2)
split(descripcion, ';')
Problem: When I try to use 'Flatten' activity (as here https://learn.microsoft.com/en-us/azure/data-factory/data-flow-flatten), is just not working and data flow throws me just one column or if I add an additional column in the 'Flatten' activity I just get another column with the same data that the first one:
Expected output:
column2
column1
column3
2000017
ENVASE CORONA CLARA 24/355 ML GRAB
PC13
2004297
ENVASE V FAM GRAB 12/940 ML USADO
PC15
Could you say me what i'm doing wrong, guys? thanks by the way.
You can use the derived column activity itself, try as below.
After the first derived column, what you have is a string array which can just be split again using derived schema modifier.
Where firstc represent the source column equivalent to your column descripcion
Column1: split(firstc, ';')[1]
Column2: split(firstc, ';')[2]
Column3: split(firstc, ';')[3]
Optionally you can select the columns you need to write to SQL sink

Selecting a column not in cube in Spark

I have a dataframe which has say 3 columns x,y and z.
I want to get all the three columns in result but I do not want to cube on column z.
Is there a way I can do it?
P.S. - (I have just given example with 3 columns but I have quite a long list of columns so GROUP SET is not an option).
Example -
val df = Seq(("1","x","a"),("1","v","b"),("3","x","c")).toDF("col1","col2","col3")
val list = Seq("col1","col2").map(e=>col(e))
// now I want to select col3 non cubed (basically I do not want get the combinations for it)
// This guy will not select col3 at all since col3 is not part of cube which is I want to achieve
display(df.select($"col1",$"col2",$"col3").cube(list:_*).agg(sum("col1")))
Cube is an extension of GroupBY in which you will get the aggregated result for the various combinations of columns used to group by.
Here is an example of what you can achieve using groupBy,
df.cube($"col1",$"col2").agg(first($"col3").as("col3")).show
Please share your expected result as suggested by Shaido.

Resources