Write set of sql files to same table - python-3.x

I have a set of *.sql files (>2000) which contain table creation scripts per file. My goal is to write all of the sql files to one table. I want to accomplish this by creating the table before reading and writing the files. Each file I read I want to filter to only contain the INSERT INTO lines.
I currently read the files as follows:
with open(file) as f:
sql = f.read()
This returns a string which holds the table creation string as well as the INSERT statements. Is there any simple string filtering way to only return the lines containing the INSERT statements?

Related

Excel Power Query - Appending Tables With Fields From Main Table Only

Appending Tables in Power Query from same workbook
I've been trying to append two tables and to keep only the fields from the main table (take out the secondary fields that do not exist in the main table) by appending queries within the same workbook. Desired Result.
The only approach I can think of is appending queries using sources outside of the Excel workbook and utilizing a folder to combine files from (use main table as a sample file) and then append any additional queries to that.Alternative Solution/Cumbersome and requires additional files
However, I need to append queries internally from the same workbook without the need of saving separate files/workbooks in a folder.
Try
=Table.Combine({MainTable, Table.RemoveColumns(SecondaryTable,List.Difference(Table.ColumnNames(SecondaryTable),Table.ColumnNames(MainTable)))})
like in
let Source = Excel.CurrentWorkbook(){[Name="MainTableSource"]}[Content],
z=Table.Combine({Source, Table.RemoveColumns(SecondaryTable,List.Difference(Table.ColumnNames(SecondaryTable),Table.ColumnNames(Source)))})
in z

Is it possible to create a view from external data?

I have some csv files in my data lake which are being quite frequently updated through another process. Ideally I would like to be able to query these files through spark-sql, without having to run an equally frequent batch process to load all the new files into a spark table.
Looking at the documentation, I'm unsure as all the examples show views that query existing tables or other views, rather than loose files stored in a data lake.
You can do something like this if your csv is in S3 under the location s3://bucket/folder:
spark.sql(
"""
CREATE TABLE test2
(a string, b string)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ","
LOCATION 's3://bucket/folder'
"""
)
You have to adapt the fields tho and the field separators.
To test it, you can first run:
Seq(("1","a"), ("2","b"), ("3","a"), ("4","b")).toDF("num", "char").repartition(1).write.mode("overwrite").csv("s3://bucket/folder")

Add column to CSV File from another CSV File (Azure Data Factory)

For example:
Persons.csv
name, last_name
-----------------------
jack, jack_lastName
luc, luc_lastname
FileExample.csv
id
243
123
Result:
name, last_name, exampleId
-------------------------------
jack, jack_lastName, 243
luc, luc_lastname, 123
I want to aggregate any number of columns from another data source, to insert that final result in a file or in a database table.
I have been trying many ways but I can't do it.
You can try to make use of Mergefiles in azure data factory pipeline to merge two csv files .
Select copydata activity and go to source to loop through wild card entry *.csv to search for csv files in storage(configure linked storage to adf in this process).
Then the create a output csv in the same container if required as in my case to merge files and store by naming it some examplemerge.csv.
Check mark the first row as header.
validate and try to debug .
Then you must be able to see merged files in the resultant merged file in output folder.
You can check this reference vlog Merge Multiple CSV files to single CSV for more details and also this vlog on Load Multiple CSV Files to a Table in Azure Data Factory if required.
But if you want to join the files , there must be some common column to join.
Also check this thread from Q&A Azure Data Factory merge 2 csv files with different schema

Sorting records in a CSV file

Tell me how you can sort records in CSV files using typescript + node js. Sort by Id.
The number of records in files can be up to 1 million.
Here's an example of file entries:
Blockquote
Here's a conceptual solution:
Create a new SQLite db with a table having the appropriate schema for your columns
Stream the data from the source CSV file, reading one line at a time: parse and insert the data from the line into the db table from the previous step
Create the output CSV file and append the header line
Iterate over the db table entries in the desired sort order, one at a time: convert each entry back into a CSV line in the correct column order, and then append the line to the CSV file in the previous step
Cleanup: (Optionally validate your new CSV file, and then) delete the SQLite db
If you can fit the entire parsed CSV data in memory at once, you push each line into an array instead of using a db. Then you just sort the array in-place and iterate its elements.

Copying Data into Cassandra table

Can we import/copy multiple files into acassandra table which are having same column name in a table and in files?
COPY table1(timestamp ,temp ,total_load ,designl) FROM 'file1', 'file2' WITH HEADER = 'true';
I tried using above syntax: but its saying
Improper COPY command.
i mean to to say suppose we have 100's of delimiter files with same columns, and i want to load all files into single cassandra table using single cql query?
is this possible:?
when i tried it using each COPY command for each file to a table it is Over Writing the data?
Please Help me!
You can specify more tables with the following synax:
COPY table1("timestamp", temp, total_load, designl) FROM 'file1, file2' WITH HEADER = 'true';
or you can also use wildcards:
COPY table1("timestamp", temp, total_load, designl) FROM 'folder/*.csv' WITH HEADER = 'true';
Two remarks however:
Timestamp is a type name in Cassandra, if your column has this name, you need to quote it as I did in the example above.
If your data is overwritten when executing several copy commands, then it will be overwritten even if you execute a single copy command. If you have several lines for the same PRIMARY KEY, then only the last row will win.

Resources