I have an Excel file with a few columns (20) and some data that I need to upload into 4 SQL Server tables. The tables are related and specific columns represents my id for each table.
Is there an ETL tool that I can use to automate this process?
This query uses bulk insert to store the file in a #temptable
and then inserts the contents from this temp table into the table you want in the database, however the file being imported is .csv. you can just save your excel file as csv, before doing this.
CREATE TABLE #temptable (col1,col2,col3)
BULK INSERT #temptable from 'C:\yourfilelocation\yourfile.csv'
WITH
(
FIRSTROW = 2,
fieldterminator = ',',
rowterminator = '0x0A'
) `
INSERT INTO yourTableInDataBase (col1,col2,col3)
SELECT (col1,col2,col3)
FROM #temptable
To automate this, you can put this inside a stored procedure and call the stored procedure using batch.Edit this code and put this inside textfile and save as cmd
set MYDB= yourDBname
set MYUSER=youruser
set MYPASSWORD=yourpassword
set MYSERVER=yourservername
sqlcmd -S %MYSERVER% -d %MYDB% -U %MYUSER% -P %MYPASSWORD% -h -1 -s "," -W -Q "exec yourstoredprocedure"
Related
I am trying to extract the DDL of tables and store it in .sql files using pandas
The code I have tried is :
query = "show table tablename"
df = pd.read_sql(query, connect)
df.to_csv('xyz.sql', index=False, header=False, quoting=None)
This creates a .sql file with the DDL like this -
" CREATE TABLE .....
.... ; "
How do I write the file without the quotes, like -
CREATE TABLE .....
.... ;
Given a string s, such as "CREATE ...",
one can delete double-quote characters with:
s = s.replace('"', '')
And don't forget
maketrans,
which (with translate) is very good at efficiently
deleting unwanted characters from very long strings.
I have created new table with csv file with following code
%sql
SET spark.databricks.delta.schema.autoMerge.enabled = true;
create table if not exists catlog.schema.tablename;
COPY INTO catlog.schema.tablename
FROM (SELECT * FROM 's3://bucket/test.csv')
FILEFORMAT = CSV
FORMAT_OPTIONS ('mergeSchema' = 'true', 'header' = 'true')
but i have new file with additional data. how can i load that please guide?
thanks
need to load new datafile in delta table
I tried to reproduce the same in my environment and got the below
Make sure, to check whether the schema and file.csv data_type should match otherwise you will get an error.
Please follow below syntax insert data from csv file
%sql
copy into <catalog>.<schema>.<table_name>
from "<file_loaction>/file_3.csv"
FILEFORMAT = csv
FORMAT_OPTIONS('header'='true','inferSchema'='True');
For Eg:
1)File has
ID|Name|job|hobby|salary|hobby2
2)Data:
1|ram|architect|tennis|20000|cricket
1|ram|architect|football|20000|gardening
2|krish|teacher|painting|25000|cooking
3)Table:
Columns in table: ID-Name-Job-Hobby-Salary
Is it possible to load data into table as below:
1-ram-architect-tenniscricketfootbalgardening-20000
2-krish-teacher-paintingcooking-25000
Command: db2 "Load CLIENT FROM ABC.FILE of DEL MODIFIED BY coldel0x7x keepblanks REPLACE INTO tablename(ID,Name,Job,Hobby,salary) nonrecoverable"
You cannot achieve what you think you want in a single action with either LOAD CLIENT or IMPORT.
You are asking to denormalize, and I presume you understand the consequences.
Regardless, you can use a multi-step approach, first load/import into a temporary table, and then in a second step use SQL to denormalize into the final table, before discarding the temporary table.
Or if you are adept with awk , and the data file is correctly sorted, then you can pre-process the file externally to a database before load/import.
Or use an ETL tool.
You may use the INGEST command instead of LOAD.
You must create the corresponding infrastructure for this command beforehand with the following command, for example:
CALL SYSINSTALLOBJECTS('INGEST', 'C', 'USERSPACE1', NULL);
Load your file afterwards with the following command:
INGEST FROM FILE ABC.FILE
FORMAT DELIMITED by '|'
(
$id INTEGER EXTERNAL
, $name CHAR(8)
, $job CHAR(20)
, $hobby CHAR(20)
, $salary INTEGER EXTERNAL
, $hobby2 CHAR(20)
)
MERGE INTO tablename
ON ID = $id
WHEN MATCHED THEN
UPDATE SET hobby = hobby CONCAT $hobby CONCAT $hobby2
WHEN NOT MATCHED THEN
INSERT (ID, NAME, JOB, HOBBY, SALARY) VALUES($id, $name, $job, $hobby CONCAT $hobby2, $salary);
I am working on copying data from a source Oracle database to a Target SQL data warehouse using the Data factory.
When using the copy function in data factory, we are asked to specify the destination location and a table to copy the data to. There are multiple tables that needs to be copied, and therefore making a table for each in the destination is time consuming.
How can I setup data factory to copy data from the source to a destination, where it will automatically create a table at the destination, without having to explicitly create them manually?
TIA
Came across the same issue last year, used pipeline.parameters() for dynamic naming and a Data Factory stored procedure activity before the copy activity to first create the empty table from a template before copying https://learn.microsoft.com/en-us/azure/data-factory/transform-data-using-stored-procedure.
CREATE PROCEDURE create_sql_table_proc #WindowStartYear NVARCHAR(30), #WindowStartMonth NVARCHAR(30), #WindowStartDay NVARCHAR(30)
AS
BEGIN
declare #strsqlcreatetable as [NVARCHAR](255)
declare #strsqldroptable as [NVARCHAR](255)
declare #tablename as [NVARCHAR](255)
declare #strsqlsetpk as [NVARCHAR](255)
select #tablename = 'TABLE_NAME_' + #WindowStartYear + #WindowStartMonth + #WindowStartDay
select #strsqldroptable = 'DROP TABLE IF EXISTS ' + #tablename
select #strsqlcreatetable = 'SELECT * INTO ' + #tablename + ' FROM OUTPUT_TEMPLATE'
select #strsqlsetpk = 'ALTER TABLE ' + #tablename + ' ADD PRIMARY KEY (CustID)'
exec (#strsqldroptable)
exec (#strsqlcreatetable)
exec (#strsqlsetpk)
END
Since have started pushing the table to SQL from our Pyspark scripts running on a cluster, where it is not necessary to first create the empty table https://medium.com/#radek.strnad/tips-for-using-jdbc-in-apache-spark-sql-396ea7b2e3d3.
I'd like to switch an actual system importing data into a PostgreSQL 9.5 database from CSV files to a more efficient system.
I'd like to use the COPY statement because of its good performance. The problem is that I need to have one field populated that is not in the CSV file.
Is there a way to have the COPY statement add a static field to all the rows inserted ?
The perfect solution would have looked like that :
COPY data(field1, field2, field3='Account-005')
FROM '/tmp/Account-005.csv'
WITH DELIMITER ',' CSV HEADER;
Do you know a way to have that field populated in every row ?
My server is running node.js so I'm open to any cost-efficient solution to complete the files using node before COPYing it.
Use a temp table to import into. This allows you to:
add/remove/update columns
add extra literal data
delete or ignore records (such as duplicates)
, before inserting the new records into the actual table.
-- target table
CREATE TABLE data
( id SERIAL PRIMARY KEY
, batch_name varchar NOT NULL
, remote_key varchar NOT NULL
, payload varchar
, UNIQUE (batch_name, remote_key)
-- or::
-- , UNIQUE (remote_key)
);
-- temp table
CREATE TEMP TABLE temp_data
( remote_key varchar -- PRIMARY KEY
, payload varchar
);
COPY temp_data(remote_key,payload)
FROM '/tmp/Account-005'
;
-- The actual insert
-- (you could also filter out or handle duplicates here)
INSERT INTO data(batch_name, remote_key, payload)
SELECT 'Account-005', t.remote_key, t.payload
FROM temp_data t
;
BTW It is possible to automate the above: put it into a function (or maybe a prepared statement), using the filename/literal as argument.
Set a default for the column:
alter table data
alter column field3 set default 'Account-005'
Do not mention it the the copy command:
COPY data(field1, field2) FROM...