Databricks Magic Sql - Export Data - databricks

Is it possible to export the output of a "magic SQL" command cell in Databricks?
I like the fact that one doesn't have to escape the SQL command and it can be easily formatted. But, I cant seem to be able to use the output in other cells. What I would like to do is export the data to a CSV file, but potentially, finish some final manipulation of the dataframe before I write it out.
sql = "select * from calendar"
df = sqlContext.sql(sql)
display(df.limit(10))
vs (DBricks formatted the following code)
%sql
select
*
from
calendar
but imagine, once you bring in escaped strings, nested joins, etc. Wondering if there is a better way to work with SQL in databricks.

The simplest solution is the most obvious one that I didn't think of: create a view!
%sql
CREATE OR REPLACE TEMPORARY VIEW vwCalendar as
/*
Comments to make your future self happy!
*/
select
c.line1, -- more comments
c.line2, -- more comments
c.zipcode
from
calendar
where
c.status <> 'just an example\'s' -- <<imagine escaping this
and now you can use the view vwCalendar in subsequent SQL cells just like any other table.
and if you want to use it in a python cell:
df = spark.table("vwCalendar")
display(df.limit(3))
https://docs.databricks.com/spark/latest/spark-sql/language-manual/sql-ref-syntax-ddl-create-view.html
https://docs.databricks.com/spark/latest/spark-sql/udf-python.html#user-defined-functions---python

Related

Spark SQL - declaring and using variables in SQl Notebook

In Azure data bricks i created SQL note book. I am trying to use the variables and use that across multiple SQL statements. e.g. declare fiscal year and use that across where criteria. Intent is to avoid hardcoding.
It looks i have to use Python / Scala. Is there any way to achieve this using pure SQL statements?
e.g.:
var #fiscalYear = 2018;
select * from table where fiscalyear = #fiscalyear
Check this link out:
https://forums.databricks.com/questions/176/how-do-i-pass-argumentsvariables-to-notebooks.html
Another way is to do this (to set variable value):
%python
dbutils.widgets.text("var","text")
dbutils.widgets.remove("var")
Then you can go:
%sql
select * from table where value = '$var'

how to transfer data from a single csv/excel file to multiple oracle tables

I am having a excel file having 15 columns and 1000's of records. I want to load that data to 3 different oracle tables.
How can we do this ?
Should I convert the file to csv first?
Also, there is one more complication to task. I need to perform some validations before inserting to tables, eg.
I have a column A in excel I want to use the column A to derive value B from an Oracle table say 'tab' , then store B on table 'tab'.
Generally it is easier to work with raw data using SQL, so the first step is to get the raw data queryable in the easiest fashion.
The neatest solution is to use an external table. Convert the Excel spreadsheet into a CSV file then define an external table to query the file. Then you can use ...
INSERT INTO << table1 >> (...)
SELECT what_ever FROM << external_table >>
... or even ...
INSERT ALL
INTO << table1 >> (...)
INTO << table2 >> (...)
INTO << table3 >> (...)
SELECT * FROM << external_table >>
... depending on what rules you need to apply.
If your organization already uses external tables this should be easy to configure. However, some places are funny about allowing the database to interact with OS files, so you may not be able to use this approach. Find out more.
Alternatively you can build a staging table which matches the CSV file, and load the data into that using SQL*Loader. SQL*loader is a client-side tool, so it is requires less permissions to use.
If you don't want to build any new structures at all you can edit the CSV file to form a set of SELECT statements from DUAL (use UNION ALL) and insert into the target tables from that. Mastery of regex can really help with tasks like this.
If it is a routine activity,
Use SQLLDR - SQL Loader to load the raw data to a staging table.
Write a Stored Procedure with business logic needed to transform the data from staging table to main tables

Access TransferSpreadsheet Excel - Prevent Duplicates?

I am working on an application where there is a desire to automate data entry as much as possible. The wish is to add a button to such entry forms for choosing an excel file to import. I have done this for one interface, and now I'm working on others. I'm looking for the best way to prevent duplicates are imported into a table. For the one I am working on now, it is a simple 2 column import. One method I have used before is to import the spreadsheet into a temp table. Then I can utilize a query to insert where <> . I just wonder if this is the best method to use.
Any thoughts?
Thanks!
Something like this should work. I can tailor it more if you list some more details of your projects.
From "External Data" on the ribbon, link to the excel file.
Then write the following query:
INSERT INTO table1
(
field1,
field2
)
SELECT
a.field1,
a.field2
FROM tableExcel AS a
LEFT JOIN table1 AS b ON a.field1 = b.field1
WHERE (((a.field1) Is Null));
Then just attach a macro to the button running the query above.
I ended up finding the solution that will work best. I can put an index on the 2 fields that are getting imported from the spreadsheet, into the table. Then before I issue the transferspreadsheet command, I will set warnings false, and set them true once it is done. This way, the user won't get errors for the indexes doing their job of rejecting duplicates.
Anyone see any problem with that solution? The only bummer is that if I imported to a temp table, I could get a count of items first and verify the count after insert, so I could report some info to the user in the process. Other than that, this means I don't need a temp table, and I can go directly into the goal table without worry about importing dupes.

Create a Volatile table in teradata

I have a sharepoint list which i have linked to in MS Access.
The information in this table needs to be compared to information in our datawarehouse based on keys both sets of data have.
I want to be able to create a query which will upload the ishare data into our datawarehouse under my login run the comparison and then export the details to Excel somewhere. MS Access seems to be the way to go here.
I have managed to link the ishare list (with difficulties due to the attachment fields)and then create a local table based on this.
I have managed to create the temp table in my Volatile space.
How do i append the newly created table that i created from the list into my temporary space.
I am using Access 2010 and sharepoint 2007
Thank you for your time
If you can avoid using Access I'd recommend it since it is an extra step for what you are trying to do. You can easily manipulate or mesh data within the Teradata session and export results.
You can run the following types of queries using the standard Teradata SQL Assistant:
CREATE VOLATILE TABLE NewTable (
column1 DEC(18,0),
column2 DEC(18,0)
)
PRIMARY INDEX (column1)
ON COMMIT PRESERVE ROWS;
Change your assistant to Import Mode (File-> Import Data)
INSERT INTO NewTable (?,?)
Browse for your file, this example would be a comma delineated file with two numeric columns and column one being the index.
You can now query or join this table to any information in the uploaded database.
When you are finished you can drop with:
DROP TABLE NewTable
You can export results using File->Export Data as well.
If this is something you plan on running frequently there are many ways to easily do these type of imports and exports. The Python module Pandas has simple functionality for reading a query directly into DataFrame objects and dropping those objects into Excel through the pandas.io.sql.read_frame() and .to_excel functions.

how to work with csv files in vim

I want csv file to be opened in vim in the same way it opens in microsoft office . Data should be in column format and commas should not be seen and its should be traversed easily. Is it possible in vim with help of any plug-ins?
I am probably a little bit later answering that question, but for completeness I'll answer anyway. I have made the csv plugin that should make it possible to do what you want.
Among others, it allows:
Display on which column the cursor is as well as number of columns
Search for text within a column using :SearchInColumn command
Highlight the column on which the cursor is using :HiColumn command
Visually arrange all columns using :ArrangeColumn command
Delete a Column using :DeleteColumn command
Display a vertical or horizontal header line using :Header or :VHeader command
Sort a Column using :Sort command
Copy Column to register using :Column command
Move a column behind another column using :MoveCol command
Calculate the Sum of all values within a column using :SumCol command (you can also define your own custom aggregate functions)
Move through the columns using the normal mode commands (W forwards, H backwards, K upwards, J downwards)
sets up a nice syntax highlighting, concealing the delimiter, if your Vim supports it
I've tried Christian's csv plugin, and it is useful for quick looks at csv files, especially when you need to look at many different files.
However, when I'm going to be looking at the same csv file more than a few times, I import the file into sqlite3, which makes further analysis much faster and easier to perform.
For instance, if my file looks like this:
file.csv:
field1name, field2name, field3name
field1data, field2data, field3data
field1data, field2data, field3data
I create a new sqlite db (from the command line):
commandprompt> sqlite3 mynew.db
Then create a table in the db to import the file into:
sqlite> create table mytable (field1name, field2name, field3name);
sqlite> .mode csv
sqlite> .headers ON
sqlite> .separator ,
sqlite> .import file.csv mytable
Now the new table 'mytable' contains the data from the file, but the first row is storing the header, which you don't typically want, so you need to delete it (use single quotes around the field value; if you use double quotes you'll delete all rows):
sqlite> delete from mytable where field1name = 'field1name';
Now you can easily look at the data, filter by complex formulas, sort by multiple fields, etc.
sqlite> select * from mytable limit 30;
(Sorry this turned into a sqlite tutorial but it seems like every time that I don't import into sqlite, I end up spending much more time using vim/less/grep/sort/cut than I would have had I just imported in the first place).
There's also exist rainbow_csv vim plugin. It will highlight csv/tsv file columns in different "rainbow" colors and will allow you to write SQL-like SELECT and UPDATE queries using Python or JavaScript expressions.
You probably want to look at sc as an alternative.. Have a look at this linux journal page
Here's some tips for working with CSV files in vim:
http://vim.wikia.com/wiki/Working_with_CSV_files
I'm not sure if there's a way to display it in columns, without commas, though the tips in that link allow vim to traverse and manipulate CSV very easily.
I use #chrisbra's plugin,
" depending on your package manager
dein#add('chrisbra/csv.vim')
and I add a quick command on page load;
could be risky on large records.
autocmd BufRead *.csv :%ArrangeColumn

Resources