I want to write a script to query data from Hive using ODBC. However, the column that I have to make a condition on happen to have a white space in it. As a result, it returned only column names but no result. So, I'd like to know how I can escape the white space in column name, "item id", so that I can get the result I desire returned.
Here is the example of code I used.
import pyodbc
import pandas as pd
query = "SELECT * FROM tableA " \
"WHERE 'item id' RLIKE 'AB001.*' LIMIT 2 "
with pyodbc.connect("DSN=HIVE_ODBC", autocommit=True) as conn:
df = pd.read_sql(query, conn)
df
Thank you in advance
Try the following:
query = "SELECT * FROM tableA " \
"WHERE `item id` RLIKE 'AB001.*' LIMIT 2 "
e.i., instead single-quote use `.
Related
I'm using psycopg2 to connect to postgre DB, and to export the data into CSV file.
This is how I made the export DB to csv:
def export_table_to_csv(self, table, csv_path):
sql = "COPY (SELECT * FROM %s) TO STDOUT WITH CSV DELIMITER ','" % table
self.cur.execute(sql)
with open(csv_path, "w") as file:
self.cur.copy_expert(sql, file)
But the data is just the rows - without the column names.
How can I export the data with the column names?
P.S. I am able to print the column names:
sql = '''SELECT * FROM test'''
self.cur.execute(sql)
column_names = [desc[0] for desc in self.cur.description]
for i in column_names:
print(i)
I want the cleanest way to do export the DB with columns name (i.e. I prefer to do this in one method, and not rename columns In retrospect).
As I said in my comment, you can add HEADER to the WITH clause of your SQL:
sql = "COPY (SELECT * FROM export_test) TO STDOUT WITH CSV HEADER"
By default, comma delimiters are used with CSV option so you don't need to specify.
For future Questions, you should submit a minimal reproducible example. That is, code we can directly copy and paste and run. I was curious if this would work so I made one and tried it:
import psycopg2
conn = psycopg2.connect('host=<host> dbname=<dbname> user=<user>')
cur = conn.cursor()
# create test table
cur.execute('DROP TABLE IF EXISTS export_test')
sql = '''CREATE TABLE export_test
(
id integer,
uname text,
fruit1 text,
fruit2 text,
fruit3 text
)'''
cur.execute(sql)
# insert data into table
sql = '''BEGIN;
insert into export_test
(id, uname, fruit1, fruit2, fruit3)
values(1, 'tom jones', 'apple', 'banana', 'pear');
insert into export_test
(id, uname, fruit1, fruit2, fruit3)
values(2, 'billy idol', 'orange', 'cherry', 'strawberry');
COMMIT;'''
cur.execute(sql)
# export to csv
fid = open('export_test.csv', 'w')
sql = "COPY (SELECT * FROM export_test) TO STDOUT WITH CSV HEADER"
cur.copy_expert(sql, fid)
fid.close()
And the resultant file is:
id,uname,fruit1,fruit2,fruit3
1,tom jones,apple,banana,pear
2,billy idol,orange,cherry,strawberry
I am trying to split full name into three columns, remove the value in parenthesis and star sign as fullname,First Name and Last Name Using spark SQL. I tried with Regex and split but no avail. Can any one help me with the code snippet.
example:
Fullname
Nayaka,Malya (Nayaka)
Toyoda,Masayuki *Toyoda *
Suzy,Thamas *FUNCTIONAL *
Expected output:
Fullname
Firstname
Lastname
Fullname
Nayaka,Malya (Nayaka)
Malya
Nayaka
Nayaka.Malya
Toyoda,Masayuki *Toyoda *
Masayuki
Toyoda
Toyoda,Masayuki
Suzy,Thamas *FUNCTIONAL *
Thamas
Suzy
Suzy,Thamas
This should do the trick:
spark.sql(""" select split ( ltrim(rtrim(regexp_replace(split(fullname, ',')[1], '[^a-zA-Z ]', ''))), ' ')[0] as FIRSTNAME,
ltrim(rtrim(regexp_replace(split(fullname, ',')[0], '[^a-zA-Z ]', ''))) as LASTNAME from employee """).show(false)
Sorry for this but I'm real new to sqlite: i've created a database from an excel sheet I had, and I can't seem to fetch the values of the column I need
query = """ SELECT GNCR from table"""
cur.execute(query)
This actually works, but
query = """ SELECT ? from table"""
cur.execute(query, my_tuple)
doesn't
Here's my code:
def print_col(to_print):
db = sqlite3.connect('my_database.db')
cur = db.cursor()
query = " SELECT ? FROM my_table "
cur.execute(query, to_print)
results = cur.fetchall()
print(results)
print_col(('GNCR',))
The result is:
[('GNCR',), ('GNCR',), ('GNCR',), ('GNCR',), [...]]
instead of the actual values
What's the problem ? I can't figure it out
the "?" character in query is used for parameter substitution. Sqlite will escape the parameter you passed and replace "?" with the send text. So in effect you query after parameter substitution will be SELECT 'GNCR' FROM my_table where GNCR will be treated as text so you will get the text for each row returned by you query instead of the value of that column.
Basically you should use the query parameter where you want to substitute the parameter with escaped string like in where clause. You can't use it for column name.
I would like to generate the query that results for the BeneficiaryID 'ABC123' along with some other inputs if they were also given. Suppose if the currency value is given, I would like to include the Currency condition as well in the JOIN query, so as well the Category. I have the following code snippet in the SOAP UI Groovy script.
query= " CORR.BeneficiaryID LIKE 'ABC123'"
if (currencyValue!=""){
query=query + " and CORR.Currency LIKE '${currencyValue}'"
}
if (CategoryValue!=""){
query=query + " and CORR.Category LIKE '${CategoryValue}'"
}
log.info("Query" + query)
Outputrows = sql.rows("select CORR.Preferred as preferred ,CORR.Category as category,CORR.Currency as currency\
from BENEFICIARY CORR \
JOIN LOCATION LOC on CORR.UID=LOC.UID and ${query}
log.info("Output rows size" + Outputrows.size())
When currency and category are not given, I would like to have the following query run and get me the results.
select CORR.Preferred as preferred ,CORR.Category as category,CORR.Currency as currency\
from BENEFICIARY CORR \
JOIN LOCATION LOC on CORR.UID=LOC.UID and CORR.BeneficiaryID LIKE 'ABC123'
and when the currency and category are given(say USD & Commercial), then the following query.
select CORR.Preferred as preferred ,CORR.Category as category,CORR.Currency as currency\
from BENEFICIARY CORR \
JOIN LOCATION LOC on CORR.UID=LOC.UID and CORR.BeneficiaryID LIKE 'ABC123' and CORR.Currency LIKE 'USD' and CORR.Category LIKE 'Commercial'
All I could see on the result for Outputrows.size() is zero(0).
Can you please correct me where am I doing wrong.
Thanks.
Here is changed script.
Since the issue to just build query, only putting that part remove sql execution part as that is not really the issue.
//Define the values or remove if you get those value from somewhere else
//Just put them here to demonstrate
//You may also try by empty value to make sure you are getting the right query
def currencyValue = 'USD'
def categoryValue = 'Commercial'
def query = 'select CORR.Preferred as preferred, CORR.Category as category,CORR.Currency as currency from BENEFICIARY CORR JOIN LOCATION LOC on CORR.UID = LOC.UID and CORR.BeneficiaryID LIKE \'ABC123\''
currencyValue ? (query += " and CORR.Currency LIKE '${currencyValue}'") : query
categoryValue ? (query += " and CORR.Category LIKE '${categoryValue}'") : query
log.info "Final query is \n ${query}"
You can just pass query to further where you need to run the sql, say sql.rows(query)
You may quickly try Demo
Anyone know of a handy function to search through column_names in Vertica? From the documentation, it seems like \d only queries table_names. I'm looking for something like MySQL's information_schema.columns, but can't find any information about a similar table of meta-data.
Thanks!
In 5.1 if you have enough permissions you can do
SELECT * FROM v_catalog.columns;
to access columns's info, for some things you'll need to join with
v_catalog.tables
The answer may differ depending on the version of Vertica you are using.
In the latest version, 5.1, there is a COLUMNS system table. Just from looking at the online documentation here seems to be the most useful columns with their types:
TABLE_SCHEMA VARCHAR
TABLE_NAME VARCHAR
DATA_TYPE VARCHAR
That should give you what you need. If your version doesn't have the system table, let me know what version you're running and I'll see what we can do.
Wrap this python script in a shell function and you'll be able to see all tables that contain any two columns:
import argparse
parser = argparse.ArgumentParser(description='Find Vertica attributes in tables')
parser.add_argument('names', metavar='N', type=str, nargs='+', help='attribute names')
args = parser.parse_args()
def vert_attributes(*names):
first_name = names[0].lower()
first = "select root.table_name, root.column_name from v_catalog.columns root "
last = " where root.column_name like '%s' " % first_name
names = names[1:]
if len(names) >= 1:
return first + " ".join([" inner join (select table_name from v_catalog.columns where column_name like '%s') q%s on root.table_name = q%s.table_name " % (name.lower(), index, index) for index,name in enumerate(names)]) + last
else:
return first + last
print nz_attributes(*tuple(args.names))