Cast some columns and select all columns without explicitly writing column names - python-3.x

I want to cast some columns and then select all others
id, name, property, description = column("id"), column("name"), column("property"), column("description")
select([cast(id, String).label('id'), cast(property, String).label('property'), name, description]).select_from(events_table)
Is there any way to cast some columns and select all with out mentioning all column names
I tried
select([cast(id, String).label('id'), cast(property, String).label('property')], '*').select_from(events_table)
py_.transform(return_obj, lambda acc, element: acc.append(dict(element)), [])
But I get two extra columns (total 7 columns) which are cast and I can't convert them to dictionary which throws key error.
I'm using FASTAPI, sqlalchemy and databases(async)
Thanks

Pretty sure you can do
select_columns = []
for field in events_table.keys()
select_columns.append(getattr(events_table.c, field))
select(select_columns).select_from(events_table)
to select all fields from that table. You can also keep a list of fields you want to actually select instead of events_table.keys(), like
select_these = ["id", "name", "property", "description"]
select_columns = []
for field in select_these
select_columns.append(getattr(events_table.c, field))
select(select_columns).select_from(events_table)

Related

How to get the size of a list returned by column in pyspark

name
contact
address
"max"
[{"email": "watson#commerce.gov", "phone": "650-333-3456"}, {"email": "emily#gmail.com", "phone": "238-111-7689"}]
{"city": "Baltimore", "state": "MD"}
"kyle"
[{"email": "johnsmith#yahoo.com", "phone": "425-231-8754"}]
{"city": "Barton", "state": "TN"}
I am working with a dataframe in Pyspark that has a few columns including the two mentioned above. I need to create columns dynamically based on the contact fields.
When I use the "." operator on contact as contact.email I get a list of emails. I need to create separate column for each of the emails.
contact.email0, contact.email1, etc.
I found this code online, which partially does what I want, but I don't completely understand it.
employee_data.select(
'name', *[col('contact.email')[i].alias(f'contact.email{i}') for i in range(2)]).show(truncate=False)
The range is static in this case, but my range could be dynamic. How can I get the size of list to loop through it? I tried size(col('contact.email')) or len(col('contact.email')) but got an error saying the col('column name') object is not iterable.
Desired output something like -
name
contact.email0
contact.email1
max
watson#commerce.gov
emily#gmail.com
kyle
johnsmith#yahoo.com
null
You can get desired output by using pivot function,
# convert contact struct to array of emails by using transform function
# explode the array
# perform pivot
df.select("name", posexplode_outer(expr("transform(contact, c-> c.email)"))) \
.withColumn("email", concat(lit("contact.email"), col("pos"))) \
.groupBy("name").pivot("email").agg(first("col")) \
.show(truncate=False)
+----+-------------------+---------------+
|name|contact.email0 |contact.email1 |
+----+-------------------+---------------+
|kyle|johnsmith#yahoo.com|null |
|max |watson#commerce.gov|emily#gmail.com|
+----+-------------------+---------------+
To understand what the solution you found does, we can print the expression in a shell:
>>> [F.col('contact.email')[i].alias(f'contact.email{i}') for i in range(2)]
[Column<'contact.email[0] AS `contact.email0`'>, Column<'contact.email[1] AS `contact.email1`'>]
Basically, it creates two columns, one for the first element of the array contact.email and one for the second element. That's all there is to it.
SOLUTION 1
Keep this solution. But you need to find the max size of your array first:
max_size = df.select(F.max(F.size("contact"))).first()[0]
df.select('name',
*[F.col('contact')[i]['email'].alias(f'contact.email{i}') for i in range(max_size)])\
.show(truncate=False)
SOLUTION 2
Use posexplode to generate one row per element of the array + a pos column containing the index of the email in the array. Then use a pivot to create the columns you want.
df.select('name', F.posexplode('contact.email').alias('pos', 'email'))\
.withColumn('pos', F.concat(F.lit('contact.email'), 'pos'))\
.groupBy('name')\
.pivot('pos')\
.agg(F.first('email'))\
.show()
Both solutions yield:
+----+-------------------+---------------+
|name|contact.email0 |contact.email1 |
+----+-------------------+---------------+
|max |watson#commerce.gov|emily#gmail.com|
|kyle|johnsmith#yahoo.com|null |
+----+-------------------+---------------+
You can use size or array_length functions to get the length of the list in the contact column, and then use that in the range function to dynamically create columns for each email. Here's an example:
from pyspark.sql.functions import size, array_length
contact_size = size(col('contact'))
employee_data.select(
'name', *[col('contact')[i]['email'].alias(f'contact.email{i}') for i in range(contact_size)]).show(truncate=False)
Or, using array_length:
from pyspark.sql.functions import size, array_length
contact_size = array_length(col('contact'))
employee_data.select(
'name', *[col('contact')[i]['email'].alias(f'contact.email{i}') for i in range(contact_size)]).show(truncate=False)

How can I convert from SQLite3 format to dictionary

How can i convert my SQLITE3 TABLE to a python dictionary where the name and value of the column of the table is converted to key and value of dictionary.
I have made a package to solve this issue if anyone got into this problem..
aiosqlitedict
Here is what it can do
Easy conversion between sqlite table and Python dictionary and vice-versa.
Get values of a certain column in a Python list.
Order your list ascending or descending.
Insert any number of columns to your dict.
Getting Started
We start by connecting our database along with
the reference column
from aiosqlitedict.database import Connect
countriesDB = Connect("database.db", "user_id")
Make a dictionary
The dictionary should be inside an async function.
async def some_func():
countries_data = await countriesDB.to_dict("my_table_name", 123, "col1_name", "col2_name", ...)
You can insert any number of columns, or you can get all by specifying
the column name as '*'
countries_data = await countriesDB.to_dict("my_table_name", 123, "*")
so you now have made some changes to your dictionary and want to
export it to sql format again?
Convert dict to sqlite table
async def some_func():
...
await countriesDB.to_sql("my_table_name", 123, countries_data)
But what if you want a list of values for a specific column?
Select method
you can have a list of all values of a certain column.
country_names = await countriesDB.select("my_table_name", "col1_name")
to limit your selection use limit parameter.
country_names = await countriesDB.select("my_table_name", "col1_name", limit=10)
you can also arrange your list by using ascending parameter
and/or order_by parameter and specifying a certain column to order your list accordingly.
country_names = await countriesDB.select("my_table_name", "col1_name", order_by="col2_name", ascending=False)

extract array of arrays in presto

I have a table in Athena (presto) with just one column named individuals and this is the type of then column:
array(row(individual_id varchar, ids array(row(type varchar, value varchar, score integer))))
I want to extract value from inside the ids and return them as a new array. As an example:
[{individual_id=B1Q, ids=[{type=H, value=efd3, score=1}, {type=K, value=NpS, score=1}]}, {individual_id=D6n, ids=[{type=A, value=178, score=6}, {type=K, value=NuHV, score=8}]}]
and I want to return
ids
[efd3, NpS, 178, NuHV]
I tried multiple solutions like
select * from "test"
CROSS JOIN UNNEST(individuals.ids.value) AS t(i)
but always return
Expression individuals is not of type ROW
select
array_agg(ids.value)
from test
cross join unnest(test.individuals) t(ind)
cross join unnest(ind.ids) t(ids)
result:
[efd3, NpS, 178, NuHV]
that will return all the id values as one row, which may or may not be what you want
if you want to return an array of individual values by individual_id:
select
ind.individual_id,
array_agg(ids.value)
from test
cross join unnest(test.individuals) t(ind)
cross join unnest(ind.ids) t(ids)
group by
ind.individual_id

How to get related field value from database in odoo 11 and postgresql?

I am trying to get a related field value from database, but it showing column 'column_name' does not exist.
When i try to find out the value of product_id or using join to find the common data between sale.order and product.product Model . but it showing column 'column_name' does not exist.
In sale.order model the field defination is like
product_id = fields.Many2one('product.product', related='order_line.product_id', string='Product')
But when i try to join two table like below code to fetch all data as per product, like below code.
select coalesce(p.name,'Unassigned Product'), count(*) from sale_order o left join product_product p on o.product_id = p.id where o.state = 'sale' group by p.name;
It showing below error,
column o.product_id does not exist
LINE 1: ... from sale_order o left join product_product p on o.product_...
When i try to get data from sale_order table like below code.
select product_id from sale_order;
It showing below error.
column "product_id" does not exist
Can any one help me to get that value.
To access a related field from database , you have to use the store=True , keyword.
Rewrite your field definition as,
product_id = fields.Many2one('product.product', related='order_line.product_id', string='Product', store=True)
and uninstall and install the module.

Select Query containing tuple With mixed single as well as double quotes

Postgresql select query containing tuple with single quotes as well as double quotes when giving this tuple as the input to select query it genrates error stating that specific value is not present in the database.
I have treid converting that list of values to JSON list with double quotes but that doesn't help either.
list = ['mango', 'apple', "chikoo's", 'banana', "jackfruit's"]
query = """select category from "unique_shelf" where "Unique_Shelf_Names" in {}""" .format(list)
ERROR: column "chikoo's" doesn't exist
Infact chikoo's does exist
But due to double quotes its not fetching the value.
Firstly please don't use list as a variable name, list is a reserved keyword and you don't wanna overwrite it.
Secondly, using "" around tables and columns is bad practice, use ` instead.
Thirdly, when you format an array, it outputs as
select category from `unique_shelf`
where `Unique_Shelf_Names` in (['mango', 'apple', "chikoo's", 'banana', "jackfruit's"])
Which is not a valid SQL syntax.
You can join all values with a comma
>>>print("""select category from `unique_shelf` where `Unique_Shelf_Names` in {})""".format(','.join(l)))
select category from `unique_shelf`
where `Unique_Shelf_Names` in (mango,apple,chikoo's,banana,jackfruit's)
The issue here is that the values inside the in bracket are not quoted. We can do that by formatting them beforehand using double quotes(")
l = ['mango', 'apple', "chikoo's", 'banana', "jackfruit's"]
list_with_quotes = ['"{}"'.format(x) for x in l]
query = """
select category from `unique_shelf`
where `Unique_Shelf_Names` in ({})""" .format(','.join(list_with_quotes))
This will give you an output of
select category from `unique_shelf`
where `Unique_Shelf_Names` in ("mango","apple","chikoo's","banana","jackfruit's")

Resources