Tools: Peewee 3, SQLite, Python 3
Official documentation for Peewee 3 recursive common table expression (cte):
http://docs.peewee-orm.com/en/latest/peewee/querying.html#common-table-expressions
I am storing a family tree in a simple self-referencing table called Person.
Structure (see below): id, name, parent, custom_order
Notes:
- parent field equals null if this person is an ancestor / root item, otherwise equals to id of parent record if this person is a child
- custom_order is a float number (score to determine who is the user's favourite person)
Objective:
I would like to retrieve the whole family tree and ORDER the results FIRST by parent and SECOND by custom_order.
Issue:
I managed to get the results list but the ORDER is wrong.
DB model
class Person(Model):
name = CharField()
parent = ForeignKeyField('self', backref='children', null = True)
custom_order = FloatField()
Note: if parent field is null then it's a root item
Query code
# Define the base case of our recursive CTE. This will be people that have a null parent foreign-key.
Base = Person.alias()
base_case = (Base
.select(Base)
.where(Base.parent.is_null())
.cte('base', recursive=True))
# Define the recursive terms.
RTerm = Person.alias()
recursive = (RTerm
.select(RTerm)
.join(base_case, on=(RTerm.parent == base_case.c.id)))
# The recursive CTE is created by taking the base case and UNION ALL with the recursive term.
cte = base_case.union_all(recursive)
# We will now query from the CTE to get the people
query = cte.select_from(cte.c.id, cte.c.name, cte.c.parent_id, cte.c.custom_order).order_by(cte.c.parent_id, cte.c.custom_order)
print(query.sql())
Printed query syntax
('WITH RECURSIVE "base" AS
(
SELECT "t1"."id", "t1"."name", "t1"."parent_id", "t1"."custom_order" FROM "person" AS "t1" WHERE ("t1"."parent_id" IS ?)
UNION ALL
SELECT "t2"."id", "t2"."name", "t2"."parent_id", "t2"."custom_order" FROM "person" AS "t2" INNER JOIN "base" ON ("t2"."parent_id" = "base"."id")
)
SELECT "base"."id", "base"."name", "base"."parent_id" FROM "base"
ORDER BY "base"."parent_id", "base"."custom_order"',
[None])
Root of the problem
The code posted in the question works correctly. I verified it by printing the query results to the console:
query = cte.select_from(cte.c.id, cte.c.name, cte.c.parent_id, cte.c.custom_order).order_by(cte.c.parent_id, cte.c.custom_order).dicts()
print(json.dumps(list(query), indent=4))
The problem originated from the fact that I was passing the query results to a nested python Dictionary before printing them to the console, BUT the Python Dictionary is unordered. So, no wonder the printed results were in a different order than the database results.
Solution
Use a Python Ordered Dictionary if you want to store the query results in a fixed order:
import collections
treeDictionary = collections.OrderedDict()
Related
Is there a way to aggregate values in a column in sqlite in one-to-many relationship into array?
For example, I have 2 tables like this:
Artists:
ArtistId name
1 AC/DC
2 Accept
Albums:
AlbumId ArtistId Title
1 1 For Those About To Rock We Salute You
2 1 Let There Be Rock
3 2 Balls to the Wall
4 2 Restless and Wild
When I just do a query with a join:
SELECT
Name,
Title
FROM
artists
JOIN albums USING(ArtistId)
WHERE artists.ArtistId = 1;
I get:
I found out that I can do group_concat:
SELECT
Name,
GROUP_CONCAT(Title)
FROM
artists
JOIN albums USING(ArtistId)
WHERE artists.ArtistId = 1;
To concatenate all values together:
But I still have to parse the coma-separated string with titles: For Those About To Rock We Salute You,Let There Be Rock in the code to get the array of titles for each artist.
I use Python and I'd prefer to get something like a tuple for each row:
(name, titlesArray)
A much easier way in this case for me would be to use json.loads and json.dumps functions to save all the "many" array members into the same row in the same table, instead of using the recommended way for databases to save values in different tables and then use joins to retrieve them: the "many" values is an array on the object, and it's just much easier to save and get them using just 2 functions: json.loads and json.dumps, compared to manually saving the "many" values into a separate table, create binding to the "one" value, then use group_concat to concat them into a string, and then parse it more to actually get my array back.
Is it possible to get an array of values, or do I have to do group_concat and parse the string?
You might not be able to receive an array from sqlite straight away, but you can achieve the result with a very little edit on your query and a single line in python.
group_concat supports a custom delimiter that you can use later to split the entries.
Let's assume you have something like this:
from typing import Typle
import sqlite3
def connect(file: str = None) -> sqlite3.Connection:
connection = None
try:
connection = sqlite3.connect(file)
except sqlite3.Error:
raise
return connection
def select(connection: sqlite3.Connection) -> Tuple(str, str)):
entry = None
try:
sql = """
SELECT
Name,
GROUP_CONCAT(Title)
FROM artists
JOIN albums USING(ArtistId)
WHERE artists.ArtistId = 1;
"""
cursor.execute(sql, parameters)
reply = cursor.fetchone()
if reply is not None:
entry = reply
except sqlite3.Error:
raise
finally:
cursor.close()
return entry
that you can use to connect to the database and select from it like so:
connection = connect(r"/path/to/file.sqlite3")
if connection is not None:
entry = select(connection)
connection.close()
It is not important if your query is inside a function or not, the important concept is that you are using python to do this query, and you can add some code to manipulate the values.
As you can see here group_concat accepts a separator that you can use to arbitrarily separate values.
Your new select function could be something like:
def select(connection: sqlite3.Connection) -> Tuple(str, Tuple(str, ...)):
entry = None
separator = r"|"
try:
sql = f"""
SELECT
Name,
GROUP_CONCAT(Title, {separator})
FROM artists
JOIN albums USING(ArtistId)
WHERE artists.ArtistId = 1;
"""
cursor.execute(sql, parameters)
reply = cursor.fetchone()
if reply is not None:
reply[1] = reply[1].split(separator)
entry = reply
except sqlite3.Error:
raise
finally:
cursor.close()
return entry
Without changing how you use this function, you would now have a tuple with all the titles.
Another idea you'd like to consider is to do a more specific select query, like:
select albums.Title
from albums
where albums.ArtistId = 1;
In this case, you can have a list of titles using: cursor.fetchall().
Of course the band name should be asked separately in this case.
how to make a Multidimensional Dictionary with multiple keys and value and how to print its keys and values?
from this format:
main_dictionary= { Mainkey: {keyA: value
keyB: value
keyC: value
}}
I tried to do it but it gives me an error in the manufacturer. here is my code
car_dict[manufacturer] [type]= [( sedan, hatchback, sports)]
Here is my error:
File "E:/Programming Study/testupdate.py", line 19, in campany
car_dict[manufacturer] [type]= [( sedan, hatchback, sports)]
KeyError: 'Nissan'
And my printing code is:
for manufacuted_by, type,sedan,hatchback, sports in cabuyao_dict[bgy]:
print("Manufacturer Name:", manufacuted_by)
print('-' * 120)
print("Car type:", type)
print("Sedan:", sedan)
print("Hatchback:", hatchback)
print("Sports:", sports)
Thank you! I'm new in Python.
I think you have a slight misunderstanding of how a dict works, and how to "call back" the values inside of it.
Let's make two examples for how to create your data-structure:
car_dict = {}
car_dict["Nissan"] = {"types": ["sedan", "hatchback", "sports"]}
print(car_dict) # Output: {'Nissan': {'types': ['sedan', 'hatchback', 'sports']}}
from collections import defaultdict
car_dict2 = defaultdict(dict)
car_dict2["Nissan"]["types"] = ["sedan", "hatchback", "sports"]
print(car_dict2) # Output: defaultdict(<class 'dict'>, {'Nissan': {'types': ['sedan', 'hatchback', 'sports']}})
In both examples above, I first create a dictionary, and then on the row after I add the values I want it to contain. In the first example, I give car_dict the key "Nissan" and set it's values to a new dictionary containing some values.
In the second example I use defaultdict(dict) which basically has the logic of "if i am not given a value for key then use the factory (dict) to create a value for it.
Can you see the difference of how to initiate the values inside of both of the different methods?
When you called car_dict[manufacturer][type] in your code, you hadn't yet initiated car_dict["Nissan"] = value, so when you tried to retrieve it, car_dict returned a KeyError.
As for printing out the values, you can do something like this:
for key in car_dict:
manufacturer = key
car_types = car_dict[key]["types"]
print(f"The manufacturer '{manufacturer}' has the following types:")
for t in car_types:
print(t)
Output:
The manufacturer 'Nissan' has the following types:
sedan
hatchback
sports
When you loop through a dict, you are looping through only the keys that are contained in it by default. That means that we have to retrieve the values of key inside of the loop itself to be able to print them correctly.
Also as a side note: You should try to avoid using Built-in's names such as type as variable names, because you then overwrite that functions namespace, and you can have some problems in the future when you have to do comparisons of types of variables.
I'm writing a script in python and am accessing an API. I can get some information I need but where I'm stuck is with nested queries. In my code below, first_name needs to equal what would essentially be result[customer->firstname] but I cannot figure out how to get that.
What is the proper syntax to get a nested query like that?
Orders-> customer -> firstname
for result in results['orders']:
order_status_info= self_api.which_api('order_statuses/%d' % result['order_status_id'])
for customer_blocked_reason in customer_blocked_reasons:
if customer_blocked_reason in order_status_info['name']:
customer_is_blocked = True
order_id = 0
order_date = result['ordered_at']
first_name = result [??????]
Is result a dict?
You can access nested dictionaries like so:
>>> d = {"level1": {"level2": "somedata"}}
>>> d["level1"]["level2"]
'somedata'
I'm trying to create the Mysql Insert query like this for inserting million of records:
INSERT INTO mytable (fee, fi) VALUES
('data1',96)
,('data2',33)
,('boot',17)
My values is stored as tuple in the list:
datatuplst = [("data1",96), ("data2", 33),("data3", 17)]
My code:
c3 = con.cursor()
c3.execute("INSERT INTO `bigdataloadin` (`block_key`, `id`) VALUES %s" %','.join(datatuplst))
This is not working and I'm getting error:
TypeError: sequence item 0: expected str instance, tuple found
Need help on how to create the dynamic query with values stored in tuples list.
You can generate the string that you need. The error is explanatory enough. When it is ','.join(datatuplst) the interpreter is forced to join tuples. So using list comprehension you can say this instead:
','.join([str(el) for el in datatuplst])
The output for this statement is going to be: "('data1', 96),('data2', 33),('data3', 17)"
Then your actual INSERT statement will be interpreted as follows:
"INSERT INTO `bigdataloadin` (`block_key`, `id`) VALUES ('data1', 96),
('data2', 33),('data3', 17)"
Good luck!
I'm trying to build a string that contains all attributes of a class-object. The object name is jsonData and it has a few attributes, some of them being
jsonData.Serial,
jsonData.InstrumentSerial,
jsonData.Country
I'd like to build a string that has those attribute names in the format of this:
'Serial InstrumentSerial Country'
End goal is to define a schema for a Spark dataframe.
I'm open to alternatives, as long as I know order of the string/object because I need to map the schema to appropriate values.
You'll have to be careful about filtering out unwanted attributes, but try this:
' '.join([x for x in dir(jsonData) if '__' not in x])
That filters out all the "magic methods" like __init__ or __new__.
To include those, do
' '.join(dir(jsonData))
These take advantage of Python's dir method, which returns a list of all attributes of an object.
I don't quite understand why you want to group the attribute names in a single string.
You could simply have a list of attribute names as the order of a python list is persist.
attribute_names = [x for x in dir(jsonData) if '__' not in x]
From there you can create your dataframe. If you don't need to specify the SparkTypes, you can just to:
df = SparkContext.createDataFrame(data, schema = attribute_names)
You could also create a StructType and specify the types in your schema.
I guess that you are going to have a list of jsonData records that you want to consider as Rows.
Let's considered it as a list of objects, but the logic would still be the same.
You can do that as followed:
my_object_list = [
jsonDataClass(Serial = 1, InstrumentSerial = 'TDD', Country = 'France'),
jsonDataClass(Serial = 2, InstrumentSerial = 'TDI', Country = 'Suisse'),
jsonDataClass(Serial = 3, InstrumentSerial = 'TDD', Country = 'Grece')]
def build_record(obj, attr_names):
from operator import attrgetter
return attrgetter(*attr_names)(obj)
So the data attribute referred previously would be constructed as:
data = [build_record(x, attribute_names) for x in my_object_list]