cs50 pset7 13.sql - remove kevin from results - cs50

UPDATE: I was able to easily and simply solve this problem by changing 2 things:
Add to the end of the code:
AND people.name != "Kevin Bacon"
Change nested query to movies.id instead of title.
Thanks to this thread for the answer: CS50 Pset 7 13.sql, I can't solve it, nested sqlite3 database
I'm at the very last step of PSET7 movies (https://cs50.harvard.edu/x/2020/psets/7/movies/)
The goal output should be all people that have starred in a movie with Kevin, but Kevin himself should NOT be included.
I have a query that returns a list of all the names of people who starred in movies with Kevin Bacon, but Kevin is there.
How can I remove him from the results easily?
I've looked into NOT IN, but I can't figure out how to get that to work with nested queries the way I have set this up. So any advice is greatly appreciated.
Here's my code:
SELECT DISTINCT people.name
FROM people
INNER JOIN stars ON stars.person_id = people.id
INNER JOIN movies ON movies.id = stars.movie_id
WHERE movies.title IN (
SELECT DISTINCT movies.title
FROM movies
INNER JOIN stars ON stars.movie_id = movies.id
INNER JOIN people ON people.id = stars.person_id
WHERE people.name = "Kevin Bacon" AND people.birth = "1958");

Try
SELECT name
FROM people
WHERE id IN (SELECT DISTINCT person_id
FROM stars
WHERE movie_id IN (SELECT movie_id
FROM stars
INNER JOIN people
ON stars.person_id = people.id
WHERE people.name = "kevin bacon"
AND people.birth = 1958))
AND NOT ( name = "kevin bacon"
AND birth = 1958 );
Explanation-
First SELECT all movie_ids where "kevin bacon" (the one who was born in 1958) has starred
Now SELECT all person_ids who've starred in the aforementioned movies
Now SELECT the names of all these people
Finally, exclude only "kevin bacon" (the one who was born in 1958) from this result

Related

django filter with related name [duplicate]

What is the difference between filter with multiple arguments and chain filter in django?
As you can see in the generated SQL statements the difference is not the "OR" as some may suspect. It is how the WHERE and JOIN is placed.
Example1 (same joined table): from https://docs.djangoproject.com/en/dev/topics/db/queries/#spanning-multi-valued-relationships
Blog.objects.filter(
entry__headline__contains='Lennon',
entry__pub_date__year=2008)
This will give you all the Blogs that have one entry with both (entry__headline__contains='Lennon') AND (entry__pub_date__year=2008), which is what you would expect from this query.
Result:
Blog with {entry.headline: 'Life of Lennon', entry.pub_date: '2008'}
Example 2 (chained)
Blog.objects.filter(
entry__headline__contains='Lennon'
).filter(
entry__pub_date__year=2008)
This will cover all the results from Example 1, but it will generate slightly more result. Because it first filters all the blogs with (entry__headline__contains='Lennon') and then from the result filters (entry__pub_date__year=2008).
The difference is that it will also give you results like:
A single Blog with multiple entries
{entry.headline: '**Lennon**', entry.pub_date: 2000},
{entry.headline: 'Bill', entry.pub_date: **2008**}
When the first filter was evaluated the book is included because of the first entry (even though it has other entries that don't match). When the second filter is evaluated the book is included because of the second entry.
One table: But if the query doesn't involve joined tables like the example from Yuji and DTing. The result is same.
The case in which results of "multiple arguments filter-query" is different than "chained-filter-query", following:
Selecting referenced objects on the basis of referencing objects and relationship is one-to-many (or many-to-many).
Multiple filters:
Referenced.filter(referencing1_a=x, referencing1_b=y)
# same referencing model ^^ ^^
Chained filters:
Referenced.filter(referencing1_a=x).filter(referencing1_b=y)
Both queries can output different result:
If more then one
rows in referencing-modelReferencing1can refer to same row in
referenced-modelReferenced. This can be the case in Referenced:
Referencing1 have either 1:N (one to many) or N:M (many to many)
relation-ship.
Example:
Consider my application my_company has two models Employee and Dependent. An employee in my_company can have more than dependents(in other-words a dependent can be son/daughter of a single employee, while a employee can have more than one son/daughter).
Ehh, assuming like husband-wife both can't work in a my_company. I took 1:m example
So, Employee is referenced-model that can be referenced by more then Dependent that is referencing-model. Now consider relation-state as follows:
Employee: Dependent:
+------+ +------+--------+-------------+--------------+
| name | | name | E-name | school_mark | college_mark |
+------+ +------+--------+-------------+--------------+
| A | | a1 | A | 79 | 81 |
| B | | b1 | B | 80 | 60 |
+------+ | b2 | B | 68 | 86 |
+------+--------+-------------+--------------+
Dependenta1refers to employeeA, and dependentb1, b2references to employeeB.
Now my query is:
Find all employees those having son/daughter has distinction marks (say >= 75%) in both college and school?
>>> Employee.objects.filter(dependent__school_mark__gte=75,
... dependent__college_mark__gte=75)
[<Employee: A>]
Output is 'A' dependent 'a1' has distinction marks in both college and school is dependent on employee 'A'. Note 'B' is not selected because nether of 'B''s child has distinction marks in both college and school. Relational algebra:
Employee ⋈(school_mark >=75 AND college_mark>=75)Dependent
In Second, case I need a query:
Find all employees whose some of dependents has distinction marks in college and school?
>>> Employee.objects.filter(
... dependent__school_mark__gte=75
... ).filter(
... dependent__college_mark__gte=75)
[<Employee: A>, <Employee: B>]
This time 'B' also selected because 'B' has two children (more than one!), one has distinction mark in school 'b1' and other is has distinction mark in college 'b2'.
Order of filter doesn't matter we can also write above query as:
>>> Employee.objects.filter(
... dependent__college_mark__gte=75
... ).filter(
... dependent__school_mark__gte=75)
[<Employee: A>, <Employee: B>]
result is same! Relational algebra can be:
(Employee ⋈(school_mark >=75)Dependent) ⋈(college_mark>=75)Dependent
Note following:
dq1 = Dependent.objects.filter(college_mark__gte=75, school_mark__gte=75)
dq2 = Dependent.objects.filter(college_mark__gte=75).filter(school_mark__gte=75)
Outputs same result: [<Dependent: a1>]
I check target SQL query generated by Django using print qd1.query and print qd2.query both are same(Django 1.6).
But semantically both are different to me. first looks like simple section σ[school_mark >= 75 AND college_mark >= 75](Dependent) and second like slow nested query: σ[school_mark >= 75](σ[college_mark >= 75](Dependent)).
If one need Code #codepad
btw, it is given in documentation #Spanning multi-valued relationships I have just added an example, I think it will be helpful for someone new.
Most of the time, there is only one possible set of results for a query.
The use for chaining filters comes when you are dealing with m2m:
Consider this:
# will return all Model with m2m field 1
Model.objects.filter(m2m_field=1)
# will return Model with both 1 AND 2
Model.objects.filter(m2m_field=1).filter(m2m_field=2)
# this will NOT work
Model.objects.filter(Q(m2m_field=1) & Q(m2m_field=2))
Other examples are welcome.
This answer is based on Django 3.1.
Environment
Models
class Blog(models.Model):
blog_id = models.CharField()
class Post(models.Model):
blog_id = models.ForeignKeyField(Blog)
title = models.CharField()
pub_year = models.CharField() # Don't use CharField for date in production =]
Database tables
Filters call
Blog.objects.filter(post__title="Title A", post__pub_year="2020")
# Result: <QuerySet [<Blog: 1>]>
Blog.objects.filter(post__title="Title A").filter(post_pub_date="2020")
# Result: <QuerySet [<Blog: 1>, [<Blog: 2>]>
Explanation
Before I start anything further, I have to notice that this answer is based on the situation that uses "ManyToManyField" or a reverse "ForeignKey" to filter objects.
If you are using the same table or an "OneToOneField" to filter objects, then there will be no difference between using a "Multiple Arguments Filter" or "Filter-chain". They both will work like a "AND" condition filter.
The straightforward way to understand how to use "Multiple Arguments Filter" and "Filter-chain" is to remember in a "ManyToManyField" or a reverse "ForeignKey" filter, "Multiple Arguments Filter" is an "AND" condition and "Filter-chain" is an "OR" condition.
The reason that makes "Multiple Arguments Filter" and "Filter-chain" so different is that they fetch results from different join tables and use different conditions in the query statement.
"Multiple Arguments Filter" use "Post"."Public_Year" = '2020' to identify the public year
SELECT *
FROM "Book"
INNER JOIN ("Post" ON "Book"."id" = "Post"."book_id")
WHERE "Post"."Title" = 'Title A'
AND "Post"."Public_Year" = '2020'
"Filter-chain" database query use "T1"."Public_Year" = '2020' to identify the public year
SELECT *
FROM "Book"
INNER JOIN "Post" ON ("Book"."id" = "Post"."book_id")
INNER JOIN "Post" T1 ON ("Book"."id" = "T1"."book_id")
WHERE "Post"."Title" = 'Title A'
AND "T1"."Public_Year" = '2020'
But why do different conditions impact the result?
I believe most of us who come to this page, including me =], have the same assumption while using "Multiple Arguments Filter" and "Filter-chain" at first.
Which we believe the result should be fetched from a table like following one which is correct for "Multiple Arguments Filter". So if you are using the "Multiple Arguments Filter", you will get a result as your expectation.
But while dealing with the "Filter-chain", Django creates a different query statement which changes the above table to the following one. Also, the "Public Year" is identified under the "T1" section instead of the "Post" section because of the query statement change.
But where does this weird "Filter-chain" join table diagram come from?
I'm not a database expert. The explanation below is what I understand so far after I created the exact structure of the database and made a test with the same query statement.
The following diagram will show where this weird "Filter-chain" join table diagram comes from.
The database will first create a join table by matching the row of the "Blog" and "Post" tables one by one.
After that, the database now does the same matching process again but uses the step 1 result table to match the "T1" table which is just the same "Post" table.
And this is where this weird "Filter-chain" join table diagram comes from.
Conclusion
So two things make "Multiple Arguments Filter" and "Filter-chain" different.
Django create different query statements for "Multiple Arguments Filter" and "Filter-chain" which make "Multiple Arguments Filter" and "Filter-chain" result come from other tables.
"Filter-chain" query statement identifies a condition from a different place than "Multiple Arguments Filter".
The dirty way to remember how to use it is "Multiple Arguments Filter" is an "AND" condition and "Filter-chain" is an "OR" condition while in a "ManyToManyField" or a reverse "ForeignKey" filter.
The performance difference is huge. Try it and see.
Model.objects.filter(condition_a).filter(condition_b).filter(condition_c)
is surprisingly slow compared to
Model.objects.filter(condition_a, condition_b, condition_c)
As mentioned in Effective Django ORM,
QuerySets maintain state in memory
Chaining triggers cloning, duplicating that state
Unfortunately, QuerySets maintain a lot of state
If possible, don’t chain more than one filter
You can use the connection module to see the raw sql queries to compare. As explained by Yuji's, for the most part they are equivalent as shown here:
>>> from django.db import connection
>>> samples1 = Unit.objects.filter(color="orange", volume=None)
>>> samples2 = Unit.objects.filter(color="orange").filter(volume=None)
>>> list(samples1)
[]
>>> list(samples2)
[]
>>> for q in connection.queries:
... print q['sql']
...
SELECT `samples_unit`.`id`, `samples_unit`.`color`, `samples_unit`.`volume` FROM `samples_unit` WHERE (`samples_unit`.`color` = orange AND `samples_unit`.`volume` IS NULL)
SELECT `samples_unit`.`id`, `samples_unit`.`color`, `samples_unit`.`volume` FROM `samples_unit` WHERE (`samples_unit`.`color` = orange AND `samples_unit`.`volume` IS NULL)
>>>
If you end up on this page looking for how to dynamically build up a django queryset with multiple chaining filters, but you need the filters to be of the AND type instead of OR, consider using Q objects.
An example:
# First filter by type.
filters = None
if param in CARS:
objects = app.models.Car.objects
filters = Q(tire=param)
elif param in PLANES:
objects = app.models.Plane.objects
filters = Q(wing=param)
# Now filter by location.
if location == 'France':
filters = filters & Q(quay=location)
elif location == 'England':
filters = filters & Q(harbor=location)
# Finally, generate the actual queryset
queryset = objects.filter(filters)
If requires a and b then
and_query_set = Model.objects.filter(a=a, b=b)
if requires a as well as b then
chaied_query_set = Model.objects.filter(a=a).filter(b=b)
Official Documents:
https://docs.djangoproject.com/en/dev/topics/db/queries/#spanning-multi-valued-relationships
Related Post: Chaining multiple filter() in Django, is this a bug?
There is a difference when you have request to your related object,
for example
class Book(models.Model):
author = models.ForeignKey(Author)
name = models.ForeignKey(Region)
class Author(models.Model):
name = models.ForeignKey(Region)
request
Author.objects.filter(book_name='name1',book_name='name2')
returns empty set
and request
Author.objects.filter(book_name='name1').filter(book_name='name2')
returns authors that have books with both 'name1' and 'name2'
for details look at
https://docs.djangoproject.com/en/dev/topics/db/queries/#s-spanning-multi-valued-relationships

Orchard: In what table is the Blog post stored

I'm attempting to export data from an older Orchard db and am having problems finding which table the content of a blog post is stored. I've tried using a number of different 'Search all columns' spocs to search all tables and columns but am not finding text from the post itself.
If I have a blog post where the opening sentence is:
This sentence contains a unique word.
I would have expected at least one of the various 'Search all columns' examples to have turned up a table/column. But so far, none have.
thx
Orchard store data based on two tables, ContentItemRecord and ContentItemVersionRecord, which store meta data for content items like BlogPost, and these content items built from multiple parts, each part has it's table and the relation between the item and it's parts is based on Id (if not draftable) or ContentItemRecord_Id (if draftable) columns
if we take BlogPost type as example, which built from TitlePart, BodyPart, AutoroutePart and CommonPart, and you want to select all the data of post (id = 90), then you can find it's title in TitlePartRecord table (ContentItemRecord_Id = 90), and the body text of it in BodyPartRecord table with same relation as title part record, and the route part in AutorouteRecord table with same relation, and the common meta data in CommonPartRecord (Id = 90).
This is the way to extract data from Orchard database, hope this will help you.
Tnx to #mdameer...
and the related query of madmeer's answer is this:
SELECT * FROM dbo.default_Title_TitlePartRecord
inner join dbo.default_Orchard_Framework_ContentItemRecord on
dbo.default_Title_TitlePartRecord.ContentItemRecord_id=dbo.default_Orchard_Framework_ContentItemRecord.Id
inner join dbo.default_Common_BodyPartRecord on
dbo.default_Common_BodyPartRecord.ContentItemRecord_id=dbo.default_Orchard_Framework_ContentItemRecord.Id
where dbo.default_Title_TitlePartRecord.ContentItemRecord_id=90
and this is the rightsolution
Just in case it may be useful for others, the following is the actual SQL query used to migrate an Orchard instance to Umbraco. It is derived from the excellent answers by mdameerand and Iman Salehi:
SELECT t.Title, f.Data, b.Text FROM dbo.Title_TitlePartRecord t
inner join dbo.Orchard_Framework_ContentItemRecord f on
t.ContentItemRecord_id=f.Id
inner join dbo.Common_BodyPartRecord b on
b.ContentItemRecord_id=f.Id
AND b.Id = (
SELECT MAX(m2.Id)
FROM dbo.Common_BodyPartRecord m2
WHERE m2.ContentItemRecord_id = f.Id
)
AND t.Id = (
SELECT MAX(m2.Id)
FROM dbo.Title_TitlePartRecord m2
WHERE m2.ContentItemRecord_id = f.Id
)

Explicitly specify "FROM" table

I currently have the following query:
query = self.session.query(Student, School).join(
Person.student, aliased=True).join(
Student.school, aliased=True).filter(
Person.id == 1)
Which compiles to this SQL.
SELECT student.id AS student_id, student.school_id AS student_school_id, student.person_id AS student_person_id, school.id AS school_id, school.name AS school_name
FROM student, school, person JOIN student AS student_1 ON person.id = student_1.person_id JOIN school AS school_1 ON school_1.id = student_1.school_id
WHERE person.id = :id_1
I want the query to remain exactly as it is but I want the from statement to be exclusively from the Person model. So something like
SELECT * FROM person JOIN ... WHERE person.id = :id_1
I think the aliased kwarg is messing up the from condition. Removing the aliased kwarg fixes the behavior but I need the aliased kwarg for special use cases. How can I remove the student and school tables from the "FROM" statement.
The aliased argument to .join uses anonymous aliasing, meaning the Student and School you pass to session.query are different 'instances' of the table.
from sqlalchemy.orm import aliased
aliased_student = aliased(Student)
aliased_school = aliased(School)
query = (
session.query(aliased_student, aliased_school)
.select_from(Person)
.join(aliased_student, Person.student)
.join(aliased_school, Student.school)
.filter(Person.id == 1))
Here you can see that you can tell .join which alias to use when joining to a relationship.

Copy one field to another: "Subquery returns more than 1 row"

here we have the query:
UPDATE ps_product_lang_copy
SET description =
( SELECT description
FROM ( SELECT description
FROM `ps_product_lang_copy`
WHERE id_lang = 5
) t
)
WHERE id_lang IN (1,2,3,4);
I'm trying to do this thing:
i've a field on a table, called 'description'
every row in this table has an 'id_lang'
i wish to copy, for every row and not only one, the field 'description' from id_lang = 5 to 'description' of others rows with id_lang IN (1,2,3,4) but the system, with that query told me: Subquery returns more than 1 row
i read a lot here on stack overflow but really can't fix my code :(
EDIT:
SOLVED by myself, i'll post here to help each others
UPDATE ps_product_lang
INNER JOIN ps_product_lang_copy
ON ps_product_lang.id_product = ps_product_lang_copy.id_product
SET ps_product_lang.description = ps_product_lang_copy.description
WHERE ps_product_lang_copy.id_lang = 5
ciuss
SOLVED
UPDATE ps_product_lang
INNER JOIN ps_product_lang_copy
ON ps_product_lang.id_product = ps_product_lang_copy.id_product
SET ps_product_lang.description = ps_product_lang_copy.description
WHERE ps_product_lang_copy.id_lang = 5

Query Distinct Taxonomies of Custom Post Type

I have a custom post type called "coupons". These coupons can then be drilled down by "city". City is a custom field that is defined for each individual "coupon". I am then trying to modify a sidebar widget so it will only show the distinct categories related to the specific queried cities result set, as opposed to all coupon categories. I would then need to display a count that would show how many of the returned coupons belong to the specific category.
So far the query returns all coupon categories (and the coupon count), not just the ones that belong to the specific coupon category.
SELECT t.*, tt.*, tr.object_id FROM wp_terms AS t INNER JOIN wp_term_taxonomy AS tt ON tt.term_id = t.term_id INNER JOIN wp_term_relationships AS tr ON tr.term_taxonomy_id = tt.term_taxonomy_id WHERE tt.taxonomy IN ('coupon_category', 'coupon_tag', 'stores', 'coupon_type') AND tr.object_id IN (430, 618) ORDER BY t.name ASC
Any help would be greatly appreciated.

Resources