Link to similar thread
Let's say i have a query something like this :
'SELECT * FROM blabla WHERE id IN (SELECT id FROM bla WHERE name ='Mr.Anderson')'
Usually you write in sphinx.config main part of query, which gets data, and indexes it... And then you with php sphinx api , set different filters to get what you want....
As i already understood sphinx can't work with subqueries... Something like SphinxSE exists for that purpose , or at least it can handle subqueries , but
(**SphinxSE is just an ordinary client that talks to searchd over network. You need it when you do not have a native API for your language and want to work through MySQL. Or if you want to optimize certain workloads that pull big Sphinx result sets directly to MySQL and additionally process them there (and avoid overheads of pulling Sphinx set to application and then immediately pushing it to MySQL).)**
So my question is if i will start using sphinxSE will i still be able to use php sphinx api , with those subqueries? And i would really appreciate if you could give maybe tutorials or articles with examples about this whole story , how to do it..
And second question is , what are my other options , if i won't use SphinxSe....
For example i write 2 different sources and indexes , one for subquery , other for main query and then just through filter insert subquery results into query through php sphinx api .... Is this big overhead for something like this or not ?
Thx in advance...
Related
I want to use raw SQL queries in my application but I have some questions on how to structure my application.
Some background:
I am writing a JSON API with Express and Postgres.
I am not currently using an ORM. I have used Sequelize before, but I don't believe the queries are optimized so I am hesitant to use it.
I am using camelCase in my code but Postgres is case-insensitive, so for readability, I have used under_scores in my DB tables. I constantly have to do queries like:
SELECT first_name AS "firstName" from users;
When the queries get larger, it is almost impossible to read since there is no syntax highlighting of SQL in js string templates.
I feel there is too much repetition in my queries, but that is expected.
What I am thinking:
I was not able to find a Visual Studio Code extension that can highlight SQL inside js files and strings. If there was one, I might get by.
I might write all my queries in .sql files, so that I can have syntax highlighting and load them all into memory when my application starts to prevent too many IO operations, since it would be against the reasoning why I am using raw SQL in the first place.
Anyone had this issue before? How do you structure your application when using raw SQL with Postgres and Express?
Definitely keep all the sql scripts in the corresponding .sql files.
Stick to a meaningfull naming convention. Come up with one you feel comfortable with: in future it will allow to build helpful tools around your codebase doing a lot of boring stuff automatically and making you much happier.
In case you're getting complicated rapidly, generate at least some of the duplicate/commonly used sql. Consider having some simple placeholders in your files like {{ firstName }} which is translated into first_name AS "firstName". Such a translation should happen only once - when you lode source .sql. This is more sophisticated and highly depends upon your tasks kind. Sometimes such an approach is useless, sometimes - useful.
I am looking to write dynamic queries for an ArangoDB graph database and am wondering if there are best practices or standard approaches to doing it.
By 'dynamic queries' I mean that users would have the ability to build a query that is then executed on the dataset.
Methods that ArangoDB can support this could include:
Dynamically generate AQL queries by manually injecting bindvars
Write Foxx functions to deliver on supported queries, and have another Foxx function bind those together to build a response.
Write a workflow which extracts data into a temporary collection and then invokes Foxx functions to filter/sort the data to the desired outcome.
The queries would be very open ended, where someone would (for example):
Query all countries with population over 10,000,000
Sort countries by land in square kilometers
Pick the top 10 countries in land coverage
Select primary language spoken in each country
Count occurrences of each language.
That query alone is straight forward to execute, but if a user was able to [x] check or select from a range of supported query options, order them in their own defined way, and receive the output, it's a little more involved.
Are there some supported or recommended approaches to doing this?
My current approach would be to write blocks of AQL that delivered on each part, probably in a LET Q1 = (....), LET Q2 = (...) format, and then finally in the bottom of the query have a generic way of processing the queries to generate a response.
But I have a feeling that smart use of Foxx functions could help here as well, having Foxx-Query-Q1 and Foxx-Query-Q2 coded to support each query type, then an aggregation Foxx app that invoked the right queries in the right order to build the right response.
If anyone has seen best ways of doing this, it would be great to get some hints/advice.
Thanks!
i have a big query (in my query builder) and a lot of left joins. So i get Articles with their comments and tags and so on.
Let's say i have the following dql:
$dql = 'SELECT blogpost, comment, tags
FROM BlogPost blogpost
LEFT JOIN blogpost.comments comments
LEFT JOIN blogpost.tags tags';
Now let's say my database has more than 100 blogposts but i only want the first 10, but with all the comments of those 10 and all their tags, if they exist.
If i use setMaxResults it limits the Rows. So i might get the first two Posts, but the last one of those is missing some of it's comments or tags. So the followin doesn't work.
$result = $em->createQuery($dql)->setMaxResults(15)->getResult();
Using the barely documented Pagination Solution that ships with doctrine2.2 doesn't really work for me either since it is so slow, i could as well load all the data.
I tried the Solutions in the Stackoverflow Article, but even that Article is still missing a Best Practise and the presented Solution is deadly slow.
Isn't there a best practise on how to do this?
Is nobody using Doctrine2.2 in Production mode?
Getting the proper results with a query like this is problematic. There is a tutorial on the Doctrine website explaining this problem.
Pagination
The tutorial is more about pagination rather than getting the top 5 results, but the overall idea is that you need to do a "SELECT DISTINCT a.id FROM articles a ... LIMIT 5" instead of a normal SELECT. It's a little more complicated than this, but the last 2 points in that tutorial should put you on the right track.
Update:
The problem here is not Doctrine, or any other ORM. The problem lies squarely on the database being able to return the results you're asking for. This is just how joins work.
If you do an EXPLAIN on the query, it will give you a more in depth answer of what is happening. It would be a good idea to add the results of that to your initial question.
Building on what is discussed in the Pagination article, it would appear that you need at least 2 queries to get your desired results. Adding DISTINCT to a query has the potential to dramatically slow down your query, but its only really needed if you have joins in it. You could write another query that just retrieves the first 10 posts ordered by created date, without the joins. Once you have the IDs of those 10 posts, do another query with your joins, and a WHERE blogpost.id IN (...) ORDER BY blogpost.created. This method should be much more efficient.
SELECT
bp
FROM
Blogpost bp
ORDER BY
bp.created DESC
LIMIT 10
Since all you care about in the first query are the IDs, you could set Doctrine to use Scalar Hydration.
SELECT
bg
FROM
Blogpost bp
LEFT JOIN
bp.comments c
LEFT JOIN
bp.tags t
WHERE
bp.id IN (...)
ORDER BY
bp.created DESC
You could also probably do it in one query using a correlated subquery. The myth that subqueries are always bad is NOT true. Sometimes they are faster than joins. You will need to experiment to find out what the best solution is for you.
Edit in light of the clarified question:
You can do what you want in native MySQL using a subquery in the FROM clause as such:
SELECT * FROM
(SELECT * FROM articles ORDER BY date LIMIT 5) AS limited_articles,
comments,
tags
WHERE
limited_articles.article_id=comments.article_id
limited_articles.article_id=tags.article_id
As far as I know, DQL does not support subqueries like this, so you can use the NativeQuery class.
ok, I'm totally new to SOLR and Lucene, but have got Solr running out-of-the-box under Tomcat 6.x and have just gone over some of the basic Wiki entries.
I have a few questions, and require some suggestions too.
Solr can index data in files (XML, CSV) and it can also index DBs. Can you also just point it to a URI/domain, and have it index a website in the way google would?
If I have a website with "Pages" data, so "Page Name", "Page Content" etc, and "Products Data", so "Product Name", "SKU" etc, do I need two different Schema.xml files? and if so, does that mean two different instances of Solr?
Finally, if you have a project with a large relational and normalized database, what would you say is the best approach from the 3 options below?:
Have a middleware service running in the background, which mines the DB and manually creates the relevant XML files to then send to SOLR
Have SOLR index the DB directly. In this case, would it be best to just point SOLR to views, which would abstract all the table relationships?
Any other options I'm unaware of?
Context: We're running in a Windows 2003 environment, .NET 3.5, SQLServer 2005/2008
cheers!
No, you need a crawler for that, e.g. Nutch
Yes, you want two separate indexes ( = two schema.xml) since the datasets don't seem to be related. This doesn't mean two instances of Solr, you can manage the two indexes with Cores.
As for populating the Solr index, it depends on your particular project, for example, can it tolerate stale data or does it have to absolutely fresh.
Other options to index data include:
Database triggers
If you're using some sort of ORM use its interception capabilities. For example you can use NHibernate events to update the index on update, insert or delete. If you use NHibernate and SolrNet this is taken care of automatically
I think Mauricio is dead on for his advice. The only point I would make is that when deciding to have a "middleware" indexer, or use the database directly. If your database (or the views?) map very closely to what a good Solr schema wants, then DIH is great. But, if you are indexing from multiple sources of data, or if you have to munge about the data in your database to meet what Solr would like, then having a dedicated middleware indexer is better.
I'm new to CouchDB and I know my mindset is probably still too much in the relational DB sphere, but here goes:
It appears that querying on Couch is all done via Views. I read that temporary views are very inefficient and should be avoided in production.
So my question really is how would one do effective querying with parameters (as the views do not accept them). For example if I were to use Couch to power a blog site would I have to create a new view for each post equivalent to 'select post from posts where id=1'.
I understand that I can use lucene along side the querying to perfom a full text search on the results, but this is only really useful for textual content not numbers.
Im happy creating a boat load of static views as they can be created very simply on the fly. My worry is that this is not how Couch was supposed to be used and I'm missing something. Feel free to enlighten me.
Cheers, Chris.
Views do accept url parameters, key being the one your are looking for. You can even limit how many rows you get and sort as well.
Your views can be indexed by arbitrary JSON keys. This means you can create a view that emits documents like so, [username docid] => doc. Then you can query this view with http://url/to/view?key=[username docid].
You could create a view that emits [username type date] => doc. Now you can get all documents of a certain between a certain date (using startKey and endKey url parameters).
Your example of the blog is one that CouchDB is particularly well suited for. In fact I believe it's an example in the upcoming CouchDB book from O'reilly.
That said, some kinds of queries are not easily handled by CouchDB alone. couchdb-lucene can help here. Don't assume that's it's only good for full text search. I've been using it to run general complex queries against the database to good effect.