Get number of rows from SELECT query _before_ any fetch - cx-oracle

With psycopg2, you have psycopg2.cursor.rowcount that gives you the number of rows that results from your query, before any fetch on the cursor.
It allows you to understand quickly if the cursor returned any result.
On cx_Oracle, cursor.rowcount has a different meaning, it gives you the number of rows that have been fetched.
What the Oracle equivalent of psycopg2's rowcount ?
Or have you a method to quickly check if your query returned any datarow ? (Google didn't help me on this point !)

Related

Will this SELECT QLDB query using index be optimized or not

I have one table with some indexes and some different columns. I am running this query to get the result of a user's data, not all the user data.
The user field is defined by UID
The time field is defined by TimeValue (i am entering time in numbers so I can compare them)
Data that I want is defined by Cause column
This is my query. Will this query do a table scan or will it only search for those rows where UID is the given one.
UID, TimeValue and Cause are index as well
SELECT Cause, Cause_Amount, UID FROM Contribution WHERE UID = 'u5JvslEo9DbQ7hcq4vkM74dWlxr2' AND TimeValue > 1620414948000 AND ( Cause = 'cleanAir' OR Cause = 'safeWater')
One way to check to see what QLDB is doing is to look at the ReadIOs value that comes back in your query stats. If you are seeing that your ReadIOs matches the total number of documents in the table, it is likely doing a scan, if it's lower, then you are using the index.

Using ADODB SQL in VBA, why are Strings truncated [to 255] only when I use grouping?

I’m using ADODB to query on Sheet1. If I fetch the data using SQL query on the sheet as below without grouping I’m getting all characters from comment.
However, if I use grouping my characters are truncated to 255.
Note – My first row contains 800 len of characters so drivers have identified the datatype correctly.
Here is my query output without grouping
Select Product, Value, Comment, len(comment) from [sheet1$A1:T10000]
With grouping
Select Product, sum(value), Comment, len(comment) from [sheet1$A1:T10000] group by Product, Comment
Thanks for posting this! During my 20+ years of database development using ADO recordsets I had never faced this issue until this week. Once I traced the truncation to the recordset I was really scratching my head. Couldn't figure how/why it was happening until I found your post and you got me focused on the GROUP BY. Sure enough, that was the cause (some kind of ADO bug I guess). I was able to work around it by putting correlated scalar sub-queries in the SELECT list, vice using JOIN and GROUP BY.
To elaborate...
At least 9 times out of 10 (in my experience) JOIN/GROUP BY syntax can be replaced with correlated scalar subquery syntax, with no appreciable loss of performance. That's fortunate in this case since there is apparently a bug with ADO recordset objects whereby GROUP BY syntax results in the truncation of text when the string length is greater than 255 characters.
The first example below uses JOIN/GROUP BY. The second uses a correlated scalar subquery. Both would/should provide the same results. However, if any comment is greater than 255 characters these 2 queries will NOT return the same results if an ADODB recordset is involved.
Note that in the second example the last column in the SELECT list is itself a full select statement. It's called a scalar subquery because it will only return 1 row / 1 column. If it returned multiple rows or columns an error would be thrown. It's also known as a correlated subquery because it references something that is immediately outside its scope (e.emp_number in this case).
SELECT e.emp_number, e.emp_name, e.supv_comments, SUM(i.invoice_amt) As total_sales
FROM employees e INNER JOIN invoices i ON e.emp_number = i.emp_number
GROUP BY e.emp_number, e.emp_name, e.supv_comment
SELECT e.emp_number, e.emp_name, e.supv_comments,
(SELECT SUM(i.invoice_amt) FROM invoices i WHERE i.emp_number = e.emp_number) As total_sales
FROM employees e

Extracting the first 13,000 results of a search query with Custom Search Engine JSON API

I am developing an application (Python 3.x) in which I need to collect the first 13,000 results of a CSE query using one search keyword (from result index 1 to 13,000). For a free version of CSE JSON API (I have tried it), I can only get the first 10 results per query or 100 results per day (by repeating the same query while incrementing the index) otherwise it gives an error (HttpError 400.....returned Invalid Value) when the result index exceeds 100. Is there any option (paid/free) that I can deploy to achieve the objective?
Custom Search JSON API is limited to a max depth of 100 results per query, so you'll need to find a different API or devise some solution to modify the query to divide up the result set into smaller parts

Select all from column family not returning all

I'm having inconsistent results when I attempt to select * from a column family. For fun, I ran the same query in a loop and then counted the result set returned. No matter what I do, it seems to vary (sometimes as much as +/- 150 rows for every ~4,000).
rSet = session.execute(SimpleStatement('SELECT * FROM colFam', consistency_level=ConsistencyLevel.QUORUM))
Using this query results in different row counts returned. The table in question isn't going to hold millions of rows - being able to accurately select * from it is important. Running the query directly in CQLShell yields similar weirdness.
Is there something inherent in CQL/Cassandra that I haven't yet learned about that prevents Cassandra from being able to return an accurate representation of *? Or is my Google fu just failing me?

How to iterate over a SOLR shard which has over 100 million documents?

I would like to iterate over all these documents without having to load the entire result in memory which seems to be the case apparently - QueryResponse.getResults() returns SolrDocumentList which is an ArrayList.
Can't find anything in the documentation. Am using SOLR 4.
Note on the background of problem: I need to do this when adding a new SOLR shard to the existing shard cluster. In that case, I would like to move some documents from the existing shards to the newly added shard(s) based on consistent hashing. Our data grows constantly and we need to keep introducing new shards.
You can set the 'rows' and 'start' query params to paginate a result set. Query first with start = 0, then start = rows, start = 2*rows, etc. until you reach the end of the complete result set.
http://wiki.apache.org/solr/CommonQueryParameters#start
I have a possible solution I'm testing:
Solr paging 100 Million Document result set
pasted:
I am trying to do deep paging of very large result sets (e.g., over 100 million documents) using a separate indexed field (integer) into which I insert a random variable (between 0 and some known MAXINT). When querying large result sets, I do the initial field query with no rows returned and then based on the count, I divide the range 0 to MAXINT in order to get on average PAGE_COUNT results by doing the query again across a sub-range of the random variable and grabbing all the rows in that range. Obviously the actual number of rows will vary but it should follow a predictable distribution.

Resources