Google Site Search XML API Pagination - pagination

I am using the Google Site Search XML API and want to do pagination. I know that the count in is considered inaccurate, but how does Google implement their paging on the demo site at http://www.google.com/sitesearch/? It seems to at least be accurately knowing if there are more than 35 results to break into the 8 pages.

This is an old question but I've just implemented this myself so thought I should share.
Not sure what language you are using, but here's how I did it in PHP ($xml is, of course, the full XML result retrieved using curl or file_get_contents or whatever):
$results_per_page = 8;
$pages = ceil($xml->RES->M/$results_per_page);
if ($pages > 1) {
for ($i = 0; $i < $pages; $i++) {
$class = '';
if ( ($i) * $results_per_page == $_GET['s']) {
$class = 'current-page';
}
echo '<strong>' . $i + 1 . '</strong>
}
}
Note that $results_per_page should match the value of num in the XML url that's fetched

Related

Extracting all text from a website to build a concordance

How can I grab all the text in a website, and I don't just mean ctrl+a/c. I'd like to be able to extract all the text from a website (and all the pages associated) and use it to build a concordance of words from that site. Any ideas?
I was intrigued by this so I've written the first part of a solution to this.
The code is written in PHP because of the convenient strip_tags function. It's also rough and procedural but I feel in demonstrates my ideas.
<?php
$url = "http://www.stackoverflow.com";
//To use this you'll need to get a key for the Readabilty Parser API http://readability.com/developers/api/parser
$token = "";
//I make a HTTP GET request to the readabilty API and then decode the returned JSON
$parserResponse = json_decode(file_get_contents("http://www.readability.com/api/content/v1/parser?url=$url&token=$token"));
//I'm only interested in the content string in the json object
$content = $parserResponse->content;
//I strip the HTML tags for the article content
$wordsOnPage = strip_tags($content);
$wordCounter = array();
$wordSplit = explode(" ", $wordsOnPage);
//I then loop through each word in the article keeping count of how many times I've seen the word
foreach($wordSplit as $word)
{
incrementWordCounter($word);
}
//Then I sort the array so the most frequent words are at the end
asort($wordCounter);
//And dump the array
var_dump($wordCounter);
function incrementWordCounter($word)
{
global $wordCounter;
if(isset($wordCounter[$word]))
{
$wordCounter[$word] = $wordCounter[$word] + 1;
}
else
{
$wordCounter[$word] = 1;
}
}
?>
I needed to do this to configure PHP for the SSL the readability API uses.
The next step in the solution would be too search for links in the page and call this recursively in an intelligent way to hance the associated pages requirement.
Also the code above just gives the raw data of a word-count you would want to process it some more to make it meaningful.

Propel Nested Set in combination with Pagination

$root = AnimeCommentQuery::create()->findRoot(2);
$html = "<ul>{$root->getComment()}";
foreach ($root->getDescendants() as $post)
{
$html .= '<li style="padding-left: '.$post->getLevel().' em;">';
$html .= $post->getComment();
$html .= ' by '.$post->getIbfMembersRelatedByInsertBy()->getName();
$html .= "</li>";
}
$html .= "</ul>";
echo $html;
I want to paginate the posts but I am not able to do this by:
$root = AnimeCommentQuery::create()->findRoot(2)->paginate(2, 1);
OR
$root = AnimeCommentQuery::create()->paginate(2, 1)->findRoot(2);
Can it be done with standard Pagination from propel? And how?
Don't know if this is too late....
First off, you can't use the paginate and find in the same query, they're both termination methods.
I think what you need is something like this:
$comments = AnimeCommentQuery::create()->inTree(2)->orderByBranch()->paginate(2,1);
Then foreach your way through that Collection.
Now you'll have to be a bit clever with when to close and open lists, checking current Level etc. And top and bottom of page 2+ will take a bit of consideration too. Good Luck!
The Nested Set API is worth studying further http://www.propelorm.org/behaviors/nested-set.html#complete_api, got a fair bit in.
Also consider using a ->joinWith() to get your getIbfMembersRelatedByInsertBy() prepopulated in the main query.

Can I use pagination with the $wpdb class?

Let's say I want to use the $wpdb class to retrieve image locations from the database so I can then create a gallery of images. I have this code, but when I press 'next', the link doesn't seem to go anywhere. Am I missing something?
<?php
global $wpdb;
$wpdb->show_errors();
$offset = 0;
if( isset($_GET['page']) && !empty($_GET['page']) ){
$offset = ($_GET['page']-1) * 10; // (page 2 - 1)*10 = offset of 10
}
$pics = $wpdb->get_col("SELECT pic_thumb_url FROM wp3_bp_album
WHERE owner_type = 'user' ORDER BY title DESC
LIMIT 10 OFFSET $offset"
);
//LIMIT shows 10 results per page
//OFFSET will 'skip' this number off results. On page 1 the offset is 0 on page 2 it is 10 (if 10 results per page)
foreach($pics as $pic) :
echo '' . '';
endforeach;
/*
pagination
*/
?>
previous
next
Can I implement pagination with this?
You can't use the built-in pagination, but to use pagination use an OFFSET and a LIMIT in your query. So, make your own pagination:
<?php
$offset = 0;
if(isset($_GET['page']) && !empty($_GET['page']) {
$offset = ($_GET['page']-1) * 10; // (page 2 - 1)*10 = offset of 10
}
$wpdb->get_col("SELECT pic_thumb_url FROM wp3_bp_album
WHERE owner_type = 'user' ORDER BY title DESC
LIMIT 10 OFFSET $offset"
);
//LIMIT shows 10 results per page
//OFFSET will 'skip' this number off results. On page 1 the offset is 0 on page 2 it is 10 (if 10 results per page)
/*
pagination
*/
?>
previous
next
Not flawless, it doesn't check if you are on the first or last page but at least the first check you have to build yourself.

Zend_Search_Lucene - weird behaviour for Wildcard search and "numeric" string

Weird issue I get every time I search "11.11" or "22.22" etc... No problem if I search for "aa.aa" but when I put only integers into my string, I get the following exception:
Wildcard search is supported only for non-multiple word terms
My implementation of Zend search is as below (ZF 1.11):
Zend_Search_Lucene_Search_Query_Wildcard::setMinPrefixLength(0);
Zend_Search_Lucene_Search_QueryParser::setDefaultEncoding('utf-8');
Zend_Search_Lucene_Analysis_Analyzer::setDefault(
new Zend_Search_Lucene_Analysis_Analyzer_Common_Utf8Num_CaseInsensitive()
);
$index = Zend_Search_Lucene::open(APPLICATION_PATH.'/../var/search');
if(str_word_count($searchQuery) > 1){
$searchQuery = Zend_Search_Lucene_Search_QueryParser::escape($searchQuery);
$searchQueryArray = explode(' ', $searchQuery);
$query = new Zend_Search_Lucene_Search_Query_Phrase($searchQueryArray);
}else{
$searchQuery = Zend_Search_Lucene_Search_QueryParser::escape($searchQuery);
$query = Zend_Search_Lucene_Search_QueryParser::parse(
'title:*'.$searchQuery.'* OR
description:*'.$searchQuery.'* OR
content:*'.$searchQuery.'*'
);
}
$result = $index->find($query);
I can't really find any related issue on internet so please, let me know if you've ever been in front of the similar issue. Thank you.

Searching only in undeleted documents using zend lucene

I am not new with zend lucene but I have a trouble with searching using it.
I search in documents by numbers using below code:
$term = new Zend_Search_Lucene_Index_Term($id, $idFieldName);
$docIds = $index->termDocs($term);
foreach ($docIds as $id) {
$doc = $index->getDocument($id);
echo $doc->artist_name;
}
$index->commit();
and deleting a document by number using below code:
$term = new Zend_Search_Lucene_Index_Term($id, $idFieldName);
$docIds = $index->termDocs($term);
foreach ($docIds as $id) {
$doc = $index->getDocument($id);
$index->delete($doc->lyric_id);
}
$index->commit();
When I delete a document, $index->numDocs() display that the document is deleted because the returned value is not equals the returned value of $index->count(). but the problem is, after deleting the document, I can search in it yet and I can display the value of its fields.
I checked that after optimizing the indexes but the problem is live yet. I need to remove completely a document or search in the documents that are not deleted from indexes.
Loop through the search results and check if the document is deleted. If it is, remove it from the search results.
Zend_Search_Lucene::isDeleted($id) method may be used to check if a
document is deleted.
for ($count = 0; $count < $index->maxDoc(); $count++) {
if ($index->isDeleted($count)) {
echo "Document #$id is deleted.\n";
} }
via Building Indexes: Updating Documents

Resources