I am trying to build a scraper for educational purposes only, I am using Phasher class to generate Hexadecimal hashes and store in the database then search the stored images for similar images, I wrote something a few days ago trying to show the results for the search for similar pictures but I can't figure out why it's showings only the results on first page only and other pages doesn't show them whe I press on number 1 page it doesn't show anything but the results are correct and the number of generated are correct I am new in PHP and I am trying to learn it by doing any help will appreciate it thanks in advance.
This is index.php
<?php include('header.php'); ?>
<nav class="navbar navbar-default">
<div class="container-fluid">
<div class="navbar-header">
<a class="navbar-brand" href="index.php">
<img alt="FBpp logo" src="images/logo.png">
<div class="container"><!--container-->
<h3>Search Facebook Profiles Pictures For Similar Pictures.</h3>
<p>Please upload a picture, Allowed extensions are (jpg, jpeg, pjpeg, png, x-png) and maximum size is 5 Mb...</p>
//Require config.php file to connect with mysql server and the db.
//Check if the database is empty or if there are hashed pictures then show the number of hashed pictures.
$check = mysqli_query($con, "SELECT id FROM images ORDER BY id DESC LIMIT 1;");
if(mysqli_num_rows($check) > 0){
$max_id = mysqli_fetch_row($check);
$id = $max_id[0];
echo 'We scraped '; echo '<span class="bg-info">'.$id.'</span>'; echo ' pictures...';
echo 'The database is empty you need to run scraper.php';
<br /><br />
<form action="search.php" method="post" class="form-inline reset-margin" enctype="multipart/form-data">
<div class="form-group">
<input type="file" name="image" class="file-input">
<button type="submit" name="submit" class="btn btn-primary"><span class="glyphicon glyphicon-search" aria-hidden="true"></span></button>
<br />
<?php include('footer.php'); ?>
This is search.php
//Require config.php file to connect with mysql server and the db.
$I = PHasher::Instance();
$limit = ( isset( $_GET['limit'] ) ) ? $_GET['limit'] : 10;
$page = ( isset( $_GET['page'] ) ) ? $_GET['page'] : 1;
$links = ( isset( $_GET['links'] ) ) ? $_GET['links'] : 7;
$allowedExts = array('jpg', 'jpeg', 'pjpeg', 'png', 'x-png');
$temp = explode(".", $_FILES["image"]["name"]);
$extension = end($temp);
//Check if the extenstion of the uploaded picture is correct and the max size is 5*1024*1024 Megabits.
if((($_FILES["image"]["type"] == "image/jpg")
|| ($_FILES["image"]["type"] == "image/jpeg")
|| ($_FILES["image"]["type"] == "image/pjpeg")
|| ($_FILES["image"]["type"] == "image/png")
|| ($_FILES["image"]["type"] == "image/x-png"))
&& ($_FILES["image"]["size"] <= 5242880)
&& in_array($extension, $allowedExts)){
//Check if there is an error in the file, If not upload it to tmp folder then check db for similar pictures.
if($_FILES["image"]["error"] > 0){
echo "Return Code: " .$_FILES["image"]["error"]."<br />";
} else {
move_uploaded_file($_FILES["image"]["tmp_name"], dirname(__file__)."/tmp/".$_FILES["image"]["name"]);
$uploadedImage = dirname(__file__)."/tmp/".$_FILES["image"]["name"];
if($_FILES["image"]["size"] > 0){
$hash = $I->FastHashImage($uploadedImage);
$hex = $I->HashAsString($hash);
$query = "SELECT `fid`,`hash` FROM `images` WHERE `hash` LIKE '%".$hex."%'";
$queryResult = mysqli_query($con, $query);
$numrows = mysqli_num_rows($queryResult);
echo "<p>" .$numrows. " results found for " .$_FILES['image']['name']. "</p><br />";
$Paginator = new Paginator( $con, $query );
$results = $Paginator->getData( $limit, $page );
//Loop through result set.
/*while($row = mysqli_fetch_array($selectQuery)){
if($row['hash'] == $hex){
$fid = $row['fid'];
echo "<a href='https://www.facebook.com/$fid/' target='_blank'><img src='http://localhost/fbpp/test_pics/$fid.jpg' alt='' class='img-responsive'></a><br />";
// echo "<a href='https://www.facebook.com/$fid/' target='_blank'><img src='https://graph.facebook.com/$fid/picture?type=large' alt='' class='img-responsive'></a><br />";
echo '<div class="col-md-10 col-md-offset-1">
<table class="table table-striped table-condensed table-bordered table-rounded"><tbody>';
for( $i = 0; $i < count( $results->data ); $i++ ){
if($results->data[$i]["hash"] == $hex){
echo '<tr>';
$fid = $results->data[$i]['fid'];
echo "<td><a href='https://www.facebook.com/$fid/' target='_blank'><img src='http://localhost/fbpp/test_pics/$fid.jpg' alt='' class='img-responsive'></a></td>";
// echo "<td><a href='https://www.facebook.com/$fid/' target='_blank'><img src='https://graph.facebook.com/$fid/picture?type=large' alt='' class='img-responsive'></a></td>";
echo '</tr>';
if($numrows <= 10)
echo "";
} else {
echo '</tbody></table>';
echo $Paginator->createLinks( $links, 'pagination pagination-sm' );
echo '</div>';
//Else after checking the file size.
else {
echo "Picture is corrupted the size is 0";
} //Else after error check.
// This else after checking the picture extenstion and max size.
else {
echo "<p>Please Upload A Picture, Max. size is 5 Mb.</p>";
This is pagination class if you want to look at it:
class Paginator {
private $_conn;
private $_limit;
private $_page;
private $_query;
private $_total;
public function __construct( $conn, $query ) {
$this->_conn = $conn;
$this->_query = $query;
$rs= $this->_conn->query( $this->_query );
$this->_total = $rs->num_rows;
public function getData( $limit = 10, $page = 1 ) {
$this->_limit = $limit;
$this->_page = $page;
if ( $this->_limit == 'all' ) {
$query = $this->_query;
} else {
$query = $this->_query . " LIMIT " . ( ( $this->_page - 1 ) * $this->_limit ) . ", $this->_limit";
$rs = $this->_conn->query( $query );
while ( $row = $rs->fetch_assoc() ) {
$results[] = $row;
$result = new stdClass();
$result->page = $this->_page;
$result->limit = $this->_limit;
$result->total = $this->_total;
$result->data = $results;
return $result;
public function createLinks( $links, $list_class ) {
if ( $this->_limit == 'all' ) {
return '';
$last = ceil( $this->_total / $this->_limit );
$start = ( ( $this->_page - $links ) > 0 ) ? $this->_page - $links : 1;
$end = ( ( $this->_page + $links ) < $last ) ? $this->_page + $links : $last;
$html = '<ul class="' . $list_class . '">';
$class = ( $this->_page == 1 ) ? "disabled" : "";
$html .= '<li class="' . $class . '">«</li>';
if ( $start > 1 ) {
$html .= '<li>1</li>';
$html .= '<li class="disabled"><span>...</span></li>';
for ( $i = $start ; $i <= $end; $i++ ) {
$class = ( $this->_page == $i ) ? "active" : "";
$html .= '<li class="' . $class . '">' . $i . '</li>';
if ( $end < $last ) {
$html .= '<li class="disabled"><span>...</span></li>';
$html .= '<li>' . $last . '</li>';
$class = ( $this->_page == $last ) ? "disabled" : "";
$html .= '<li class="' . $class . '">»</li>';
$html .= '</ul>';
return $html;
For more information about this you can look at latest commit on my github account:
Thanks in advance I appreciate any help.
Well with a lot of thinking reading and debugging I had to write the code from scratch again in OOP and I discovered that I need to use sessions because the value of the image path was empty or not completed when I press on any page of the paginator anyway here is the complete code if someone looking for an answer for the same problem...
class Search{
function __construct()
* Upload posted image from index.php to tmp dir
* #return string
function uploadImage()
move_uploaded_file($_FILES['image']['tmp_name'], dirname(__file__).'/tmp/'.$_FILES['image']['name']);
$uploadedImage = dirname(__file__).'/tmp/'.$_FILES['image']['name'];
$_SESSION['image'] = $uploadedImage;
return $_SESSION['image'];
function imageHashing()
$I = PHasher::Instance();
$hash = $I->FastHashImage(Search::uploadImage());
$hex = $I->HashAsString($hash);
$query = "SELECT `fid`,`hash` FROM `images` WHERE `hash` LIKE '%".$hex."%'";
//echo $query;
return $query;
function imageResults()
$limit = ( isset( $_GET['limit'] ) ) ? $_GET['limit'] : 10;
$page = ( isset( $_GET['page'] ) ) ? $_GET['page'] : 1;
$links = ( isset( $_GET['links'] ) ) ? $_GET['links'] : 7;
$queryResults = mysqli_query($con, Search::imageHashing());
$numrows = mysqli_num_rows($queryResults);
echo "<p>" .$numrows. " results found.</p><br />";
$Paginator = new Paginator( $con, Search::imageHashing() );
$results = $Paginator->getData( $limit, $page );
for( $i = 0; $i < count( $results->data ); $i++ ){
echo '<tr>';
$fid = $results->data[$i]['fid'];
echo '<td>';
echo "<a href='https://www.facebook.com/$fid/' target='_blank'>https://www.facebook.com/$fid/</a>";
echo "<a href='https://www.facebook.com/$fid/' target='_blank'><img src='https://graph.facebook.com/$fid/picture?type=large' alt='' class='img-responsive'></a>";
$name = 'https://graph.facebook.com/'.$fid.'?fields=name&access_token=748352698603001|94fc98094ca42f974879c56f3229c5e4';
$response = file_get_contents($name);
$user = json_decode($response,true);
echo $user['name'];
echo '</td>';
echo '</tr>';
if($numrows <= 10){
echo "";
} else {
echo '</tbody></table>';
echo $Paginator->createLinks( $links, 'pagination pagination-sm' );
echo '</div>';
I want to import the data in the excel to mySQL DB using php. I have tried using the way explained in other questions but nothing worked out for me. Kindly let me know how to import the data in to DB using php.
Also, do let me know where to place the excel file to be uploaded,I mean the location in the system.
method 1.you can use load data command
method 2. Excel reader
method 3. parseCSV
method4. (PHP 4, PHP 5) fgetcsv
please refer this PHP code
//table Name
$tableName = "MyTable";
//database name
$dbName = "MyDatabase";
$conn = mysql_connect("localhost", "root", "") or die(mysql_error());
mysql_select_db($dbName) or die(mysql_error());
//get the first row fields
$fields = "";
$fieldsInsert = "";
if (($handle = fopen("test.csv", "r")) !== FALSE) {
if(($data = fgetcsv($handle, 1000, ",")) !== FALSE) {
$num = count($data);
$fieldsInsert .= '(';
for ($c=0; $c < $num; $c++) {
$fieldsInsert .=($c==0) ? '' : ', ';
$fieldsInsert .="`".$data[$c]."`";
$fields .="`".$data[$c]."` varchar(500) DEFAULT NULL,";
$fieldsInsert .= ')';
//drop table if exist
if(mysql_num_rows(mysql_query("SHOW TABLES LIKE '".$tableName."'"))>=1) {
mysql_query('DROP TABLE IF EXISTS `'.$tableName.'`') or die(mysql_error());
//create table
$sql = "CREATE TABLE `".$tableName."` (
`".$tableName."Id` int(100) unsigned NOT NULL AUTO_INCREMENT,
PRIMARY KEY (`".$tableName."Id`)
) ";
$retval = mysql_query( $sql, $conn );
if(! $retval )
die('Could not create table: ' . mysql_error());
else {
while(($data = fgetcsv($handle, 1000, ",")) !== FALSE) {
$num = count($data);
//get field values of each row
for ($c=0; $c < $num; $c++) {
$fieldsInsertvalues .=($c==0) ? '(' : ', ';
$fieldsInsertvalues .="'".$data[$c]."'";
$fieldsInsertvalues .= ')';
//insert the values to table
$sql = "INSERT INTO ".$tableName." ".$fieldsInsert." VALUES ".$fieldsInsertvalues;
echo 'Table Created';
Let start with the basic back ground. We recently brought our web hosting in house.
There are few old website still use Perl. I have no experience with Perl.
Let's Begin. We have a this sub website on our main domain.
Public link : http://www.gatewayrehab.org/eap/
When you goto website we get the following error message
"Software error:
Can't call method "display" on an undefined value at /var/www/www.gatewayrehab.org/app/webroot/eap/index.cgi line 47."
Looking at the EAP website/directory all files look in place with proper permission, again I have no experience with Perl/Cgi. Below is the index.cgi file :
#!/usr/bin/perl -w
### the main control file used in the system
BEGIN { unshift #INC, qw(./cgi-bin/include/); }
### send all fatal errors to the browser
use CGI::Carp qw(fatalsToBrowser);
use CGI qw(:standard);
use Error_Handler;
use File_Handler;
use Cookie_Handler;
require "./cgi-bin/setup.cgi";
do "./cgi-bin/include/common.cgi";
### initialize the file handling module
my $File = new File_Handler;
### initialize the cookie handling module
my $Cookie = new Cookie_Handler;
$ENV{'REQUEST_METHOD'} =~ tr/a-z/A-Z/;
read(STDIN, $buffer, $ENV{'CONTENT_LENGTH'});
} else {
$buffer = $ENV{'QUERY_STRING'};
#pairs = split(/&/, $buffer);
foreach $pair (#pairs){
($name, $value) = split(/=/, $pair);
$value =~ tr/+/ /;
$value =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/eg;
$name =~ tr/+/ /;
$name =~ s/\breq\_//ig;
$name =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/eg;
$name =~ tr/A-Z/a-z/;
$name = trim($name);
$FORM{$name} = trim($value);
my %cookiedata = $Cookie -> get_cookies();
### read the summary database
my $summary_ref = $File -> read($login_summary)|| $Error -> display("$!". __LINE__);
my (#summary) = #$summary_ref;
### read the companies database
my $companies_ref = $File -> read($companies_db)|| $Error -> display("$!". __LINE__);
my (#companies) = #$companies_ref;
my %COMP = ();
foreach (#companies) {
$_ =~ s/\n|\r//g;
my ($c_num, $c_name) = split(/\t/, $_);
$COMP{$c_num} = $c_name;
if ( $cookiedata{'LOGIN'} != 1 ) {
my $found = 0;
my $company_number = $ENV{'REMOTE_USER'};
$company_number =~ s/s|e|w//g;
foreach (#summary) {
$_ =~ s/\n|\r//g;
my #field = split(/\t/, $_);
$field[0] = &trim($field[0]);
$field[2] = &trim($field[2]);
$field[3] = &trim($field[3]);
$field[4] = &trim($field[4]);
$field[5] = &trim($field[5]);
$field[6] = &trim($field[6]);
if ( $field[0] eq "$company_number" ) {
$found = 1;
my $firstletters = substr($ENV{'REMOTE_USER'}, 0, 2);
$firstletters = trim($firstletters);
if ( $firstletters ne "sw" && $firstletters ne "lf" ) {
$firstletters = substr($firstletters, 0, 1);
if ( lc($firstletters) eq "e" ) {
$field[3] = ($field[3] + 1);
} elsif ( lc($firstletters) eq "s" ) {
$field[2] = ($field[2] + 1);
} elsif ( lc($firstletters) eq "w" ) {
$field[4] = ($field[4] + 1);
} elsif ( lc($firstletters) eq "sw" ) {
$field[5] = ($field[2] + 1);
} elsif ( lc($firstletters) eq "lf" ) {
$field[6] = ($field[6] + 1);
} else {
$field[3] = ($field[3] + 1);
$_ = join("\t", #field);
if ( $found == 1 ) {
# write data back to file
# append to summary file
open(LOG, ">$login_summary") || $Error -> display("$!". __LINE__);
foreach (#summary) {
print LOG $_ ."\n";
#$File -> file($login_summary);
#$File -> data(\#summary);
#$File -> write() || $Error -> display("$!". __LINE__);
} else {
$e = 0;
$s = 0;
$w = 0;
$sw = 0;
$lf = 0;
my $firstletters = substr($ENV{'REMOTE_USER'}, 0, 2);
$firstletters = trim($firstletters);
if ( $firstletters ne "sw" && $firstletters ne "lf" ) {
$firstletters = substr($firstletters, 0, 1);
if ( lc($firstletters) eq "e" ) {
$e = 1;
} elsif ( lc($firstletters) eq "s" ) {
$s = 1;
} elsif ( lc($firstletters) eq "w" ) {
$w = 1;
} elsif ( lc($firstletters) eq "sw" ) {
#$sw = 1;
$s = 1;
} elsif ( lc($firstletters) eq "lf" ) {
$lf = 1;
} else {
$e = 1;
# append to summary file
open(LOG, ">>$login_summary") || $Error -> display("$!". __LINE__);
print LOG $company_number ."\t". $COMP{$company_number} ."\t". $s ."\t". $e ."\t". $w . "\t". $sw ."\t". $lf ."\n";
my (#login_logs) = ();
my $logline = "";
$login_logs[0] = $ENV{'REMOTE_USER'};
$login_logs[1] = $ENV{'REMOTE_ADDR'};
$login_logs[2] = time();
open(LOG, ">>$login_logs") || $Error -> display("$!". __LINE__);
print LOG $ENV{'REMOTE_USER'} ."\t". $ENV{'REMOTE_ADDR'} ."\t". time() ."\n";
print "Set-Cookie: LOGIN=1";
print "; path=$cookiepath; domain=$cookiedomain;\n";
my $firstletters = substr($ENV{'REMOTE_USER'}, 0, 2);
$firstletters = trim($firstletters);
if ( $firstletters ne "sw" && $firstletters ne "lf") {
$firstletters = substr($firstletters, 0, 1);
if ( lc($firstletters) eq "e" ) {
print "Location: http://www.gatewayrehab.org/eap/new/employee/member.htm\n\n";
} elsif ( lc($firstletters) eq "s" ) {
print "Location: http://www.gatewayrehab.org/eap/supervisor/\n\n";
} elsif ( lc($firstletters) eq "w" ) {
print "Location: http://www.gatewayrehab.org/eap/new/worklife/member.htm\n\n";
} elsif ( lc($firstletters) eq "sw" ) {
print "Location: http://www.gatewayrehab.org/eap/supervisor-wl/\n\n";
} elsif ( lc($firstletters) eq "lf" ) {
print "Location: http://www.gatewayrehab.org/eap/legalandfinancial/\n\n";
} else {
print "Location: http://www.gatewayrehab.org/eap/new/employee/member.htm\n\n";
#output html
print "Content-type: text/html\n\n";
print "<h1>hello world!</h1>";
$e = `perl -ver`;
$r = `whereis perl5`;
$z = `whereis sendmail`;#
$w = `top`;#
$d = `w`;
print "<pre>perl version:<br>$e<hr>perl path:<br>$r<hr>sendmail path:<br>$z<hr>top:<br>$w<hr>w:<br>$d<hr>environment vars:<br>";##
while (($key, $val) = each %ENV) {
print "$key = $val\n";
$x= 'lowercase';
print "<hr>path tranlsated(NT)<br>$ENV{'PATH_TRANSLATED'}</pre>";
#$x = uc($x);
print "<br>$x";
Please let me know what I am missing. If you need to look at more "included" files let me know.
Also here is the link for our cgi config. http://www.gatewayrehab.org/eap/cgi-bin/cgi.cgi
Thank You.
The error comes from this line: my $summary_ref = $File -> read($login_summary)|| $Error -> display("$!". __LINE__);. It means $Error doesn't exist or its value is undef. And indeed, I don't see such a variable being declared or initialised. Maybe it's suppose to be exported by Error_Handler???
This error is happening when trying to report another error. You could try replacing (if only temporarily) $Error -> display("$!". __LINE__); with die($!) and checking your server's error log for the error message. That said, it's surely "No such file or directory" or "Permission denied", so maybe it's not worth the time to find out the exact message. (Upd: Actually, I think the message will be "redirected" to your browser, so that makes things easier.)
I'm guessing here, but it looks like it's trying to read the file named by $login_summary. I have no idea where this is set (if at all!), so you might want to find out its value, and maybe where it's getting set.
As ikegami pointed out, the error you are seeing indicates that $Error is not being initialized, and looking at the rest of the script, I would guess that what is needed (first of all) is to initialize it in the same manner as the $File and $Cookie variables. Add this line after line 20 in your script:
my $Error = new Error_Handler;
That might give you a nicer error message, but it will probably just tell you what you already discovered when you added your die($!); line: 'No such file or directory'.
Your script is also doing a file called ./cgi-bin/include/common.cgi. Check this file for your $login_summary variable, to know what file it's trying to access.
I have resolved the problem with a quick fix. I don't know why it work but it does for me.
Here what i did...After reading online i found that adding "-w" on the header of all .(dot)cgi files make it work.
I do hope there is a better method to add "-w" in one place then adding it on all .cgi files.
In short change #!/usr/bin/perl to #!/usr/bin/perl -w
Thanks all.
When writing queries or running through result sets, I'm constantly having to refer to fields as "field_id_X". I want to believe there is a saner way to go about this than defining a CONST for every field_id/name pair.
define(NAME_FIELD ,'field_id_3');
define(HEIGHT_FIELD, 'field_id_4');
foreach( $result as $row ){
$name = $row[NAME_FIELD]; // :(
Get an Array for field id's and names...
function getFieldReferences() {
$sql = "SELECT field_id, field_name
FROM exp_channel_fields
WHERE site_id = ".$this->EE->config->item('site_id');
$result = $this->EE->db->query($sql);
if ($result->num_rows() > 0) {
$result = $result->result_array();
$finalResult = array();
foreach ($result as $row)
$finalResult[$row["field_id"]] = $row["field_name"];
return $finalResult;
} else {
return false;
Example conversion of a specific entry details $entry_id...
$sql = "SELECT exp_channel_data.*, exp_channel_titles.*, exp_channels.channel_name
FROM exp_channel_data, exp_channel_titles, exp_channels
WHERE exp_channel_data.entry_id = $entry_id
AND exp_cart_products.entry_id = $entry_id
AND exp_channel_titles.entry_id = $entry_id
LIMIT = 1";
$result = $this->EE->db->query($sql);
if ($result->num_rows() > 0) {
$result = $result->result_array();
$result = $result[0];
//### Get Field Titles ###
$fieldReferences = getFieldReferences();
//### Replace Field ID reference with name ###
foreach ($result as $key => $value) {
if (substr($key,0,9) == "field_id_") {
$result[$fieldReferences[substr($key,9)]] = $value;
if (substr($key,0,9) == "field_ft_")
}//### End of foreach ###
Convert Member fields to names based on specified member $id...
$sql = "SELECT m_field_id, m_field_name
FROM exp_member_fields";
$result = $this->EE->db->query($sql);
if ($result->num_rows() > 0) {
$memberFields = $result->result_array();
$sql = "SELECT exp_member_data.*, exp_members.email
FROM exp_member_data, exp_members
WHERE exp_member_data.member_id = $id
AND exp_members.member_id = $id
$result = $this->EE->db->query($sql);
if ($result->num_rows() > 0) {
$result = $result->result_array();
$rawMemberDetails = $result[0];
//### Loop through each Member field assigning it the correct name ###
foreach($memberFields as $row)
$memberDetails[ $row['m_field_name'] ] = $rawMemberDetails['m_field_id_'.$row['m_field_id']];
You could look up exp_channel_fields.field_name (or field_label) by the field_id you have from exp_channel_data.
Wikipedia uses an "HTML sitemap" to link to every single content page. The huge amount of pages has to be split into lots of groups so that every page has a maximum of ca. 100 links, of course.
This is how Wikipedia does it:
Special: All pages
The whole list of articles is divided into several larger groups which are defined by their first and last word each:
"AAA rating" to "early adopter"
"earth" to "lamentation"
"low" to "priest"
When you click one single category, this range (e.g. "earth" to "lamentation") is divided likewise. This procedure is repeated until the current range includes only ca. 100 articles so that they can be displayed.
I really like this approach to link lists which minimizes the number of clicks needed to reach any article.
How can you create such an article list automatically?
So my question is how one could automatically create such an index page which allows clicks to smaller categories until the number of articles contained is small enough to display them.
Imagine an array of all article names is given, how would you start to program an index with automatical category-splitting?
Array('AAA rating', 'abdicate', ..., 'zero', 'zoo')
It would be great if you could help me. I don't need a perfect solution but a useful approach, of course. Thank you very much in advance!
Edit: Found the part in Wikipedia's software (MediaWiki) now:
* Implements Special:Allpages
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2 of the License, or
* (at your option) any later version.
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* GNU General Public License for more details.
* You should have received a copy of the GNU General Public License along
* with this program; if not, write to the Free Software Foundation, Inc.,
* 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
* http://www.gnu.org/copyleft/gpl.html
* #file
* #ingroup SpecialPage
* Implements Special:Allpages
* #ingroup SpecialPage
class SpecialAllpages extends IncludableSpecialPage {
* Maximum number of pages to show on single subpage.
protected $maxPerPage = 345;
* Maximum number of pages to show on single index subpage.
protected $maxLineCount = 100;
* Maximum number of chars to show for an entry.
protected $maxPageLength = 70;
* Determines, which message describes the input field 'nsfrom'.
protected $nsfromMsg = 'allpagesfrom';
function __construct( $name = 'Allpages' ){
parent::__construct( $name );
* Entry point : initialise variables and call subfunctions.
* #param $par String: becomes "FOO" when called like Special:Allpages/FOO (default NULL)
function execute( $par ) {
global $wgRequest, $wgOut, $wgContLang;
# GET values
$from = $wgRequest->getVal( 'from', null );
$to = $wgRequest->getVal( 'to', null );
$namespace = $wgRequest->getInt( 'namespace' );
$namespaces = $wgContLang->getNamespaces();
( $namespace > 0 && in_array( $namespace, array_keys( $namespaces) ) ) ?
wfMsg( 'allinnamespace', str_replace( '_', ' ', $namespaces[$namespace] ) ) :
wfMsg( 'allarticles' )
if( isset($par) ) {
$this->showChunk( $namespace, $par, $to );
} elseif( isset($from) && !isset($to) ) {
$this->showChunk( $namespace, $from, $to );
} else {
$this->showToplevel( $namespace, $from, $to );
* HTML for the top form
* #param $namespace Integer: a namespace constant (default NS_MAIN).
* #param $from String: dbKey we are starting listing at.
* #param $to String: dbKey we are ending listing at.
function namespaceForm( $namespace = NS_MAIN, $from = '', $to = '' ) {
global $wgScript;
$t = $this->getTitle();
$out = Xml::openElement( 'div', array( 'class' => 'namespaceoptions' ) );
$out .= Xml::openElement( 'form', array( 'method' => 'get', 'action' => $wgScript ) );
$out .= Html::hidden( 'title', $t->getPrefixedText() );
$out .= Xml::openElement( 'fieldset' );
$out .= Xml::element( 'legend', null, wfMsg( 'allpages' ) );
$out .= Xml::openElement( 'table', array( 'id' => 'nsselect', 'class' => 'allpages' ) );
$out .= "<tr>
<td class='mw-label'>" .
Xml::label( wfMsg( 'allpagesfrom' ), 'nsfrom' ) .
" </td>
<td class='mw-input'>" .
Xml::input( 'from', 30, str_replace('_',' ',$from), array( 'id' => 'nsfrom' ) ) .
" </td>
<td class='mw-label'>" .
Xml::label( wfMsg( 'allpagesto' ), 'nsto' ) .
" </td>
<td class='mw-input'>" .
Xml::input( 'to', 30, str_replace('_',' ',$to), array( 'id' => 'nsto' ) ) .
" </td>
<td class='mw-label'>" .
Xml::label( wfMsg( 'namespace' ), 'namespace' ) .
" </td>
<td class='mw-input'>" .
Xml::namespaceSelector( $namespace, null ) . ' ' .
Xml::submitButton( wfMsg( 'allpagessubmit' ) ) .
" </td>
$out .= Xml::closeElement( 'table' );
$out .= Xml::closeElement( 'fieldset' );
$out .= Xml::closeElement( 'form' );
$out .= Xml::closeElement( 'div' );
return $out;
* #param $namespace Integer (default NS_MAIN)
* #param $from String: list all pages from this name
* #param $to String: list all pages to this name
function showToplevel( $namespace = NS_MAIN, $from = '', $to = '' ) {
global $wgOut;
# TODO: Either make this *much* faster or cache the title index points
# in the querycache table.
$dbr = wfGetDB( DB_SLAVE );
$out = "";
$where = array( 'page_namespace' => $namespace );
$from = Title::makeTitleSafe( $namespace, $from );
$to = Title::makeTitleSafe( $namespace, $to );
$from = ( $from && $from->isLocal() ) ? $from->getDBkey() : null;
$to = ( $to && $to->isLocal() ) ? $to->getDBkey() : null;
if( isset($from) )
$where[] = 'page_title >= '.$dbr->addQuotes( $from );
if( isset($to) )
$where[] = 'page_title <= '.$dbr->addQuotes( $to );
global $wgMemc;
$key = wfMemcKey( 'allpages', 'ns', $namespace, $from, $to );
$lines = $wgMemc->get( $key );
$count = $dbr->estimateRowCount( 'page', '*', $where, __METHOD__ );
$maxPerSubpage = intval($count/$this->maxLineCount);
$maxPerSubpage = max($maxPerSubpage,$this->maxPerPage);
if( !is_array( $lines ) ) {
$options = array( 'LIMIT' => 1 );
$options['ORDER BY'] = 'page_title ASC';
$firstTitle = $dbr->selectField( 'page', 'page_title', $where, __METHOD__, $options );
$lastTitle = $firstTitle;
# This array is going to hold the page_titles in order.
$lines = array( $firstTitle );
# If we are going to show n rows, we need n+1 queries to find the relevant titles.
$done = false;
while( !$done ) {
// Fetch the last title of this chunk and the first of the next
$chunk = ( $lastTitle === false )
? array()
: array( 'page_title >= ' . $dbr->addQuotes( $lastTitle ) );
$res = $dbr->select( 'page', /* FROM */
'page_title', /* WHAT */
array ('LIMIT' => 2, 'OFFSET' => $maxPerSubpage - 1, 'ORDER BY' => 'page_title ASC')
$s = $dbr->fetchObject( $res );
if( $s ) {
array_push( $lines, $s->page_title );
} else {
// Final chunk, but ended prematurely. Go back and find the end.
$endTitle = $dbr->selectField( 'page', 'MAX(page_title)',
__METHOD__ );
array_push( $lines, $endTitle );
$done = true;
$s = $res->fetchObject();
if( $s ) {
array_push( $lines, $s->page_title );
$lastTitle = $s->page_title;
} else {
// This was a final chunk and ended exactly at the limit.
// Rare but convenient!
$done = true;
$wgMemc->add( $key, $lines, 3600 );
// If there are only two or less sections, don't even display them.
// Instead, display the first section directly.
if( count( $lines ) <= 2 ) {
if( !empty($lines) ) {
$this->showChunk( $namespace, $from, $to );
} else {
$wgOut->addHTML( $this->namespaceForm( $namespace, $from, $to ) );
# At this point, $lines should contain an even number of elements.
$out .= Xml::openElement( 'table', array( 'class' => 'allpageslist' ) );
while( count ( $lines ) > 0 ) {
$inpoint = array_shift( $lines );
$outpoint = array_shift( $lines );
$out .= $this->showline( $inpoint, $outpoint, $namespace );
$out .= Xml::closeElement( 'table' );
$nsForm = $this->namespaceForm( $namespace, $from, $to );
# Is there more?
if( $this->including() ) {
$out2 = '';
} else {
if( isset($from) || isset($to) ) {
global $wgUser;
$out2 = Xml::openElement( 'table', array( 'class' => 'mw-allpages-table-form' ) ).
<td>' .
$nsForm .
<td class="mw-allpages-nav">' .
$wgUser->getSkin()->link( $this->getTitle(), wfMsgHtml ( 'allpages' ),
array(), array(), 'known' ) .
</tr>" .
Xml::closeElement( 'table' );
} else {
$out2 = $nsForm;
$wgOut->addHTML( $out2 . $out );
* Show a line of "ABC to DEF" ranges of articles
* #param $inpoint String: lower limit of pagenames
* #param $outpoint String: upper limit of pagenames
* #param $namespace Integer (Default NS_MAIN)
function showline( $inpoint, $outpoint, $namespace = NS_MAIN ) {
global $wgContLang;
$inpointf = htmlspecialchars( str_replace( '_', ' ', $inpoint ) );
$outpointf = htmlspecialchars( str_replace( '_', ' ', $outpoint ) );
// Don't let the length runaway
$inpointf = $wgContLang->truncate( $inpointf, $this->maxPageLength );
$outpointf = $wgContLang->truncate( $outpointf, $this->maxPageLength );
$queryparams = $namespace ? "namespace=$namespace&" : '';
$special = $this->getTitle();
$link = $special->escapeLocalUrl( $queryparams . 'from=' . urlencode($inpoint) . '&to=' . urlencode($outpoint) );
$out = wfMsgHtml( 'alphaindexline',
return '<tr><td class="mw-allpages-alphaindexline">' . $out . '</td></tr>';
* #param $namespace Integer (Default NS_MAIN)
* #param $from String: list all pages from this name (default FALSE)
* #param $to String: list all pages to this name (default FALSE)
function showChunk( $namespace = NS_MAIN, $from = false, $to = false ) {
global $wgOut, $wgUser, $wgContLang, $wgLang;
$sk = $wgUser->getSkin();
$fromList = $this->getNamespaceKeyAndText($namespace, $from);
$toList = $this->getNamespaceKeyAndText( $namespace, $to );
$namespaces = $wgContLang->getNamespaces();
$n = 0;
if ( !$fromList || !$toList ) {
$out = wfMsgWikiHtml( 'allpagesbadtitle' );
} elseif ( !in_array( $namespace, array_keys( $namespaces ) ) ) {
// Show errormessage and reset to NS_MAIN
$out = wfMsgExt( 'allpages-bad-ns', array( 'parseinline' ), $namespace );
$namespace = NS_MAIN;
} else {
list( $namespace, $fromKey, $from ) = $fromList;
list( , $toKey, $to ) = $toList;
$dbr = wfGetDB( DB_SLAVE );
$conds = array(
'page_namespace' => $namespace,
'page_title >= ' . $dbr->addQuotes( $fromKey )
if( $toKey !== "" ) {
$conds[] = 'page_title <= ' . $dbr->addQuotes( $toKey );
$res = $dbr->select( 'page',
array( 'page_namespace', 'page_title', 'page_is_redirect' ),
'ORDER BY' => 'page_title',
'LIMIT' => $this->maxPerPage + 1,
'USE INDEX' => 'name_title',
if( $res->numRows() > 0 ) {
$out = Xml::openElement( 'table', array( 'class' => 'mw-allpages-table-chunk' ) );
while( ( $n < $this->maxPerPage ) && ( $s = $res->fetchObject() ) ) {
$t = Title::makeTitle( $s->page_namespace, $s->page_title );
if( $t ) {
$link = ( $s->page_is_redirect ? '<div class="allpagesredirect">' : '' ) .
$sk->linkKnown( $t, htmlspecialchars( $t->getText() ) ) .
($s->page_is_redirect ? '</div>' : '' );
} else {
$link = '[[' . htmlspecialchars( $s->page_title ) . ']]';
if( $n % 3 == 0 ) {
$out .= '<tr>';
$out .= "<td style=\"width:33%\">$link</td>";
if( $n % 3 == 0 ) {
$out .= "</tr>\n";
if( ($n % 3) != 0 ) {
$out .= "</tr>\n";
$out .= Xml::closeElement( 'table' );
} else {
$out = '';
if ( $this->including() ) {
$out2 = '';
} else {
if( $from == '' ) {
// First chunk; no previous link.
$prevTitle = null;
} else {
# Get the last title from previous chunk
$dbr = wfGetDB( DB_SLAVE );
$res_prev = $dbr->select(
array( 'page_namespace' => $namespace, 'page_title < '.$dbr->addQuotes($from) ),
array( 'ORDER BY' => 'page_title DESC',
'LIMIT' => $this->maxPerPage, 'OFFSET' => ($this->maxPerPage - 1 )
# Get first title of previous complete chunk
if( $dbr->numrows( $res_prev ) >= $this->maxPerPage ) {
$pt = $dbr->fetchObject( $res_prev );
$prevTitle = Title::makeTitle( $namespace, $pt->page_title );
} else {
# The previous chunk is not complete, need to link to the very first title
# available in the database
$options = array( 'LIMIT' => 1 );
if ( ! $dbr->implicitOrderby() ) {
$options['ORDER BY'] = 'page_title';
$reallyFirstPage_title = $dbr->selectField( 'page', 'page_title',
array( 'page_namespace' => $namespace ), __METHOD__, $options );
# Show the previous link if it s not the current requested chunk
if( $from != $reallyFirstPage_title ) {
$prevTitle = Title::makeTitle( $namespace, $reallyFirstPage_title );
} else {
$prevTitle = null;
$self = $this->getTitle();
$nsForm = $this->namespaceForm( $namespace, $from, $to );
$out2 = Xml::openElement( 'table', array( 'class' => 'mw-allpages-table-form' ) ).
<td>' .
$nsForm .
<td class="mw-allpages-nav">' .
$sk->link( $self, wfMsgHtml ( 'allpages' ), array(), array(), 'known' );
# Do we put a previous link ?
if( isset( $prevTitle ) && $pt = $prevTitle->getText() ) {
$query = array( 'from' => $prevTitle->getText() );
if( $namespace )
$query['namespace'] = $namespace;
$prevLink = $sk->linkKnown(
htmlspecialchars( wfMsg( 'prevpage', $pt ) ),
$out2 = $wgLang->pipeList( array( $out2, $prevLink ) );
if( $n == $this->maxPerPage && $s = $res->fetchObject() ) {
# $s is the first link of the next chunk
$t = Title::MakeTitle($namespace, $s->page_title);
$query = array( 'from' => $t->getText() );
if( $namespace )
$query['namespace'] = $namespace;
$nextLink = $sk->linkKnown(
htmlspecialchars( wfMsg( 'nextpage', $t->getText() ) ),
$out2 = $wgLang->pipeList( array( $out2, $nextLink ) );
$out2 .= "</td></tr></table>";
$wgOut->addHTML( $out2 . $out );
if( isset($prevLink) or isset($nextLink) ) {
$wgOut->addHTML( '<hr /><p class="mw-allpages-nav">' );
if( isset( $prevLink ) ) {
$wgOut->addHTML( $prevLink );
if( isset( $prevLink ) && isset( $nextLink ) ) {
$wgOut->addHTML( wfMsgExt( 'pipe-separator' , 'escapenoentities' ) );
if( isset( $nextLink ) ) {
$wgOut->addHTML( $nextLink );
$wgOut->addHTML( '</p>' );
* #param $ns Integer: the namespace of the article
* #param $text String: the name of the article
* #return array( int namespace, string dbkey, string pagename ) or NULL on error
* #static (sort of)
* #access private
function getNamespaceKeyAndText($ns, $text) {
if ( $text == '' )
return array( $ns, '', '' ); # shortcut for common case
$t = Title::makeTitleSafe($ns, $text);
if ( $t && $t->isLocal() ) {
return array( $t->getNamespace(), $t->getDBkey(), $t->getText() );
} else if ( $t ) {
return null;
# try again, in case the problem was an empty pagename
$text = preg_replace('/(#|$)/', 'X$1', $text);
$t = Title::makeTitleSafe($ns, $text);
if ( $t && $t->isLocal() ) {
return array( $t->getNamespace(), '', '' );
} else {
return null;
Not a great approach as you don't have a way of stopping when you get to the end of the list. You only want to split the items if there is more items than your maximum (although you may want to add some flexibility there, as you could get to the stage where you have two items on a page).
I assume that the datasets would actually come from a database, but using your $items array for ease of display
At its simplest, assuming it is coming from a web page that is sending an index number of the start and end, and that you have checked that those numbers are valid and sanitised
$itemsPerPage = 50; // constant
$itemStep = ($end - $start) / $itemsPerPage;
if($itemStep < 1)
for($i = $start; $i < $end; $i++)
// display these as individual items
for($i = $start; $i < $end; $i += $itemStep)
$to = $i + ($itemStep - 1); // find the end part
if($to > $end)
$to = $end;
display_to_from($items[$i], $items[$to]);
where the display functions display the links as you want. However, one of the problems doing it like that is that you may want to adjust the items per page, as you run the risk of having a set of (say) 51 and ending up with a link from 1 to 49, and another 50 to 51.
I don't understand why you are arranging it in groups in your pseudocode, as you are going from page to page doing further chops, so you only need the start and end of each section, until you get to the page where all the links will fit.
-- edit
The original was wrong. Now you divide the amount of items you have to go through by the maximum items you want to display. If it is 1000, this will be listing ever 20 items, if it is 100,000 it will be every 2,000. If it is less than the amount you show, you can show them all individually.
-- edit again - to add some more about the database
No, you are right, you don't want to load 2,000,000 data records, and you don't have to.
You have two options, you can make a prepared statement such as "select * from articles where article = ?" and loop through the results getting one at a time, or if you want to do it in one block - Assuming a mysql database and the code above,
$numberArray = "";
for($i = $start; $i < $end; $i += $itemStep)
$to = $i + ($itemStep - 1); // find the end part
if($to > $end)
$to = $end;
// display_to_from($items[$i], $items[$to]);
if( $i != $start)
$numberArray += ", ";
$numberArray.= $i.", ".$to;
$sqlQuery = "Select * from articles where article_id in (".$numberArray.")";
... do the mysql select and go through the results, using alternate rows as the start and end
This gives you a query like 'Select * from articles where article_id in (1,49,50,99,100,149... etc)'
The process that as a normal set
My approach in pseudo-code:
$items = array('air', 'automatic', 'ball', ..., 'yield', 'zero', 'zoo');
$itemCount = count($items);
$itemsPerPage = 50; // constant
$counter = 0;
foreach ($items as $item) {
$groupNumber = floor($counter/$itemsPerPage);
// assign $item to group $groupNumber
// repeat this procedure recursively for each of the new groups
Do you think this is a good approach? Can you improve or complete it?