Shane Zentz PHP Programming - Sitemap Generator / Sitemap Maker

 

 

This PHP sitemap generator script by Shane Zentz is meant to be added to the root directory of the server that you would like to create a sitemap of. It is in early BETA development and cannot be gaurenteed to work correctly so use it / try it at your own risk. That said, there are several options to use it, one would be to call it on a webpage as in include('sitemap-maker2.php'); and then comment out the last two lines of the script (which are the lines which create one html and one XML sitemap file on the root directory), another option is to remove the echo part of the script and just run it to let it create the HTML and XML sitemaps on the root directory.

The script body is displayed below and can be modified to fit any of your needs. Also use it as seen fit, but please give me credit when it is used.....

 

Here is the script:


  <?php
/*
Shane Zentz (c) 2017
sitemap-maker2.php
php script to create HTML sitemap by reading filenames from root dir and sub dirs
   
/* idea is to add this into sitemap.php/html as include('tester.php');, then it will
   automatically add a div/ul with all links on the site??? */
   
/* ERRORS:\
   .      if theres a file in dir named file.php.old or file.html.bak it may still add it to sitemap???   
   ..     if a webpage contains a blank title tag then it will be skipped by sitemap maker???
   ...    not sure what would happen if a title tag is commented out???
   ....   array/output needs to be sorted properly and eventually styled up correctly???
  
  
*/   

$recount = 0;
$out = getDirContents($_SERVER['DOCUMENT_ROOT']);
$rootDir = $_SERVER['DOCUMENT_ROOT'];

echo '<h1>'.$_SERVER['HTTP_HOST'].'</h1>';
echo '<h3>'.$_SERVER['SERVER_NAME'].'</h3>';
echo $rootDir.'<br>';

// read the contents of directory recursively and get file names, return array with all file names
function getDirContents($dir)
{
  $handle = opendir($dir);
  if ( !$handle ) return array();
  $contents = array();
  while ( $entry = readdir($handle) )
  {
    if ( $entry=='.' || $entry=='..' ) continue;
    $entry = $dir.DIRECTORY_SEPARATOR.$entry;
	// only add to array if is a file and .html or .php extension here but ignore .bak files.......
    if ( is_file($entry) && (strpos($entry, '.html') || strpos($entry, '.php')) && (!strpos($entry, '.bak')) )
    {
	  $output = str_replace($_SERVER['DOCUMENT_ROOT'], $_SERVER['SERVER_NAME'], $entry);  // replace doc_root name with domain name
	  $contents[] = $output;
    }
    else if ( is_dir($entry) )
    {
	  // recursively get files from directory folders here.....	
      $contents = array_merge($contents, getDirContents($entry));
    }
  }
  closedir($handle);
  return $contents;
}

// get the page title by reading the file (if title exists it must be a legit webpage, so test if it returns blank or an title)
function getPageTitle($hFile)
{
$titles = '';
$lines = file_get_contents($hFile);
$title = '';
if (strpos($lines, "<title>")){  // check to see if file contains title tag
$dom = new DOMDocument;
@$dom->loadHTML($lines);
$title = $dom->getElementsByTagName('title');
if ($title->length > 0) { $titles = $title->item(0)->textContent; }
}
fclose($hFile);
return $titles;	// return the title text as string if it exists
}
  // not sure about how to sort this array out so it prints out rightly
  //sort($out);
  //echo "<!DOCTYPE html><html lang=\"en\"><head><style>body{ul li{list-style-type:none;}a {text-decoration:none;}}</style></head><body>";
  echo "<h1>HTML Site Map</h1><div><ul style=\"list-style-type:none;\">";
  $outString = "<h1>HTML SITE MAP</h1><div><ul style=\"list-style-type:none;\">";
  $XMLoutString = "<?xml version=\"1.0\" encoding=\"UTF-8\"?><urlset>";
  
foreach($out as $key => $value){	
	$url = 'https://www.'.$value;  // to get full https:// url here	
	$newLine = $url;
	$ck = 'https://www.'.$_SERVER['SERVER_NAME'].'/';
	$newLine2 = str_replace($ck, '', $newLine);
	$pageTitle = getPageTitle($newLine2);
    // if getPageTitle returns a string with length > 0 then it must be a webpage so print it
	// also make sure length of title is less than 500??? (fixes error where this script prints as webpage)
	if ((strlen($pageTitle) > 0) && (strlen($pageTitle) < 300)) {
	$recount++;
	echo '<li><a href='.$url.' style="text-decoration:none">'.$pageTitle.'</a></li><br>';
	$outString .= '<li><a href='.$url.'>'.$pageTitle.'</a></li><br>';
	$XMLoutString .= '<url><loc><a href='.$url.'>'.$pageTitle.'</loc></url>'; // how to style up the xml?? so it puts each url/title on one line only???
	}
}
  echo "<h2>Total Files: ".$recount."</h2>";
  $outString .= "</ul></div>";
  $XMLoutString .= "</urlset>";
  echo "</ul></div><p> </p><p> </p></body></html>";
// ok, now write the contents of $out array to sitemap.html file and put on root dir????
file_put_contents('sitemap-test.html', $outString); // make the html sitemap...
file_put_contents('sitemap-test.xml', $XMLoutString); // make the xml sitemap...
?>