|
| For security purpose, you shouldn't include the administration pages or any other sensitive pages in the sitemap file. Because you wouldn't want your admin pages to be indexed on the search engines. Therefore, GSiteCrawler will read ROBOTS.TXT and skip/ignore the URL listed on the text file. | |
| It's common for nowadays server to has custom "file-not-found" pages. It can be hard for the search engine or even GSiteCrawler to differentiate whether the page is really existed or not. Enabling this option could check such issue. | |
| If you website has been setup for a while, Google may have know some information or index a couple of pages on your site, GSiteCrawler can double check it first with Google which eventually makes the software runs faster as we don't have to do extra work. |

6. Now the crawler is running and you can manually setup the number of crawlers running on your site, just in case you didn't overload your site. You can pause the crawler anytime and continue on it later. If you want to abort the crawler, then pause the crawler and clear the total queue to restart it all over again.

7. After all crawlers have stopped and your site has been fully crawled by them, then click on Generate to choose either Google sitemap file or Yahoo url list. Afterwards, upload the sitemap file manually through your FTP software such as FileZilla.
Note: You shouldn't use GSiteCrawler if you are on Drupal or WordPress. Drupal has its own sitemap generator called XMLSitemap module, which's pretty easy and straightforward. I have initially tried using GSiteCrawler on my drupal site, it will include by default unnecessary links such as login destination, quicktabs module, etc. Believe it or not my site with just over 300 links turned out to be a large 10,000 links using GSiteCrawler due to quicktabs module being installed.

How to create sitemap .xml using the GSiteCrawler




Recent comments