Personal tools
You are here: Home All Locations ASET Web FAQ 2.08. Penn State Search Engine

Web FAQ 2.08. Penn State Search Engine

Document Actions
2.8c. How are Web sites indexed by the Penn State search engine?

The Penn State Search Engine periodically reads the Penn State home page, http://www.psu.edu/, indexes it, and follows all links that remain inside the .psu.edu domain.  The pages found are indexed and links found on those pages are also followed and thus the process continues until the search engine runs out of room in the index or all allowed pages are found and indexed. This process runs continuously, and pages that are found to change often are checked and re-indexed more often than pages that do not change as often.

Some pages within the .psu.edu domain are not indexed. For example, pages that are instructed by the server's robots.txt file to not be crawled are ignored. Pages may have the "robots" <meta> tag that indicates if it should be indexed or not, or whether the links should be followed. Pages that have restricted access settings such as password protection or specific IP restriction that prevent the search engine from crawling it are not indexed. URLs under the server's /cgi-bin/ folder, or those that contain a ? are not indexed by default. Site managers may request these URLs on their respective servers to be indexed. Faculty, staff, and student personal pages (http://www.personal.psu.edu/) are not indexed by the Penn State search engine.

Sites outside of the .psu.edu domain that are run by Penn State or on Penn State's behalf may be indexed by approved request.  Inquiries should be directed to the Penn State Search Engine Team.

For more information, see Introduction to the Penn State Search Engine.

For help contact The ITS Help Desk
Also Search the ITS Site Index
Last modified 07-18-2007