Website Verification – Profile

Profile contains settings for website verification and the last verification results. Select Action -> Profile from the main menu to bring up the profile window.

Starting URLs

Starting URL:
The web address that serves as the starting point for the verification process.

Note: Starting URLs can be imported from a text file with a list of URLs or from an HTML file. To import the starting URL from a file, click on the Import button, select the source type (List of URLs or HTML file), and then select the source file in the window that appears.

Retrieve links from the specified page only:
If this option is selected, Web Link Validator will check the links within the starting URL(s) only.

Verify external links:
This setting determines the program's behavior concerning the external links validation (the links located on other servers or above the starting page in the directory tree). If this option is selected, the program will verify the external links.

Case sensitivity:
Select this option when case-sensitivity is important. In general, activating this option is only necessary when the website is hosted on a Unix-style or similar operating system.

Spell checker:
Toggles the built-in spell checker to spell-check words within all detected internal pages. If any spelling errors are found, they will be reported on the link properties pane and in the report.

Related topic:
How to Check Spelling on a Website

Internal Links

Internal links are the links located within the Starting URL's folder or its subfolders.

Example:

Starting URL:
  http://www.example.com/folder/

The following links are considered external:
  http://www.relsoftware.com
  http://www.weblinkvalidator.com

And also:
  http://www.example.com
  http://www.example.com/page1.htm

These links are EXTERNAL since they are 'outside' the starting URL's area, ie they are not contained within the http://www.example.com/folder/ folder or its subfolders.

The following links are considered internal:
  http://www.example.com/folder/page1.htm
  http://www.example.com/folder/page2.htm
  ...and so on.

These links are INTERNAL since they are 'inside' the starting URL's area, ie they are contained within the http://www.example.com/folder/ folder or its subfolders.

Internal Links in Web Link Validator

Under the Internal Links setting, it is possible to label links and masks as internal.

The Starting URLs data defines the starting point for validation and the links to be validated. The program automatically processes the links located within the Starting URL's folder or its subfolders (internal links) and automatically skips full processing of the links located outside the Starting URL's folder or its subfolders (external links).

To validate links that are outside the Starting URL's folder or its subfolders, the links or masks should be listed in the internal links area, which is marked as 'Treat external link as internal if it matches one of the following masks (one per line)'.

To exclude specific links from the list of internal links (as specified above), it is necessary to list such links or their masks in the external links area. This is marked as '...but treat link as external anyway if it matches one of the following masks (one per line)'.

Notes:
Special characters can be used to specify masks, such as:
'*' - stands for any text
'?' - stands for any character

Example:

Starting URL:
  http://www.example.com/folder2/

Treat an external link as internal only if it matches one of the following masks (one per line):
  *www.example.com/folder/*

...but treat link as external anyway if it matches one of the following masks (one per line):
  *www.example.com/folder/temp/*

The program recognizes these links as internal:
  http://www.example.com/folder/
  http://www.example.com/folder2/
  http://www.example.com/folder/page.htm

However, links that contain www.example.com/folder/temp/ in their URL will be considered as external. Therefore, these links will not be fully processed by the program.

Authentication

If a site contains pages that require a user name and password, they should be entered here. When Web Link Validator encounters a page that requires authentication it will automatically use the user name and password provided.

These settings work only for web server-based authentication. If a website uses HTML-based authentication, it is possible to use the POST Method in the Starting URLs section.

Note: Only one user name and password can be used for the entire site.

Related topics:
How to Check Password Protected Websites
How to Check Password Protected Websites with HTML-based Authentication

HTML Analysis

Searching for links within the following tags - only selected HTML tags will be analyzed. If some tags are not marked, the detected links will be ignored.

Script Analysis

Allows discovery of links inside scripts (eg JavaScript or CSS), and non-HTML documents (eg Word, PDF, Excel etc).

Search for links in JavaScript:
Attempts to find links in JavaScript inside the source code of each page.

Analyze DHTML Events - Attempts to find links in event-handling commands like onClick, onMouseOver etc.

Analyze parameter "VALUE=..." - Attempts to find links in the value parameter. Applicable to HTML tags: option, input, param.

Analyze <SCRIPT>...</SCRIPT> sections - Attempts to find links in code enclosed in the <SCRIPT> and </SCRIPT> tags. By default, the application analyzes both code contained within the main web page and code in external JavaScript files. To disable analyzing of external JavaScript files, drop the option Download .JS files and search for links.

Search for links in CSS, <STYLE>...</STYLE> section:
Attempts to find links in cascaded style sheets code and style formatting enclosed in the <STYLE> and </STYLE>. Please note: by default, the application analyzes both style code contained within the main web page and code in external style sheet files. To disable analyzing external CSS files, drop the option Download .CSS files and search for links.

Search for links in Adobe Flash files:
Attempts to find links in detected Adobe Flash files.

Search for links in XML/RSS
Attempts to find links in detected XML/RSS files.

Search for links in DOC
Attempts to find links in detected Word documents.

Search for links in RTF
Attempts to find links in detected rich-text files.

Search for links in PDF
Attempts to find links in detected Adobe Acrobat files.

Search for links in XLS
Attempts to find links in detected Excel spreadsheets.

Verify Links

In this section of the profile settings, it is possible to set additional links verification parameters.

Verify external links:
This setting determines the program's behavior in relation to external links verification (the links placed on other servers or above the starting page in the directory tree). If this item is marked, the program will verify the external links.

Mark links found in the <FORM> tag as "Unsupported":
This setting determines the program's behavior in relation to links for executable files placed in the <FORM> tag parameters. It us usually a link to the script that processes a type of form. If this item is marked, the program will define these links as Unsupported.

Waiving verification for links matching one of the following masks:
This setting enables specification of the file masks that will go unverified by the program. However, these links will be added to the list as Not verified.

Verify links matching one of the following masks:
The program will verify links relating to these masks.

Notes:
Special characters can be used to specify masks, such as:
'*' - stands for any text
'?' - stands for any character

Examples:

The mask "http://www.example.com/part/*.zip" excludes verification of the link "http://www.example.com/part/one.zip", but does not exclude verification of the link "http://www.example.com/part/one.pdf".

The mask "*/cgi-bin/enter.cgi" excludes the link "http://example.com/cgi-bin/enter.cgi", but does not exclude the link "http://example.com/enter.cgi".

Exclude Links

Use the Exclude Links section to tell Web Link Validator which links or sections of your website are to be ignored during verification.

Exclude all external links
Exclude all links that are outside the server specified in the starting page or placed in the higher level directories of the same server.

Example:

With the starting URL http://www.example.com/products/, links to all folders other than 'products' and other domains will be excluded from the verification.

Exclude links matching one of the following masks - exclude these links from the verification.

Example:

*/images/* - will exclude all links that contain the word 'images'.

Do not exclude links matching one of the following masks - this field allows the inclusion of URLs that would normally be excluded from verification as specified in the 'Exclude links matching...' field.

Example:

According to the rule above, the "/images/" folder and all of its subfolders are to be excluded from verification. However, if verification is required for the "/images/new/" subfolder, the following mask can be entered in this field: */images/new/*.

Directory Index

Use settings in this section to avoid duplicate checking of an index file by removing its name from the URL. For example, the URLs "http://www.myhost.com/" and "http://www.myhost.com/index.html" have different syntax but link to exactly the same file. Therefore, it is possible to remove the 'index.html' substring from the second URL without losing any information.

Note: This feature applies to internal links only.

Examples:

index.html
index.php
default.asp

Error Pages

Parameters in this section help identify broken links in cases where the server displays an error page (or redirects to such a page), but returns incorrect error code. This may happen when custom error pages are used.

Note: This feature applies to internal links only.

Mark link as broken if it redirects to a URL that matches one of the following masks (one per line) - In the field below, enter masks of files that are custom error pages. When the application hits a link that points to one of these files, it will mark the link as broken.

Example:

*example.com/error/* - Mark link as broken if it points or redirects to any file in the error folder on this domain and its sub-domains.

Mark link as broken if its HTML source code matches any of these masks - If the HTML source code of the page contains any phrases from the field below, the link will be marked as broken.

Example:

*404 Not Found* - mark link as broken if the HTML source code of the page contains this string.

Limits

This sets limits on the exploration range.

Maximum exploration depth - the number of levels below the starting page possible to be explored by the application. The zero value means the depth is unrestricted.

Maximum number of links - the total number of links the application is allowed to process.

Non-HTML files - the total number of non-HTML files (eg external JavaScript, Cascaded Style Sheets, Flash animation etc), with a size not greater than that defined in settings (ie the application download size).

Note: To edit the value, click on the edit box and then enter the necessary number. Alternatively, click on the up or down arrow by the edit box until the desired value is displayed.

Page Optimization

This group of settings allows the definition of those pages to be marked as too slow, large, new, old or deep.

Slow pages - pages that exceed the set number of kilobytes (by both HTML code and images), will be marked as slow.

Small pages - pages that are smaller than the set number of kilobytes in size will be marked as small.

New pages - pages that were last modified within the set number of days will be marked as new.

Old pages - pages that were NOT modified within the set number of days will be marked as old.

Deep pages - pages with more than the set number of clicks from the starting page will be marked as deep.

Note: To edit the value, click on the edit box and enter the necessary number. Alternatively, click on the up or down arrow by the edit box until the desired value is displayed.

Resource Utilization

Parameters in this section help to decrease the system resources required by the program (especially the RAM), and speed-up program operation. These features become extremely valuable when checking very large sites or sites containing numerous reciprocal links.

These settings specify which data should be recorded in the process of website verification.

Save the information on bookmarks - if the bookmark information is not required, disable this option to speed up operation and save RAM.

Orphan Analysis

This section contains Orphan Analysis settings to identify files on your web server that are not in use. These files needlessly occupy valuable disk space and can be deleted.

The program compares files found in directories on the local/network computer or FTP server with URLs (traced in the process). Files that do not belong to any page are flagged as orphaned.

Files become orphaned because of broken links with URLs. The orphaned files can be repaired either by fixing the links directed at them, or by creating such links in other files.

Related topic:
How to Perform Orphan Analysis

Page Rules

Page Rules are designed for evaluating pages against certain conditions and confirming the absence or presence of specific display text, tag text, links, scripting, forms, etc generated by the code. For example, it is possible to test a website's pages to discover whether each contains contact information.

Page Rule Groups allows the application of individual sets of page rules to different sections of a website. For instance, it is possible to verify if different product page titles contain the corresponding product names. For instance, the 'Product One' text must appear on those pages with URLs that begin with http://website/prod1, and the text 'Product Two' must appear on those pages with URLs that begin with http://website/prod2.

By default, page rules within a group will apply to all pages of the website. To limit a certain page rule group's applicable area, use the 'Exclude...' and 'Do not exclude...' settings.

Notes:

To quickly disable a group without deleting it, just add the '*' sign to the 'Exclude...' setting.

Carry out the following to apply a certain group's page rules to a certain page or section of the website. Firstly, exclude all the pages by adding the '*' to the 'Exclude...', and then insert the pages to be evaluated to 'Do not exclude...', eg "http://www.relsoftware.com/wlv/*".

Related topic:
How to Find Specific Text on the Website
How to Use Web Link Validator as a Reciprocal Link Checking Tool

HTML Syntax

This set of options enables analyzing of HTML tags and location of those that contain errors.

Enable the HTML Syntax check - enables verification of the validity of HTML tags. Other options on this page are only available when this option is enabled.

<IMG> tag without ALT attribute - finds all <IMG> tags without the ALT attribute.

<IMG> tag with blank ALT attribute - finds all <IMG> tags with a blank ALT attribute.

<IMG> tag without HEIGHT/WIDTH attribute - finds all <IMG> tags without the HEIGHT/WIDTH attribute.

<INPUT TYPE=IMAGE> without ALT attribute - finds all <INPUT TYPE=IMAGE> tags without the ALT attribute.

<INPUT TYPE=IMAGE> tag with blank ALT attribute - finds all <INPUT TYPE=IMAGE> tags with a blank ALT attribute.

<INPUT TYPE=IMAGE> tag without HEIGHT/WIDTH attribute - finds all <INPUT TYPE=IMAGE> tags without the HEIGHT/WIDTH attribute.

<A> tag without ALT attribute - finds all <A> tags without the ALT attribute.

<A> tag with blank ALT attribute - finds all <A> tags with a blank ALT attribute.

<A> tag without TITLE attribute - finds all <A> tags without the TITLE attribute.

<A> tag with blank TITLE attribute - finds all <A> tags with a blank TITLE attribute.

<A> tag with missing HREF/NAME attribute - finds all <A> tags with a missing HREF/NAME attribute.

<A> tag with blank HREF/NAME attribute - finds all <A> tags with a blank HREF/NAME attribute.

<A> tag with blank comment - finds all <A> tags with a blank comment; eg <a href="http://www.example.com/"></a>.

Replace Links

Use these settings to replace certain links before verification. For example, it may be required to replace links like

"http://www.example.com/redirect.aspx?redirURL=http://www.example.com/news.aspx" with

"http://www.example.com/news.aspx" to avoid redirection when validating the link.

Auto

These settings are used to simplify working with the command line when the Auto mode is on. The Auto mode is activated when "/auto" key is specified in the command prompt. The corresponding command line parameters have a higher priority than their counterparts in these subsections.

Related topic:
How to Get the Report Emailed Automatically on a Daily Basis

Report

These options are to define report filename, location and information to be included in a report.

Report filename - specify the filename and where the report is to be saved. The filename and folder path can be typed in, selected on the drop-down menu, or (most easy) browsed to by clicking on the Browse button by the report filename field.

Use personal reports list for this profile - allow selection of information to be included in report. This list can be adjusted on the report setup screen when generating a report.

Exclude
Exclude links from the report if their error descriptions match at least one mask specified in this box.
To edit masks in the exclude box, simply click on the box and edit the content as necessary.

Advanced

Use these options to fine-tune the selected profile's settings.

Load limit - defines how many links per second the application is allowed to process. Select the Use personal limit settings... option and then choose the desired limit on the drop-down menu below.

Disable cookies - select this option to emulate a browser with disabled cookies.

Session ID - if the website adds a session ID to its link addresses, it is likely to result in different URLs pointing to the same pages. To avoid this, enter the session identifier in this field, and Web Link Validator will ignore the session value when checking these pages.

Example:

For the URL http://www.example.com/page.asp?SID=12345, enter 'SID' in the Session ID field, and the software will read this link as http://www.example.com/page.asp.

Setup URLs - the URLs to be visited for obtaining the necessary credentials to access the target website. For example, this applies if you need to get a cookie required for enabling links on the 'Starting URLs' pages. You may also need this feature if the site uses a complex HTML-based authentication scheme.

Notes: