Stars: 188
Forks: 22
Pull Requests: 55
Issues: 71
Watchers: 9
Last Updated: 2022-09-08 14:50:47
PHP Link Checker
License: MIT License
Languages: PHP, HTML, Dockerfile
Fink (pronounced "Phpink") is a command line tool, written in PHP, for checking HTTP links.
Install as a stand-alone tool or as a project dependency:
$ composer require dantleech/fink --devDownload the PHAR from the Releases page.
You can build your own PHAR by cloning this repository and running:
$ ./vendor/bin/box compileRun the command with a single URL to start crawling:
$ ./vendor/bin/fink https://www.example.com
Use --output=somefile to log verbose information for each URL in JSON format, including:
url: The tested URL.status: The HTTP status code.referrer: The page which linked to the URL.referrer_title: The value (e.g. link title) of the referring element.referrer_xpath: The path to the node in the referring document.distance: The number of links away from the start document.request_time: Number of microseconds taken to make the request.timestamp: The time that the request was made.exception: Any runtime exception encountered (e.g. malformed URL, etc).url (multiple) Specify one or more base URLs to crawl (mandatory).--client-max-body-size: Max body size for HTTP client (in bytes).--client-max-header-size: Max header size for HTTP client (in bytes).--client-redirects=5: Set the maximum number of times the client should redirect (0 to never redirect).--client-security-level=1: Set the default SSL security
level--client-timeout=15000: Set the maximum amount of time (in milliseconds)
the client should wait for a response, defaults to 15,000 (15 seconds).--concurrency: Number of simultaneous HTTP requests to use.--display-bufsize=10: Set the number of URLs to consider when showing the
display.--display=+memory: Set, add or remove elements of the runtime display
(prefix with - or + to modify the default set).--exclude-url=logout: (multiple) Exclude URLs matching the given PCRE pattern.--header="Foo: Bar": (multiple) Specify custom header(s).--help: Display available options.--include-link=foobar.html: Include given link as if it were linked from the
base URL.--insecure: Do not verify SSL certificates.--load-cookies: Load from a cookies.txt.--max-distance: Maximum allowed distance from base URL (if not specified
then there is no limitation).--max-external-distance: Limit the external (disjoint) distance from the
base URL.--no-dedupe: Do not filter duplicate URLs (can result in a
non-terminating process).--output=out.json: Output JSON report for each URL to given file
(truncates existing content).--publisher=csv: Set the publisher (defaults to json) can be either
json or csv.--rate: Set a maximum number of requests to make in a second.--stdout: Stream to STDOUT directly, disables display and any specified outfile.$ fink http://www.example.com --max-external-distance=0
$ fink http://www.example.com --max-external-distance=1
jq to analyse resultsjq is a tool which can be used to query and manipulate JSON data.
$ fink http://www.example.com -x0 -oreport.json
$ cat report.json| jq -c '. | select(.status==404) | {url: .url, referrer: .referrer}' | jq
# create a cookies file for later re-use (simulate a login in this case via HTTP-POST)
$ curl -L --cookie-jar mycookies.txt -d username=myLogin -d password=MyP4ssw0rd https://www.example.org/my/login/url
# re-use the cookies file with your fink crawl command
$ fink https://www.example.org/myaccount --load-cookies=mycookies.txt
note: its not possible to create the cookie jar on computer A, store it and read it in again on e.g. a linux server. you need to create the cookie file from the very same ip, because otherwise server side session handling might not continue the http-session because of a IP mismatch
0: All URLs were successful.1: Unexpected runtime error.2: At least one URL failed to resolve successfully.