Link / URL Status Checker

Daz

New Member
What does this do?
For anyone who has seen or got my Image Status Checker hack, this is exactly the same, but works on links (, [url]) instead.

Basically it scans all your posts, extracts all the url tags, and scans each of the images to see if they're still valid.

The rest of this is basically the description of that hack reworded


Why?
I had a look at all the links on my site and was alarmed at how many were now gone. Since the only way you can check the links on your board is to manually read every post and click them, I decided to come up with a better way... and this is it.


How does it work?
The first part: In the AdminCP, under Maintenance and Update Counters... right at the bottom is this hack. It works by looking up every url tag, then requesting the image, and reading the http status code. So code 200 means 'image ok', 404/410 means 'image gone' etc. That then gets stored in a database table. A server has 15 seconds to reply to the request or the status is labelled as "Unknown"
The second part: The browsing element, linkstatuscheck.php (original filename huh!). This allows you to browse all the images found in the last scan using some powerful filtering (statuses to display, search, order by).


Hack features

* General
* Fully phrased.
* Templates are grouped. Who's online handled.
* Part 1 - Admin
* Reads the post table, scans all the [url=] and [url] tags on demand and records the actual http status code returned.
* If it gets stuck during the scan, you can restart the section it's currently doing.
* If an link appears in more than one post, it's only checked once.
* Start from, per page and timeout options for scanning.
* Part 2 - Browser
* Status codes are put into one of three descriptions for simplicity: Working, Dead, Unknown. Unknown is if the server didn't respond or similar - on the basis that a temporary timeout doesn't necessarily mean the image has gone.
* In the browser, link urls are force wrapped. Unless people post using all caps, you have a low screen resolution, or the font size is big, the table should never stretch.
* Filtering allows you to show just the working/dead/unknown images, and there's a search facility for a variety of fields.
* Convenient link to edit the post (if a dead link is found). This works by can_moderate - edit links only appear for people who own the post, or can moderate the forum it's in.
* Works by canview - if someone can't view a particular forum (e.g. staff forum) normally, they can't view the images within it.
* Uses css for common stuff to reduce the size of the outputted pages.



Bad Things
It's far from a perfect hack, there are many things to do. Please be aware that I won't be doing them, but if anyone else wants a crack, feel free!

* Only supports http://, not https://
* Can only handle replies like: HTTP 1.x 200 as the first line.
* Only supports [url] and [url=] tags. If you have HTML turned on in any forums it won't see <a href=> links.
* Biggie: There's no way to update a single post or link without a full re-scan. That means if someone edits their post to update or remove a dead link, it will not change on the browser until a full re-scan is done. I did play with various update methods but most are flawed in one way or another.
* No cron job.
* No session variables. (People without cookies will be logged out a lot).



Footnotes
The code to the Image Status Check hack is very similar, so I pretty much copy pasted and adapted it. It is a little bigger due to handling the [url=] "option", but you may find references to images. Let me know if you do.


It will work on 3.6 and 3.5, though you'll need to remove the "executionorder=" from the .xml file to get it working on 3.5


Installation
Upload linkstatuscheck.php to your vB directory. Install the product, set overwrite to yes.


Customizing

* By default it's set to only allow moderators, super-moderators and administrators to view the browser. This can be changed with the setting in AdminCP > vB Options.
* The phrases all start with usc_ if you want to change them.
* You can add a link to linkstatuschecker.php on the navbar (or anywhere) if you want your members to be able to view it.
 
What does this do?
For anyone who has seen or got my Image Status Checker hack, this is exactly the same, but works on links (, [url]) instead.

Basically it scans all your posts, extracts all the url tags, and scans each of the images to see if they're still valid.

The rest of this is basically the description of that hack reworded


Why?
I had a look at all the links on my site and was alarmed at how many were now gone. Since the only way you can check the links on your board is to manually read every post and click them, I decided to come up with a better way... and this is it.


How does it work?
The first part: In the AdminCP, under Maintenance and Update Counters... right at the bottom is this hack. It works by looking up every url tag, then requesting the image, and reading the http status code. So code 200 means 'image ok', 404/410 means 'image gone' etc. That then gets stored in a database table. A server has 15 seconds to reply to the request or the status is labelled as "Unknown"
The second part: The browsing element, linkstatuscheck.php (original filename huh!). This allows you to browse all the images found in the last scan using some powerful filtering (statuses to display, search, order by).


Hack features

* General
* Fully phrased.
* Templates are grouped. Who's online handled.
* Part 1 - Admin
* Reads the post table, scans all the [url=] and [url] tags on demand and records the actual http status code returned.
* If it gets stuck during the scan, you can restart the section it's currently doing.
* If an link appears in more than one post, it's only checked once.
* Start from, per page and timeout options for scanning.
* Part 2 - Browser
* Status codes are put into one of three descriptions for simplicity: Working, Dead, Unknown. Unknown is if the server didn't respond or similar - on the basis that a temporary timeout doesn't necessarily mean the image has gone.
* In the browser, link urls are force wrapped. Unless people post using all caps, you have a low screen resolution, or the font size is big, the table should never stretch.
* Filtering allows you to show just the working/dead/unknown images, and there's a search facility for a variety of fields.
* Convenient link to edit the post (if a dead link is found). This works by can_moderate - edit links only appear for people who own the post, or can moderate the forum it's in.
* Works by canview - if someone can't view a particular forum (e.g. staff forum) normally, they can't view the images within it.
* Uses css for common stuff to reduce the size of the outputted pages.



Bad Things
It's far from a perfect hack, there are many things to do. Please be aware that I won't be doing them, but if anyone else wants a crack, feel free!

* Only supports http://, not https://
* Can only handle replies like: HTTP 1.x 200 as the first line.
* Only supports [url] and [url=] tags. If you have HTML turned on in any forums it won't see <a href=> links.
* Biggie: There's no way to update a single post or link without a full re-scan. That means if someone edits their post to update or remove a dead link, it will not change on the browser until a full re-scan is done. I did play with various update methods but most are flawed in one way or another.
* No cron job.
* No session variables. (People without cookies will be logged out a lot).



Footnotes
The code to the Image Status Check hack is very similar, so I pretty much copy pasted and adapted it. It is a little bigger due to handling the [url=] "option", but you may find references to images. Let me know if you do.


It will work on 3.6 and 3.5, though you'll need to remove the "executionorder=" from the .xml file to get it working on 3.5


Installation
Upload linkstatuscheck.php to your vB directory. Install the product, set overwrite to yes.


Customizing

* By default it's set to only allow moderators, super-moderators and administrators to view the browser. This can be changed with the setting in AdminCP > vB Options.
* The phrases all start with usc_ if you want to change them.
* You can add a link to linkstatuschecker.php on the navbar (or anywhere) if you want your members to be able to view it.
 
Dude, does this really work? Because i get like this:
Code:
Database error in vBulletin 3.6.8:

Invalid SQL:
INSERT INTO imagestatus VALUES (NULL, 700696, 4238, 'h77p://www.picsaway.com/thumbs/013-ff067b51cc.jpg', '');

MySQL Error  : User '*******' has exceeded the 'max_questions' resource (current value: 100000)
Error Number : 1226
Date         : Wednesday, January 9th 2008 @ 07:49:06 PM
Script       : http://***admincp/misc.php?do=check_image_status
Referrer     : http://***admincp/misc.php?do=chooser
IP Address   : ********
Username     : *********
Classname    : vB_Database
My board is full of images, it would be really handful.
 
Back
Top