should we change dead links in source tree to archive.org?

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

should we change dead links in source tree to archive.org?

Andreas Weber-6
Dear all,

there are approx. 500 links in the source tree (gnulib excluded) and
many of them are dead now, for example

inpolygon.m: http://local.wasp.uwa.edu.au/~pbourke/geometry/insidepoly/

but archive.org has a copy. The newest I've found is
https://web.archive.org/web/20070923104638/http://local.wasp.uwa.edu.au/~pbourke/geometry/insidepoly/

So my question is: Should we replace dead links with archive.org links?

-- Andy

Reply | Threaded
Open this post in threaded view
|

Re: should we change dead links in source tree to archive.org?

siko1056
On 7/25/19 5:26 PM, Andreas Weber wrote:

> Dear all,
>
> there are approx. 500 links in the source tree (gnulib excluded) and
> many of them are dead now, for example
>
> inpolygon.m: http://local.wasp.uwa.edu.au/~pbourke/geometry/insidepoly/
>
> but archive.org has a copy. The newest I've found is
> https://web.archive.org/web/20070923104638/http://local.wasp.uwa.edu.au/~pbourke/geometry/insidepoly/
>
> So my question is: Should we replace dead links with archive.org links?
>
> -- Andy
>

Good catch.  For me this is indeed an unexpected large number of links
in the source tree.

Agreed, dead links are of little use and if the link content can be
found on archive.org, why not pointing to this resource instead of
pointing nowhere.

Do you have some fancy script or bash-one-liner to do this work?  I
think your detection method is more sophisticated than mine ^^

HTTPS and HTTP

$ grep --exclude-dir=gnulib --exclude-dir=libgnu --exclude-dir=.hg
--exclude-dir=autom4te.cache -R -E "https{0,1}:\/\/" | grep -v
https://www.gnu.org/licenses | grep -v
https://www.gnu.org/software/octave | grep -v https://www.octave.org | wc -l
1767

HTTP

$ grep --exclude-dir=gnulib --exclude-dir=libgnu --exclude-dir=.hg
--exclude-dir=autom4te.cache -R -E "http:\/\/" | grep -v
https://www.gnu.org/licenses | grep -v
https://www.gnu.org/software/octave | grep -v https://www.octave.org | wc -l
1293

We can also hunt for some old HTTP links.  This seems like a long term
task.  A new contribution guideline might be to only include "permanent"
links to resources within documentation and source code.

Best,
Kai