./net/crawl [small and efficient HTTP crawler]
[+] Add this package to your ports tracker

[ CVSweb ] [ Homepage ] [ RSS feed ]

Version: 0.4, Package name: crawl-0.4
Maintained by: The OpenBSD ports mailing-list
Master sites:
Description
The crawl utility starts a depth-first traversal of the web at the
specified URLs. It stores all JPEG images that match the configured
constraints. Crawl is fairly fast and allows for graceful termination.
After terminating crawl, it is possible to restart it at exactly the
same spot where it was terminated. Crawl keeps a persistent database
that allows multiple crawls without revisiting sites.

The main reason for writing crawl was the lack of simple open source
web crawlers. Crawl is only a few thousand lines of code and fairly
easy to debug and customize.

Features

+ Saves encountered JPEG images
+ Image selection based on regular expressions and size constraints
+ Resume previous crawl after graceful termination
+ Persistent database of visited URLs
+ Very small and efficient code
+ Supports robots.txt


Filesize: 108.48 KB
Version History (View Complete History)
  • (2009-11-29) Updated to version: crawl-0.4p2
  • (2006-07-21) Package added to openports.se, version crawl-0.4p1 (created)
[show/hide] View available PLISTS (Can be a lot of data)

CVS Commit History:

   2015-01-17 20:16:09 by Christian Weisgerber | Files touched by this commit (3349)
Log message:
Drop remaining MD5/RMD160/SHA1 checksums.
   2011-04-16 15:13:44 by Stuart Henderson | Files touched by this commit (10)
Log message:
- remove unnecessary NULL casts
- sync WANTLIB
   2011-04-16 15:13:44 by Stuart Henderson | Files touched by this commit (10)
Log message:
- remove unnecessary NULL casts
- sync WANTLIB
   2010-11-26 07:23:31 by Marc Espie | Files touched by this commit (5)
Log message:
db/v3 meets PKGSPEC
   2010-11-22 03:27:12 by Marc Espie | Files touched by this commit (3)
Log message:
last remaining old-style lib version numbers
   2010-11-19 15:31:39 by Marc Espie | Files touched by this commit (372)
Log message:
new depends
   2010-11-11 04:54:09 by Marc Espie | Files touched by this commit (20)
Log message:
WANTLIB conversion
   2010-10-18 12:37:00 by Marc Espie | Files touched by this commit (357)
Log message:
USE_GROFF=Yes
   2010-07-12 16:07:42 by Stuart Henderson | Files touched by this commit (244)
Log message:
use REVISION, checked with before/after make show=PKGNAMES (plus some
extra-careful checking where there are complicated PSEUDO_FLAVORS).
   2010-05-23 10:25:21 by Marc Espie | Files touched by this commit (5)
Log message:
__FUNCTION__ -> __func__
   2010-05-23 10:25:21 by Marc Espie | Files touched by this commit (5)
Log message:
__FUNCTION__ -> __func__
   2010-05-23 10:25:21 by Marc Espie | Files touched by this commit (5)
Log message:
__FUNCTION__ -> __func__
   2010-05-23 10:25:21 by Marc Espie | Files touched by this commit (5)
Log message:
__FUNCTION__ -> __func__
   2010-05-23 10:25:21 by Marc Espie | Files touched by this commit (5)
Log message:
__FUNCTION__ -> __func__
   2009-03-16 05:05:54 by Stuart Henderson | Files touched by this commit (8)
Log message:
fix pkgspec
   2007-09-15 16:37:00 by Michael Erdely | Files touched by this commit (333)
Log message:
Remove surrounding quotes in COMMENT*/PERMIT_*/BROKEN/ERRORS
Add $OpenBSD$ to p5-SNMP-Info/Makefile (ok kili@, simon@)
   2007-04-05 10:20:19 by Marc Espie | Files touched by this commit (912)
Log message:
base64 checksums.


   2005-11-01 09:24:15 by Marc Espie | Files touched by this commit (1)
Log message:
missing libevent


   2005-01-12 14:31:06 by Nikolay Sturm | Files touched by this commit (10)
Log message:
better and consistent LIB_DEPENDS on db; this fixes a few possible
build time failures, where the wrong version of db could be found


   2005-01-05 10:15:08 by Christian Weisgerber | Files touched by this commit (250)
Log message:
SIZE


   2004-12-15 17:31:27 by Aleksander Piotrowski | Files touched by this commit (179)
Log message:
Add WANTLIB markers


   2004-09-15 12:17:48 by Marc Espie | Files touched by this commit (262)
Log message:
new plists, kill a few INSTALL scripts.


   2004-04-16 09:06:48 by Christian Weisgerber | Files touched by this commit (1)
Log message:
fix build; from Robert Nagy <thuglife@bsd.ru>


   2004-04-07 16:51:03 by Brad Smith | Files touched by this commit (4)
Log message:
upgrade to crawl 0.4
--
From: Robert Nagy <thuglife at bsd dot hu>


   2004-04-07 16:51:03 by Brad Smith | Files touched by this commit (4)
Log message:
upgrade to crawl 0.4
--
From: Robert Nagy <thuglife at bsd dot hu>


   2004-04-07 16:51:03 by Brad Smith | Files touched by this commit (4)
Log message:
upgrade to crawl 0.4
--
From: Robert Nagy <thuglife at bsd dot hu>


   2004-04-07 16:51:03 by Brad Smith | Files touched by this commit (4)
Log message:
upgrade to crawl 0.4
--
From: Robert Nagy <thuglife at bsd dot hu>


   2004-01-10 01:33:12 by Nikolay Sturm | Files touched by this commit (4)
Log message:
fix db dependencies to ensure db/v3 is installed
ensure db/v3 is used
also fixes build on NO_SHARED_ARCHS
with conceptual help from brad@


   2004-01-10 01:33:12 by Nikolay Sturm | Files touched by this commit (4)
Log message:
fix db dependencies to ensure db/v3 is installed
ensure db/v3 is used
also fixes build on NO_SHARED_ARCHS
with conceptual help from brad@


   2003-12-08 10:42:34 by Nikolay Sturm | Files touched by this commit (5)
Log message:
use new databases/db layout
db update and these modifications by
Aleksander Piotrowski <aleksander dot piotrowski at nic dot com dot pl>


   2003-12-08 10:42:34 by Nikolay Sturm | Files touched by this commit (5)
Log message:
use new databases/db layout
db update and these modifications by
Aleksander Piotrowski <aleksander dot piotrowski at nic dot com dot pl>


   2003-09-25 00:25:14 by Jolan Luff | Files touched by this commit (22)
Log message:
drop maintainership on some stuff i don't use anymore, lop off
WWW: ${HOMEPAGE} while touching 'em


   2003-09-25 00:25:14 by Jolan Luff | Files touched by this commit (22)
Log message:
drop maintainership on some stuff i don't use anymore, lop off
WWW: ${HOMEPAGE} while touching 'em


   2003-05-12 02:23:48 by Jolan Luff | Files touched by this commit (34)
Log message:
update e-mail address, ok brad@


   2003-03-27 10:41:49 by Peter Valchev | Files touched by this commit (1)
Log message:
new maintainer Jolan Luff <jolan@cryptonomicon.org>


   2003-03-01 22:37:25 by David Krause | Files touched by this commit (18)
Log message:
fix more spelling errors/typos
ok pvalchev@


   2002-12-28 16:29:13 by Peter Valchev | Files touched by this commit (12)
Log message:
cast NULL sentinel to void * so it is 64bit on alpha & sparc64


   2002-12-09 08:37:24 by Brad Smith | Files touched by this commit (20)
Log message:
change e-mail address.
--
From: MAINTAINER


   2002-10-27 18:38:47 by Christian Weisgerber | Files touched by this commit (72)
Log message:
No regression tests available.