Version: 0.4, Package name: crawl-0.4 |
Maintained by: The OpenBSD ports mailing-list |
Master sites: |
Description The crawl utility starts a depth-first traversal of the web at the specified URLs. It stores all JPEG images that match the configured constraints. Crawl is fairly fast and allows for graceful termination. After terminating crawl, it is possible to restart it at exactly the same spot where it was terminated. Crawl keeps a persistent database that allows multiple crawls without revisiting sites. The main reason for writing crawl was the lack of simple open source web crawlers. Crawl is only a few thousand lines of code and fairly easy to debug and customize. Features + Saves encountered JPEG images + Image selection based on regular expressions and size constraints + Resume previous crawl after graceful termination + Persistent database of visited URLs + Very small and efficient code + Supports robots.txt |
Filesize: 108.48 KB |
Version History (View Complete History) |
|
2019-07-12 14:49:09 by Stuart Henderson | Files touched by this commit (854) |
Log message: replace simple PERMIT_PACKAGE_CDROM=Yes with PERMIT_PACKAGE=Yes |
2015-01-17 20:16:09 by Christian Weisgerber | Files touched by this commit (3349) |
Log message: Drop remaining MD5/RMD160/SHA1 checksums. |
2011-04-16 15:13:44 by Stuart Henderson | Files touched by this commit (10) |
Log message: - remove unnecessary NULL casts - sync WANTLIB |
2010-11-26 07:23:31 by Marc Espie | Files touched by this commit (5) |
Log message: db/v3 meets PKGSPEC |
2010-11-22 03:27:12 by Marc Espie | Files touched by this commit (3) |
Log message: last remaining old-style lib version numbers |
2010-11-19 15:31:39 by Marc Espie | Files touched by this commit (372) |
Log message: new depends |
2010-11-11 04:54:09 by Marc Espie | Files touched by this commit (20) |
Log message: WANTLIB conversion |
2010-10-18 12:37:00 by Marc Espie | Files touched by this commit (357) |
Log message: USE_GROFF=Yes |
2010-07-12 16:07:42 by Stuart Henderson | Files touched by this commit (244) |
Log message: use REVISION, checked with before/after make show=PKGNAMES (plus some extra-careful checking where there are complicated PSEUDO_FLAVORS). |
2010-05-23 10:25:21 by Marc Espie | Files touched by this commit (5) |
Log message: __FUNCTION__ -> __func__ |
2009-03-16 05:05:54 by Stuart Henderson | Files touched by this commit (8) |
Log message: fix pkgspec |
2007-09-15 16:37:00 by Michael Erdely | Files touched by this commit (333) |
Log message: Remove surrounding quotes in COMMENT*/PERMIT_*/BROKEN/ERRORS Add $OpenBSD$ to p5-SNMP-Info/Makefile (ok kili@, simon@) |
2007-04-05 10:20:19 by Marc Espie | Files touched by this commit (912) |
Log message: base64 checksums. |
2005-11-01 09:24:15 by Marc Espie | Files touched by this commit (1) |
Log message: missing libevent |
2005-01-12 14:31:06 by Nikolay Sturm | Files touched by this commit (10) |
Log message: better and consistent LIB_DEPENDS on db; this fixes a few possible build time failures, where the wrong version of db could be found |
2005-01-05 10:15:08 by Christian Weisgerber | Files touched by this commit (250) |
Log message: SIZE |
2004-12-15 17:31:27 by Aleksander Piotrowski | Files touched by this commit (179) |
Log message: Add WANTLIB markers |
2004-09-15 12:17:48 by Marc Espie | Files touched by this commit (262) |
Log message: new plists, kill a few INSTALL scripts. |
2004-04-16 09:06:48 by Christian Weisgerber | Files touched by this commit (1) |
Log message: fix build; from Robert Nagy <thuglife@bsd.ru> |
2004-04-07 16:51:03 by Brad Smith | Files touched by this commit (4) |
Log message: upgrade to crawl 0.4 -- From: Robert Nagy <thuglife at bsd dot hu> |
2004-01-10 01:33:12 by Nikolay Sturm | Files touched by this commit (4) |
Log message: fix db dependencies to ensure db/v3 is installed ensure db/v3 is used also fixes build on NO_SHARED_ARCHS with conceptual help from brad@ |
2003-12-08 10:42:34 by Nikolay Sturm | Files touched by this commit (5) |
Log message: use new databases/db layout db update and these modifications by Aleksander Piotrowski <aleksander dot piotrowski at nic dot com dot pl> |
2003-09-25 00:25:14 by Jolan Luff | Files touched by this commit (22) |
Log message: drop maintainership on some stuff i don't use anymore, lop off WWW: ${HOMEPAGE} while touching 'em |
2003-05-12 02:23:48 by Jolan Luff | Files touched by this commit (34) |
Log message: update e-mail address, ok brad@ |
2003-03-27 10:41:49 by Peter Valchev | Files touched by this commit (1) |
Log message: new maintainer Jolan Luff <jolan@cryptonomicon.org> |
2003-03-01 22:37:25 by David Krause | Files touched by this commit (18) |
Log message: fix more spelling errors/typos ok pvalchev@ |
2002-12-28 16:29:13 by Peter Valchev | Files touched by this commit (12) |
Log message: cast NULL sentinel to void * so it is 64bit on alpha & sparc64 |
2002-12-09 08:37:24 by Brad Smith | Files touched by this commit (20) |
Log message: change e-mail address. -- From: MAINTAINER |
2002-10-27 18:38:47 by Christian Weisgerber | Files touched by this commit (72) |
Log message: No regression tests available. |