Copyright (C) 2000-2005 by Oswald Buddenhagen based on puf 0.1.x (C) 1999,2000 by Anders Gavare This program is FREE software in the sense of the GPL. See COPYING for details. Project homepage: http://puf.sourceforge.net/ What is puf? ------------ puf is a "parallel url fetcher" for UN*X systems. It is has some similarities to GNU wget. The most notable difference from wget is that puf downloads files in parallel. NOTE: If you are planing on using puf to do massive downloads to a system where multiple users are working, you might want to tell people what you are doing since puf can use up a lot of resources (mostly network bandwidth, but also memory if left running for too long). How to compile and install: --------------------------- You need to have the Perl compatible regular expressions (pcre) library including development files installed. First, run "./configure", then "make". Then run "make install" as root. On RPM based Linux systems you can use this: rpm -ta puf-*.tar.gz && rpm -i /usr/src/redhat/RPMS/i386/puf*.rpm Tested platforms (as of 0.93.2a) include Linux, MaxOS X, and even CygWin. Previously tested platforms included Solaris, OpenBSD, and Digital UNIX 4.0, but recent puf versions have not been tested on them. Ultrix is known not to work. If you (don't) manage to compile puf on a platform which is not specified herein, then I'd appreciate if you email me about it. Usage: ------ Just run puf without any parameters and you should get the pretty straight forward syntax printed to stdout. In general, the syntax looks like this: puf [options] url [...] I will not list all the options here. To get the list of options, simply run "puf -h". urls may be "real" urls, like this: http://some.host.org/path/file or partial, like: www.blah.com (http:// is automatically prepended) (At the time of writing, only the http protocol is recognized.) There are options available for recusive fetching and for fetching images and frames associated with the specified url. When running puf, you'll see a status which looks something like the following example: URLs Connections Bytes Time Kbyte/s done+ fail/ total errs cur/max done/total pass left cur/avg 1+ 0/ 1 0 0/20 7466/7466 00:00 00:00 364/364 The first numbers are the number of files downloaded, the number of files which cannot be retrieved and total number of files to download. Errs is the total number of network and file errors encountered. Next comes the number of currently active connections. puf tries to use the maximum number as much as possible. Number of bytes downloaded and total bytes go a bit up and down, and you shouldn't trust them too much. :-) This is because puf doesn't know beforehand how large the files are. Another problem is that some servers don't send the total size of documents. The size of dynamically created documents (CGI etc.) are obviously also always of unknown size. The elapsed time should be correct, but the time left is calculated using a weird speed calculation and the number of bytes left, which might be unknown. Therefore the time left cannot be trusted unless you have a very stable connection (in terms of speed) to the server(s) to which you are connected and all downloads are already running (if there are still urls in the queue, then the numbers will grow later). Special features: ----------------- Parallel fetching: This is the main point with puf. This is also the feature which might make it a bit unstable. Bringing a unix-system down by using up memory resources is usually refered to as "thrashing", but I don't know what this is called (using up the network resources). Don't set the number of open network connections too high if you don't want to risk bringing your system down. Recursion: This makes puf act pretty much like the famous "wget" utility. Combined with parallelism, this is a very powerful feature. File handle deficiency management: On systems where the kernel hasn't been compiled to allow a high number of open file handles (or when harsh per-user limits are set), this will allow more files to be written to in parallel. (This is not good performance-wise, though.)