Diabolicly Smart Scans Scanner*

Need to illegaly download tons of pictures from alt.binaries.pictures.comics? This soft is for you. You'll be amazed at how outrageously smart DSSS is. Coupled with nget, DSSS is an intelligent script that selects and downloads the scans you want, check if pages are missing, finds out which images belong to which issue (what issue number), and moves them to the right directory tree, saving tedious hours of you checking, downloading, moving, renaming, removing, and probably goofing because you're too tired and bored.

What does DSSS do?

This is a good question. Suppose that you save comic book scans on your computer. You save each set of file for a particular issue in a particular directory. This is because file names often vary, and putting everything on the same place is totally confusing (renaming could be done also, but it is in essence the same). So you may want to put Uncle Scrooge #1 comic into (for instance) scans/us/001.

DSSS checks for PAR files first. It generates missing image files if necessary. While doing this, it also memorizes which set of images belong together. For the remaining images, it checks for similarity in file names. This works well, even when several issues of the same issue series were posted, without PAR files. In those cases, dsss will look for numbers that seldom change but do change in those file names (issue numbers). Of course, if you post images with totally obscure names, DSSS can't help you, but I'll tell you, this seldom happens, and even there, DSSS will group those file names for you.

You need to create a file rules, in the same directory as DSSS. Each line in this file is a partcular rule. For instance, my rules file reads as follow:

donald,adventure		dda
donald				dd
scrooge,adventure		usa
scrooge				us
mickey				mm
roger,rabbit			rog
wdc				wdc
comics,stories			wdc
one-shot			os
four,color			os
us				us
Each line is in the form keyw1,keyw2, ..., keyw(n) directory (note that keywords and directory are separated by TABs only). If a file name contains keyw1 AND keyw2, the scan will be moved to the specified directory. Regular expressions are supported.

DSSS looks for a rule in the file, and when it finds one, stops. So put the more uncertain guesses at the end of the file, in case none else applied.

DSSS only creates a file where it says where it wants to move files. You can edit this file to fix the file names that DSSS didn't catch, and then tell DSSS to process it.

Sample run

Now let's get to the real business. Here is how it looks like.

$ dsss.pl

23 PAR files found.

US_062_p01_fc.P01
        Verifying source files:
        All files are correct, repair is not required.
US_062_p01_fc.P02
        Verifying source files:
        All files are correct, repair is not required.
US_062_p01_fc.PAR
        Verifying source files:
        All files are correct, repair is not required.
US_062_p01_fc.P03
        Verifying source files:
        All files are correct, repair is not required.
Amazing Spider-man v2 049 HaCsA.par
        Verifying source files:
        Scanning extra files:
        Repair is required.
        22 file(s) are missing.
        1 file(s) are ok.
        Repair is not possible.
        You need 22 more recovery files to be able to repair.

--- OTHER PAR LINES SKIPPED ---	

Looking for image files.

Reading rules.
guessed <US_062_p01_fc.jpg> et al.
        is    us # 062
        pages 01-36 (counting images in PAR files)
guessed <WDC&S_102_01_FC.jpg> et al.
        is    wdc # 102
        pages 01-52 (counting images in PAR files)
guessed <Amazing Spider-man v2 049-00fc.jpg> et al.
        is    ? # 049
        pages 00-22 (counting images in PAR files)
guessed <Uncle_Scrooge_131_01_FC.jpg> et al.
        is    us # 131
        pages 01-36 (counting images in PAR files)
guessed <superman_320_01.jpg> et al.
        is    ? # 320
        pages 01-20 (counting images in PAR files)
guessed <Donald Duck 135-01-fc.jpg> et al.
        is    dd # 135
        pages 01-29, 31-36 (counting images in PAR files)
guessed <Mickey_Mouse067_17.jpg> et al.
        is    mm # 067
        pages 17-20 (counting images in PAR files)
guessed <Roger_Rabbit-Disney-12-00-FC.jpg> et al.
        is    rog # 012
        pages 00-34 (counting images in PAR files)
guessed <UNCLE_SCROOGE_cscan_078_01.jpg> et al.
        is    us # 078
        pages 01-36 (counting images in PAR files)
guessed <Uncle_Scrooge_010_01_FC.jpg> et al.
        is    us # 010
        pages 01-36 (counting images in PAR files)
guessed <VOY_11_01_fc.jpg> et al.
        is    ? # 011
        pages 01-04 (counting images in PAR files)

Images and directories written to <dir_guesses.txt>.
Run <dsss.pl mv> to move images (you may modify the file manually 
first).
To get an idea, here is how the file dir_guesses.txt now looks like:
# directories with a ? are not processed.
# dsss only creates issue-number directories,
# you should create the issue-series directories by yourself.

FilesToBeMovedTo us/us/062
US_062_p01_fc.jpg
US_062_p02_ifc_Scrooge.jpg
US_062_p03_Scrooge.jpg
US_062_p04.jpg
US_062_p05.jpg
US_062_p06.jpg
(-- other lines skipped --)

FilesToBeMovedTo us/?/049
Amazing Spider-man v2 049-00fc.jpg
Amazing Spider-man v2 049-01.jpg
Amazing Spider-man v2 049-02.jpg
Amazing Spider-man v2 049-03.jpg
Amazing Spider-man v2 049-04.jpg
Amazing Spider-man v2 049-05.jpg
Amazing Spider-man v2 049-06.jpg
Amazing Spider-man v2 049-07.jpg
(-- other lines skipped --)

FilesToBeMovedTo us/dd/135
Donald Duck 135-01-fc.jpg
Donald Duck 135-02-ad.jpg
Donald Duck 135-03.jpg
Donald Duck 135-04.jpg
Donald Duck 135-05.jpg
Actually, I downloaded a few Superman/Spider-Man by error, and made no rule for these, so everything was normal. Note that non-decoded files are present in this file, but this won't do any harm as moving non-existing file will do nothing.

This is not shown here, but it also warns if a directory already exists (an indication that you may already have those scans).

$ dsss.pl mv
Moving files to us/us/062.
Creating directory first.
Moving files to us/wdc/102.
Creating directory first.
(skipped directory) us/?/049.
Moving files to us/us/131.
(skipped directory) us/?/320.
Moving files to us/dd/135.
Creating directory first.
Moving files to us/mm/067.
Creating directory first.
Moving files to us/rog/012.
Creating directory first.
Moving files to us/us/078.
Creating directory first.
Moving files to us/us/010.
Creating directory first.
(skipped directory) us/?/011.
--- OTHER LINES SKIPPED ---

Coupling with nget

You can go even further by puting dsss and nget in a cron job. Here is a script that generates a list of keywords to search on newsgroups using nget, download relevant files, and process them with dsss. Simply put it in your crontab. To make it work, also add
 autopar = 0
 {galias
    comics=alt.binaries.pictures.comics,alt.binaries.pictures.comics.reposts
 }
 
in your .ngetrc.

Install

I have only tested DSSS on GNU/linux. To install it, you need Perl (e.g. 5.8.1 or later), par2cmdline (e.g. version 0.3), and Perl's String-Similarity (version 1), and nget. You'll also need to add the relative path to the directory where you put your image at the beginning of the file dsss.pl.

DSSS is distributed under the GNU General Public License, you can use it, modify it and redistribute it under the conditions as mentioned in the license.

Download dsss.


* I can't pronounce it, but I'm French.

Contact: <romo661@free.fr>, my main page.

Valid XHTML 1.1!