#!/usr/bin/perl # $Id: fork-and-rename 526 2016-04-16 14:49:03Z whynot $ use strict; use warnings; package main; use version 0.77; our $VERSION = qv v0.1.2; =head1 NAME fork-and-rename - rename bunch of files and put your system on knees =head1 README That renames files found in directory, if applied rule matches. Target names are pseudo-randomized. And BTW attempts to DOS your system. =head1 USAGE fork-and-rename --destination=target/ --filter=. source0/ source1/ fork-and-rename --move --filter=sh --filter=txt place0/ place1/ =head1 DESCRIPTION Neither current nor previous mobile phone that I use is capable to name saved files any useful way. What's even worse either have no grasp of what overwrithing is. So I have a lots of files incredibly named that I have to maintain somehow. B (hereafter B) takes file, counts it B, looks its B and renames it this way (after renaming, mtime of source is applied on target): ppS-IY9QqxUM.jpg -> 911692DA-20080520-112926.jpg where: =over =item 911692DA is CRC-32, in hexadecimal, all caps =item 20080520 is date of mtime, in 4 decimals of year, then 2 decimals of month and then 2 decimals of day of month =item 112926 is time part of mtime, in hours, minutes, and seconds in 2 decimals each. =back That's the purpose part. That's not that interestening, isn't it? So what B does in its name? The complete processing of each file is done in separate process (one file -- one process). The main process finds suitable file, forks, collects all already finished zombies, and when there's none zombie left, goes for next file. So does it achieve its target of putting the system on knees? No and yes. See L and L. =cut use File::Find; use Digest::CRC; use Fcntl qw| :DEFAULT |; use POSIX qw| strftime :limits_h :sys_wait_h |; use Getopt::Long; =head1 DEPENDENCIES =over =item B Provides directory traversing facility. Subject to be distributed with Perl. =item B It's used to provide distribution among filenames (B doesn't randomize, remember). In use is 32bit variant. I think, 16bit variant would have chance for clashes (although, that's untested). While 32bit variant is short enough. The next step would be 128byte hash, but do you really want that long filenames? =item B Bed process (when in copy-to mode) copies files by itself block-by-block. So it uses B and B, and thus requires constants. Subject to be distributed with Perl. =item B The block size for the system for pipe reads (B doens't B, but I've found that constant useful). B is used, remember? And contstant for unhanging B. Subject to be distributed with Perl. =item B and B Cosmetic. Subject to be distributed with Perl. =back =cut #=head1 OSNAMES =head1 INCOMPATIBILITIES POSIX-clean slash (B) is used in constructing and parsing full pathnames. You know. =cut my $template = q|%08X-%04d%02d%02d-%02d%02d%02d|; my $mask = qr|^\w{8}-\d{8}-\d{6}|; my( $dst, @filter, $move, $resuffix, $lowmark, $highmark ); sub short_help { print < \&short_help, q|version!| => \&short_version, q|destination=s| => \$dst, q|filter=s@| => \@filter, q|move!| => \$move, q|suffix=s| => \$resuffix, q|lowmark=i| => \$lowmark, q|highmark=i| => \$highmark; =head1 ARGUMENTS =over =item I<--destination> CE> Sets B in copy-to mode and assigns the target directory. In that mode files are copied in the I<--destination> directory. The source directory tree isn't recreated. =item I<--move> Sets B in rename-on-place mode. In that mode files are renamed in a directory they were found. =item I<--filter> B has 2 modes =over =item multiple filters Each filter names one suffix (with neither leading nor inter dots). If file has simple suffix, and that suffix is equal (case-blindly) with one of I<--filter>s, then file is processed. (B means that anything on the left of rightmost dot isn't suffix. If there's no dot at all, then there's no suffix.) The file is ignored otherwise. =item one filter for all However, in case you want to process all the files specifying I<--filter>s for every suffix would be error-prone, ridiculous etc. And you can't specify I<--filter> for empty suffix anyway. You can set I<--filter> to dot --filter=. And then any file will match -- with any suffix or without suffix at all. That magic filter must be alone. =back Yeah, such a brain-dead construct. And one more note one filtering. If file looks like already renamed (8 hexadecimals, 8 decimals, and 6 decimals separated by hyphen (B<->)) then file is skipped unconditionally. If filename starts with a dot (B<.>) then the file is skipped too. =back =head1 OPTIONS =over =item I<--suffix> C Renamed files keep a suffix of source. This option is supposed to maintain that any-case zoo. This sets a suffix for a target file -- If a source file happens to have a suffix it will be replaced; In case there's no source's suffix, it will be added (think: L). =item I<--lowmark> C =item I<--highmark> C Must be used simultaneously. Set two marks for childs to follow. Read L for details how it works. Disabled by default. =back =cut $move && $dst and die qq|destination ($dst) is set simumltaneously with move\n|; $move || $dst or die qq|neither mode has been choosen\n|; if( $dst ) { -d $dst or die qq|destination ($dst) isn't a directory|; $dst =~ s{/+$}{} } -d $_ or die qq|($_) isn't a directory| foreach @ARGV; s{/+$}{} foreach @ARGV; @filter or die qq|missing filter|; @filter = () if 1 == @filter && '.' eq $filter[0]; $lowmark && !$highmark and die qq|{--lowmark} is set and {--highmark} isn't\n|; !$lowmark && $highmark and die qq|{--highmark} is set and {--lowmark} isn't\n|; $lowmark && $highmark && $lowmark > $highmark and die qq|{--lowmark} is ge than {--highmark}\n|; require Time::HiRes if $lowmark; my $digest = Digest::CRC->new( type => q|crc32| ); my( $pidgap, $gapshift ) = ( 1, 1 ); my( $pidmark, $pidflag ); sub process_this ( $ ) { my $file = shift @_; defined( my $pid = fork ) or die qq|[fork] ($file): $!|; if( $pid ) { printf qq|[%i]: came\n|, $pid; $pidmark++; my @gone; push @gone, $_ until 0 >= ($_ = waitpid -1, WNOHANG); my $fix = $pidmark - @gone; if( $highmark && $highmark < $fix ) { print qq|==== HIGHMARK HAS BEEN LEFT BEHIND ====\n|; $pidgap += $gapshift; while( $lowmark < $pidmark - @gone ) { Time::HiRes::nanosleep( $pidgap ); push @gone, $_ until 0 >= ($_ = waitpid -1, WNOHANG) } } elsif( $lowmark && $lowmark < $fix && !$pidflag ) { print qq|==== LOWMARK HAS BEEN LEFT BEHIND ====\n|; $pidflag++; Time::HiRes::nanosleep( $pidgap ) } elsif( $lowmark && $lowmark > $fix && $pidflag ) { print qq|==== IT IS SAFE LEVEL NOW ====\n|; undef $pidflag } printf qq|[%s]: gone\n|, join q|] [|, @gone if @gone; $pidmark -= @gone; return } my $suffix = $resuffix || ( split m{\.}, ( split m{/}, $file )[-1] )[-1] || ''; $suffix = '' if $suffix eq ( split m{/}, $file )[-1]; my $mtime = ( stat $file )[9]; open my $fh, q|<|, $file or die qq|[open] ($file): $!|; $digest->addfile( $fh ); do { $file =~ m{^(.+)/}; $dst = $1 } if $move; my $target = sprintf qq|%s/$template%s|, $dst, $digest->digest, split( m{ }, strftime q|%Y %m %d %H %M %S|, localtime $mtime ), $suffix ? qq|.$suffix| : ''; printf qq|[%i]: %s %s\n|, $$, ( split m{/}, $target )[-1], ( split m{/}, $file )[-1]; unless( $move ) { sysopen my $fhi, $file, O_RDONLY or die qq|[sysopen] ($file) for reading: $!|; sysopen my $fho, $target, O_WRONLY | O_EXCL | O_CREAT or die qq|[sysopen] ($target) for writing: $!|; my( $chunk, $buf ); # XXX:20090525211216:whynot: What if $chunk != syswrite()? defined syswrite $fho, $buf, $chunk or die qq|[syswrite] ($target): $!| while $chunk = sysread $fhi, $buf, PIPE_BUF; utime +( stat $fhi )[8,9], $fho or die qq|[utime] ($file -> $target): $!| } else { -f $target and die qq|target ($target) for ($file) exists|; rename $file, $target } exit } =head1 DIAGNOSTICS B reports its progress, and that's unavoidable. Bs, Bs, and source-target pairs are reported. Zombies ripped in main cycle after B are reported on one line. In final cleanup -- immediately after ripping. One more note on "forked" reports. No directories are reported; The filenames are dumped in misleading reverse order -- I believe that increases readability (target filename is almost constant lenght (subject to suffix variation), while source filename length can change a lot). Also, if enabled, The Marks At Work (see L for details) will spit some diagnostics on bypassing the lowmark and the highmark. These are higlhighted by four equals on both sides of a note. =cut =head1 NOTES =cut =head2 Notes: Forks! On a snapshot of my mobile's memory card B stabilizes on 17..20 processes first, then spikes to 22..25 processes. At that point audio starts to glitch. Most number of zombies reaped at once was 3, sometime 4. I fail to see any difference either between modes (see below) or filesystems (ext3 and ext2). I still have no resources to check bigger files (such as found in FusrZ<>EshareZ<>Edoc>). That seems that B (or whatever it's emulated by) is a way costly. Have you read those 2 paragraphes above? Looks bad, doesn't it? Forget it. All that was experienced when B was in use (I don't rant about B per se). Looking for timezone it Bs B. After finding that, I've proudly dropped B and rewritten those 2 lines with B (in mind and in use). And... It's hard to say how many processes run at once -- roughly 2..5, up to 9 zombies are collected at once, and (what I like most) PIDs of Bs are highly sequential. The copy-to mode somewhat differs -- processes don't come in batches (as they do for rename-in-place mode). However everything is a way fast. One interesting observation. Whatever wrapping is choosen (S> or S>), Bed process reports target-source pair before parent reports Bed PID. I think, that Bing B and Bing B are a way different things. =cut =head2 Notes: Marks! Context: one core, low on memory (both physical and virtual), loads (couple of hundreds of thousands) of small files (couple KB each). B, in I<--move> mode, stabilizes at ~30 childs. Then, by B something big with big children kicks in. Then B goes trashing memory (physical -- each child first consumes a file it's about to handle, thus child grows; a little, but there more and more children), then swap, what results in IO trashing. Huge neighbor doesn't give up and fights for memory and IO too. Eventually, The OOM-Killer kicks in, what doesn't help -- footprint of primary B is small (almost smaller then anything else), thus it's never considred to be a treat. Whatever resources are freed by The OOM-Killer are immediately consumed by new forks of primary B. Eventually you can't wake up monitor (because whatever it does wants to fork). And you can't ssh-in either (because forks, man). Then you discover that B is disabled for years now. Thanks a lot, Debian, that helps (at least B doesn't fork). Anyway, DOSing your system is secondary objective. First -- files must be renamed. And two options (I<--lowmark> and I<--highmark>) enable it. That's how it works: When number of children reaches I<--highmark> B will stall until number of children falls to I<--lowmark>. Each time it happens some aggressive adjustments are made to internal parameters that affect how the stalling happens. Because there's no any science behind this these parameters aren't present for manipulation by user. However, you still can edit B directly if you wish so. Doesn't look like you can break anything though. Enjoy your marks. =cut =head1 BUGS AND LIMITATIONS =over =item * I<(caveat)> Marks aren't enabled by default. =item * I<(caveat)>, may be I<(bug)> As already mentioned, copy-to mode doesn't recreate directory tree. =item * I<(caveat)> And then if two (or more) files are met (in possibly different directories) that have equal CRC-32 and mtime's, then the target filenames will be the same. So only first file will be copied. In two cases when I stepped in that -- offending files were plain same (icons distributed with some app). =item * I<(caveat)> The very same situation (however, that seems to be quite improbable) could happen in rename-in-place mode too. =item * (I?) The atime of source could be collected before file is opened for CRC-32 calculation. =back =cut find { wanted => sub { my $file = ( split m{/} )[-1]; !-f $_ || $file =~ m{^\.} || $file =~ m{$mask} || @filter && !grep $file =~ m{\.\Q$_\E$}i, @filter and return; process_this $_ }, no_chdir => 1 }, @ARGV; printf qq|[%i]: gone\n|, $_ until 0 >= ($_ = waitpid -1, 0); =head1 AUTHOR Eric Pozharski, Ewhynot@cpan.orgZ<>E =head1 COPYRIGHT & LICENSE Copyright 2009, 2013 by Eric Pozharski This utility is free in sense: AS-IS, NO-WARRANTY, HOPE-TO-BE-USEFUL. This utility is released under GNU GPLv3. =cut # vim: set filetype=perl