Patches for the Detachment Scripts

Jump to: - - 12K
index.cgi - index.cgi.patch - 1.9K
delete.cgi - delete.cgi.patch - 1K - - 780b
I am not going to explain how the detach script version 2.4 works. I am going to try to explain what I patched. The original code is good and you should look at it.

+### Edited by Jack Zielke <> and
+### Bobby Burden <>
-### $Id: detach,v 2.4 2007/07/13 21:39:17 ryan Exp ryan $
+### $Id$
Property changes on:
Added: svn:keywords
   + Id

Since there was already a csv Id in the code I went ahead and put a svn Id in its place. I also added two more authors and a page where you can get the latest version.

-    my %ARGS = map { $_ => 1 } qw(aggressive web_root dir_root hash shorten);
+    my %ARGS = map { $_ => 1 } qw(aggressive web_root dir_root hash shorten msize);

I added a new argument 'size'. The variable msize is used internally. Without this change the program will exit with 'ERROR: invalid option msize'

-        my $hash = md5_base64(join("",time,$$,$<,$>,@_));
-        $hash =~ s/[^a-zA-Z0-9]//;  # Really, I want base 62
-        $hash =~ s/[oO0iIlL2Zz]//;  # And, avoid pain with visual cut/paste, 
-                                    # now base 52
-        $stamp= substr($hash,0,2) . # 144555105949057024 combos (52^10);
-            "/" . substr($hash,2,8); 
+        my $hash = '';
+        do {
+            $hash .= md5_base64(join("",time,$$,$<,$>,$self));
+            $hash =~ s/[^a-zA-Z0-9]//g;    # Really, I want base 62
+            $hash =~ tr/vVO0Il12Z5S/vV/sd; # And, avoid pain with visual cut/paste,
+                                           # now base 53
+        } while (length($hash) < 10);      # Just in case
+        $stamp= substr($hash,0,2) .        # 168287943181908783 combos 53^10 - 2(53^9)
+            "/" . substr($hash,2,8);

Let's break this up into smaller pieces.


'my $args = shift' makes @_ = nothing. So in the original join @_ did not add anything to the end of the string. Any array would work here. It is the 'random' memory address that we want appended to the string for md5 to hash. I chose $self instead of $args in case you decide to hard code the arguments - $self would still be needed and available.

$hash =~ s/[^a-zA-Z0-9]//;
$hash =~ s/[^a-zA-Z0-9]//g;

This needed to be a global search and replace.

$hash =~ s/[oO0iIlL2Zz]//;
$hash =~ tr/vVO0Il12Z5S/vV/sd;

This also needed to be a global search and replace. I changed what should have been s/[oO0iIlL2Zz]//g to use tr// because tr is slightly faster here. I agreed with O0Il2Z. I do not see much of a problem with oiLz. I added 15S. I am also removing repeated v's.

Since we are stripping out /+= from base64 (most likely) and then another 9 chars there is a chance that we will not have 10 characters to work with. The do while length < 10 ensures that we end up with 10 random characters for the hash even if it takes more than 1 pass. While this is unlikely it is how random numbers work. I want this script to always work so the loop was added.

+    $self->{cids}     = {};

I need to track 1 more global variable. This is a hash of the Content-ID's.

-    $self->append_blurbs()  if $self->{aggressive};
+    $self->append_blurbs($entity)  if $self->{aggressive};

In case there is no body to attach the blurbs to I need access to the original email $entity so that I can add another attachment.

+    my($entity) = @_;

Same as above. Keeping access to the original $entity in case I need it.

+        if (!scalar keys %{$self->{firsts}}) {
+            DEBUG("There are detachments and nothing to append the blurb to.  Creating empty text/plain.");
+            my $part = build MIME::Entity (
+                Type     => 'text/plain; charset=us-ascii',
+                Data     => '',
+                Encoding => 'quoted-printable',
+            );
+            $entity->add_part($part, 0);
+            $self->{firsts}{'text/plain'} = $part;
+        }

The first text/plain, first text/html, first text/rich and text/enriched are kept track of. If the email does not have any of those and attachments were detached and agressive is turned on then we need to make a place to put the links in. If you do not do this the attachments get stripped and you do not have any record as to where they are. This code adds a blank 'text/plain' attachment to $entity which is the email with the attachments stripped.

-    my $header = "The following attachments have been detached and are available for viewing.";
-    my $footer = "Only click these links if you trust the sender, as well as this message.";
+    my $header  = "The following attachments have been detached and are available for viewing.";
+    my $footer  = "Only click these links if you trust the sender, as well as this message.";
+    my $footer2 = "Note: Attachments will be deleted after 30 days.";

The header and footer are default. I liked the footer from here so I added it. Since text, html and rtf all use different line endings I added the second line as a new variable and passed it down the line.

-        return $self->append_blurb_plain($header, $footer, $lines);
+        return $self->append_blurb_plain($header, $footer, $footer2, $lines);
-        return $self->append_blurb_html($header, $footer, $lines);
+        return $self->append_blurb_html($header, $footer, $footer2, $lines);
-        return $self->append_blurb_rtf($header, $footer, $lines);
+        return $self->append_blurb_rtf($header, $footer, $footer2, $lines);
-    my($header, $footer, $lines) = @_;
+    my($header, $footer, $footer2, $lines) = @_;
+                 " $footer2\n",
-    my($header, $footer, $lines) = @_;
+    my($header, $footer, $footer2, $lines) = @_;
-    my($header, $footer, $lines) = @_;
+    my($header, $footer, $footer2, $lines) = @_;
+                 "$footer2\par\n",

Passing the second footer to the append_blurb_* subs

-                 "<ul>",
-                 map("<li><a href=\"$_\">$_</a>\n", (@{$self->{urls}})),
-                 "</ul>$footer\n",
-                 "<p>",);
+                 "<ul>",);
+    foreach(@{$self->{urls}}) {
+        my $pretty_url = $_;
+        $pretty_url =~ s/\%([A-Fa-f0-9]{2})/pack('C', hex($1))/seg;
+        push (@blurb, "<li><a href=\"$_\">$pretty_url</a>\n");
+        my $cid = $self->{cids}{$_};
+        if ($cid) {
+            $cid =~ /<(.+)>/;
+            $cid = $1;
+            my $url = $_;
+            DEBUG("Replacing cid:$cid with $url");
+            foreach (@$lines) {
+                $_ =~ s#cid:$cid#$url#g;
+            }
+        }
+    }
+    push(@blurb, ("</ul>$footer<br />\n",
+                 "$footer2\n",
+                 "<p>",));

The html blurb received some extra work beyond the second footer line. First I un-hex encode the url so that it looks nice for the link. Basically 2011%2D07%2D11%5F13%2D05%2D39%5F707.jpg becomes 2011-07-11_13-05-39_707.jpg. Next I swap Content-ID links for web links. If the user allows remote content to load then the images will still show up inline but they will load from the web server instead of from the email server.

-    $lines->[@$lines-1] =~ #}$##;  Remove trailing container bracker
+    $lines->[@$lines-1] =~ #}$##;  Remove trailing container bracket

Fixed a typo in the comment.

-        if ($part->head()->recommended_filename()) {
+        if ($part->head()->recommended_filename() || $part->head()->get('Content-ID',0)) {

In some email clients when you insert an image into the body it does not put the original filename into the attachment. This caused the detach script to not extract the attachment. For multipart/related mime messages the root item is not supposed to have a Content-ID while all of the related attachments must have a Content-ID. So basically if they have a Content-ID they are ripe for detaching.

+    if ($self->{msize}) {
+        my $filesize = -s $src;
+        DEBUG("File name: ".$src);
+        DEBUG("File size: ".$filesize);
+        if ($self->{msize} > $filesize) {
+            return;
+        }
+    }

If you specified a minimum attachment size before detachment occurs this is where that check happens.

-    system("/bin/cp",$src,$dest);
+    my $cid = $entity->head()->get('Content-ID',0);
+    if ($cid) {
+        $self->{cids}{$url} = $cid;
+    }
+    system("/bin/mv",$src,$dest);

I changed cp to mv for speed and I save the Content-ID -> web link mapping for use in the html blurb.

-my($opt_web, $opt_dir, $opt_verbose, $opt_help, $opt_aggressive, $opt_hash, $opt_shorten);
+my($opt_web, $opt_dir, $opt_verbose, $opt_help, $opt_aggressive, $opt_hash, $opt_shorten, $opt_size);
+                "size=s"       => \$opt_size,
+      --size        minimum file size to detach (in bytes)
+                           msize      => $opt_size,

Added the size argument.

-  -h, --hash        use hash instead of date in dir names
+      --hash        use hash instead of date in dir names

-h displays the help. It does not turn on the hash flag so I removed -h from the help so that it would not tell you that -h does something that it does not actually do in the code.


-@matches = glob ("$year/$month/*/*/index.html");
-@match_list = reverse @matches;
+@matches = glob ("../*/*/index.html");
+my @temparray = ();
+foreach $file (@matches) {
+       my $mtime = (stat($file))[9];
+       my ($day,$mon,$yr) = (localtime($mtime))[3..5];
+       $yr += 1900;
+       $mon++;
+       $name = "$yr/" . sprintf("%02d", $mon) . '/' . sprintf("%02d", $day) . '/' . substr($file,3);
+       push (@temparray, $name) if $name =~ m#^$year/$month#;
+@match_list = reverse sort @temparray;

The original script expected to get the date from the folder name. Since I am using the random hash function I needed to patch index.cgi to get the date from the files instead.

-       my ($yr,$mon,$day,$key,$file) = split('/',$detach);
+       my ($yr,$mon,$day,$d1,$d2,$f) = split('/',$detach);
+       my $file = "../$d1/$d2/$f";
+       my $key = "$d1$d2";

Replacing $key with directory1 and directory2 then building $file from them.

-       open(DETACH, "$detach");
+       open(DETACH, "$file");

Since I tossed the date into the $detach array it does not contain a working path to the file. $file does so I used that instead.

-       print "<TD class='wd'><A HREF=\"delete.cgi?key=$yr/$mon/$day/$key\">X</A></TD>\n";
+       print "<TD class='wd'><A HREF=\"delete.cgi?key=$key\">X</A></TD>\n";

Pass the correct information to the delete.cgi script.

-print "</TR></TABLE>\n";
+if (@match_list) {
+       print "</TR></TABLE>\n";
+} else {
+       print "<br />No files found<br />\n";
+       print "<a href=\"index.cgi\">View all files</a><br />\n" if $month ne "*";

If there were no files I want some output telling me so.


-if ($key =~ m#^200\d/\d\d/\d\d/\d+(:\d+:\d+)?$#) {
-        system("/bin/rm -r $key");
-        # print our a redirect back to the referrer url after deletng?
-        print $cgi->redirect($cgi->referer);
+if ($key =~ m#^[a-zA-Z0-9]{10}$#) {
+       my $dir = '../' . substr($key,0,2) . '/' . substr($key,2,8);
+       system("/bin/rm -r $dir");
+       # print our a redirect back to the referrer url after deletng?
+       print $cgi->redirect($cgi->referer);

Change the regex for the random hash key instead of the date. Split the key back into the directory1/directory2 structure.

+            # Update timestamp
+            touch(sourcefile)

Call the new touch function after a successful hardlink. This will set the timestamp on the file to 'now' to keep it from being deleted too soon. Since this script is run daily the timestamp will still be within a few hours of when the file was created. I do not mind a file lasting an extra day but I do mind a file being deleted prematurely.

+def touch(fname, times = None):
+    fhandle = file(fname, 'a')
+    try:
+        os.utime(fname, times)
+    finally:
+        fhandle.close()

Unix touch command replacement in python. Update last modified to 'now'.