September 15, 2003
Training Bogofilter with python

My python program for training bogofilter to distinguish between spam and non spam. It relies on the mail being in Maildirs on the machine the program is run on. It looks only at mail that has been read since the last time it was run. It assumes that this mail is in the correct folder so it compares where bogofilter thinks the mail should be, based on the X-Bogosity header, with where it actually is. Mail that isn't where bogofilter thinks it should be is fed back through bogofilter in the training mode so that bogofilter can improve. Works pretty well.

The program without html markup is here

Posted by Alex. Permalink
Comments
October 29, 2002
I know this is getting silly

Another round of renaming scripts. This time I've tried to provide "functional" and "imperative" styles for Python and Perl. For this sort of problem I personally think of it as a problem of processing a list of names so I prefer the functional approach. If the end result were to be a list of some sort then I think the functional approaches win hands down. The presence of the rename "side effect" means that they're not pure but The Python imperative style is courtesy of Jarno Virtanen with a downgrade by me so it matches the functionality of the others.

As a Python script, functional style:

import glob
import os

map(lambda f: os.rename(f + '.txt', f + '.xml'),
    [os.path.splitext(f)[0] for f in glob.glob('*.txt')])

As a Python script, imperative style:

import glob
import os

for filename in glob.glob('*.txt'):
    base = os.path.splitext(filename)[0]
    newname = '%s.%s' % (base, 'xml')
    os.rename(base + '.txt', newname)

As a Perl script, functional style:

map(rename($_ . '.txt', $_ . '.xml'),
    map(s/\.txt$//, glob('*.txt')))

As a Perl script, imperative style:

foreach $f (glob('*.txt')) {
    $f =~ s/\.txt$//;
    rename($f . '.txt', $f . '.xml');
}

Finally the shortest solution, provided by John Masson. As a Bash shell script from the command line:

for i in *.txt; do mv $i ${i%%.txt}.xml; done
Posted by Alex. Permalink
Comments
Now all we need is a canonical Java version. ;-) Posted by: Jarno Virtanen on October 29, 2002 11:05 PM
I'll give it a go :) Posted by: Alex on October 30, 2002 06:47 AM
using windows shell: ren *.txt *.xml is even shorter and simpler, the Bash thing is obfuscated with % and ; and %% and $ thingies. Posted by: Will Stuyvesant on November 2, 2002 02:05 PM
Unix has the rename command as well. rename .txt .xml ./* Posted by: asdf on December 15, 2002 02:14 PM
October 28, 2002
More bulk file renames

Jarno Virtanen has a much nicer pythonization of the elisp bulk file rename script here. As he says, his looks much more Pythonic. Mine was a literal translation from an elisp semi functional approach using the same constructs.

I'll try rewriting the perl in a more perlish style later. Probably involve using foreach $f (readdir(DIR)) as an outer loop rather than trying a functional approach. Of course it's also possible to write the elisp as an imperative loop rather than in a semi functional style.

It's interesting that once I had a solution it was easy to produce a literal translation, but it didn't occur to me to recast the problem using a different approach. If I'd have started with a Python script would I have written the elsip using dolist or while?

Posted by Alex. Permalink
Comments
October 24, 2002
Playing with HTML Templating in Python

Simon Brunning pointed me to this resource for web programming in python and recommended Quixote or PHP from the Webware project. Of course, in the typical response of programmers everywhere I decided to hack something together myself instead.

In my defense I can only say that my requirements are very, very simple, and I wanted an opportunity to expand my python "skills". The system I came up with is a basic version of Java Server Pages (very basic). A template is just an HTML page with <#= and #> around any python expressions. The values of the expressions are substituted into the page at runtime. All variables used by the template are provided as members of a dictionary called v. For example:

<h1>Written by <#= v["author"] #></h1>

The actual template is created from the html page by converting the page into the source code for a python function, compiling the function, and then executing it passing in the dictionary that provides the variable values. The generated function is cached and reused if the source file hasn't changed since the function was generated. Templates are of course represented as objects so creating and using one looks like:

t = Template("test.html")
t.execute({"author":"Alex", "date":"October 2002"})

Extending this would probably involve adding a TemplateFactory that cached templates. Adding support for compound python statements while, for etc. is compilcated by python's use of indentation to mark scope. If I ever need this I'll probably have to go with some sort of END_BLOCK or DEDENT marker to end up with:

<ul>
<# for item in v["lines"]: #>
<li><#= item #></li>
<# END_BLOCK #>
</ul>

Anyway, here's the source code, and here it is syntax highlighted for viewing.

Posted by Alex. Permalink
Comments