Posted in June 2008

Performance of Python, PHP and Perl

Had a 7GB text file that I needed to run some parsing on (to prepare for a DB import).  As part of my habit I pulled out perl and whipped up a quick program to parse and generate some loadable files.  While watching it run I got to thinking about … why … why perl (yes, I know habbits are hard to break).  So while watching it run I re-wrote the program into PHP and Python.

Performance Numbers (on 5 million lines worth of the file)

  $ time ./split.pl  p.test           # Perl 5.8.8

  real    0m38.577s
  user    0m33.554s
  sys     0m0.848s

  $ time ./split.py p.test            # Python 2.4.4
  real    0m44.895s
  user    0m42.975s
  sys     0m0.900s

  $ time php split.php p.test         # PHP 5.2.6RC4
  real    1m10.887s
  user    0m51.251s
  sys     0m18.677s

So, it appears that Perl is the right choice for this job.. Though python is a good second choice, but PHP 50% slower (most likely due to not having complied regular expressions).   I also might note that I’m not fond of the python if/else probably with a chained expression match, where I want to “side effect” out the results of the match — is there better syntax?

Here’s the code for you’re viewing pleasure and possible commentary.

Perl

use strict;

my %first;

open(FULL, ">full.txt");

while (<>) {
# __SINGLE_TOKEN__ adrianenamorado                 1
# __MULTI_TOKEN__ a aaron yalow        1
    chop;
    if (/^__MULTI_TOKEN__\s+(\S+)\s+(.*)\t?\s*(\d+)\s*$/) {
        $first{$1} += $3;
        print FULL  $1," ", $2, "\t", $3, "\n";
    } elsif (/^__SINGLE_TOKEN__\s+(\S+)\s*\t?\s*(\d+)\s*$/) {
        $first{$1} += $2;
    } else {
        print "Unknown: ", $_, "\n";
    }
}

close(FULL);

open(FIRST, ">first.txt");
while (my($k, $c) = each %first) {
    print FIRST $k,"\t",$c,"\n";
}
close(FIRST);

Python

import sys, os, re

first = dict()

ofd = open("full.txt", 'w')

mre = re.compile('^__MULTI_TOKEN__\s+(\S+)\s+(.*)\t?\s*(\d+)\s*$')
sre = re.compile('^__SINGLE_TOKEN__\s+(\S+)\s*\t?\s*(\d+)\s*$')

ifd = open(sys.argv[1], 'r')

for line in ifd :
    line = line.strip()
    m = mre.match(line)
    if m :
        first[m.group(1)] = m.group(3)
        print >> ofd, m.group(1), " ", m.group(2), "\t", m.group(3)
    else :
        m = sre.match(line)
        if m :
            first[m.group(1)] = m.group(2)
        else :
            print "Unknown ", line

ofd.close();

ofd = open("first.txt", 'w')
for (k, c) in first.iteritems() :
    print >> ofd, k, "\t", c
ofd.close()

PHP

$first = array();

$fd = fopen("full.txt", 'w');
$in = fopen($argv[1], 'r');

while ($line = fgets($in)) {
    $line = trim($line);
    if (preg_match('/^__MULTI_TOKEN__\s+(\S+)\s+(.*)\t?\s*(\d+)\s*$/', $line, $m)) {
        $first[$m[1]] += $m[3];
        fprintf($fd, "%s %s\t%d\n", $m[1], $m[2], $m[3]);
    } else if (preg_match('/^__SINGLE_TOKEN__\s+(\S+)\s*\t?\s*(\d+)\s*$/', $line, $m)) {
        $first[$m[1]] += $m[2];
    } else {
        print "Unknown: {$line}\n";
    }
}

fclose($fd);

$fd = fopen("first.txt", 'w');
foreach ($first as $k => $c) {
    fprintf($fd, "%s\t%d\n", $k, $c);
}
fclose($fd);
Tagged , , ,

Showing the whale…

Saw this in a set of comments on hacker news…  totally rocking quote!

Showed the whale,
Jumped the shark,
Epic fail,
Nothing but carp.
Its all fish to me. :)

Now to get it all on a threadless shirt with some Web 2.0 graphics ;-)   Maybe the twitter logo to bring it all home.

Framinstein lives!

Not quite interactive, but half of what I want in a digital picture frame is now available…

eStarling Picture Frame

  • Feeds – flickr, etc.
  • Email
  • WiFi

Front-end vs. back-end developers (my take)

Over on Dzone this article Font-end vs. back-end developers caught my attention.  Alas, I feel compelled to write a longer rant.  First off let’s touch on Mads points, but let me point out a bias first.

Front end devs tend to be less “classically” trained than back end devs, based on the resume flow I see for front end positions vs. back end positions front end developers come from non CS backgrounds, back end developers have CS degrees.

Front-end devs don’t unit test

While I can make an argument that the tools to test front end code vs. back end code are quite different, it’s a “cultural” thing more than an absolute.  There are challenges, which is that to “front end test” code you need to have two computers with 6..7 different browsers attached, since unless you’re using some automated tools (which I’ve yet to find open sourced) there’s still the off chance that you’ve got a trailing “,” in some JavaScript to blow up the world.

Ultimately if you assume a level of “experience” it’s up to the senior people (maybe those people with CS degrees) to help set some standards and find tools to use in a production situation.

Back-end devs are more low-level

I can’t only partly disagree, since fundamentally, back-end can equate to everything from MySQL database table, index and query optimizations to multi-threaded queuing services.  However this a false assumption, yes, those skills are needed but I’ve never met a true backend developer that really enjoyed dealing with the memory management issues in parsing XML from C++.   It’s just tedious…  much the same as matching layout in CSS between IE and FF.

Front-end devs make more mistakes

A joke, right…  What you really should consider is a difference in tools, which relates to point #1 (unit tests).  The biggest challenge is that “classic” backend is C++/Java or other compiled language.  With a compiled language you have a greater chance that your code is meaningful (not error free), since a typo isn’t going to blow up later

had the following line in some PHP the other day:
     throw new Foo_Bar_Exception();
problem was that it was a Foo_Baz_Exception();  not caught until the line was executed

So while you might need to track down different classes of problems, you still have similar problems.   While there is some “endorphin” driving development, the biggest challenge that a front-end developer has is that they know that once they’ve build out 90% of the functionality, that somebody is going to walk in and have them change 50% of the work they just did…  Or potentially throw a re-design in the pipeline so, 90% of their work is out the window.  Back end folks, just nod and smile and say “that’ll be 18 months of development”.

Back-end devs hate the client-side

Digging a hole for oneself…  I think the problem is not that there is a dissatisfaction, but I’ve frequently encountered developers that are fairly rigid in their thinking (back end primarily and sometimes on the front end).  The problem is that HTML/CSS browser compatibility is a shifting landscape, so what worked six months ago might be broken today.  The idea that code without change can break is a tough concept to most developers.  It’s a way of life for some and a battle to be fought for others.

My personal take

Again it’s a matter of training, some broad stroke generalizations:

Front end developers

  • Typically don’t have a CS degree, or have a CS degree from a 3rd tier school.
  • Work in languages that similar to basic (see PHP is Basic)
  • Have a visual skill in converting photoshop documents to CSS/HTML/etc.
  • Have a high tolerance for iterative programming, due to type free languages

Back end developers

  • Have a CS degree or lots of experience
  • Tend to me more systematic in their problem solving approach
  • Don’t mind spending days finding the one object that is leaking
  • Try and build tools to solve problems

Fundamentally, it’s up to senior developers to lead their teams to better solutions.  Seniority comes from experience and education, but if you’re unwilling to learn new tools (NIH is evil) or help build tools to solve problems that are not necessarily your own, then you’re not senior.  Your just another hack developer, who has a my way or the highway attitude.

Framinstein — Digital Fridge — Digital Postcards

Photo frame, or world interface.  Caught this article about a Digital picture frame doubles as a secondary display and got to thinking about an old project.

Idea:

What I want is a useful device on my fridge to replace my calendar on the wall, see just about every sci-fi movie made in the last five years.  But how?

How about a slave interface:

  • Digital picture frame at around 800×600
  • Shows photos when not “in use”
  • WiFi (ok, USB for now) connection to the home computer
  • Voice commands — think Star Trek — “Computer, show calendar”
  • That’s it, if it was really good you could say:  “Computer, show grocery list, add milk”
  • When not in use show the slide show — maybe “improved” with the digital postcards…

Things that I continue to think about is how we interact with computers ala:

  • Home Note — Worth reading the papers to see what really happens…
  • Digital Postcards — think about what it would mean if there was a twitter-ish service that combined a fixed display in your home (office, etc) with the output of a cellphone camera from your friends..  Take it one step further, Make those annoying facebook messages “real” by broadcasting them to passive viewers.  [Ok, maybe you don't want to tell your parents you woke up with somebody...]

Gosh, the whole Microsoft Socio-Digital Systems group looks fun…. 

Delightful Matt

Over the years I’ve had the chance to work with my friend Matt twice — once at Excite@Home (via blue mountain arts) a second time at Yahoo on Mail.  He has to be one of the more upbeat people I’ve worked with through the years, one thing he’s said so many times that it’s almost comical (ok, maybe it’s just me snickering under my breath) is:

     “How can we delight our users”

or

      “It’s delightful”

or

      “It’s a delightful experience”

So, now ever time I hear “…delight…user…” in a sentence I think of Matt.  So, as I was riding in to work this morning I had the following series of thoughts:

  • Delighted users are happy users
  • Happy users are repeat users
  • Repeat users are good customers
  • Good customers bring their friends
  • Making customers happy and their friends happy…well, ‘nuf said

Thank Matt for keeping things delightful — I’ll still snicker under my breath.