Posted in June 2008

How to handle Settings for Accounts.

I’ve run into this problem a few times, but I’m putting out a call to see if anybody has some great insight into a good solution. To date, I’ve come up with three different approaches to handling user settings, but none of which I’m in love with. Other ideas, other comments, other ways?

Working assumptions

  • It’s handy to access specific elements outside of the database (e.g. who has XYZ preference set)
  • Databases appear to be better at storing small data than large (varchar < 255 vs. BLOB of 64k)
  • There is no right answer

Small records only

First encountered this at Excite with Excite Mail, since we were using OpenWave as the backend we could store 255 bytes in one of their database tables, and it became a preferences store. Though of course it was a bit more restrive than the below SQL, it got the job done. With a bit of class wrapping ontop, it really works.

Pro:
  • Only one record type
  • Con:
  • End up with a chained dkey (signature_0, signature_1, etc.)
  • CREATE TABLE IF NOT EXISTS setting (
        account_id      int unsigned NOT NULL,
    
        dkey            varchar(20) NOT NULL,
        data            tinyblob NOT NULL,
        created_on      TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
    
        PRIMARY KEY     (account_id, dkey),
        FOREIGN KEY     (account_id) REFERENCES account(account_id) ON DELETE CASCADE
    ) ENGINE=InnoDB;
    

    Small and Big records

    Ok, so when you don’t want to deal with chaining of records you can create a BIG and SMALL store of data.

    Pro:

  • No chaining of records
  • Easily delete keys (just delete from both and ignore errors)
  • Con:

  • Lots of extra model level helpers to determine where data should go
  • CREATE TABLE IF NOT EXISTS user_data (
        user_id         varchar(16) NOT NULL,
    
        dkey            varchar(20) NOT NULL,
        dtype           int DEFAULT 0,
        data            tinyblob,
    
        created_on      TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    
        PRIMARY KEY     (user_id, dtype, dkey),
        INDEX           (user_id, dtype),
        FOREIGN KEY     (user_id) REFERENCES user(user_id) ON DELETE CASCADE
    ) ENGINE=InnoDB;
    
    CREATE TABLE IF NOT EXISTS user_data_big (
        user_id         varchar(16) NOT NULL,
    
        dkey            varchar(20) NOT NULL,
        dtype           int DEFAULT 0,
        data            blob,
    
        created_on      TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    
        PRIMARY KEY     (user_id, dtype, dkey),
        INDEX           (user_id, dtype),
        FOREIGN KEY     (user_id) REFERENCES user(user_id) ON DELETE CASCADE
    ) ENGINE=InnoDB;
    

    Blob of doom

    What about the idea that you don’t want your SQL store to have a clue, so save settings in one big blob (json, xml encoded, etc.)

    Pro:

  • All the parsing is at code level, record management is easy (1 per user)
  • Con:

  • it’s a blob, you can’t really access it easily outside of your program
  • CREATE TABLE IF NOT EXISTS user_blob (
        user_id         varchar(16) NOT NULL,
    
        data            blob,
    
        created_on      TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    
        INDEX           (user_id),
        FOREIGN KEY     (user_id) REFERENCES user(user_id) ON DELETE CASCADE
    ) ENGINE=InnoDB;
    

    Any other ideas? Cool suggestions, the magic solution I’ve forgotten?

    Bike shirt poetry

    I bike… I bike more than average…  While a Poseur Jersey would be fun, I was thinking on my ride to work today that a shirt/jersey that said (primarily on my ride to work):

         The more I bike
         The less gas I use
         The more there is
         The cheaper it is
         The happier you are

    Maybe then a few cars would be a bit more considerate.

    Tagged

    JavaScript (spidermonkey) version

    Way to carried away with this… here’s the SpiderMonkey version (as opposed to the rhino) one. Biggest complaint is the File object is a bit of a pain to use…

    fd = File('p.test');
    fd.open("read",'text');
    
    mre = /^__MULTI_TOKEN__\s+(\S+)\s+(.*)\t?\s*(\d+)\s*$/;
    sre = /^__SINGLE_TOKEN__\s+(\S+)\s*\t?\s*(\d+)\s*$/;
    
    var ofd = File('full.txt');
    ofd.open('write,create,replace','text');
    first = {}
    
    while (line = fd.readln()) {
        if (m = line.match(mre)) {
            first[m[1]] = (first[m[1]] ? first[m[1]] : 0) + parseInt(m[3]);
            ofd.writeln(m[1] + " " + m[2] + "\t" + m[3]);
        } else if (m = line.match(sre)) {
            first[m[1]] = (first[m[1]] ? first[m[1]] : 0) + parseInt(m[2]);
        } else {
            print("Unknown: " + line);
        }
    }
    
    ofd.close()
    
    ofd = File('first.txt');
    ofd.open('write,create,replace','text');
    for (key in first) {
        ofd.writeln(key + "\t" + first[key]);
    }
    ofd.close()
    

    This is all based on the PHP/Perl/Python performance code.

    JavaScript version of the Perl/PHP performance test

    Using the Rhino compiler… sorry no benchmark numbers, but just as a placeholder. To pick a nit, wanted to do first[m[1]] += parseInt(m[3]) but that yeilded a NaN since first[m[1]] was undefined… You would think that JavaScript would make undefined == 0.

    fd = java.io.BufferedReader(java.io.FileReader('p.test'))
    
    mre = /^__MULTI_TOKEN__\s+(\S+)\s+(.*)\t?\s*(\d+)\s*$/;
    sre = /^__SINGLE_TOKEN__\s+(\S+)\s*\t?\s*(\d+)\s*$/;
    
    ofd = java.io.PrintWriter(java.io.FileWriter('full.txt'));
    first = {}
    
    while (line = fd.readLine()) {
        if (m = line.match(mre)) {
            first[m[1]] = (first[m[1]] ? first[m[1]] : 0) + parseInt(m[3]);
            ofd.println(m[1] + " " + m[2] + "\t" + m[3]);
        } else if (m = line.match(sre)) {
            first[m[1]] = (first[m[1]] ? first[m[1]] : 0) + parseInt(m[2]);
        } else {
            print("Unknown: " + line);
        }
    }
    
    ofd.close()
    
    ofd = java.io.PrintWriter(java.io.FileWriter('first.txt'));
    //  Sigh... rhino needs an update...
    //   first.forEach(function(e, i, a) { ofd.println(i + "\t" + e); });
    for (key in first) {
        ofd.println(key + "\t" + first[key]);
    }
    ofd.close()
    This is all based on the PHP/Perl/Python performance code.
    

    When’s a string not a string

    When it’s intertwined — ok, bad joke.

    I’m currently “fighting” (playing) around with the python storm ORM system and had the following problem (not the real code, but demonstration).

    What I wanted to say is:

    find(..., employee_title == 'CEO')

    However this didn’t work, I had to say:

    find(..., employee_title == u'CEO')

    After a bit of discussion down a mailing list, I got the “classic” answer back.

    Maybe you think it’s annoying, but it’s correct, and doing it with automatic conversion is error-prone.

    While I can see the point, it’s one of my pet peeves with Python specifically… Which is that there are quite a few programmers who are overly type aware, but yet are working in a language that is “late typing”.   You would think that in a world where one of the contributions to design is duck typing, would be the idea that last mile type enforcement is a bad thing.

    What’s really funny is that when you put this in context of this post I read this morning about 100% code coverage the following can be observed.

       To simply state a goal of 100% code coverage for a test suite in the presence of late typing is a red herring. 

    def foo(flg) :
       if flg :
           x = u'TEST'
       else :
           x = 'TEST'
    
    def bar(i) :
       x = ""
       x += foo(False)
       return x
    
    def unitTest() :
      j = foo(True)
      for i in range(1,3) :
         k = bar(i)
    

    I’ve got 100% code coverage in (flawed) unit test, but the problem is that since the typing is late there is the potential for lots of latent bugs. Eiffel! Since python isn’t going to give into true typing, it might as well adopt PEP 316: Programming by Contract to enable a test suite to enforce pre/post conditions to insure a function can only return specific types.

    Parting thought after I did a bit of investigation: Python Contracts – plus it improves the documentation of functions (another personal pain point)

    The $100,000 Keying Error

    The IEEE summary of this error is worth a read, both from the cause and the outcome of the UI testing.

    Since I’m frequently dealing with interfaces that require typing of sets of digits or numbers.  I’ve been thinking about this for the last few days and have two thoughts on how to reduce errors like this.  Not of course that I have any capabilities to test either idea, but there’s always a chance that somebody has some additional insight.

    The core summary of the error is:

    Fossbakk’s daughter’s account number was 71581555022, but she inserted an extra 5 and keyed in 71581555 5022. The user interface accepted only 11 digits in this field (the standard length of a Norwegian account number), thus truncating the number to 71581555502. The last digit is a checksum based on a modulo-11 formula. This will detect all single keying errors and errors where two consecutive digits are interchanged. Inserting an extra 5 changed both the ninth and tenth digits.

    Ideas of how to reduce the error:

    Grouping of digits

    Telephone numbers most specifically, are traditionally written in different grouping combinations (nn nn nn nn OR nnn nnn nnnn).  While bank numbers are  usually just strings of digits.  Thus providing very few visual cues that there is an error, in a grouped system you would notice that the first digit of a group is incorrect.   Though I will say that windows product keys, which are grouped are a total pain, but they at least provide a quick way to see that any one group is ok.

    Mixed digits and alpha

    If a account number was a mixture of digits an alphabetic characters (avoiding those that might be misleading 0 vs. O), I would imagine that similar to the grouping strategy it would reduce the error rate.  Things like:   715 BUVX 5022, one could even extend this to be similar to the old US telephone system nomenclature “715 wabash 5022″ where technically only the “WAB” is the unique part, but by turning into a real word the error rate might be reduced.