samhuri.net


By Sami Samhuri

April 2007

Funny how code can be beautiful

While reading a Haskell tutorial I came across the following code for defining the Fibonacci numbers:

fib = 1 : 1 : [ a + b | (a, b) <- zip fib (tail fib) ] After reading it a few times and understanding how it works I couldn’t help but think how beautiful it is. I don’t mean that it’s aesthetically pleasing to me; the beautiful part is the meaning and simplicity. Lazy evaluation is sweet. Haskell is the most challenging real language I have tried to wrap my head around. I haven’t done much with any functional languages yet but they are truly fascinating. I’m beginning to understand monads[1] but I’m quite sure I don’t see the whole picture yet. Erlang looks like it may be more suited to real world apps so I would like to learn that some time. The pragmatic guys have a book on Erlang in the works, and I love every book of theirs which I have read. Going deeper down the functional rabbit-hole you’ll find things like this polyglot quine, which absolutely blows my mind. I used to be impressed by the JAPH sigs or some of the various obfuscated contest winners but that first one definitely cleans the rest up with a perfect 10 in geekiness. [1] The following links have all been helpful while trying to wrap my head around monads. * A Gentle Introduction to Haskell (link is directly to chapter 9) * What the hell are Monads? * Monads on WikiBooks * Monads for the Working Haskell Programmer

Quickly inserting millions of rows with MySQL/InnoDB

The absolute first thing you should do is check your MySQL configuration to make sure it’s sane for the system you’re using. I kept getting a ‘The table is too large’ error on my Gentoo box after inserting several million rows because the default config limits the InnoDB tablespace size to 128M. It was also tuned for a box with as little as 64M of RAM. That’s cool for a small VPS or your old Pentium in the corner collecting dust. For a modern server, workstation, or even notebook with gigs of RAM you’ll likely want to make some changes.

Tweaking my.cnf

Here are the relevant settings you can tweak in order to work with large datasets efficiently. These are set in your my.cnf file, which varies in location.

On Gentoo it resides at /etc/mysql/my.cnf.

When MySQL 5.x is installed via DarwinPorts on Mac OS X you need to copy one of the defaults from /opt/local/share/mysql5/mysql/ to /opt/local/etc/mysql5/my.cnf and then modify it accordingly.

If you use another system you’re on your own. If you can’t figure it out, please put down the text editor and leave the poor config file alone! Jokes aside this really is not difficult if you’re used to configuring *nix programs.

innodbbufferpool_size

This determines how much memory MySQL uses for table indexes and data. You can set it as low as 8-10M, or high as 50-80% of your memory on a dedicated MySQL server. I have RAM to burn[1] in my workstation so I set this to 200M, 20% of my 1GB.

[1] I run Fluxbox on Gentoo, I use 200-300M of my 1GB on average and with 200M for MySQL 409M are in use at this moment. Gotta love those lightweight window managers!

innodbadditionalmempoolsize

According to a post on a MySQL mailing list, modern OSs have fast enough mallocs and this variable has little effect on performance. I set mine to 16M before reading that post, so I’ll just leave it at that.

innodbdatafile_path

On Gentoo this one bit me right in the ass, and I mentioned it above. It specifies how large the files used to store your data can be, and how many of them there are. The default setting is almost sane: ibdata1:10M:autoextend:max:128M. Limiting the total size to 128M caused my test to fail after inserting several million rows.

Simply removing max:128M solves the problem. The resulting setting tells the InnoDB engine to use one file, named ibdata1 which is initially 10M in size and grows as required.

innodblogfile_size

The default Gentoo config says they (whoever they are) keep this at 25% of innodb_buffer_pool_size so I did just that. 50M in my case.

innodblogbuffer_size

Again I only went as far as the Gentoo config to learn about this setting. They had it at 8M and recommend increasing it if you have large transactions. I can’t think of any particularly large transactions I currently use but I doubled it to 16M anyway.

Save my.cnf and restart mysqld

That’s it for the MySQL config. Restart mysqld however you do that on your platform. sudo /etc/init.d/mysql restart should look familiar to many *nix users.

Now you should be able to insert dozens and indeed hundreds of millions of rows into your InnoDB tables. Sadly this brought little performance gains to the table. MySQL wraps single queries in implicit transactions. Wrapping everything in a transaction may work, but inevitably something will go wrong and you may want the ability to resume inserting the rows instead of starting all over.

The solution now is to execute SET AUTOCOMMIT=0 before inserting the data, and then issuing a COMMIT when you’re done. With all that in place I’m inserting 14,000,000 rows into both MyISAM and InnoDB tables in 30 minutes. MyISAM is still ~ 2 min faster, but as I said earlier this is adequate for now. Prior to all this it took several hours to insert 14,000,000 rows so I am happy.

Now you can enjoy the speed MyISAM is known for with your InnoDB tables. Consider the data integrity a bonus! ;-)

Getting to know Vista

It looks pretty good!

After figuring out how to minimise the translucency of the window decorations I think Aero looks ok. Window titles, on both windows and the taskbar, can be difficult to read at a glance which is really stupid if you ask me. But it’s better than Luna! They really lay the effects on thick but overall I find it pretty pleasant and it runs well on my MacBook’s Intel 945 video chip.

Ah yes, the Sidebar is nowhere to be seen on my desktop. It’s a nice-looking waste of space.

But it’s not all useful

Sadly The new task switcher (Win-tab) is terrible. Before using it I wondered why they didn’t replace Alt-tab completely. Now I know and I am grateful to MS for not replacing it. Alt-tab easily wins. Especially since it displays thumbnails of windows.

Three gripes with Win-tab fancy-shmanciness:

It’s stable (so far)

Besides the fact it is aesthetically pleasing [subjective] it also has just worked for me so far. Nothing has crashed or broken which is almost miraculous. Not that I had a terrible time with XP, but it was still frail old Windows at times. I’m equally pleased with Apple’s drivers for Windows which probably adds to the experience. I’ve used XP machines with proper drivers, and those without and the differenc is night &amp; day. I’ve had uptimes in months on a stable XP notebook.

Never thought this day would come...

But I actually like the Start menu. Really, I do. You hit the Windows key, type a few letters and boom you launch your app or search your computer, or the web (Google in Firefox, my default search). It’s not QuickSilver or LaunchBar; it’s not supposed to be. For the average Joe this is cool, and for the average power user it’s very useful. For the casual Windows user it’s great. It even learns.

I don’t love it though. I knew before using it that the new method of navigating through the All Programs menu would be weird. It is, but I guess it may be better than the previous fly-out scheme (which I don’t care for either). I guess the All Programs menu is more or less legacy now though and I don’t see myself using it often.

I’m a command line junkie

They fixed at least one glaring bug. I used the cmd.exe shell a little bit even though I hate it. I was happy to find that Tab completion works for more than the current directory now. Before Vista it would complete the same entries from PWD no matter how deep you tried to drill down into the filesystem. Other than that it seems to be the same crummy shell. [edit—apparently this is fixed in XP as well, my mistake]

I installed the Windows PowerShell (PoSH) but haven’t really put an effort to learn it yet. The syntax is unorthodox coming from *nix shells (zsh), but it’s sort of refreshing and it lives up to the Power part of its name. I really like the fact that collections of (say) files can be passed around and iterated over, filtered, etc. not as filenames but as real objects with corresponding methods and metadata. Built-in support for XML is pretty nifty too.

I’ve often longed for a shell which acted like a normal shell for the most part, but allowed irb-like interpretation of arbitrary Ruby code as well. The PowerShell seems like it could be something similar to what I’ve wanted. Too bad it’s proprietary and only runs on Windows. If I use Vista a lot this summer I could end up getting into it more though. It’s quite interesting.

Random

The good:

The bad:

My conclusion

Perhaps the scores of talented developers at Microsoft can save them despite their obvious shortcomings in management. .NET seems like a decent platform, but we’ll have to see how I like it once I actually use it. So far I don’t hate Vista and considering the previous versions of Windows that’s a pretty good review coming from me. I’m still recommending Macs to my family and friends, but who knows what the future holds. I don’t hate Vista and by the end of the summer I may even [gasp] like it, and/or .NET. I haven’t used an IDE since VB6 and MS has always had a decent IDE (albeit with a crummy text editor). I’m expecting to enjoy it. If there’s one thing MS knows it’s the value of good dev tools and developers.

ActiveRecord::Base.find_or_create and find_or_initialize

I've extended ActiveRecord with find_or_create(params) and find_or_initialize(params). Those are actually just wrappers around find_or_do(action, params) which does the heavy lifting.

They work exactly as you'd expect them to work with possibly one gotcha. If you pass in an id attribute then it will just find that record directly. If it fails it will try and find the record using the other params as it would have done normally.

Enough chat, here's the self-explanatory code:

1
2
3
4
# extend ActiveRecord::Base with find_or_create and find_or_initialize.
ActiveRecord::Base.class_eval do
  include ActiveRecordExtensions
end
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
module ActiveRecordExtensions
  def self.included(base)
    base.extend(ClassMethods)
  end

  module ClassMethods
    def find_or_initialize(params)
      find_or_do('initialize', params)
    end

    def find_or_create(params)
      find_or_do('create', params)
    end

    private

    # Find a record that matches the attributes given in the +params+ hash, or do +action+
    # to retrieve a new object with the given parameters and return that.
    def find_or_do(action, params)
      # if an id is given just find the record directly
      self.find(params[:id])

    rescue ActiveRecord::RecordNotFound => e
      attrs = {}     # hash of attributes passed in params

      # search for valid attributes in params
      self.column_names.map(&:to_sym).each do |attrib|
        # skip unknown columns, and the id field
        next if params[attrib].nil? || attrib == :id

        attrs[attrib] = params[attrib]
      end

      # no valid params given, return nil
      return nil if attrs.empty?

      # call the appropriate ActiveRecord finder method
      self.send("find_or_#{action}_by_#{attrs.keys.join('_and_')}", *attrs.values)
    end
  end
end

A triple-booting, schizophrenic MacBook

The steps are well documented so I won’t get into detail here but if you have a backup and can wipe your disk all you do is:

With MacPorts and Gentoo/MacOSX the Gentoo install is superfluous but I’ll spare 12G just to see Gentoo run on this fine machine. Setting up the hardware should be fun. Right now I’m compiling X, (package 77 of 94) and the Core Duo is crunching code very nicely with 2G to work with, without any swap. I fully intend to put off creating a swap file unless I have to. Needless to say I’ll be running fluxbox or Xfce, none of that Gnome or KDE stuff. If I ever need a swap file I will eat my keyboard.

[edit: 25 minutes to compile X.org, not too shabby!]

My initial experience with Vista is quite good. Sadly the same old registry hack is required to swap Caps lock and Control but I was just glad it worked. I really like the new Start menu and the eye-candy is fairly pleasant for the most part. Until now I’d only used RC2 on a machine incapable of running Aero Glass and it looked terrible. I switched to Windows Classic just like I do with XP. Not so with Aero at its finest though. Without thinking about the price Vista is a nice upgrade to Windows. But because of the price and uncertainty of running Aero Glass I still hesitate to urge non-geeks to upgrade.

OS X is OS X. It’s my favourite desktop OS right now because of apps like LaunchBar/Quicksilver and TextMate, a generally excellent UI, good old *nix stability, zsh out of the box! When I need WireShark or the GIMP X11 is there waiting. Mac notebooks are great and tight integration with the hardware is a clear advantage for OS X.

Oh yeah, I also have a Parallels VM for Windows 3.11. It boots in about second to the C:\> prompt and then another second to type win and Windows to start. Without TCP/IP there’s not much to do though (I’m not going to write a driver for Parallels’ ethernet adapter).

Like I said the X.org boys are doing amazing work. Hopefully soon after the current eye-candy craze is over they’ll get to more important work that needs to be done.