samhuri.net: August 2016

Easy Optimization Wins

10th August, 2016 ∞

It's not hard to hide a whole lot of complexity behind a function call, so you have to be very aware of what the functions you are using actually do, and how long they take to do it.

Here's some example code illustrating a big performance problem I found in a codebase I've inherited. We have a dictionary keyed by a string representing a date, e.g. "2016-08-10", and where the values are arrays of videos for that given date. Due to some unimportant product details videos can actually appear in more than one of the array values. The goal is to get an array of all videos, sorted by date, and with no duplicates. So we need to discard duplicates when building the sorted array.

func allVideosSortedByDate(allVideos: [String:[Video]]) -> [Video] {
    var sortedVideos: [Video] = []
    // sort keys newest first
    var dateKeys = allVideos.allKeys.sort { $1 < $0 }
    for key in dateKeys {
        for video in allVideos[key] {
            if !sortedVideos.contains(video) {
                sortedVideos.append(video)
            }
        }
    }
    return sortedVideos
}

Can you spot the problem here? sortedVideos.contains(_:) is an O(n) algorithm which means that in the worst case it has to look at every single element of the collection to check if it contains the given item. It potentially does n operations every time you call it.

Because this is being called from within a loop that's already looping over all n items, that makes this an O(n²) algorithm, which is pretty terrible. If you ever write an n² algorithm you should find a better solution ASAP. There almost always is one! If we have a modest collection of 1,000 videos that means we have to do 1,000,000 operations to get a sorted array of them. That's really bad. One measly line of innocuous looking code completely blew the performance of this function.

In this particular case my first instinct is to reach for a set. We want a collection of all the videos and want to ensure that they're unique, and that's what sets are for. So what about sorting? Well we can build up the set of all videos, then sort that set, converting it to an array in the process. Sounds like a lot of work right? Is it really faster? Let's see what it looks like.

func allVideosSortedByDate(allVideos: [String:[Video]]) -> [Video] {
    var uniqueVideos: Set<Video> = []
    for key in allVideos.allKeys {
        for video in allVideos[key] {
            uniqueVideos.insert(video)
        }
    }
    // sort videos newest first
    let sortedVideos = uniqueVideos.sort { $1.creationDate < $0.creationDate }
    return sortedVideos
}

The for loops are still O(n) and now we've shifted some work outside of those loops. After the O(n) loops we have a sort method that is probably something like O(n * log n), which is fairly typical. So the total complexity of the function is O(n + n * log n). In order to simplify this we can inflate the first term a bit by increasing it to n * log n as well, making the whole thing O(2n * log n), and since we discard constants that's O(n * log n) overall.

Putting the constant back in let's see how many operations it now takes to get an array of all videos sorted by date. Again saying we have a modest collection of 1,000 videos, that'll be 2 * 1,000 * log 1000 → 2,000 * 3 → 6,000 operations to do the whole thing. A far cry from 1,000,000!

Getting practical, in this case running the original function against 4,990 videos takes 29.653 seconds on an iPhone 4S. Running the new function against the same set of videos takes 4.792 seconds. Still not great and there's room for improvement, but that was a big, easy win already.

Mind your algorithms folks, it makes a huge difference in the real world.

A Git Pre-commit Hook for iOS

4th August, 2016 ∞

Krzysztof Zabłocki wrote a nice article on using a git pre-commit hook to catch mistakes in iOS projects before you push those mistakes out to the whole team/world. It's a great idea! But the shell script has some problems, so let's fix those.

If you don't care what I did or why then you can just see the updated script.

Repeated code

The diff command is repeated. This is any easy win:

diff-index() {
    git diff-index -p -M --cached HEAD -- "$@"
}

if diff-index '*Tests.swift' | ...

You get the idea.

Portability

One problem is that the bootstrap script uses an absolute path when creating a symlink to the pre-commit script. That's no good because then your pre-commit hook breaks if you move your project somewhere else.

That's easily fixed by using a relative path to your pre-commit hook, like so:

ln -s ../../scripts/pre-commit.sh .git/hooks/pre-commit

Ah, this is more flexible! Of course if you ever move the script itself then it's on you to update the symlink and bootstrap.sh, but that was already the case anyway.

Show me the errors

Ok great so this script tells me there are errors. Well, script, what exactly are those errors?

Show me the money! –Cuba Gooding Jr. in Jerry Maguire

First ignore the fact I'm talking to a shell script. I don't get out much. Anyway... now we need to pull out the regular expressions and globs so we can reuse them to show what the actual errors are if we find any.

test_pattern='^\+\s*\b(fdescribe|fit|fcontext|xdescribe|xit|xcontext)\('
test_glob='*Tests.swift *Specs.swift'
if diff-index $test_glob | egrep "$test_pattern" >/dev/null 2>&1
...

Pro tip: I prefixed test_pattern with \b to only match word boundaries to reduce false positives.

And:

misplaced_pattern='misplaced="YES"'
misplaced_glob='*.xib *.storyboard'
if diff-index $misplaced_glob | grep '^+' | egrep "$misplaced_pattern" >/dev/null 2>&1
...

You may notice that I snuck in *Specs.swift as well. Let's not be choosy about file naming.

Then we need to show where the errors are by using diff-indef, with an || true at the end because the whole script fails if any single command fails, and git diff-index regularly exits with non-zero status (I didn't look into why that is).

echo "COMMIT REJECTED for fdescribe/fit/fcontext/xdescribe/xit/xcontext." >&2
echo "Remove focused and disabled tests before committing." >&2
diff-index $test_glob | egrep -2 "$test_pattern" || true >&2
echo '----' >&2

And for misplaced views:

echo "COMMIT REJECTED for misplaced views. Correct them before committing." >&2
git grep -E "$misplaced_pattern" $misplaced_glob || true >&2
echo '----' >&2

Fix all the things, at once

The third problem is that if there are any focused or disabled tests you won't be told about any misplaced views until you try to commit again. I want to see all the errors on my first attempt to commit, and then fix them all in one fell swoop.

The first step is to exit at the end using a code in a variable that is set to 1 when errors are found, so we always run through both branches even when the first has errors.

Up top:

failed=0

In the middle, where we detect errors:

failed=1

And at the bottom:

exit $failed

That's all there is to it. If we don't exit early then all the code runs.

General Unixy goodness

Error output should be directed to stderr, not stdout. I littered a bunch of >&2 around to rectify that situation.

Final countdown

Those were all the obvious improvements in my mind and now I'm using this modified version in my project. If you come up with any more nice additions or changes please share! Fork this gist.

Here's the whole thing put together:

#!/usr/bin/env bash
#
# Based on http://merowing.info/2016/08/setting-up-pre-commit-hook-for-ios/

set -eu

diff-index() {
  git diff-index -p -M --cached HEAD -- "$@"
}

failed=0

test_pattern='^\+\s*\b(fdescribe|fit|fcontext|xdescribe|xit|xcontext)\('
test_glob='*Tests.swift *Specs.swift'
if diff-index $test_glob | egrep "$test_pattern" >/dev/null 2>&1
then
  echo "COMMIT REJECTED for fdescribe/fit/fcontext/xdescribe/xit/xcontext." >&2
  echo "Remove focused and disabled tests before committing." >&2
  diff-index $test_glob | egrep -2 "$test_pattern" || true >&2
  echo '----' >&2
  failed=1
fi

misplaced_pattern='misplaced="YES"'
misplaced_glob='*.xib *.storyboard'
if diff-index $misplaced_glob | grep '^+' | egrep "$misplaced_pattern" >/dev/null 2>&1
then
  echo "COMMIT REJECTED for misplaced views. Correct them before committing." >&2
  git grep -E "$misplaced_pattern" $misplaced_glob || true >&2
  echo '----' >&2
  failed=1
fi

exit $failed