Don’t use python’s list comprehensions for set operations. Use, um, the set type and her functions.

I’m making, say, 60,000 function calls, each involving two python lists (no duplicates within these lists, so they’re theoretically more like sets), with each having some overlap with the other.  I’m making these 60,000 calls, say, 30,000 times.  The question: how do I find the symmetric difference of the two lists efficiently via my function call?

Say my lists are a and b.  I’m concerned with the most computationally intensive part of the function, the calculation of the symmetric difference.  Here’s one approach:

[elem for elem in a if elem not in b] + [elem for elem in b if elem not in a]

But this ends up taking anywhere from .8 to 1.4 seconds for the 60,000 function calls.  Too long!  I thought perhaps if I can collapse it all into one list comprehension, then I’d be fine … maybe the issue is the fixed cost of calling the function:

[elem for elem in set(a+b) if (elem in a and elem not in b) or (elem in b and elem not in a)]

But this is even sillier - not to mention it takes a ton of time.  Anywhere from 1 - 1.8 seconds for the calls.  So why not just use python’s set type?  I can then turn my code into this guy:

a, b = set(a), set(b)

sym_diff = a.symmetric_difference(b)

This takes about .2 - .4 seconds, which is a great improvement, especially considering the number of calculations I’m making.

Moral of the story: list comprehensions are pretty handy, but not particularly efficient.  I’ll report more after I jump into the C code that dictates these operations.  Likely I could cut down this run time even more!

Leave a Reply