Need an algorithm for collapsing netblock ranges into lists of superset ranges

algorithm

My math-fu is failing me! I need an efficient way of reducing network ranges to supersets, e.g. if I input list of IP ranges:

  • 1.1.1.1 to 2.2.2.5
  • 1.1.1.2 to 2.2.2.4
  • 10.5.5.5 to 155.5.5.5
  • 10.5.5.6 to 10.5.5.7

I want to return the following ranges:

  • 1.1.1.1 to 2.2.2.5
  • 10.5.5.5 to 155.5.5.5

Note: the input lists are not ordered (though they could be?). The naive way to do this is to check every range in the list to see if the input range x is a subset, and if so, NOT insert range x. However, whenever you insert a new range it might be a superset of existing ranges, so you have to check the existing ranges to see if they can be collapsed (e.g., removed from my list).

Best Solution

This is a union of segments computation. An optimal algorithm (in O(nlog(n))) consists in doing the following:

  1. sort all endpoints (starting and ending points) in a list L (each endpoint should know the segment it belongs to). If an endpoint is equal to a starting point, the starting point should be considered smaller than the enpoint.
  2. go through the sorted list L from left to right and maintain the number LE-RE, where LE is the number of left endpoints that you have already passed, and RE is the number of right endpoints that you have already passed.
  3. each time LE-RE reaches zero, you are at the end of a connected union of segments, and you know that the union of the segments you have seen before (since the previous return to zero) is one superset.
  4. if you also maintained the min and max, between each return to zero, you have the bounds of the superset.

At the end, you obtain a sorted list of disjoint supersets. Still, two supersets A and B can be adjacent (the endpoint of A is just before the starting point of B). If you want A and B to be merged, you can do this either by a simple postprocessing step, or by slightly modifying step 3: when LE-RE reaches zero, you would consider it the end of a superset only if the next element in L is not the direct successor of your current element.