Implementing Hoey Shamos Algorithm with C#

Implementing Hoey Shamos algorithm with C#

First, regarding the line intersection: you do not need the actual point of intersection, only to know if they intersect. See http://www.geeksforgeeks.org/check-if-two-given-line-segments-intersect/ for an algorithm that does just that.

About the List implementation:

In your implementation using Lists, you call indexOf on the sweepline to find nl. This searches the sweepline from start to end. See List(T).IndexOf. If you were to use the BinarySearch method, that ought to speed up the search considerably.

List's documentation has a paragraph called performance considerations. They urge you to use a value type that implements IEquatable<T> and IComparable<T>. So, your Line2D should probably be a struct and implement these interfaces.

If you follow that advice, retrieval of the endpoint from the sweepline should be O(log n), which is sufficient for your purpose, and memory should be used more efficiently.

Insertion and removal are O(n) for Lists, cause the underlying array needs to be moved around in memory. A SortedSet has faster insertion and removal, but I don't quite see how to find an item's neighbors in O(log n) there. Anyone? (See also Why SortedSet<T>.GetViewBetween isn't O(log N)?)

Anyways, the C5 TreeSet should solve this.

I looked up the performance of IndexOf and [i] in the user guide and they're both listed as O(log n). So that is not supposed to be the issue. It's still probably somewhat faster, but no more than a fixed factor, to call the specific methods for finding the neighbors on the sweepline, i.e. Successor and Predecessor, which are also O(log n).

[...]
try 
{
    Line2D above = sweepline.Successor(nl);
    if (above.intersectsLine(nl))
    {
        return false;
    }
}
catch (NoSuchItemException ignore) { }
[...]

I don't like that they do not have a method that doesn't throw the exception, since throwing exceptions is very expensive. Your sweep line will be pretty full generally so my best guess is that failure to find one will be rare and calling Successor is the most efficient way. Alternatively, you could keep calling IndexOf like you do now, but check if it equals Count minus one before retrieving [index + 1], and prevent the exception from being thrown at all:

[...]
int index = sweepline.IndexOf(nl);
if( index < sweepline.Count-1 )
{
    Line2D above = sweepline[index + 1];
    if (above.intersectsLine(nl))
    {
        return false;
    }
}
[...]

Chapter two of the manual describes equality and comparison for C5 collections. Here, too, at the very least you must implement IEquatable<T> and IComparable<T>!

One more thought: you report feeding the algorithm 700000 lines. Could you start with timing for example 1000, 2500, 5000, 10000 lines and seeing how the algorithm scales for cases where they do not intersect?

On how to compare the lines on the sweepline:

You need to find some sort of natural ordering for the Line2Ds on the Sweepline TreeSet, since the CompareTo method asks you to compare one Line2D to another. One of the Line2Ds already sits in the Sweepline TreeSet, the other has just been encountered and is being added.

Your sweepline runs from bottom to top, I think:

List<AlgEvent> SortedList = events.OrderBy(o => o.getY()).ToList();

Sweepline running through the line segments

So let's say segment S1 got added to the TreeSet at event 1, and we wish to compare it to S2, which is being added at event 2, right now.

The line segments could possibly intersect at some point, which would change the ordering, but the algorithm will check for this right after inserting them, in the above and below checks. Which would perhaps better be called the left and right checks, come to think of it.

Anyways.. so the easiest would be to compare the bottom endpoints of both line segments. To the left is smaller, to the right is bigger. However, we need to look at the ordering at the position of the sweepline and they may have changed positions since then, like in the picture.

So we need to compare the bottom endpoint of S2 to the red point on S1, and see if it lies to the left or to the right of that point. It lies to the left so S2 is considered smaller than S1.

Usually it's simpler than this: If all of S1 lies to the left of S2's bottom endpoint, S1 is smaller than S2. If all of S1 lies to the right of S2's bottom endpoint, S1 is larger than S2.

I think you're looking for the typesafer version of the interface:

public class Line2D : IComparable<Line2D>

Assuming two properties BottomY, the lowest of the two Y values, and BottomX, the X value of the lowest endpoint, a somewhat tested attempt:

int IComparable<Line2D>.CompareTo(Line2D other)
{
    if( BottomY < other.BottomY )
    {
        return -other.CompareTo(this);
    }

    // we're the segment being added to the sweepline
    if( BottomX >= other.X1 && BottomX >= other.X2 )
    {
        return 1;
    }
    if( BottomX <= other.X1 && BottomX <= other.X2 )
    {
        return -1;
    }

    if( other.Y2 == other.Y1 )
    {
        // Scary edge case: horizontal line that we intersect with. Return 0?
        return 0;
    }

    // calculate the X coordinate of the intersection of the other segment
    // with the sweepline
    // double redX = other.X1 + 
    //    (BottomY - other.Y1) * (other.X2 - other.X1) / (other.Y2 - other.Y1);
    //
    // return BottomX.CompareTo(redX);

    // But more efficient, and more along the lines of the orientation comparison:
    return Comparer<Double>.Default.Compare(
        (BottomX - other.X1) * (other.Y2 - other.Y1),
        (BottomY - other.Y1) * (other.X2 - other.X1) );

}

Efficiently finding the intersections of an array of lines

Please specify - do you mean infinite lines?

For line segments there is efficient Bentley-Ottmann algorithm.

For N infinite lines there are about N^2 intersections (if most of them are not parallel), so your approach is optimal in complexity sense (perhaps, micro-optimization is possible)

Edit: with clarified task description Bentley-Ottmann looks like overhead.

Find intersections of lines with clipping window
(for example, using Liang-Barsky algorithm)
Consider only lines that intersect window

Scan top window border from left corner to the right
Insert every line end into the binary search tree 
(and add link to this tree node from the paired end to the map) 

Scan right window border from top corner to the bottom right one
Check whether current line already has another end in the tree/map
If yes
    all lines with ends between positions of the first and 
    the second ends do intersect with it
    Calculate intersections and remove both line ends from tree and map
else
  Insert line end into the list

Continue for bottom and left edge.

Complexity O(N) for preliminary treatment and O(K + MlogM) for M lines intersecting window rectangle and K intersection (note that K might be about N^2)
Sample Image

Example: tree state for walking around the perimeter

 E               //and map F to E node
 EG              //and map H to G
 EGI     
 EGIK
 EGIK + H
      H has pair point G, so GH intersect IJ and KL
      remove G
 EIK
 EIK + F
      F has pair point E, so EH intersect IJ and KL
      remove E
 IK
 IK + J
      J has pair point I, so IJ intersect KL
      remove I
 K
 K+L  
    remove K
 end (5 intersections detected)

Implementing a brute force algorithm for detecting a self-intersecting polygon

Your current Brute-Force algorithm is O(n^2). For just your two 70,000 line polygons that's some factor of almost 10 Billion operations, to say nothing of the 700,000 other polygons. Obviously, no amount of mere code optimization is going to be enough, so you need some kind algorithmic optimization that can lower that O(n^2) without being unduly complicated.

The O(n^2) comes from the nested loops in the brute-force algorithm that are each bounded by n, making it O(n*n). The simplest way to improve this would be to find some way to reduce the inner loop so that it is not bound by or dependent on n. So what we need to find is some way to order or re-order the inner list of lines to check the outer line against so that only a part of the full list needs to be scanned.

The approach that I am taking takes advantage of the fact that if two line segments intersect, then the range of their X values must overlap each other. Mind you, this doesn't mean that they do intersect, but if their X ranges don't overlap, then they cannot be intersecting so theres no need to check them against each other. (this is true of the Y value ranges also, but you can only leverage one dimension at a time).

The advantage of this approach is that these X ranges can be used to order the endpoints of the lines which can in turn be used as the starting and stopping points for which lines to check against for intersection.

So specifically what we do is to define a custom class (endpointEntry) that represents the High or Low X values of the line's two points. These endpoints are all put into the same List structure and then sorted based on their X values.

Then we implement an outer loop where we scan the entire list just as in the brute-force algorithm. However our inner loop is considerably different. Instead of re-scanning the entire list for lines to check for intersection, we rather start scanning down the sorted endpoint list from the high X value endpoint of the outer loop's line and end it when we pass below that same line's low X value endpoint. In this way, we only check the lines whose range of X values overlap the outer loop's line.

OK, here's a c# class demonstrating my approach:

class CheckPolygon2
{
    // internal supporting classes
    class endpointEntry
    {
        public double XValue;
        public bool isHi;
        public Line2D line;
        public double hi;
        public double lo;
        public endpointEntry fLink;
        public endpointEntry bLink;
    }
    class endpointSorter : IComparer<endpointEntry>
    {
        public int Compare(endpointEntry c1, endpointEntry c2)
        {
            // sort values on XValue, descending
            if (c1.XValue > c2.XValue) { return -1; }
            else if (c1.XValue < c2.XValue) { return 1; }
            else // must be equal, make sure hi's sort before lo's
                if (c1.isHi && !c2.isHi) { return -1; }
                else if (!c1.isHi && c2.isHi) { return 1; }
                else { return 0; }
        }
    }

    public bool CheckForCrossing(List<Line2D> lines)
    {
        List<endpointEntry> pts = new List<endpointEntry>(2 * lines.Count);

        // Make endpoint objects from the lines so that we can sort all of the
        // lines endpoints.
        foreach (Line2D lin in lines)
        {
            // make the endpoint objects for this line
            endpointEntry hi, lo;
            if (lin.P1.X < lin.P2.X)
            {
                hi = new endpointEntry() { XValue = lin.P2.X, isHi = true, line = lin, hi = lin.P2.X, lo = lin.P1.X };
                lo = new endpointEntry() { XValue = lin.P1.X, isHi = false, line = lin, hi = lin.P1.X, lo = lin.P2.X };
            }
            else
            {
                hi = new endpointEntry() { XValue = lin.P1.X, isHi = true, line = lin, hi = lin.P1.X, lo = lin.P2.X };
                lo = new endpointEntry() { XValue = lin.P2.X, isHi = false, line = lin, hi = lin.P2.X, lo = lin.P1.X };
            }
            // add them to the sort-list
            pts.Add(hi);
            pts.Add(lo);
        }

        // sort the list
        pts.Sort(new endpointSorter());

        // sort the endpoint forward and backward links
        endpointEntry prev = null;
        foreach (endpointEntry pt in pts)
        {
            if (prev != null)
            {
                pt.bLink = prev;
                prev.fLink = pt;
            }
            prev = pt;
        }

        // NOW, we are ready to look for intersecting lines
        foreach (endpointEntry pt in pts)
        {
            // for every Hi endpoint ...
            if (pt.isHi)
            {
                // check every other line whose X-range is either wholly 
                // contained within our own, or that overlaps the high 
                // part of ours.  The other two cases of overlap (overlaps 
                // our low end, or wholly contains us) is covered by hi 
                // points above that scan down to check us.

                // scan down for each lo-endpoint below us checking each's 
                // line for intersection until we pass our own lo-X value
                for (endpointEntry pLo = pt.fLink; (pLo != null) && (pLo.XValue >= pt.lo); pLo = pLo.fLink)
                {
                    // is this a lo-endpoint?
                    if (!pLo.isHi)
                    {
                        // check its line for intersection
                        if (pt.line.intersectsLine(pLo.line))
                            return true;
                    }
                }
            }
        }

        return false;
    }
}

I am not certain what the true execution complexity of this algorithm is, but I suspect that for most non-pathological polygons it will be close to O(n*SQRT(n)) which should be fast enough.

Explanation of the Inner Loop logic:

The Inner Loop simply scans the endPoints list in the same sorted order as the outer loop. But it will start scanning from where the outer loop from where the outer loop currently is in the list (which is the hiX point of some line), and will only scan until the endPoint values drop below the corresponding loX value of that same line.

What's tricky here, is that this cannot be done with an Enumerator (the foreach(..in pts) of the outer loop) because there's no way to enumerate a sublist of a list, nor to start the enumeration based on another enumerations current position. So instead what I did was to use the Forward and Backward Links (fLink and bLink) properties to make a doubly-linked list structure that retains the sorted order of the list, but that I can incrementally scan without enumerating the list:

for (endpointEntry pLo = pt.fLink; (pLo != null) && (pLo.XValue >= pt.lo); pLo = pLo.fLink)

Breaking this down, the old-style for loop specifier has three parts: initialization, condition, and increment-decrement. The initialization expression, endpointEntry pLo = pt.fLink; initializes pLo with the forward Link of the current point in the list. That is, the next point in the list, in descending sorted order.

Then the body of the inner loop gets executed. Then the increment-decrement pLo = pLo.fLink gets applied, which simply sets the inner loop's current point (pLo) to the next lower point using it's forward-link (pLo.fLink), thus advancing the loop.

Finally, it loops after testing the loop condition (pLo != null) && (pLo.XValue >= pt.lo) which loops so long as the new point isn't null (which would mean that we were at the end of the list) and so long as the new point's XValue is still greater than or equal to the low X value of the outer loop's current point. This second condition insures that the inner loop only looks at lines that are overlapping the line of the outer loop's endpoint.

What is clearer to me now, is that I probably could have gotten around this whole fLink-bLink clumsiness by treating the endPoints List as an array instead:

Fill up the list with endPoints
Sort the List by XValue
Outer Loop through the list in descending order, using an index iterator (i), instead of a foreach enumerator
(A) Inner Loop through the list, using a second iterator (j), starting at i and ending when it passed below pt.Lo.

That I think would be much simpler. I can post a simplified version like that, if you want.

Is there an efficient algorithm for finding all intersections of a set of infinite lines?

In general case (not all lines are parallel) there are O(n^2) intersections, so simple loop with intersection calculation is the best approach

(there is no way to get n*(n-1)/2 points without calculations)

For the case of existence a lot of parallel lines make at first grouping lines by direction and check only intersections between lines in diffferent groups

Bentley-Ottmann Algorithm Implementation

Here is a Java implementation of the Bentley-Ottman algorithm

Trim lasso that contains closed loop

you can test for line intersections
this is O(n^2) with naive approach if you segmentate the lines into segments then the complexity became much better see
- Implementing Hoey Shamos algorithm with C#
for this and other approaches in better complexities then O(n^2)
you can exploit periodicity of loop
closed loop is similar to circle so the x axis goes through the same point twice and the y axis too. So remember starting x,y loop through all lines and when x or y gets near the starting point (just single axis not booth whichever first) then stop.
This is O(n) now just test for intersection around the line you stop and forward from start point up to some treshold distance. If no intersection found do this again with the other axis stop point. The test is still O(m^2) but this time m<<n
you can integrate the angle
compute center as average point O(n) now loop through lines from start and compute angle its covering in a lasso circle (from center) add it up to some variable and when reach 360 degrees stop. Do the same from end.
Now you have selected the inner part of the loop so test only lines from the stop points to start/stop of lasso

[edit1] had a bit time/mood today for this so here is the bullet 2 in more detail

compute bounding box of lasso
need min,max of x,y coordinates let call them x0<=x1,y0<=y1 this is done easy by single for loop in O(n)
allocate and compute line counters per coordinate
you need 2 arrays int cntx[x1-x0+1],cnty[y1-y0+1] if you have floating coordinates truncate to some grid/scale. Do not forget to clear the values to zero. The computation is easy for any rendered pixel(x,y) of the lasso.
Increment counter cntx[x-x0]++; cnty[y-y0]++; do not increment on endpoint (to avoid duplicate increments). This is also O(n)
find first line from start that has more than 2 lines per coordinate
let it be i0
find next line after it that has less or equal lines then 2 per coordinate
let it be i1
do the same from end and call the indexes j0,j1
find the intersection
now you need to check just intersection between any line from <i0,i1> against any line from <j0,j1> this is O(n^2)

Here an example:

example

hand drawed lasso in mine svg editor (sadly SO does not support svg images). On left side you see blue lines (lines count > 2) and on sides there is graphs for cntx,cnty contents. The right side is computed selection (i0,i1),(j0,j1) in red

Some source code in C++:

int i,i0,i1,j0,j1;
int a,x,y,x0,y0,x1,y1;
int *cntx,*cnty;
List<int> pnt;  // laso points (x,y,flag)

// bounding box O(n) and reset flags
x=pnt[0]; x0=x; x1=x;
y=pnt[1]; y0=y; y1=y;
for (i=0;i<pnt.num;)
    {
    x=pnt[i]; i++;
    y=pnt[i]; i++;
    pnt[i]=0; i++;
    if (x0>x) x0=x;
    if (x1<x) x1=x;
    if (y0>y) y0=y;
    if (y1<y) y1=y;
    }
// allocate and compute counter buffers (count how many line at x or y coordinate)
cntx=new int[x1-x0+1];
cnty=new int[y1-y0+1];
for (x=x0;x<=x1;x++) cntx[x-x0]=0;
for (y=y0;y<=y1;y++) cnty[y-y0]=0;
x=pnt[0];
y=pnt[1];
for (i=0;i<pnt.num;)
    {
    a=pnt[i]; i++;
    for (;x<a;x++) cntx[x-x0]++;
    for (;x>a;x--) cntx[x-x0]++; x=a;
    a=pnt[i]; i++; i++;
    for (;y<a;y++) cnty[y-y0]++;
    for (;y>a;y--) cnty[y-y0]++; y=a;
    }
// select sections with 3 lines (trimable)
for (i=0;i<pnt.num;)
    {
    x=pnt[i]; i++;
    y=pnt[i]; i++;
    if ((cntx[x-x0]>2)||(cnty[y-y0]>2)) pnt[i]=1; i++;
    }
// select trim areas from start (i0,i1) and from end (j0,j1)
for (i=0      ;i<pnt.num;) { i0=i; i++; i++; a=pnt[i]; i++; if ( a) break; }
for (         ;i<pnt.num;) { i1=i; i++; i++; a=pnt[i]; i++; if (!a) break; }
for (i=pnt.num;i>0      ;) { i--; a=pnt[i]; i--; i--; j1=i; if ( a) break; }
for (         ;i>0      ;) { i--; a=pnt[i]; i--; i--; j0=i; if (!a) break; }
delete[] cntx;
delete[] cnty;

pnt[pnt.num] is mine dynamic array holding the laso coordinates and flags for each vertex
pnt[0]=x(0),pnt[1]=y(0),pnt[2]=flag(0),pnt[3]=x(1),...
at the end the i0,i1,j0,j1 contains the trimable point/lines selection so it is part where at least 3 parallel lines exist and from that selected just first from start and end. Hope it is clear enough now.

Reduce time taken to find N line intersection

Here's an O(n log n)-time algorithm that uses a sweep line with a Fenwick tree.

Step 0: coordinate remapping

Sort the x-coordinates and replace each value by an integer in 0..n-1 so as to preserve order. Do the same for y-coordinates. The intersection properties are preserved while allowing the algorithms below an easier implementation.

Step 1: parallel line segments

I'll describe this step for horizontal segments. Repeat for the vertical segments.

Group the horizontal segments by y coordinate. Process one group at a time, creating events for the sweep line as follows. Each segment gets a start event at its lesser endpoint and a stop event at its greater. Sort starts before stops if you want closed line segments. Sweep the events in sorted order, tracking the number of line segments that currently intersect the sweep line and the number of start events processed. The number of parallel intersections for a segment is (number of segments intersected at start time + number of start events processed at stop time - number of start events processed at start time). (See also Given a set of intervals, find the interval which has the maximum number of intersections for a previous explanation of mine for this.)

Step 2: perpendicular line segments

I'll describe this step in terms of counting for each horizontal line segment how many vertical line segments it intersects.

We do another sweep line algorithm. The events are horizontal segment starts, vertical segments, and horizontal segment stops, sorted in that order assuming closed line segments. We use a Fenwick tree to track, for each y coordinate, how many vertical segments have covered that y coordinate so far. To process a horizontal start, subtract the tree value for its y coordinate from its intersection count. To process a horizontal stop, add the tree value for its y coordinate to its intersection count. This means that the count increases by the difference, which is the number of vertical segments that stabbed the horizontal one while it was active. To process a vertical segment, use the power of the Fenwick tree to quickly increment all of the values between its lesser y coordinate and its greater (inclusive assuming closed segments).

If you want, these sweep line algorithms can be combined. I kept them separate for expository reasons.

Implementing Hoey Shamos Algorithm with C#