tag:nickp.svbtle.com,2014:/feedNicholas Pilkington2016-10-09T17:21:16-07:00Nicholas Pilkingtonhttps://nickp.svbtle.comSvbtle.comtag:nickp.svbtle.com,2014:Post/blue-angel2016-10-09T17:21:16-07:002016-10-09T17:21:16-07:00Fat Albert<p>I was watching the <a href="https://en.wikipedia.org/wiki/Fleet_Week">Fleet Week</a> display by the <a href="https://en.wikipedia.org/wiki/Blue_Angels">Blue Angels</a> yesterday and we were talking about if you could determine where an aircraft was based on the sounds you were hearing from the engine. </p>
<p><a href="https://svbtleusercontent.com/tpyiovgnkazyow.png"><img src="https://svbtleusercontent.com/tpyiovgnkazyow_small.png" alt="Screenshot 2016-10-09 15.58.30.png"></a></p>
<p>Say we have an aircraft at some unknown position flying at a constant linear velocity. If the engine is emitting sound at a constant frequency and as soon as we start hearing the engine we start recording the sound. Then given <strong>just</strong> that audio let’s try and determine how far away the aircraft is and how fast it’s traveling. Here’s a generated <a href="https://www.dropbox.com/s/a0r76ldm0yrbskw/audio.wav?dl=0">sample recording</a> of a source starting <code class="prettyprint">315.914</code> meters away and traveling at <code class="prettyprint">214</code> meters per second in an unknown direction. </p>
<p>First let’s make a simplification. We can rotate our frame of reference such that the aircraft is traveling along the x-axis from some unknown starting point. If we look from above the situation looks like this. </p>
<p><a href="https://svbtleusercontent.com/3feyeptxkgjdva.png"><img src="https://svbtleusercontent.com/3feyeptxkgjdva_small.png" alt="Screenshot 2016-10-09 17.13.50.png"></a></p>
<p>When working with audio the first thing to do would probably be to plot the spectrogram and see if we can gleam anything from that. The spectrogram of a WAV file can be plotted using this code:</p>
<pre><code class="prettyprint">Fs, audio = scipy.io.wavfile.read('audio.wav')
MAX_FREQUENCY = 2000
pylab.figure(facecolor='white')
pylab.specgram(audio, NFFT = 1024, Fs=Fs, cmap=pylab.cm.gist_heat)
pylab.ylim((100,MAX_FREQUENCY))
pylab.xlim((0,1.1))
pylab.xlabel('Time (s)')
pylab.ylabel('Frequency (Hz)')
pylab.show()
</code></pre>
<p>and the result spectrogram which shows the power spectrum of the received signal as a function of time looks like this.</p>
<p><a href="https://svbtleusercontent.com/b2vg7xfvhvkowq.png"><img src="https://svbtleusercontent.com/b2vg7xfvhvkowq_small.png" alt="Screenshot 2016-10-09 16.12.57.png"></a></p>
<p>This looks great. Most importantly you can see the <a href="https://en.wikipedia.org/wiki/Doppler_effect">Doppler Effect</a> in action because the sound waves are compressing in the direction of the observer. This implies that the aircraft is moving towards us. Other than that there isn’t much that can be gained here. We can look at the inflection point of the spectrogram and infer that this is the point where the aircraft is passing perpendicular to us which corresponds to the actual frequency that the engine is emitting which in this case looks like about <code class="prettyprint">500</code> Hertz. However we can’t assume that the aircraft will pass us so we probably can’t even take that.</p>
<p><a href="https://svbtleusercontent.com/dx3uzapbh44j2w.png"><img src="https://svbtleusercontent.com/dx3uzapbh44j2w_small.png" alt="Screenshot 2016-10-09 16.48.39.png"></a></p>
<p>Let’s try something different. Let’s analyze this in the time domain instead. When the aircraft starts emitting sounds at some real time <code class="prettyprint">t</code> that sound takes a while before it arrives at the observer. This delay depends on the distance from the observer and the speed of sound. When this first bit of audio arrives at the observer which is <code class="prettyprint">t=0</code> but in “receiver time” the aircraft has already been flying for a while. So this first piece of audio corresponds to a previous location. We don’t know what this delay is because we don’t know how far away the plane way. Since the frequency of the sounds we are receiving are changing because of the Doppler Effect we can’t really rely on frequency analysis either. Let’s instead zoom in and look at the <a href="https://en.wikipedia.org/wiki/Zero_crossing">zero-crossings</a> of the signal. </p>
<p>The zero-crossing are the points in time that the signal (regardless of frequency) cross the x-axis. In “real time” there will be a number of times when this happens and they will be constantly spaced by <code class="prettyprint">1/(f*2.0)</code> where <code class="prettyprint">f</code> is the frequency of the sounds emitted by the engine. However when we receive the signal and the aircraft is traveling towards us - it will be squashed, and have shorter time between zero-crossings and then further apart as the aircraft flies away. So the signal get’s concertinas in a specific way. Here’s an exaggerated diagram of what is being emitted and what is being received when:</p>
<p><a href="https://svbtleusercontent.com/f9rr4mcb9h4sxw.png"><img src="https://svbtleusercontent.com/f9rr4mcb9h4sxw_small.png" alt="Screenshot 2016-10-09 17.14.35.png"></a></p>
<p>Let’s say the plane is traveling with a speed of <code class="prettyprint">v</code> parallel to the x-axis. So its x-coordinates at time <code class="prettyprint">t</code> is <code class="prettyprint">x0 + v * t</code> (some unknown starting point) and its y-coordinate is <code class="prettyprint">R</code> (some unknown distance). Here <code class="prettyprint">t</code> is the real time when the signal is emitted. The time for this signal to reach us is:</p>
<pre><code class="prettyprint">import numpy as np
def reach_time(x0, v, t, R):
c = 340.29 # speed of sound
dt = np.sqrt((x0 + v*t)**2 + R**2)/c
return dt
</code></pre>
<p>The time stamp in received time is just just <code class="prettyprint">reach_time(x0, v, t, R) + t - t0</code> where <code class="prettyprint">t0</code> is the initial and unknown delay for the first signal to reach us. From this we can get the timestamp of the nth zero-crossing knowing that the source frequency is fixed.</p>
<pre><code class="prettyprint">import numpy as np
def nth_zero_crossing(n, x0, v, R, f, n0):
c = 340.29 # speed of sound
f2 = 2.0*f
return (np.sqrt((x0 + v*n/f2)**2 + R**2)/c + (n - n0)/f2)
</code></pre>
<p>So we’ve got a model that maps the time of a zero-crossing at the source to the time of a zero-crossing in our WAV file. This is a mapping of zero-crossings in the source to zero-crossing in the received signal. Which are the orange lines in this image:</p>
<p><a href="https://svbtleusercontent.com/joau6wakv8a75q.png"><img src="https://svbtleusercontent.com/joau6wakv8a75q_small.png" alt="Screenshot 2016-10-09 17.19.30.png"></a></p>
<p>Now we need to extract the zero-crossings from the WAV file so we can compare. We could use some more advanced interpolation but since there are <code class="prettyprint">44100</code> samples per second in the audio file the impact on the resulting error term should be small. Here’s some code to extract the time of each zero-crossing in an audio file.</p>
<pre><code class="prettyprint">import scipy
import numpy as np
Fs, audio = scipy.io.wavfile.read(fn)
audio = np.array(song, dtype='float64')
# normalize
audio = (audio - audio.mean()) / audio.std()
prev = song[0]
ztimes = [ 0 ]
for j in xrange(2, song.shape[0]):
if (song[j] * prev <= 0 and prev != 0):
cross = float(j) / Fs
ztimes.append(cross)
prev = song[j]
</code></pre>
<p>This gives us a generative model where we can select some parameters of the situation and using the <code class="prettyprint">nth_zero_crossing</code> compute what the received signal would look like. This puts us in a good position to create an error function between the actual (empirical) data in the audio file and the generated data based on our parameters. Then we can try and find the parameters that minimize this error. Here some code that computes the residue of our generates signal:</p>
<pre><code class="prettyprint">import numpy as np
def gen_received_signal(args):
f2, v, x0, R, n0 = args
n = np.arange(len(ztimes))
y = (np.sqrt((x0 + v*n/f2)**2 + R**2)/c + (n - n0)/f2)
error = np.array(ztimes) - y
return error
</code></pre>
<p><a href="https://svbtleusercontent.com/bz5ptw6tpcj77a.png"><img src="https://svbtleusercontent.com/bz5ptw6tpcj77a_small.png" alt="Screenshot 2016-10-09 17.04.27.png"></a></p>
<p>Using a non-linear least squares solver like <a href="https://en.wikipedia.org/wiki/Levenberg%E2%80%93Marquardt_algorithm">Levenberg Marquardt</a> we can search for the parameters that best explain our data. </p>
<pre><code class="prettyprint">import numpy as np
from scipy.optimize import least_squares
f2 = 1600
v = 100
x0 = -100
R = 10
n0 = 100
args = [f2, v, x0, R, n0]
res = least_squares(gen_received_signal, args)
f2, v, x0, R, n0 = res.x
# compute the initial distance
D = np.sqrt(x0**2+R**2)
print 'Solution distance=', D, 'x0=',x0, 'v=',v, 'f=',f2/2.0
</code></pre>
<p>Out of this pops the solution and more. It has also accurately computed the source frequency given some bad initial guesses. Since we aren’t assuming anything about the change in frequency this approach also works when the aircraft does not pass us and is only recorded on approach or flying away from us. In reality the sound would attenuate quadratically over distance but that should not impact this solution because we don’t use amplitudes.</p>
tag:nickp.svbtle.com,2014:Post/asteroid-intersections2016-10-01T10:23:39-07:002016-10-01T10:23:39-07:00Minkowski Asteroids<p>We have two convex polygons <code class="prettyprint">P</code> and <code class="prettyprint">Q</code> each moving at constant velocities <code class="prettyprint">Vp</code> and <code class="prettyprint">Vq</code>. At some point in time they <strong>may</strong> pass through one another. We would like to find the point in time at which the area of their intersection is at a maximum. Here is a simple visualization, where the yellow area represents the intersection and the arrow heads represents the velocity of the polygons.</p>
<p><a href="https://svbtleusercontent.com/ksl17eoepmjkww.gif"><img src="https://svbtleusercontent.com/ksl17eoepmjkww_small.gif" alt="output_z4aALm.gif"></a></p>
<p>Let’s first look at the problem of testing whether two polygons intersect. The simplest way to do it is to check if any edges of one polygon intersect any edges of the other. For this we need a line segment intersection algorithm. We can check if two line segments <code class="prettyprint">A - B</code> and <code class="prettyprint">C - D</code> intersect if the signed area of the triangle <code class="prettyprint">A, B, C</code> is different from the signed area of the triangle <code class="prettyprint">A, B, D</code> and similarly for <code class="prettyprint">C, D, A</code> and <code class="prettyprint">C, D, B</code>. It’s simple and runs in <code class="prettyprint">O(N^2)</code>. Here’s the code:</p>
<pre><code class="prettyprint">import numpy as np
def area(A, B, C):
return np.linalg.norm(np.cross(B - A, B - C)) * 0.5
def intersect(A, B, C, D):
return area(A, B, C) != area(A, B, D) and area(C, D, A) != area(C, D, B)
def polygons_intersect(P, Q):
n = len(P)
m = len(P)
for i in xrange(n):
for j in xrange(m):
if intersect(P[i], P[(i+1)%n], Q[j], Q[(j+1)%m]):
return True
return False
</code></pre>
<h1 id="aside_1">Aside: <a class="head_anchor" href="#aside_1">#</a>
</h1>
<p>There is a another way to do this using something called the <a href="https://en.wikipedia.org/wiki/Hyperplane_separation_theorem">hyperplane separation theorem</a>. Rather than explaining it I’ll plot an example of how it works in two dimensions, which I think is more helpful. Take each edge of the polygons in questions and extend them outwards in the direction of their normals. In the plots below the dotted lines represent normals and the solid lines, the extensions of and edges of one of the polygons. Let’s call extensions of the edges “barriers”. Now consider projecting both shapes onto any of the barriers. This would turn them into line segments on the barriers. In the case of intersection these segments would overlap on all the barriers. Look at this plot and confirm that projecting the shapes in the center onto any barrier would yield a solid line segment, not two.</p>
<p><a href="https://svbtleusercontent.com/diozord5fvahpa.png"><img src="https://svbtleusercontent.com/diozord5fvahpa_small.png" alt="Screenshot 2016-10-01 11.50.13.png"></a></p>
<p>This is different in the case where there is no intersection. In this case there is at least one barrier where the projection of the shapes do not for a solid line segment. In this example it’s the purple barrier and you can see that the normal to this barrier actually shows the separation of the shapes (purple dotted line). How do we check if the projection of the shapes on to the barrier intersect? We can project those segments again down onto something simple line a horizontal line and see if their endpoints overlap.</p>
<p><a href="https://svbtleusercontent.com/pl9jjtwrtkb6lq.png"><img src="https://svbtleusercontent.com/pl9jjtwrtkb6lq_small.png" alt="Screenshot 2016-10-01 12.03.15.png"></a></p>
<p>Armed with some fast polygon intersection algorithms we can go back to our original problem and try various points in time and check whether the polygons intersect. This is still not great because there is no guarantee that they <strong>will</strong> intersect and even then, what we are actually looking for is the range of times during which the shapes intersect so we can compute the maximum area of overlap. </p>
<p>Let’s try another approach. First let’s make a simplification and assume that one of the shapes is stationary and the other is moving relative to it. Let’s project each point on our moving polygon out in the direction of its velocity. </p>
<p><a href="https://svbtleusercontent.com/iuy1yyercc7z1g.png"><img src="https://svbtleusercontent.com/iuy1yyercc7z1g_small.png" alt="Screenshot 2016-10-01 12.37.29.png"></a></p>
<p>These rays may or may not intersect the other polygon. In this case we have four intersections. With each of the intersections we can compute a time based on the velocity of the polygon. We can sort these times and return the minimum and maximum as time range of overlap.</p>
<pre><code class="prettyprint">def overlap_range(P, Q, V):
for x, y in zip(*Q.exterior.xy):
# simulate a ray as a long line segment
SCALE = 1e10
segment_x = [ x, x + SCALE*V[0] ]
segment_y = [ y, y + SCALE*V[1] ]
line = [segment_x, segment_y]
points = intersections(P, line)
for point in points:
ix, iy = point.x, point.y
# time of intersection
when = math.hypot(x-ix, y-iy) / math.hypot(V[0], V[1])
intersection_times.append(when)
intersection_times.sort()
if len(intersection_time) == 0:
print "polygons never overlap"
elif len(intersection_time) == 1:
print "polygons touch by don't overlap"
return intersection_time[0], intersection_time[-1]
</code></pre>
<p>Here each point on the polygon creates a line segment and that is tested against each edge of the other polygon. So we’ve handled the issues of when the polygons don’t ever intersect but we can do even better with <a href="https://en.wikipedia.org/wiki/Minkowski_space">Minkowski geometry</a>. We’ll use something called the Minkowski difference between two sets of points. In image processing it’s related to <a href="https://en.wikipedia.org/wiki/Erosion_(morphology)">erosion</a>. What this does is take one shape, mirror it about the origin and then compute the Minkowski sum of the mirrored shape and the other one. The Minkowski sum is related to <a href="https://en.wikipedia.org/wiki/Dilation_(morphology)">dilation</a> and for two sets of points <code class="prettyprint">P</code> and <code class="prettyprint">Q</code> is defined as all points <code class="prettyprint">p+q</code> where <code class="prettyprint">p</code> is in <code class="prettyprint">P</code> and <code class="prettyprint">q</code> is in <code class="prettyprint">Q</code>. </p>
<p>Don’t worry two much about the definitions. Here is the <strong>key point</strong> to understand. We are looking to compute if the two polygons intersect. If they intersect there is a point in each polygon that is inside both of them. If so mirroring one polygon and computing the Minkowski sum polygon would create a polygon that contains the origin.</p>
<p>Here’s some code to compute the Minkowski difference between two polygons. Since both sets are convex we take the convex hull of the resulting polygon to create a new convex polygon.</p>
<pre><code class="prettyprint">def minkowski_difference(P, Q):
R = []
for i in xrange(len(P)):
for j in xrange(len(Q)):
R.append((P[i] - Q[j], P[i] - Q[j]))
return convex_hull(R)
</code></pre>
<p>The <a href="https://en.wikipedia.org/wiki/Convex_hull">convex hull</a> is just the minimum sized convex polygon that encloses all the points. If you hammer a bunch of nails into a board and stretch an elastic band around all the the nails; the nails that touch the elastic band are the convex hull. It can be computed in <code class="prettyprint">O(N*log(N))</code> with a <a href="https://en.wikipedia.org/wiki/Graham_scan">Graham Scan</a>. Here’s a image showing two sets of points (red and blue) and their corresponding convex hulls. It also shows the intersection in yellow which is the convex hull of the points in both polygons and the intersection points. Seeing this go back and confirm the <strong>key point</strong> above that if the polygons intersect their Minkowski difference contains the origin.</p>
<p><a href="https://svbtleusercontent.com/swbpyosqm1fgg.png"><img src="https://svbtleusercontent.com/swbpyosqm1fgg_small.png" alt="Screenshot 2016-10-02 14.06.38.png"></a></p>
<p>Now instead of having two polygons we have one green polygon that is the Minkowski difference of the other two. In addition, from the definition of the Minkowski difference, we know that if the origin is inside this polygon the two comprising polygons intersect one another. This is a really important fact which let’s us compute collisions really fast and more importantly when the collision will happen. We can also compute the first and last points of intersection of these polygons using a single ray from the origin in the direction of the relative velocities of the polygons.</p>
<p><a href="https://svbtleusercontent.com/ofqg5xpm85tfwq.png"><img src="https://svbtleusercontent.com/ofqg5xpm85tfwq_small.png" alt="Screenshot 2016-10-01 13.18.39.png"></a></p>
<p>Here’s an illustration of what is happening in both normal and Minkowski space. You can see the blue and red polygons passing through one another. At the same time you can watch the green polygon representing the Minkowski difference between the red and blue polygon moving through the origin at the same time.</p>
<p><a href="https://svbtleusercontent.com/ynfpafrpvx0dpw.gif"><img src="https://svbtleusercontent.com/ynfpafrpvx0dpw_small.gif" alt="output_XH5Mwd.gif"></a></p>
<p>Once we’ve retrieved this range (the first and last intersection times) we can sample some points in that range and compute the overlap. The intersection of two convex polygons another convex polygon which is the convex hull of the intersections points and points that lie inside both polygons. Using our convex hull function and our intersection function we can compute this polygon and then use the <a href="https://en.wikipedia.org/wiki/Shoelace_formula">Surveyor’s Algorithm</a> to compute the area.</p>
<p>Finally putting all the pieces together we have an algorithm that takes the Minkowski difference of two polygons then computes the (generally) two points of intersection of the ray from the origin to the Minkowski difference polygon. Using the times of the two intersections we can compute the overlap of the two polygons as a function of time. Plotting the result we get this.</p>
<p><a href="https://svbtleusercontent.com/gloajfbi5lwdna.png"><img src="https://svbtleusercontent.com/gloajfbi5lwdna_small.png" alt="Screenshot 2016-10-01 12.55.45.png"></a></p>
<p>The final task remains to compute the maximum of this function. It seems that that the overlap is unimodal where the maximum is reached if one shape is entirely inside the other. Theres is proof in this <a href="https://hal.archives-ouvertes.fr/inria-00073859/document">paper</a> . Since the function is unimodal we can use a <a href="https://en.wikipedia.org/wiki/Ternary_search">ternary search</a> to quickly compute the maximum in <code class="prettyprint">O(log N)</code>.</p>
<pre><code class="prettyprint">def findMax(objectiveFunc, lower, upper):
if abs(upper - lower) < 1e-6:
return (lower + upper)/2
lowerThird = (2*lower + upper)/3
upperThird = (lower + 2*upper)/3
if objectiveFunc(lowerThird) < objectiveFunc(upperThird):
return findMax(objectiveFunc, lowerThird, upper)
else:
return findMax(objectiveFunc, lower, upperThird)
</code></pre>
<p>Minkowski geometry extends to <code class="prettyprint">N</code> dimensions and the principles stay the same - which can make it easier to do things like collision detection and response in 3 dimensions where the more simplistic methods don’t generalize well. This question was posed at the ICPC ACM World Finals. </p>
tag:nickp.svbtle.com,2014:Post/telephone-tapping2016-09-18T01:23:06-07:002016-09-18T01:23:06-07:00Telephone Wiretapping <p>Imagine you are looking to intercept a communication that can happen between two people over a telephone network. Let’s say that the two people in question are part of a larger group, who all communicate with each other (sometimes via other people in the network if they don’t have their phone number). We can represent this as a graph where the vertices are people and edges connect two people if they have each other in their phone books. </p>
<p>Here’s a network with <code class="prettyprint">6</code> people, some of whom don’t directly communicate with each other but can do so through others. Each person can reach all the others though, so the graph is connected. </p>
<p><a href="https://svbtleusercontent.com/mu1rhu10ijcurg.jpg"><img src="https://svbtleusercontent.com/mu1rhu10ijcurg_small.jpg" alt="graph.jpg"></a></p>
<p>Let’s also assume that this group of people communicate efficiently and use the smallest amount of calls possible and always distribute information to every person. If an unknown member of the network wants to communicate some nefarious plans to all the other members they call some people who in turn spread the message through the network by making more calls while adhering to the rules above. If we can tap a single link between two people what is the probability of intercepting one of these calls? Let’s work through an example of a network of 4 people who can each communicate with two others. The graph looks like this:</p>
<p><a href="https://svbtleusercontent.com/gvrltcrazv8ukq.png"><img src="https://svbtleusercontent.com/gvrltcrazv8ukq_small.png" alt="Screenshot 2016-09-18 00.50.46.png"></a></p>
<p>If we tap then link connecting <code class="prettyprint">0</code> and <code class="prettyprint">1</code> there is only one way to communicate to all members without using this tapped link, where as there are <code class="prettyprint">3</code> that do use it. Meaning that the probability of intercepting the information is <code class="prettyprint">0.75</code>. The small images represent the ways in which the communication can happen and those that use the tapped link (will be intercepted) are highlighted.</p>
<p>There are a couple of important things to note as this point. Firstly the links chosen to communicate over form a <a href="https://en.wikipedia.org/wiki/Spanning_tree">spanning tree</a> of the graph. This is an important property as a spanning tree has one less edge than the number of nodes and doesn’t contain any cycles. Cycles would mean that the communication has not been efficient because we could remove an edge on the cycle and still have the information reach all the people. </p>
<p>Let’s work through another example and compute the probability of intercepting the communication if we tap a specific link. Here is another graph. It represents <code class="prettyprint">4</code> people but this time there are <code class="prettyprint">6</code> links. Everyone can communicate with everyone else. Let’s tap the top link - highlighted in yellow.</p>
<p><a href="https://svbtleusercontent.com/atkdvbglghrxua.png"><img src="https://svbtleusercontent.com/atkdvbglghrxua_small.png" alt="Screenshot 2016-09-17 20.37.48.png"></a></p>
<p>Now let’s enumerate all the spanning trees of this graph manually. Notice that each spanning tree connects all the vertices in the original graph just using fewer edges. In particular <code class="prettyprint">3</code> edges which is one less that the number of vertices. Adding another edge would create a cycle. There are <code class="prettyprint">16</code> different spanning trees and <code class="prettyprint">8</code> of them (highlighted in yellow) use the link we have tapped. This means the probability of intercepting the transitions is <code class="prettyprint">8.0 / 16.0 = 0.5</code>.</p>
<p><a href="https://svbtleusercontent.com/x0lbh6pnfuqhg.png"><img src="https://svbtleusercontent.com/x0lbh6pnfuqhg_small.png" alt="Screenshot 2016-09-17 20.37.52.png"></a></p>
<p>Cool! So to solve this problem we need to count the number of spanning trees of a graph that uses a specified edge - call that value <code class="prettyprint">A</code>. Then compute the number of spanning trees that the graph has - call that value <code class="prettyprint">B</code>. The probability of intercepting the communication on the tapped link is <code class="prettyprint">A/B</code>. </p>
<p>The number of spanning trees that use a specific edges can be computed by collapsing the vertices at each end of that edge into one vertex and computing the number of spanning trees for that new multi-graph. For example for the cross-box graph above if we want to find the number of spanning trees that use the top edge we collapse it and generate the graph on the right which indeed has <code class="prettyprint">8</code> spanning trees. Remember this could create multi edges between vertices.</p>
<p><a href="https://svbtleusercontent.com/bklqjh9ylqwxra.png"><img src="https://svbtleusercontent.com/bklqjh9ylqwxra_small.png" alt="Screenshot 2016-09-18 00.51.45.png"></a></p>
<p>Enumerating all the spanning trees is not a feasible option as this number grows really quickly. In fact <a href="https://en.wikipedia.org/wiki/Cayley%27s_formula">Cayley’s formula</a> gives the number of spanning trees of a complete graph of size <code class="prettyprint">K</code> as <code class="prettyprint">K ** (K-2)</code>. </p>
<p>Instead we can use Kirchhoff’s matrix tree theorem. Which tells us that if we have a graph represented by an adjacency matrix <code class="prettyprint">G</code> we can count the number of the spanning trees as:</p>
<p><img src="https://wikimedia.org/api/rest_v1/media/math/render/svg/4bd618b8b7f0e5506ca460b410160f107bc2436f" alt="graph.jpg"></p>
<p>Where the lambdas are the non-zero eigenvalues of the associated Laplacian matrix <code class="prettyprint">G</code>. It’s actually easier and more numerically stable to compute the determinant of a cofactor of the Laplacian which gives the same result. The Laplacian matrix is used to compute lots of useful properties of graphs. It is equal to the degree matrix minus the adjacency matrix:</p>
<p><img src="https://wikimedia.org/api/rest_v1/media/math/render/svg/712994d22cc3a9e0bd6148764a17c1628f843062" alt="graph.pj"></p>
<p>Computing the Laplacian from an adjacency matrix can be done with this code:</p>
<pre><code class="prettyprint lang-python"># compute the Laplacian of the adjacency matrix
def laplacian(A):
L = -A
for a in xrange(L.shape[0]):
for b in xrange(L.shape[1]):
if A[a][b]:
L[a][a] += A[a][b] # increase degree
return L
</code></pre>
<p>Using this we can compute the cofactor. </p>
<pre><code class="prettyprint lang-python">def cofactor(L, factor=1.0):
Q = L[1::, 1::] # bottom right minor
return np.linalg.det(Q / factor)
</code></pre>
<p>Also I added a scaling parameter to the cofactor computation. The determinants can get really big when the network has thousands of vertices. In this case computing the numerator and denominator of the probability can result in overflow. If we take some factor <code class="prettyprint">factor</code> out of the Laplacian matrices before computing the determinant we reduce the value by <code class="prettyprint">factor ** N</code> where N is the size of the matrix. Using this we can compute the probability for large matrices because the factors almost totally cancel out because the matrices dimensions different by only 1.</p>
<pre><code class="prettyprint lang-python">def probability(G1, G2):
factor = 24.0
# det(A) = f**n*det(A/f)
L1 = laplacian(G1)
L2 = laplacian(G2)
Q1 = cofactor(L1, factor=factor)
Q2 = cofactor(L2, factor=factor)
# f**(n-1) * det(G1/f)
# -------------------
# f**(n-2) * det(G2/f)
return Q1 / Q2 / factor
</code></pre>
<p>Using this we can go through each edge in the graph and compute the probability of intercepting if we tap that edge. This value will change depending on the graph and if it has an <a href="https://en.wikipedia.org/wiki/Biconnected_component">articulating points</a> these edges will have probability <code class="prettyprint">1.0</code>.</p>
tag:nickp.svbtle.com,2014:Post/geometric-cliques2016-09-17T18:57:18-07:002016-09-17T18:57:18-07:00Geometric Cliques<p>If you have <code class="prettyprint">N</code> points in the plane what is the largest subset of those points such that each point is within a distance of <code class="prettyprint">D</code> of all the others. It seems pretty innocuous right? Turns out it’s a great big beautiful disaster.</p>
<p>Here’s a example of <code class="prettyprint">N = 10</code> points where the maximum subset are the points collected with dotted lines which are each within a distance <code class="prettyprint">D</code> of each other. There is no bigger set. </p>
<p><a href="https://svbtleusercontent.com/vugyu6dq2zjtfa.png"><img src="https://svbtleusercontent.com/vugyu6dq2zjtfa_small.png" alt="Screenshot 2016-09-17 19.35.21.png"></a></p>
<p>Trying to compute every subset of points and checking if they are all within <code class="prettyprint">D</code> of each other will take exponential time <code class="prettyprint">O(2^N)</code>. So we need a better approach. Let’s try picking a point which will will assume to be part of this clique. Then all the candidate points that are not within <code class="prettyprint">D</code> of that point won’t be in the clique. For example if we have three co-linear points spaced by <code class="prettyprint">D</code> and select the middle one to build our clique then either the left points can be in the clique or the right one but not both. This does give us a heuristic. We can take each point filter out points that aren’t within <code class="prettyprint">D</code> and then run a brute force search to find the maximal clique. </p>
<p>Let’s try and do better, as this could still be exponential depending on how the points are clustered. Let’s start with a bunch of points:</p>
<p><a href="https://svbtleusercontent.com/p9f5oddzlaidma.png"><img src="https://svbtleusercontent.com/p9f5oddzlaidma_small.png" alt="Screenshot 2016-09-19 20.06.26.png"></a> </p>
<p>Pick any two points and assume that they are going to be the furthest points apart in our clique let this distance be <code class="prettyprint">F</code>, so <code class="prettyprint">F <= D</code>.</p>
<p><a href="https://svbtleusercontent.com/hmcqnuymt1x5w.png"><img src="https://svbtleusercontent.com/hmcqnuymt1x5w_small.png" alt="Screenshot 2016-09-19 20.07.49.png"></a></p>
<p>If we filter out all points more than <code class="prettyprint">F</code> from these two points we get this situation</p>
<pre><code class="prettyprint lang-python"># try all candidates for furthest points
for i in xrange(N):
for j in xrange(i + 1, N):
xi, yi = points[i]
xj, yj = points[j]
if distance(xi, yi, xj, yj) > D: continue
# Furthest pair in our clique
F = distance(xi, yi, xj, yj)
lens = []
for k in xrange(N):
if k == i: continue
if k == j: continue
xk, yk = points[k]
if distance(xi, yi, xk, yk) > F: continue
if distance(xj, yj, xk, yk) > F: continue
lens.append(points[k])
</code></pre>
<p><a href="https://svbtleusercontent.com/1rn8hmvwkuorq.png"><img src="https://svbtleusercontent.com/1rn8hmvwkuorq_small.png" alt="Screenshot 2016-09-19 20.08.13.png"></a></p>
<p>Points inside the intersection of these two circles - the lens shape - are within <code class="prettyprint">F</code> of both points at the end of the line and <code class="prettyprint">F <= D</code>. I thought this way the end of the story. But we can’t simply select all of these points as a clique because they may not be within <code class="prettyprint">D</code> of each other. For example the top and bottom points in the lens shape might be further than <code class="prettyprint">D</code> apart so we need to do some more work.</p>
<p>First note that all the points above the dotted lines are within <code class="prettyprint">F</code> and therefore <code class="prettyprint">D</code> of each other so they are a potential clique, as are points below the dotted line. But there may be a bigger clique incorporating points from both sides of the lines. If we pick a certain point below the line for our clique we are forbidden from picking any points more than <code class="prettyprint">D</code> away from that point. Incidentally these <u>forbidden</u> points will all lie on the other side of the line. Take a moment, look at the picture above and make sure you are happy with that.</p>
<p>Now lets separate the points inside the lens shape into two sets. Those above the dotted line and those below. This can be done buy taking the signed area of the triangle created between the two points at the end of the line and the point in question. If the area is positive the point is above the line and is it’s negative it’s below the line.</p>
<pre><code class="prettyprint lang-python">top, bot = [], []
M = len(lens)
for k in xrange(M):
xk, yk = lens[k]
if area((xi, yi), (xj, yj), (xk, yk)) >= 0:
top.append(lens[k])
else:
bot.append(lens[k])
</code></pre>
<p>The situation is now like this:</p>
<p><a href="https://svbtleusercontent.com/yzkqab8nih7suw.png"><img src="https://svbtleusercontent.com/yzkqab8nih7suw_small.png" alt="Screenshot 2016-09-19 20.10.00.png"></a></p>
<p>Let’s now treat each point as a vertex and connect vertices in one set with vertices in the other if they are further than <code class="prettyprint">D</code> away. Using fewer points so it’s not as cluttered the situation could look like this. Red lines denote points further than <code class="prettyprint">D</code> apart and some point don’t have edges connected to them. This is fine, it just means all points are within <code class="prettyprint">D</code> of them.</p>
<p><a href="https://svbtleusercontent.com/1tbc28fm0xulgg.png"><img src="https://svbtleusercontent.com/1tbc28fm0xulgg_small.png" alt="Screenshot 2016-09-17 21.36.39.png"></a></p>
<p>We’ve now constructed a bipartite graph representing some of the geometric constraints we are interested in. So our task is to select the maximum number of points from this set such that we don’t have any two points connected by a red line because that means they are too far away. Vertices with no edges connected to them are freebies. They are within <code class="prettyprint">D</code> of all of points so there’s no reason not to pick them. </p>
<p>Our problem of selecting the maximum number of vertices in a graph such that no two points have an incident edge is the problem of computing the <a href="https://en.wikipedia.org/wiki/Independent_set_(graph_theory)">maximum independent set</a> of a graph. This is unfortunately NP-Complete in a general graph but in a bi-partite graph the number of vertices in the maximum independent set equals the number of edges in a minimum <a href="https://en.wikipedia.org/wiki/Edge_cover">edge covering</a>. This in turn is equal to the <a href="https://en.wikipedia.org/wiki/Maximum_flow_problem">maximum-flow</a> of the graph. This series of dualities is called <a href="https://en.wikipedia.org/wiki/K%C5%91nig%27s_theorem_(graph_theory)">König’s theorem</a>. </p>
<p>Without the bi-partite structure, computing the maximum independent set is NP-Complete where as maximum-flow can be computed in a general graph in polynomial time. So we’re in much better shape. We compute the maximum flow of this graph and that value <code class="prettyprint">f</code> plus the number of nodes without edges <code class="prettyprint">top_set</code> and <code class="prettyprint">bot_set</code> plus <code class="prettyprint">2</code> for the end points of the line as our solution for that pair of points.</p>
<pre><code class="prettyprint lang-python">def max_bipartite_independent(top, bot):
graph = collections.defaultdict(dict)
top_set = set( [i for i in xrange( len(top)) ])
bot_set = set( [i for i in xrange( len(bot)) ])
for i in xrange(len(top)):
for j in xrange(len(bot)):
xi, yi = top[i]
xj, yj = bot[j]
if distance(xi, yi, xj, yj) > F:
node_i = 'TOP' + str(i)
node_j = 'BOT' + str(j)
src = 'SOURCE-NODE'
snk = 'SINK-NODE'
graph[node_i][node_j] = 1
graph[src][node_i] = MAX_INT
graph[node_j][snk] = MAX_INT
top_set.discard(i)
bot_set.discard(j)
f = flow.max_flow(graph, 'SOURCE-NODE', 'SINK-NODE')
solution = f + len(top_set) + len(bot_set) + 2
return solution
</code></pre>
<p>There are a few different algorithms to compute maximum flow . The following is a simple implementation of the Ford-Fulkerson algorithm using Dijkstra’s algorithm to find augmenting paths. </p>
<pre><code class="prettyprint lang-python">import Queue
def dijkstra(graph, source, sink):
q = Queue.PriorityQueue()
q.put((0, source, []))
visited = set([source])
while not q.empty():
length, node, path = q.get()
# Found a path return the capacity
if node == sink:
cap = None
for a, b in path:
if cap is None or graph[a][b] < cap:
cap = graph[a][b]
return cap, path
# Visit next node
for child in graph[node].keys():
if not child in visited and graph[node][child] > 0:
next_state = (length+1, child, path + [(node, child)])
visited.add(child)
q.put(next_state)
# No paths remaining
return None, None
</code></pre>
<p>And the remaining code to compute the maximum-flow of the graph:</p>
<pre><code class="prettyprint lang-python">def max_flow(graph, source, sink):
flow = 0
while True:
capacity, path = dijkstra(graph, source, sink)
if not capacity: return flow
for a, b in path:
graph[a][b] = graph[a].get(b, 0) - capacity
graph[b][a] = graph[b].get(a, 0) + capacity
flow += capacity
return flow
</code></pre>
<p>Cool so we’ve finally go all the pieces needed to solve this problem. We try every pair of points <code class="prettyprint">O(N^2)</code> as the candidate for the two furtherest points in our clique then of the points the fall inside the lens we build a bipartite graph and compute the maximum independent set which corresponds to the maximum clique size. </p>
<p>All in the algorithm takes <code class="prettyprint">O(V^2)</code> attempting each pair of candidates points. But with the maximum-flow inner loop it’s <code class="prettyprint">O(V^5)</code>. This appears to be optimal without using a faster maximum matcher. Full source code is <a href="https://gist.github.com/nickponline/abce6170b96043a4c372feef590388d3">here</a>, <a href="https://gist.github.com/nickponline/8a25d6a393de50580dfe440693c6abc5">maximum flow code</a> and a little <a href="https://gist.github.com/nickponline/dcdd02adbb7b7bcd595e53349135cfd8">visualizer</a>. </p>
tag:nickp.svbtle.com,2014:Post/minimum-image-cover2016-07-25T12:24:23-07:002016-07-25T12:24:23-07:00Minimum Image Cover<p>Some applications of photogrammetry require us to collect a number of overlapping aerial images covering an area. The more overlap the better, as more overlap in pairs of images gives us a higher result in some applications. However in other applications we are actually looking for <u>as few images</u> as possible from the set that still cover the area of interest without any gaps.</p>
<p><a href="https://svbtleusercontent.com/gab8uaudo7b2q.jpg"><img src="https://svbtleusercontent.com/gab8uaudo7b2q_small.jpg" alt="dji-inspire-1-drone-bh1.jpg"></a></p>
<p>Framed another way - given a collection of sets, what is the minimum number of those sets that need to be selected such that their union is the union of all of the sets. The sets in our case are defined by taking each location on the ground as a <code class="prettyprint">longitude,latitude</code> pair and then finding the set of images that can see that location. We’ll talk about how to enumerate these locations on the ground later. This is called the set cover problem.</p>
<p><a href="https://svbtleusercontent.com/vqdikmneu7pka.png"><img src="https://svbtleusercontent.com/vqdikmneu7pka_small.png" alt="Screenshot 2016-07-25 11.53.04.png"></a></p>
<h2 id="camera-geometry_2">Camera Geometry <a class="head_anchor" href="#camera-geometry_2">#</a>
</h2>
<p>Before we start solving the problem lets generate a dataset of aerial imagery. In order to do that we need to represent an aerial camera at some point in space and the direction in which it is pointing. The location and orientation of the camera together are called the pose and are represented by six values - longitude, latitude, altitude, yaw, pitch and roll. The first three represent position and the last three represent orientation. In addition to this, cameras have a number of intrinsic parameters. For the purposes of our data set we are just going to consider focal length (<code class="prettyprint">F</code>) and sensor size. Focal length is the distance over which the camera lens focuses light onto the sensor. The shorter the focal length the wider the field of view. The longer the focal length the smaller the field of view but the higher the ground sampling distance (which is measured in centimeters per pixel). Sensor size is the size of the CCD in the camera, larger sensors (at the same resolution) can represent a larger scene. There are a few different conventions for sensor size and as a result focal length is sometimes given as <u>equivalent</u> focal length as if the sensor were a certain size. In this case we’ll assume that our camera has a 35mm-equivalent focal length of 20mm. Which oddly (by convention) means that the sensor size is 24x36 mm. From these parameters - pose, focal length and sensor size - we can draw this diagram that describes the area on the ground (footprint) that the camera can see which is related to the field of view by the altitude.</p>
<p><a href="https://svbtleusercontent.com/wneu64xxyl2jza.jpg"><img src="https://svbtleusercontent.com/wneu64xxyl2jza_small.jpg" alt="focal-length-fov-sensor-size.jpg"></a></p>
<p>This image assumes that the camera is pointing straight down which would generate a clean rectangular footprint (rectangular because the sensor is not square). In reality this is not the case and the aircraft is bumping around and the camera is moving slightly which generates a quadrilateral footprint, also causing some perspective distortion. In order to compute the footprint accurately we can represent the camera with 3 vectors - <code class="prettyprint">position</code>, <code class="prettyprint">lookat</code> and <code class="prettyprint">up</code>. The <code class="prettyprint">position</code> vector is the location of the camera in space. The <code class="prettyprint">lookat</code> vector is a unit vector in the direction the camera is pointing and the <code class="prettyprint">up</code> vector is a unit vector out of the top of the camera to disambiguate the upside down images. We can apply a rotation matrix computed from <code class="prettyprint">roll</code>, <code class="prettyprint">pitch</code> and <code class="prettyprint">yaw</code> to the <code class="prettyprint">up</code> and <code class="prettyprint">lookat</code> vectors about the camera <code class="prettyprint">position</code> to orientate the camera like this.</p>
<pre><code class="prettyprint lang-python">import numpy as np
def rotate(vector, yaw, pitch, roll):
Rotz = np.array([
[np.cos(yaw), np.sin(yaw), 0],
[-np.sin(yaw), np.cos(yaw), 0],
[0, 0 ,1]
])
Rotx = np.array([
[1, 0, 0],
[0, np.cos(pitch), np.sin(pitch)],
[0, -np.sin(pitch), np.cos(pitch)]
])
Roty = np.array([
[np.cos(roll), 0, -np.sin(roll)],
[0, 1, 0],
[np.sin(roll), 0, np.cos(roll)]
])
rotation_matrix = np.dot(Rotz, np.dot(Roty, Rotx))
return np.dot(rotation_matrix, vector)
up = array([0, 1, 0])
position = array([0, 0, 80000]) # 80 meters up
lookat = array([0, 0, -1]) # pointing down
yaw, pitch, roll = 45, 5, 5
up = rotate(up, yaw, pitch, roll)
lookat = rotate(lookat, yaw, pitch, roll)
</code></pre>
<p>Now we have a camera pointing in the correct direction we need to compute the four corners of the footprint on the ground. We can do this geometrically by placing the camera sensor plane <code class="prettyprint">F</code>-mm away from the camera position in the direction of the <code class="prettyprint">lookat</code> and then projecting a ray from the camera position through the four corners of the sensors and into the ground. The point at which the ray intersects the ground - assuming the ground is flat (the ground is not flat) are the four corners of the footprint.</p>
<pre><code class="prettyprint">def ground_projection(p, corners):
k = -position[2] / (corners[2] - position[2])
return position + (corners-position)*k
</code></pre>
<p>Using this code we can generate a bunch of overlapping camera positions covering an area. Here’s a dataset of 100 images taken generated as if a DJI Phantom 4 (<code class="prettyprint">F</code>=3.6) was flying over a 250 square meter area at 400m. Now we can return to our set cover problem of finding the minimum number of images that cover the whole area. The whole area in this case is the union of the footprints of the individual images (right). Let’s lay a lattice over the ground to discretize the points of the grounds this is a good approximation of the ground plane and makes the problem easier to solve. (left) We’re now looking for the minimum number of images that cover all the lattice points.</p>
<p><a href="https://svbtleusercontent.com/imtejvsz7xlu1q.png"><img src="https://svbtleusercontent.com/imtejvsz7xlu1q_small.png" alt="Screenshot 2016-07-25 11.32.19.png"></a></p>
<h2 id="greedy_2">Greedy <a class="head_anchor" href="#greedy_2">#</a>
</h2>
<p>First lets try a greedy approach. Select the image that covers the most uncovered lattice points and add it to your solution set. Continue this until we have covered all the lattice points. This is easy to implement and runs fast, however the result isn’t optimal and we take more images than necessary.</p>
<h2 id="integer-linear-programming_2">Integer Linear Programming <a class="head_anchor" href="#integer-linear-programming_2">#</a>
</h2>
<p>Let’s think about the problem another way. Let’s take an example with 6 position on the ground and 5 cameras. Let the 6 positions be variables <code class="prettyprint">x1, x2, x3, x4, x5, x6</code> and the 5 cameras are sets <code class="prettyprint">s1 = {x1, x2}, s2 = {x3, x4}, s3 = {x5, s6}, s4 = {x1, x2, x3}, s5 = {x4, x5, x6}</code>.</p>
<pre><code class="prettyprint">[x1 x2 x3 x4 x5 x6]
[ s1 ][ s2 ][ s3 ]
[ s4 ][ s5 ]
</code></pre>
<p>Let’s assign an <u>inclusion</u> variable <code class="prettyprint">i1 .. i6</code> to each set. We would like to minimize the sum of <code class="prettyprint">i1 ... i6</code> such that for each position on the ground we have included at least one of the cameras containing it. There is one constraint for each element, and also the possibility of duplicate constraints (which can be ignored) if there are two elements that are covered by the same two sets. In this case both <code class="prettyprint">x1</code> and <code class="prettyprint">x2</code> are covered by <code class="prettyprint">s1</code> and <code class="prettyprint">s4</code> hence the first two constraints are the same. Similarly for the last two.</p>
<pre><code class="prettyprint">s1 + s4 >= 1
s1 + s4 >= 1
s2 + s4 >= 1
s2 + s5 >= 1
s3 + s5 >= 1
s3 + s5 >= 1
</code></pre>
<p>In addition we constrain the variables <code class="prettyprint">i1 ... i6</code> to be <code class="prettyprint">{0, 1}</code> we can get an optimal solution to the problem where <code class="prettyprint">s1, s2, s3 = 0</code> and <code class="prettyprint">s4, s5=1</code> which represents the minimum set cover. In reality integer programming itself is dual to <a href="https://en.wikipedia.org/wiki/Karp%27s_21_NP-complete_problems">Satisfiability</a> which is NP-complete meaning we can’t easily find this solution. But we can use a trick that get us close.</p>
<h2 id="relaxed-linear-programming_2">Relaxed Linear Programming <a class="head_anchor" href="#relaxed-linear-programming_2">#</a>
</h2>
<p>Let’s relax the condition that <code class="prettyprint">i1 ...i6</code> need to be <code class="prettyprint">{0, 1}</code> and let them be floating point numbers between <code class="prettyprint">0 ... 1</code>. Don’t worry too much about the meaning of fractionally including an item in the set this - example with clear that up.</p>
<pre><code class="prettyprint">[x1 x2 x3 x4 x5 x6]
[ s1 ][ s2 ][ s3 ]
</code></pre>
<p>If we relax the integral constraint the optimal solution <code class="prettyprint">1.5</code> which is <code class="prettyprint">s1, s2, s3 = 0.5</code>. But more importantly if we round up each of these numbers to <code class="prettyprint">1</code>. We still get a solution but possibly lose optimality. In this case we don’t and the rounded solution <code class="prettyprint">s1, s2, s3 = 1</code> is still optimal. However in our previous example</p>
<pre><code class="prettyprint">[x1 x2 x3 x4 x5 x6]
[ s1 ][ s2 ][ s3 ]
[ s4 ][ s5 ]
</code></pre>
<p>Our rounded solution is 1.5 which rounds up to 3 which is worse than the optimal integral constrained solution of 2 (using just <code class="prettyprint">s4</code> and <code class="prettyprint">s5</code>). Our task is now to find a good heuristic way to round up the fractional values. One algorithm to do this is called <a href="https://en.wikipedia.org/wiki/Randomized_rounding#Randomized-rounding_algorithm_for_Set_Cover">randomize rounding</a>.</p>
<p>Now we can apply this to our dataset. For each lattice point we compute all the cameras that can see that location and assign that lattice point to a variables. We then set up a linear programming solver to give us a (possibly fractional solution). The code looks like this:</p>
<pre><code class="prettyprint lang-python"># LP-solver
import pulp
# set up solver
problem = LpProblem("SetCover", LpMinimize)
variables = [ LpVariable("x" + str(var), 0, 1) for var in xrange(100) ]
problem += sum(variables)
# set inclusion constraints
problem += variables[0] >= 1
problem += variables[1] + variables[2] + variables[5] >= 1
...
problem += variables[3] + variables[5] + variables[5] >= 1
problem += variables[1] >= 1
GLPK().solve(problem)
# solution
for v in problem.variables():
print 'Camera:', v.name, "=", 1 if v.varValue > 0 else 0
</code></pre>
<p>Rendering this we see that out of the 100 images we’ve needed to retain only 42 to cover the whole area. </p>
<p><a href="https://svbtleusercontent.com/j43mmlnpztlhla.png"><img src="https://svbtleusercontent.com/j43mmlnpztlhla_small.png" alt="Screenshot 2016-07-25 11.52.44.png"></a></p>
tag:nickp.svbtle.com,2014:Post/counting-money2016-05-15T17:41:36-07:002016-05-15T17:41:36-07:00Counting Money<p>This post is based on a question from the Challenge 24 competition in 2016 where we were given a photograph of each of the denominations of Hungarian currency, called <a href="https://en.wikipedia.org/wiki/Hungarian_forint">Forints</a>. In addition given a number of photos of piles of coins (some of them counterfeit) and we had to compute the total value of money automatically. </p>
<p><a href="https://svbtleusercontent.com/nxryj57tp3nprg.jpg"><img src="https://svbtleusercontent.com/nxryj57tp3nprg_small.jpg" alt="coins.jpg"></a></p>
<p>First let’s look at the template and see how we can easily extract the location of the clean coins. A flood fill can compute the <a href="https://en.wikipedia.org/wiki/Connected_component_(graph_theory)#Algorithms">connected components</a> using a BFS which runs quickly enough and since the images is quite clear we can just iterate through each unvisited non-white pixel and for each start a new component and flood out to all connected pixels that aren’t white. Here’s the code:</p>
<pre><code class="prettyprint lang-python">import cv2
from itertools import product
def flood_fill(img):
rows, cols = img.shape
components = {}
component_id = -1
seen = set()
for (r, c) in product(range(rows), range(cols)):
if inside(r, c, img, seen):
component_id += 1
components[component_id] = []
q = [(r, c)]
seen.add((r, c))
while len(q) != 0:
cr, cc = q.pop()
components[component_id].append((cr, cc))
for (dr, dc) in product([-1, 0, 1], repeat=2):
nr, nc = cr + dr, cc + dc
if inside(nr, nc, img, seen):
seen.add((nr, nc))
q.append((nr, nc))
return components
</code></pre>
<p>The results look good and it cleanly picks out the backs and fronts of each coin in the template:</p>
<p><a href="https://svbtleusercontent.com/2k5kb2hrtnzycg.jpg"><img src="https://svbtleusercontent.com/2k5kb2hrtnzycg_small.jpg" alt="a.jpg"></a></p>
<p>Now that we have all the templates we need to find matches of them in images like this, which we will call background images:</p>
<p><a href="https://svbtleusercontent.com/fzsliralkqymyg.jpg"><img src="https://svbtleusercontent.com/fzsliralkqymyg_small.jpg" alt="coins.jpg"></a></p>
<p>Matching coins in images is often presented as an example of image processing algorithms in particular <a href="http://scikit-image.org/docs/dev/user_guide/tutorial_segmentation.html">segmentation</a> and the <a href="http://docs.opencv.org/master/d3/db4/tutorial_py_watershed.html#gsc.tab=0">watershed</a> algorithm. There are a couple of things that make this deceptively difficult in this case though. The first is that we aren’t just detecting or counting the coins we actually need to know which denomination each one is so we can total the amount. The second is that the coins can occlude one another so you may only see part of a coin. Finally the coins can each be arbitrarily rotated and fakes coins can appear that are larger or smaller than the originals - these need to be discounted.</p>
<p>There are a few ways of matching templates in an image. One way is to look at the cross-correlation of the two images at different points. You can think of this as sliding the template image row by row, column by column over the background image and at each location measuring how much the two images correlate. So we are looking for the offset that corresponds to the maximum correlation. The problem with this method is that it is super slow especially if the images are large or (like in this case) there are multiple templates to match and we are looking for the best match. We can solve this much faster in the frequency-domain using something called <a href="https://en.wikipedia.org/wiki/Phase_correlation">phase correlation</a>. This is the same cross-correlation technical but also isolates the phase information. If <code class="prettyprint">Ga</code> and <code class="prettyprint">Gb</code> are the Fourier transforms of the template and background images respectively and <code class="prettyprint">Ga*</code> and <code class="prettyprint">Gb*</code> are their copmlex conjugates we can compute this as:</p>
<p><a href="https://svbtleusercontent.com/hki799qx9hvwma.png"><img src="https://svbtleusercontent.com/hki799qx9hvwma_small.png" alt="88357aa1d55f79979d1f88b5c6a2678f.png"></a></p>
<p>and retrieve a normalized (important because there are multiple templates to match) cross-correlation by taking the real component of the inverse Fourier transform of this. Here’s some code that computes the location of the peak of the phase correlation which corresponds to the translation by which the template is off the background image. This process is called <a href="https://en.wikipedia.org/wiki/Image_registration">image registration</a>.</p>
<pre><code class="prettyprint lang-python">def find_translation(background, template):
from numpy import fft2, ifft2, real, abs, where
br, bc = background.shape
tr, tc = template.shape
Ga = fft2(background)
Gb = fft2(template, (br, bc))
R = Ga * conj(Gb)
pc = ifft2(R / abs(R))
pc = real(pc)
peak = pc.max()
translation = where( pc == peak)
return translation
</code></pre>
<p>Running phase correlation for each template iteratively and selecting the highest peak seems to perform well and we are able to correctly register all the coins both back and front in addition to the occluded coins:</p>
<p><a href="https://svbtleusercontent.com/sexg2j8s0v4nig.png"><img src="https://svbtleusercontent.com/sexg2j8s0v4nig_small.png" alt="3.png.progress17.png"></a></p>
<p>This seems to work pretty well and picks out the images however it can’t handle coins that are scaled or rotated. Rotation is actually quite a complex operation if you look at the movement of pixels. Let’s try rearranging the pixels in the image in such a such that they change more predictably when the image is scaled and rotated. We can use some kind of conformal mapping. These are functions that preserve angles in the Euclidean plane and one of the most common is the log-polar transform. Here’s a basic implementation of the log-polar transform:</p>
<pre><code class="prettyprint lang-python">def log_polar(image):
from scipy.ndimage.interpolation import map_coordinates
r, c = image.shape
coords = scipy.mgrid[0:max(image.shape)*2,0:360]
log_r = 10**(coords[0,:]/(r*2.)*log10(c))
angle = 2.*pi*(coords[1,:]/360.)
center = (log_r*cos(angle)+r/2.,log_r*sin(angle)+c/2.)
ret = map_coordinates(image,center,order=3,mode='constant')
return cv2.resize(ret, image.shape)
</code></pre>
<p>Here are the resulting log-polar transforms of a few images as they rotate. What’s useful to note here is what happens to horizontal, vertical and radial lines. </p>
<p><a href="https://svbtleusercontent.com/hcmss8jqvda.gif"><img src="https://svbtleusercontent.com/hcmss8jqvda_small.gif" alt="output_OHcilz.gif"></a></p>
<p>So rotation in the log-polar space manifests as a cyclic shift of the columns of pixels. This makes sense because the units of the <code class="prettyprint">X</code> axis in the images is no longer <code class="prettyprint">X</code> but the angle of rotation. So pixels on the same angle in the original image (from some center) map to a horizontal line in log-polar space. Similarly pixels on the same radial are mapped to vertical lines <code class="prettyprint">Y</code>. Another interesting point about this transform is the use of the logarithm. This effectively squashes the bottom rows of the transformed images. Look at the number of pixels dedicated to the text “Forint” compared to the number of pixels dedicated to the center of the images. This mimics the function of the fovea in the human eye and dedicates more resolution to the area in focus which is useful in a number of images tracking applications. </p>
<p>So once we have located the template in the background image we can can use the Fourier shift theorem which tells us that a linear shift in phase manifests as cyclic shift in the Fourier domain. Now we have a way to register two images and compute the translation, rotation and scale factor between them using the following function. Using this information we can detect and count all coins in the image (including the rotated ones) and discount coins that aren’t an authentic size. There are limitations to this method though for example because of the symmetry of the Fourier transform we can only detect a limited range and resolution of scale and rotation. There are more complicate methods that extend it though but thankfully their weren’t needed in this case. </p>
<p>Image registration using spectral methods is really fast and commonly used to align where there is known to be an affine transform between the images. More complex methods are needed to where there is a perspective transform between the two images which will be the topic of an upcoming blog post. </p>
<p><a href="https://svbtleusercontent.com/jxowtkoxrbbmg.png"><img src="https://svbtleusercontent.com/jxowtkoxrbbmg_small.png" alt="6.png.progress13.png"></a></p>
tag:nickp.svbtle.com,2014:Post/chain-reaction2016-05-08T20:31:48-07:002016-05-08T20:31:48-07:00Mines Chain Reaction<p>This post is based on a question asked during the <a href="http://ch24.org">Challenge 24</a> programming competition. Given the locations of a number of land mines as <code class="prettyprint">X</code> and <code class="prettyprint">Y</code> coordinates and their blast radius <code class="prettyprint">R</code>. What is the minimum number of mines that need to be detonated such that all mines are detonated. When a mine is detonated it detonates all mines within its blast radius and the process repeats.</p>
<p><a href="https://svbtleusercontent.com/vjovi2lxa0wryg.jpg"><img src="https://svbtleusercontent.com/vjovi2lxa0wryg_small.jpg" alt="787881-landmine-1415477841-246-640x480.jpg"></a></p>
<p>Here’s a simple example with <code class="prettyprint">13</code> mines. In this case the optimal solution is to detonate mines <code class="prettyprint">0, 3</code> and <code class="prettyprint">8</code> which will detonate all others. It’s not the only solution.</p>
<p><a href="https://svbtleusercontent.com/epthtxszeqfnkg.png"><img src="https://svbtleusercontent.com/epthtxszeqfnkg_small.png" alt="Screenshot 2016-05-08 21.01.38.png"></a></p>
<p>The relationship between mines is not commutative. Just because mines <code class="prettyprint">A</code> can reach mine <code class="prettyprint">B</code> doesn’t mean that mine <code class="prettyprint">B</code> reaches mine <code class="prettyprint">A</code>. Therefore we can represent the mines as a directed graph where vertices are mines and there is an (unweighted) edge from mine <code class="prettyprint">A</code> to mine <code class="prettyprint">B</code> if mine <code class="prettyprint">A</code> can directly detonate mine <code class="prettyprint">B</code>. </p>
<p><a href="https://svbtleusercontent.com/bhqbdjirberg.jpg"><img src="https://svbtleusercontent.com/bhqbdjirberg_small.jpg" alt="graph.jpg"></a><br>
In order to solve this problem we first need to compute the <a href="https://en.wikipedia.org/wiki/Strongly_connected_component">strongly connected components</a> in this graph . These are the subsets of mines which if any one is detonated then all mines in the subset will be detonated. In the example image above mines <code class="prettyprint">5, 6</code>, and <code class="prettyprint">7</code> comprise a SCC as do mines <code class="prettyprint">0, 2, 9, 10, 11</code> and <code class="prettyprint">12</code>. For simplicity we’ll say that mines on their own are also in SCCs of size <code class="prettyprint">1</code>. In order to compute the SCCs we can use <a href="https://en.wikipedia.org/wiki/Tarjan%27s_strongly_connected_components_algorithm">Tarjan’s algorithm</a> which can be implemented recursively or with a stack.</p>
<pre><code class="prettyprint lang-python">def tarjan(graph):
index_counter = [0]
stack = []
lowlinks = {}
index = {}
result = []
def strongconnect(node):
# depth index for this node
index[node] = index_counter[0]
lowlinks[node] = index_counter[0]
index_counter[0] += 1
stack.append(node)
# Consider successors of `node`
try:
successors = graph[node]
except:
successors = []
for successor in successors:
if successor not in lowlinks:
# Successor has not yet been visited
strongconnect(successor)
lowlink = min(lowlinks[node],lowlinks[successor])
lowlinks[node] = lowlink
elif successor in stack:
# the successor is in the stack
lowlink = min(lowlinks[node],index[successor])
lowlinks[node] = lowlink
# pop the stack and generate an SCC
if lowlinks[node] == index[node]:
connected_component = []
while True:
successor = stack.pop()
connected_component.append(successor)
if successor == node: break
component = tuple(connected_component)
# storing the result
result.append(component)
for node in graph:
if node not in lowlinks:
strongconnect(node)
return result
</code></pre>
<p>This computes the SSCs for the initial graph. Now we can collapse all vertices in a SCC into a super vertex. Remember detonating <u>any</u> mine in the super vertex will detonate all the other in that super vertex. Then we can create another graph of the super vertices and connect super vertices with a directed edge if any mine in that super vertex can detonate any mine in another super vertex. We now get another directed graph although this one won’t have cycles. Remember if we denote a mine in a connected component it will detonate all mines in that component and all mines in all components reachable from that support node. Here’s an illustration to or the process so far:</p>
<p><a href="https://svbtleusercontent.com/5qamy4rdva2xsq.jpg"><img src="https://svbtleusercontent.com/5qamy4rdva2xsq_small.jpg" alt="graph.jpg"></a></p>
<p>We can now work out which mines need to be detonated. In order to do this we can look for all super vertices in this graph that have a zero in-degree. This means that they aren’t reachable by any sequence of mine detonations and thus need to be detonated themselves. One solution is to detonate mines: <code class="prettyprint">0, 3</code> and <code class="prettyprint">8</code>. There are actually multiple solutions. We can see this for example by considering the case where all the mines are within blast radius of all others and thus form one strongly connected components. In this case we could choose any mine to start the chain reaction.</p>
<p>In the competition the test cases got really large. The smallest had <code class="prettyprint">500</code> vertices and the largest had <code class="prettyprint">800,000</code> vertices. Tarjan’s algorithm is really fast and runs in <code class="prettyprint">O(N)</code>. Similarly the degree counting can also be done in <code class="prettyprint">O(N)</code>. The slowest part it actually creating the initial graph which when done naively takes <code class="prettyprint">O(N^2)</code>. In order to process the larger test cases we need to use a range query structure like a <a href="https://en.wikipedia.org/wiki/K-d_tree">KD-Tree</a> to query all mines within <code class="prettyprint">R</code> of a specific mine in logarithmic time. Reducing the processing to <code class="prettyprint">O(N log N)</code>. A simpler approach than implementing a KD-Tree is to sort all the mines by their <code class="prettyprint">X</code> coordinate and only consider partner mines that are within <code class="prettyprint">X*X</code> < <code class="prettyprint">R*R</code> of original. With randomly spaced data this gets you close to <code class="prettyprint">O(N log N)</code> without too much more coding. The problem set is available <a href="https://www.dropbox.com/s/5u4k1ckiwe1h9hm/B.zip?dl=0">here</a>.</p>
<p>This type of analysis is useful in considering the distribution of information through a network. If the initial graph represented people and edges represented the people with whom they shared information. Then the source nodes are the minimum set of people that need to be given information such that it is transferred through the whole network. </p>
tag:nickp.svbtle.com,2014:Post/the-mighty-kalman-filter2016-01-30T14:06:12-08:002016-01-30T14:06:12-08:00The Mighty Kalman Filter<p>A Kalman Filter is an algorithm for estimating the state of a system from a series of noisy measurements. That’s what I was told, and it confused me. It also doesn’t really do the Kalman filter justice. If we have a bunch of noisy measurements of something why don’t we just take the mean and be done with it? Well the impressive part is estimating the state of a <strong>system</strong>. This is something that is changing. Let’s says we have a state. This is usually a collection of variables that completely describe where something is and what it’s doing. For example if we are considering a cricket ball hit by a batsman and flying through the air towards the boundary it has a state described by its position which could be three numbers (<code class="prettyprint">x</code>, <code class="prettyprint">y</code>, <code class="prettyprint">z</code>) and its velocity again (<code class="prettyprint">vx</code>, <code class="prettyprint">vy</code>, <code class="prettyprint">vz</code>). In addition the cricket ball is under the effect of gravity so there are forces influencing some of the state. We also assume that the current state can be derived from the previous state. This means that if we know the position and velocity of the ball we can approximately derive the next position of the ball a very small amount of time in the future by integrating forward a little bit. Given these thing the Kalman filter can compute useful thing like the location and velocity of the ball and how sure we are about the estimates.</p>
<p>This sounds a lot more complicated than using a moving average but the Kalman filter cannot be out-performed in certain circumstances. It’s optimal in cases where the model perfectly matches the real system, and noise white and the covariances of the noise are exactly known. These things are seldom true, but it’s impressive how well the Kalman filter does under weaker constraints.</p>
<p>I’ll try a more hand-wavy explanation. The Kalman filter works by alternating between predicting and correcting. It starts with a vague understanding of the state which we can represent by a mean vector (<code class="prettyprint">Xk</code>) and covariance matrix (<code class="prettyprint">Pk</code>) with the same number of dimensions as there are state variables. It also also has an understanding of the dynamics for the system which are usually state transition and control matrices (<code class="prettyprint">A</code> and <code class="prettyprint">B</code>). So first it predicts, that is to say it uses its best guess of what the state is and what it understands the dynamics of the system to be to generate the next state using a control vector (<code class="prettyprint">Bk</code>) to apply the forces.</p>
<pre><code class="prettyprint lang-python">Xk = A * Xk + B * Bk
</code></pre>
<p>In additions the uncertainty about this new state increases because aren’t doing anything precisely. If we model this - called the process error - by another covariance matrix (<code class="prettyprint">Q</code>) we can update our uncertainty like this:</p>
<pre><code class="prettyprint">Pk = A * Pk * A.T + Q
</code></pre>
<p>So we’ve guessed where we should be, now we correct that guess. We do this by measuring “how far away” our observation (<code class="prettyprint">Zk</code>) is from where we think we are and how much we should trust this observation. </p>
<pre><code class="prettyprint">Ir = Zk - H*Xp
Ic = H * Pp * H.T + R
</code></pre>
<p>So we are working out a term that is used to mix our prediction and the correct. This is called the Kalman gain <code class="prettyprint">G</code> which we then use to refine out estimate.</p>
<pre><code class="prettyprint">G = P * H.T * np.linalg.inv(Ic)
Xk = Xp + G * Ir
</code></pre>
<p>And we keep doing that over and over. There is a great illustration of the derivation <a href="http://www.bzarg.com/p/how-a-kalman-filter-works-in-pictures/">in pictures</a> here. The algebra isn’t too bad, but there are a lot of variables. If we code up these equations we get this little guy:</p>
<pre><code class="prettyprint lang-python">class KF:
def __init__(self,A, B, H, Xk, P, Q, R):
self.A = A # State transition
self.B = B # Control
self.H = H # Observation
self.Xk = Xk # Initial state
self.Pk = P # Initial covariance
self.Q = Q # Estimated error process
self.R = R # Estimated error observations
def get_state(self):
return np.array([self.Xk.flat[i] for i in xrange(4)])
def update(self,Bk,zk):
Xp = self.A * self.Xk + self.B * Bk
Pp = self.A * self.Pk * self.A.T + self.Q
Ir = zk - self.H*Xp
Ic = self.H * Pp * self.H.T + self.R
G = Pp * self.H.T * np.linalg.inv(Ic)
self.Xk = Xp + G * Ir
size = self.Pk.shape[0]
self.Pk = (np.eye(size)-G * self.H) * Pp
</code></pre>
<p>So how do we use this? We need to model the linear dynamical system too. Let’s go back to the cricket ball flying through the air. We need a way to represent its state, iterate the system and make noisy and un-noisy observations. Here’s a quick and dirty implementation:</p>
<pre><code class="prettyprint lang-python">class LDS(object):
def __init__(self):
angle = 70
velocity = 100
noise = 30
g = -9.81
self.noiselevel = noiselevel
self.gravity = np.array([0,g])
vx = velocity*np.cos(np.radians(angle))
vy = velocity*np.sin(np.radians(angle))
self.velocity = [vx, vy]
self.location = [0,0]
self.acceleration = [0,0]
def get_location(self):
return self.location
def observe_location(self):
return np.random.normal(self.location, self.noise)
def observe_velocity(self):
return np.random.normal(self.velocity, self.noise)
def integrate(self, dt):
self.velocity = self.velocity * self.gravity * dt
self.location = self.location + self.velocity * dt
</code></pre>
<p>Now we can run our <code class="prettyprint">LDS</code> and pump the noisy observations into the <code class="prettyprint">KF</code> and plot where we estimate the cricket ball to be. This is what is happening when sports use <a href="https://en.wikipedia.org/wiki/Hawk-Eye">Hawk-Eye</a> technology. Cameras and RADAR make measurements of where the object (tennis or cricket ball) is and a Kalman filter is used to get more accurate positions and velocities of the object. Before we actually run it, let’s revisit why we don’t just use the average of the noisy estimates, or some kind of moving average. In certain cases, they are the same thing but when the underlying signal is changing the assumption made by a moving average is that the more recent measurements are more important so we lose some information. This can cause havok, if we try to track the location of the cricket ball using a moving average we getting the following:</p>
<p><a href="https://svbtleusercontent.com/qdoslxjoudqqoa.png"><img src="https://svbtleusercontent.com/qdoslxjoudqqoa_small.png" alt="Screenshot 2016-01-30 12.28.34.png"></a></p>
<p>The black dotted line represents the actual trajectory of the ball, the red crosses are the noisy observations and the green is our estimate of the trajectory. Obviously the ball doesn’t do this but the moving average thinks it does. If we drop the Kalman filter:</p>
<p><a href="https://svbtleusercontent.com/aqoyigwibpil1a.png"><img src="https://svbtleusercontent.com/aqoyigwibpil1a_small.png" alt="Screenshot 2016-01-30 12.29.07.png"></a></p>
<p>Real nice. We can see that the Kalman filter nicely tracks the position of the cricket ball where we would otherwise have just had the red crosses. Now let’s really put it to the test. In air traffic control you have a RADAR dish measuring where things are approximately. These usually appear as blips on a screen which you see in every action movie ever. You know the one.</p>
<p><a href="https://svbtleusercontent.com/fmjbpknai5ogga.jpg"><img src="https://svbtleusercontent.com/fmjbpknai5ogga_small.jpg" alt="Radar-screen-and-world-ma-008.jpg"></a></p>
<p>So let’s try and implement a full air traffic control system that is presented with very noisy locations of an unknown number of objects and attempts to concurrently track them and understand when they are in danger of colliding. Firstly here is what things would look like without the Kalman filter. Just a bunch of dots wildly appearing:</p>
<p><a href="https://svbtleusercontent.com/rogpybmq39budq.gif"><img src="https://svbtleusercontent.com/rogpybmq39budq_small.gif" alt="animation.gif"></a></p>
<p>Now let say the first time we see a dot appear we assign a Kalman Filter to track it and subsequent dots are part of a same sequence of observations of the same object. When a new dot appears how do we know if it’s another observation of the same object or a new one. The Kalman Filter doesn’t just give us its best estimate of the state but it also given us a measure of its certainty in the form of a covariance matrix (<code class="prettyprint">Pk</code>) . So when a new dot appears we can asses how likely that observation is to have come from any of the existing trackers. If it’s outside three standard deviations we assume that it’s a new sequence and assign it its own Kalman filter. If we run this code on the RADAR observations a lot of structure emerges. </p>
<p><a href="https://svbtleusercontent.com/xbbahaomwfl97q.gif"><img src="https://svbtleusercontent.com/xbbahaomwfl97q_small.gif" alt="animation.gif"></a></p>
<p>Again the red crosses represent the same observations but now we can see estimated locations of the aircraft, how many there are and which direction they are traveling. We can use this information for deconfliction by looking for aircraft within a certain radius of each other (nearby) and with a negative relative velocity (heading towards each-other) and ring them in orange.</p>
<p>So there we have it. Kalman Filters are fast and powerful state estimators that handle noise particularly well. But what if the underlying system is not linear? That’s where the Extended Kalman Filter (EKF) comes in. This is commonly used to fuse IMU data for more accurate GPS readings. The EKF works in a similar way to the Kalman filter while linearizing around the point of interest. This means that we use a first-order approximation of the dynamics which equates to using the Jacobian of the <code class="prettyprint">B</code> matrix in place of the state change matrix <code class="prettyprint">A</code>. If the dynamics are linear then the Jacobian matrices will all be constant and we get the normal Kalman Filter. I’m not too familiar with the details of the EKF beyond this so maybe this is a topic to re-visit in the future.</p>
tag:nickp.svbtle.com,2014:Post/star-cameras2016-01-21T19:18:48-08:002016-01-21T19:18:48-08:00Implementing a Star Camera<p>The orbit of a satellite can be computed from it’s location and velocity. What is more challenging is accurately determining its orientation. This is really important because satellites normally need to orient themselves to maximize the sunlight on their solar arrays and also point camera and other sensors in specific directions. There isn’t a lot to navigate with in space except the billions and billions of stars which is what a <a href="https://en.wikipedia.org/wiki/Star_tracker">star camera</a> does.</p>
<p>I thought it would be fun to try and implement a star camera. Let’s say we are given a picture of the night sky and need to determine which direction the camera is pointing in. To do this we need to compute a <code class="prettyprint">direction</code> vector and also an <code class="prettyprint">up</code> vector orthogonal to it so we know what the orientation of the camera is. For example, here is an image taken of Sirius, the brightest star in the sky. The un-cropped image in <code class="prettyprint">1000</code> pixels square and the field of view is <code class="prettyprint">20</code> degrees.</p>
<p><a href="https://svbtleusercontent.com/ytchjme4yfbw7g.png"><img src="https://svbtleusercontent.com/ytchjme4yfbw7g_small.png" alt="strip.png"></a></p>
<p>To work out which direction our camera is pointing we also need some kind of map of the stars. One such map was generated from the <a href="https://en.wikipedia.org/wiki/Hipparcos">Hipparcos satellite</a> which was launched in 1989. It was the first space experiment devoted to precision astrometry. Hipparcos spent the next 4 years creeping around the solar system looking at things and the resulting Hipparcos Catalogue - a high-precision catalogue of more than 118,200 stars - was published in 1997. It never reached New York Times Bestseller’s list but it’s still a great read. If we look at the Hipparcos Catalogue we see that each star has four number associated with it. Here’s an extract:</p>
<pre><code class="prettyprint">108644 5.7622335593 -0.0478353703 9.5057
108647 5.7623641166 -0.2399106245 7.6062
108649 5.7624094330 -0.2472136702 7.8371
108651 5.7624495466 -0.8732797806 9.4559
108656 5.7626566242 -0.4066098406 7.9346
108658 5.7627242328 -0.5848303489 8.8601
</code></pre>
<p>The first number is an <code class="prettyprint">id</code> of the star. The second and third numbers are the <code class="prettyprint">right ascension</code> and <code class="prettyprint">declination</code> of the star. These can be thought of as a projection of the Earth bound longitude and latitude system to the sky, respectively. The forth number is the <a href="https://en.wikipedia.org/wiki/Apparent_magnitude">apparent magnitude</a> of the star. Which is how shiny it is. It’s a negative logarithmic measure of its brightness - so the smaller the number the brighter the star. The first thing we need to do is pick out the brightest stars in the image. We can use a blob detector for this:</p>
<pre><code class="prettyprint lang-python">import cv2
def process_image(filename):
sky = cv2.imread(filename, cv2.IMREAD_GRAYSCALE)
# threshold the image to black and white
ret, sky = cv2.threshold(sky,127,255,cv2.THRESH_BINARY)
sky = 255 - sky
params = cv2.SimpleBlobDetector_Params()
# detection thresholds
params.filterByArea = True
params.minArea = 10
# Set up the detector with parameters.
detector = cv2.SimpleBlobDetector(params)
# Detect blobs.
keypoints = detector.detect(sky)
return keypoints
</code></pre>
<p>This works well ands pick out the brightest stars in the image. </p>
<p><a href="https://svbtleusercontent.com/vulpjobhxyxvia.png"><img src="https://svbtleusercontent.com/vulpjobhxyxvia_small.png" alt="starsdots.png"></a></p>
<p>Next we need to to convert the locations of the stars in the image to image coordinates. Since we know the size of the image and the field of view we can compute the focal length and convert the pixel coordinates to image coordinates:</p>
<pre><code class="prettyprint lang-python">import numpy as np
FOV = 20
SIZE = 1000
def get_dir(x, y):
focal = 0.5 * SIZE / np.tan(05 * FOV * np.pi / 180.0)
v = np.array([x - 499.5, 499.5 - y, -focal])
v = s / np.linalg.norm(v)
return s
</code></pre>
<p>From this we can compute the angle between any two stars in the image with: </p>
<pre><code class="prettyprint lang-python">def angle_between(v1, v2):
v1 = v1 / np.linalg.norm(v1)
v2 = v2 / np.linalg.norm(v2)
dot = np.dot(v1, v2)
dot = min(max(dot, -1.0), 1.0);
return np.arccos(dot)
</code></pre>
<p>Now let’s turn to the catalogue of stars. We need to match these blobs to stars in the catalogue.We can convert right ascension and declination to a unit vector by converting from spherical to cartesian coordinates:</p>
<pre><code class="prettyprint lang-python">from numpy import cos, sin
def xyz(ra, dec):
x = cos(ra) * cos(dec)
y = sin(ra) * cos(dec)
z = sin(dec)
v = np.array([x, y, z])
v = v / np.linalg.norm(v)
return v
</code></pre>
<p>Now we need to find a set of stars in the image and a corresponding set of stars in the catalogue where the angles between any pairs of stars are the same this will tell us which stars we are looking at and we can use their coordinates to reconstruct the <code class="prettyprint">direction</code> and <code class="prettyprint">up</code> vectors of where we are looking. We can do this by assigning a random star from the catalogue to the first blob. Then we keep assigning stars to blobs but each time we ensure that the distance between the stars in the set is almost equal to the distance betweens the blobs. We recursively assign stars and as we add more set becomes more consistent. I used <code class="prettyprint">6</code> stars in the matching and for the image at the top we match a number of stars near a region of sky called the <a href="https://en.wikipedia.org/wiki/Winter_Hexagon">Winter Hexagon</a>.</p>
<p><a href="https://svbtleusercontent.com/86pxg7svy8nz6a.png"><img src="https://svbtleusercontent.com/86pxg7svy8nz6a_small.png" alt="x.png"></a></p>
<p>Using this we can compute the transform from image to star coordinates:</p>
<pre><code class="prettyprint lang-python">def find_transform(stars, blobs):
v1 = stars[0]
v2 = stars[1]
v3 = stars[2]
k1 = blobs[0]
k2 = blobs[1]
k3 = blobs[2]
star_directions = np.vstack([v1, v2, v3]).T
image_directions = np.vstack([k1, k2, k3]).T
inv_star_directions = np.linalg.inv(star_directions)
transform = image_directions.dot(inv_star_directions)
direction = -transform[2, :]
up = transform[1, :]
return xfm, dir, up
</code></pre>
<p>Which gives us the correct <code class="prettyprint">direction</code> and <code class="prettyprint">up</code> vector pointing towards Sirius.</p>
<pre><code class="prettyprint">-0.18748276 0.93921052 -0.28763485
-0.05714072 0.28195542 0.95772443
</code></pre>
<p>Let’s try another image. This is of the North sky.</p>
<p><a href="https://svbtleusercontent.com/c3khxzmkbfq8uw.png"><img src="https://svbtleusercontent.com/c3khxzmkbfq8uw_small.png" alt="xx.png"></a></p>
<p>Running the same code and out pops the correct vector pointing North.</p>
<pre><code class="prettyprint">0 0 1
1 0 0
</code></pre>
<p>I’m sure the implementation used in satellite and ship navigation system are more involved, but this was fun to implement. Here is a link to the <a href="https://gist.github.com/nickponline/88fbae05c8c943c3ad09">complete code</a>. Interestingly, Hipparcos’s successor, Gaia was launched on 19 December 2013 with the intention of constructing a 3D space catalog of approximately 1 billion astronomical objects. The first catalogue of stars will be released later this year. </p>
<p><a href="https://svbtleusercontent.com/mihbvvfpqfbata.png"><img src="https://svbtleusercontent.com/mihbvvfpqfbata_small.png" alt="nova.png"></a></p>
tag:nickp.svbtle.com,2014:Post/determining-location-using-sound-part-22016-01-16T12:49:52-08:002016-01-16T12:49:52-08:00Location Estimation - Part 2<p>In the <a href="http://nickp.svbtle.com/determining-location-using-sound">previous article</a> we attempted to solve this <a href="http://ch24.org/static/archive/2012/2012_ec.pdf">location problem</a>. We used the power of each of a number of signals to roughly estimate our location. We managed to narrow it down to a smallish convex region in the plane shown in green below, which is about <code class="prettyprint">2500</code> square meters near <code class="prettyprint">(-181, 75)</code> . Now we want to go even further and see how accurately we can determine the exact position. </p>
<p><a href="https://svbtleusercontent.com/aj7atr1zs1yqvw.png"><img src="https://svbtleusercontent.com/aj7atr1zs1yqvw_small.png" alt="Screenshot 2016-01-16 13.09.30.png"></a></p>
<p>If we take the centroid of this green region as a guess at where we are we then know approximately how far away from each tower we are. We can use this distance along with the speed of sound to work out how long each individual frequency in our received signal is delayed. For example if we are <code class="prettyprint">340.29</code> meters away from a tower emitting a <code class="prettyprint">3500Hz</code> signal then each sample we received was actually emitted about <code class="prettyprint">1</code> second earlier so we should phase shift the entire signal <code class="prettyprint">1</code> second back in time to be using the same time frame as the source.</p>
<pre><code class="prettyprint lang-python">import math
speed_of_sound = 340.29
centroid_x = -181.666
centroid_y = -75.373
for tower_x, tower_y in towers:
dx = centroid_x-tower_x
dy = centroid_y-tower_y
distance = math.hypot(dx, dy)
phase_shift = distance / speed_of_sound
peaks = find_peaks(times- phase_shift, powers)
</code></pre>
<p>Here is the phase shift of the <code class="prettyprint">350.0Hz</code> tower which is approximately <code class="prettyprint">196.682</code> meters from our location which means that the source signal is phase shifted <code class="prettyprint">-0.578</code> seconds and moves the time index negative because the tower was emitting a sound before we actually heard anything. The blue peaks shows the <code class="prettyprint">360Hz</code> signal we receive where as the black plot shows the <code class="prettyprint">350Hz</code> signal that was emitted from the source on the same time axis. </p>
<p><a href="https://svbtleusercontent.com/ct9mklzgvcz6ug.png"><img src="https://svbtleusercontent.com/ct9mklzgvcz6ug_small.png" alt="Screenshot 2016-01-16 13.19.59.png"></a></p>
<p>Here are a few more towers, the further away the tower the more we have to shift the signal:</p>
<p><a href="https://svbtleusercontent.com/w2jayrly6w6qia.png"><img src="https://svbtleusercontent.com/w2jayrly6w6qia_small.png" alt="Screenshot 2016-01-16 13.33.01.png"></a></p>
<p>All the towers are now on the same time axis but we still don’t know the initial rotation of the towers, nor our exact position. Let’s pick a random initial rotation of the towers and plot what the scene looks like if this were true. What direction would all the towers be pointing in given that we know the phase shift. </p>
<p><a href="https://svbtleusercontent.com/y611i0n9purmga.png"><img src="https://svbtleusercontent.com/y611i0n9purmga_small.png" alt="Screenshot 2016-01-16 13.45.45.png"></a></p>
<p>This is a disaster. The green dots are the towers and from each tower a number of rays are emitted, there is one ray for each peak of the received signal for that frequency. The rays all go off in different directions unfortunately. Which means that the initial guess at the rotation of the towers was wrong. Let’s try another random rotation:</p>
<p><a href="https://svbtleusercontent.com/0kcre1dubhyuwg.png"><img src="https://svbtleusercontent.com/0kcre1dubhyuwg_small.png" alt="Screenshot 2016-01-16 13.46.13.png"></a></p>
<p>Looks a bit better but still not obvious. Ideally we want all the rays to intersect at one location this would be the maximum likelihood estimate of the location given the received data. Unfortunately this is not going to happen, because we are dealing with real world data here and there are various sources of noise in this system. For example the rays from a tower don’t go in exactly the same direction because of the imprecision of measuring the times. We have also approximated our initial location which means the phase shift is slightly off. We need a way to quantify <em>how closely the rays all point to the same location</em>. One way to do this is to assume that all the lines <em>should</em> intersect at a specific point and measure the norm of the residuals of the least square solution of this linear system. Here’s an example with three lines:</p>
<pre><code class="prettyprint lang-python">import numpy as np
points = [
(-1, -1, 1, 1),
(1, -1, 0, 1),
(1, 0, -1, 1)
]
A = []
b = []
for x1, y1, x2, y2 in points:
m = (y2 - y1) / (x2 - x1) # Check div/0
c = y1 - (m*x1)
A.append([-m, 1])
b.append(c)
x_sol = np.linalg.lstsq(A, b)[0]
A = np.array(A)
b = np.array(b)
x_est = b - A.dot(x_sol)
R = np.linalg.norm(x_est)
print 'Error=', R
</code></pre>
<p>For example if we have three lines that do intersect the norm of the residuals will be zero where as if they don’t the error will increase:</p>
<p><a href="https://svbtleusercontent.com/z3m2z3dilh7urg.png"><img src="https://svbtleusercontent.com/z3m2z3dilh7urg_small.png" alt="Screenshot 2016-01-16 14.03.27.png"></a> </p>
<p>We can use this error value as an objective function and plug our initial guess at the position as well as the range of possible tower orientations <code class="prettyprint">-np.pi</code> to <code class="prettyprint">np.pi</code> into a <a href="https://en.wikipedia.org/wiki/Broyden%E2%80%93Fletcher%E2%80%93Goldfarb%E2%80%93Shanno_algorithm">BFGS non-linear solver</a> and we should see our unknown position refine to something that agrees with all the towers.</p>
<pre><code class="prettyprint lang-python">from scipy.optimize import minimize
initial_estimate = [0.0, -181, 75]
result = minimize(objective_func, initial_estimate, method='BFGS')
</code></pre>
<p>And away we go:</p>
<p><a href="https://svbtleusercontent.com/rl9gtzamgaukxw.gif"><img src="https://svbtleusercontent.com/rl9gtzamgaukxw_small.gif" alt="animation.gif"></a></p>
<p>Awesome. Out pops our most likely position of <code class="prettyprint">-179.2, -95.6</code>, which turns out to be within <code class="prettyprint">30</code> centimeters of the actual location which was at <code class="prettyprint">-179.4, -95.3</code>. Not too bad. <a href="https://www.dropbox.com/sh/dritd95i4s1s1fc/AABqSVEVvgTs7N36B_xBGnUea?dl=0">Here is the data</a> if anyone want to play around with it. It’s interesting to consider the implications of being able to work backwards from a know position and be able to determine the location and power of a missing tower. It’s pretty much the same problem. This is interesting if you’re looking at RADAR. For example the missing Malaysia flight MH370 that was never found last year.</p>
<p><a href="https://svbtleusercontent.com/b9up3ogmzhnmia.gif"><img src="https://svbtleusercontent.com/b9up3ogmzhnmia_small.gif" alt="mal.gif"></a></p>
<p>Let’s say there was a someone that could have located the aircraft or at least given an estimate of where it was. By doing so they would also possibly have revealed the capabilities and or locations of their own RADAR systems which might not have been something they wanted. Hopefully this wasn’t the case but it’s something to think about.</p>