Using A Heap Sort To Find The Closest Point

There’s a common CS question that can crop up in coding interviews from time to time; famously used by Amazon, the ‘Find the closest set of points to K points’. This problem is an interesting one and involves several separations of the problem to understand how it works.

If you’re like me, algorithms and data science may not be your primary passions, I know it’s one of my weaker areas (I’m a full stack web dev) currently which is why I am trying to investigate and learn more whilst documenting my experiences along the way. I have analysed this problem and broke it down into it’s constituent parts to hopefully show a logical approach to solving this problem.

Disclaimer: This tutorial will use JavaScript to demonstrate the code, however this can be easily transferred into your language of choice.

Identifying The Approach

Graph showing [0,0] and [7, 6]

In order to calculate a value to represented the distance between these two points, we can use pythagoras theorem which follows the format:

Pythagoras Equation For Distance Vector

In the context of this problem we are going to call point [0,0] point A and point [6,7] point B. The equation would follow the fotmat:

In-case this doesn’t make sense, here’s a quick explanation. We are looking for the different between the X coordinate of point A and the X coordinate of point B therefore we subtract the x-coordinate of point A from the X coordinate of point B. What a mouth full!

In JS, this can be represented with the following function which we will define at the top of our JS file. In the following function, set refers to the target set, in this case [6,7] and k refers to the reference point, [0,0]:

Now all that is left to calculate our distance based values is to define some sample data and then iterate over the data, assigning each tuple of points a distance value. I have setup some sample data that you can use in your application, for the sake of simplicity, place this at the very top of your JS file:

let data = [[-2,4], [0,-2], [-1,0], [3,-5], [-2,-3], [3,8]];

We’re going to define a function called kPoints which is where our main application will run from

In this function, we create a new array called disArr to hold the new data we are generating. This array will have the new data sets pushed onto it. With each iteration through the test data previously defined, we push a new JSON object onto the array with the attributes points and distance. If you’re using Python or similar languages, this is perfectly okay to implement with a datatype of your choice, a dictionary or hash map potentially being the best path to take.

Executing the following function will return the following result;

[ { points: [ -2, 4 ], distance: 4.47213595499958 },{ points: [ 0, -2 ], distance: 2 },{ points: [ -1, 0 ], distance: 1 },{ points: [ 3, -5 ], distance: 5.830951894845301 },{ points: [ -2, -3 ], distance: 3.605551275463989 },{ points: [ 3, 8 ], distance: 8.54400374531753 } ]

As you can see, each point has been given a distance which represents how far it is away from our target point of [0,0]. You can play around by changing k the target point to whatever you want and you will see the values change accordingly.

Sorting The Data

To sort the data, there are several approaches that could be taken and various algorithms that could be applied. I am going to use the Heap Sort approach because I personally feel that it is the best for the job (purely personal preference and opinion). This article won’t delve too deep into what a heap sort is and does, as there’s already plenty of resources out there; I watched this video to get a better understanding. We are going to use binary tree for this problem as shown in the video.

We are going to define 4 new functions for our heap sort algorithm (in no particular order):

  • heapSort — main call to our heap sort algorithm
  • heapify — Iterate half way through the dataset backwards (only targetting parent nodes)
  • minHeapify — Evaluation of each node and it’s children to determine whether the node is in order
  • swap — Performs the swap on the nodes when called

swap()

minHeapify()

heapSort()

O(n log n)

The heapSort() function will be implemented like so:

heapify()

This will then call minHeapify(), but why? Look at the loop in the code above, the counter i is initialized half way through the array (index 2 because arrays start at 0). This takes some understanding of tree logic and how they work, but essentially we can deduce that in a binary tree, if we do:

(data.length / 2)

it will return the last parent node in the tree, if we then subsequently step back through all indexes towards the start of the array, these items will also be parents. In order to access a parents associated childrem, the following formula can be followed:

first child: index * 2 + 1
second child: index * 2 + 2
parent node: index

Double the current index, and add 1 for the left and 2 for the right. With a basic understanding of tree logic this should now make a little more sense. When we call heapify it is going to step back through all of the parent nodes ensuring that all are in order. Once this is done, we need to run through the tree again to make sure the parents are still correct as we may of switched some nodes around, this is why we loop through all nodes and call minHeapify again in the heapSort algorithm:

Finishing Our Program

The result code should now look like so:

Hope you enjoyed this tutorial and if you have any questions, be sure to leave them on this article and I can get back to you.

Co-Founder of Bel 💪🏼 | Product Manager 📄 | Developer 💻 | Product Design 🎨