# Using A Heap Sort To Find The Closest Point

There’s a common CS question that can crop up in coding interviews from time to time; famously used by Amazon, the *‘Find the closest set of points to K points’*. This problem is an interesting one and involves several separations of the problem to understand how it works.

If you’re like me, algorithms and data science may not be your primary passions, I know it’s one of my weaker areas (I’m a full stack web dev) currently which is why I am trying to investigate and learn more whilst documenting my experiences along the way. I have analysed this problem and broke it down into it’s constituent parts to hopefully show a logical approach to solving this problem.

**Disclaimer: **This tutorial will use JavaScript to demonstrate the code, however this can be easily transferred into your language of choice.

# Identifying The Approach

There are several ways that a problem such as this could be solved, I have opted to go with a distance based vector approach. Essentially, let’s say we have our target point at [0,0] and we have another point [7, 6]. It can be represented on a graph like so:

In order to calculate a value to represented the **distance** between these two points, we can use pythagoras theorem which follows the format:

In the context of this problem we are going to call point **[0,0]** point **A** and point **[6,7] **point **B**. The equation would follow the fotmat:

In-case this doesn’t make sense, here’s a quick explanation. We are looking for the different between the X coordinate of point **A** and the X coordinate of point **B** therefore we subtract the x-coordinate of point **A** from the X coordinate of point **B. **What a mouth full!

In JS, this can be represented with the following function which we will define at the top of our JS file. In the following function, ** set** refers to the target set, in this case

**[6,7]**and k refers to the reference point,

**[0,0]**:

Now all that is left to calculate our distance based values is to define some sample data and then iterate over the data, assigning each tuple of points a distance value. I have setup some sample data that you can use in your application, for the sake of simplicity, place this at the very top of your JS file:

`let data = [[-2,4], [0,-2], [-1,0], [3,-5], [-2,-3], [3,8]];`

We’re going to define a function called kPoints which is where our main application will run from

In this function, we create a new array called *disArr** *to hold the new data we are generating. This array will have the new data sets pushed onto it. With each iteration through the test data previously defined, we push a new JSON object onto the array with the attributes *points** and *** distance**. If you’re using Python or similar languages, this is perfectly okay to implement with a datatype of your choice, a dictionary or hash map potentially being the best path to take.

Executing the following function will return the following result;

[ { points: [ -2, 4 ], distance: 4.47213595499958 },{ points: [ 0, -2 ], distance: 2 },{ points: [ -1, 0 ], distance: 1 },{ points: [ 3, -5 ], distance: 5.830951894845301 },{ points: [ -2, -3 ], distance: 3.605551275463989 },{ points: [ 3, 8 ], distance: 8.54400374531753 } ]

As you can see, each point has been given a distance which represents how far it is away from our target point of **[0,0]**. You can play around by changing ** k** the target point to whatever you want and you will see the values change accordingly.

## Sorting The Data

It’s evident from our returned data that although we have got the distances for all points, we can see which one is the closest point (**[-1,0]), **however** **it’s not actually any use to the computer. Let’s assume we have 1,000 points to analyse, would you be able to look at a long screen of data and pick out the closest? I would assume not, unless you’re a wizard in which case congratulations, you probably don’t need to read on.

To sort the data, there are several approaches that could be taken and various algorithms that could be applied. I am going to use the **Heap Sort**** **approach** **because I personally feel that it is the best for the job (purely personal preference and opinion). This article won’t delve too deep into what a heap sort is and does, as there’s already plenty of resources out there; I watched this video to get a better understanding. We are going to use binary tree for this problem as shown in the video.

We are going to define 4 new functions for our heap sort algorithm (in no particular order):

- heapSort — main call to our heap sort algorithm
- heapify — Iterate half way through the dataset backwards (only targetting parent nodes)
- minHeapify — Evaluation of each node and it’s children to determine whether the node is in order
- swap — Performs the swap on the nodes when called

## swap()

Swap is the fundamental operation of our algorithm and follows a standard swap format:

## minHeapify()

Here we do our analysis on the points distance vector unit, it’s where we analyse each node of the binary tree to see which value is smallest. A while loop is used inside this function as we need to perform at least 2 checks on each node to evaluate whether all three nodes have been compared against each other. When the largest node of the three is in the parent position, the loop will *break*:

## heapSort()

The heapsort algorithm will return a promise, the reason for this is because JavaScript is asynchronous and it cannot be guaranteed how long the sorting operation will take, however it has a **Big O notation** of:

`O(`*n* log *n*)

The** heapSort()** function will be implemented like so:

## heapify()

You may of noticed in the *heapSort()** *function that we first call heapify and then subsequently call ** minHeapify()** again, why is this? Well, the call to

**calls the following function:**

*heapify*This will then call ** minHeapify()**, but why? Look at the loop in the code above, the counter

**is initialized half way through the array (index 2 because arrays start at 0). This takes some understanding of tree logic and how they work, but essentially we can deduce that in a binary tree, if we do:**

*i*`(data.length / 2)`

it will return the last parent node in the tree, if we then subsequently step back through all indexes towards the start of the array, these items will also be parents. In order to access a parents associated childrem, the following formula can be followed:

`first child: index * 2 + 1`

second child: index * 2 + 2

parent node: index

Double the current index, and add 1 for the left and 2 for the right. With a basic understanding of tree logic this should now make a little more sense. When we call ** heapify** it is going to step back through all of the parent nodes ensuring that all are in order. Once this is done, we need to run through the tree again to make sure the parents are still correct as we may of switched some nodes around, this is why we loop through all nodes and call

**again in the**

*minHeapify***algorithm:**

*heapSort*## Finishing Our Program

Now that we have our heap sort algorithm in place and ready to use, the last thing is to finalize the ** kPoints **function like so:

The result code should now look like so:

Hope you enjoyed this tutorial and if you have any questions, be sure to leave them on this article and I can get back to you.