Skip to content

Proposal: SimplePriorityQueue overloads that expose priority when trivial, for performance and thread safety #54

@JohannesMP

Description

@JohannesMP

Overview

When working with priority queues it's often useful to be able to access the priority of the head node as you dequeue it. This is especially common in many pathfinding and graph traversal algorithms.

When using GenericPriorityQueue or FastPriorityQueue you manipulate the nodes directly and so have access to their respective Priority properties.

When using SimplePriorityQueue the nodes are not publicly exposed and so it is currently necessary to first get the priority of the head element, and only then dequeue it.

A naïve user implementation might look like this:

public static class SimplePriorityQueueExtensions
{
    public static TItem Dequeue<TItem, TPriority>(
        this SimplePriorityQueue<TItem, TPriority> queue,
        out TPriority priority)
    {
        priority = queue.GetPriority(queue.First);
        return queue.Dequeue();
    }
    
    public static bool TryDequeue<TItem, TPriority>(
        this SimplePriorityQueue<TItem, TPriority> queue,
        out TItem first, 
        out TPriority priority)
    {
        // Can't use TryDequeue because by that point we've already lost the priority value stored :(
        if (!queue.TryFirst(out first))
        {
            priority = default;
            return false;
        }
        
        priority = queue.GetPriority(first);
        queue.Dequeue(); // Already have 'first' assigned, so no need to store it again.
        return true;
    }
}

This achieves the desired result using SimplePriorityQueue.GetPriority (added as a result of related issue #27), but this implementation is suboptimal (even if for a moment we ignore the fact that it is not thread safe).

The Problem

In its current implementation SimplePriorityQueue.Dequeue() retrieves an internal SimpleNode , which contains both the Data as well as the Priority properties we need, but only returns the data to the caller:

public TItem Dequeue()
{
    lock(_queue)
    {
        if(_queue.Count <= 0)
        {
            throw new InvalidOperationException("Cannot call Dequeue() on an empty queue");
        }

        SimpleNode node =_queue.Dequeue();
        RemoveFromNodeCache(node);
        return node.Data;
    }
}

So by performing Dequeue/TryDequeue we actually already have access the priority value of the head node that we want! It's just not exposed to the caller.

They instead have to make an additional GetPriority call, which in the general case means a _itemToNodesCache lookup since it is intended to work for any node, that could be avoided altogether.

The Proposal

It would be trivial to add a Dequeue and TryDequeue overload to SimplePriorityQueue that each have a out TPriority parameter, since we get that value for free in both of the existing implementations.

For example, the overload for Dequeue could be:

/// <summary>
/// Removes the head of the queue (node with minimum priority; ties are broken by order of insertion), 
/// and returns it along with its priority.
/// If queue is empty, throws an exception
/// O(log n)
/// </summary>
public TItem Dequeue(out TPriority priority)
{
    lock(_queue)
    {
        if(_queue.Count <= 0)
        {
            throw new InvalidOperationException("Cannot call Dequeue() on an empty queue");
        }

        SimpleNode node =_queue.Dequeue();
        priority = node.Priority; // <---- added
        RemoveFromNodeCache(node);
        return node.Data;
    }
}

And the overload for TryDequeue could be:

/// <summary>
/// Removes the head of the queue (node with minimum priority; ties are broken by order of insertion), 
/// and sets it to first along with its priority.
/// Useful for multi-threading, where the queue may become empty between calls to Contains() and Dequeue()
/// or between GetProprity() and TryDequeue()
/// Returns true if successful; false if queue was empty
/// O(log n)
/// </summary>
public bool TryDequeue(out TItem first, out TPriority priority)
{
    if (_queue.Count > 0)
    {
        lock (_queue)
        {
            if (_queue.Count > 0)
            {
                SimpleNode node = _queue.Dequeue();
                first = node.Data;
                priority = node.Priority; // <---- added
                RemoveFromNodeCache(node);
                return true;
            }
        }
    }
    
    first = default(TItem);
    priority = default(TPriority); // <---- added
    return false;
}

They are identical to the existing implementations, except for assigning the priority out parameter. There should be no performance overhead other than one additional assignment, which is trivial (and would still happen with (Try)GetPriority anyway).

This would even have the additional benefit of being thread-safe, unlike the naïve user implementation provided as an example at the start.

If it was deemed necessary to keep the function names unique and not overload them they could be named DequeueWithPriority and TryDequeueWithPriority respectively, or something comparable. 2nd hardest problem in computer science and all that...

Keeping things 'Simple'

The argument could be made that if you care enough about optimizing Dequeue with priority you should just use GenericPriorityQueue or FastPriorityQueue, and that the public signature of SimplePriorityQueue should be kept, well, 'simple'.

That is valid, but given how trivial this implementation would be to add it just seems like such a missed opportunity, especially for saving an additional hash for an operation that as I understand it is very common in some of the main applications of Priority Queues.

More importantly SimplePriorityQueue promises to be thread-safe, and requiring the user to make three calls means we can no longer guarantee that the operation is thread-safe - it's now up to the user to ensure that they correctly lock access to the container for the duration of their First (or TryFirst), GetPriority and Dequeue calls. Yuck.

It just seems like the equivalent of creating a SimpleDictionary implementation that promises to be "thread-safe" , but only has Contains and GetValue and does not provide a TryGetValue:

// Needs to hash key twice
// Cannot guarantee thread-safe value access
if(SimpleDictionary.Contains(key))
{
    Value value = SimpleDictionary.GetValue(key);
    // Use value here ...
}


// Needs to hash key once
// Can guarantee thread-safe value access
if(SimpleDictionary.TryGetValue(key, out Value value))
{
    // Use value here ...
}

The primary purpose of this library is performance (it's in the name after all), so I'd like to make the argument that even its 'simple' implementation would benefit from this addition, while (especially if we use unique names) remaining backwards compatible.

The proposed addition would make a common access pattern safer (one less place the user has to worry about thread safety) and faster, at no extra cost.

Relevant issues

  • Retrieve priority in Peek/Dequeue #27 : The first discussion of this issue, which resulted in the addition of SimplePriorityQueue.GetPriority. As outlined here that is sub-optimal in this specific case where we already have the node and so don't need to hash to retrieve it. It also requires the caller to manage their own thread safety which the proposed change also throws in for free.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions