Skip to content

Commit 9608ddb

Browse files
authored
Overhaul API (#5)
* The public constructor for the `KWayMerger` is now the new `kway_merge` function. `KWayMerger` is public, but unexported. * Instead of the `F` parameter (and argument to its constructor), `kway_merge` uses the same ordering API as Base's sorting functions. * `KWayMerger{T}` now iterates `@NamedTuple{from_iter::Int, value::T}`, to reduce the risk of users conflating the two elements of the tuple.
1 parent 0631c44 commit 9608ddb

File tree

7 files changed

+157
-117
lines changed

7 files changed

+157
-117
lines changed

CHANGELOG.md

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,4 +7,15 @@ This project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.html
77
## UNRELEASED
88
* Add content here that have been merged, but not made it to a release yet.
99

10+
## [0.2.0]
11+
### Breaking changes
12+
* The public constructor for the `KWayMerger` is now the new `kway_merge` function.
13+
`KWayMerger` is public, but unexported.
14+
* Instead of the `F` parameter (and argument to its constructor), `kway_merge`
15+
uses the same ordering API as Base's sorting functions.
16+
* `KWayMerger{T}` now iterates `@NamedTuple{from_iter::Int, value::T}`, to reduce
17+
the risk of users conflating the two elements of the tuple.
18+
19+
1020
## [0.1.0]
21+
* Initial release

Project.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
name = "KWayMerges"
22
uuid = "f29e91c7-719d-4dbc-8870-0ce36bf055b7"
3-
version = "0.1.0"
3+
version = "0.2.0"
44
authors = ["Jakob Nybo Nissen <[email protected]>"]
55

66
[compat]

README.md

Lines changed: 23 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -6,56 +6,62 @@
66

77
Implementation of k-way merge.
88

9-
This package implements the `KWayMerger` type.
10-
It is a stateful, lazy iterator of the elements in an iterator of iterators.
11-
The elements of the inner iterators will be yielded in an order given by a predicate optionally passed to `KWayMerger` (default: `isless`).
12-
Therefore, if the inner iterators are sorted by the predicate, the output of the `KWayMerger` is also guaranteed to be sorted.
9+
This package exports the function `kway_merge`.
10+
It constructs a `KWayMerger` - a stateful, lazy iterator of the elements in an iterator of iterators.
11+
The elements of the inner iterators will be yielded in order, as specified by the optional ordering (default: `Forward`).
12+
Therefore, if the inner iterators are sorted by the order, the yielded elements of the `KWayMerger` is also guaranteed to be sorted.
1313

14-
The primary purpose of `KWayMerger` is to efficiently merge N sorted iterables into one sorted stream.
14+
The primary purpose of `kway_merge` is to efficiently merge N sorted iterables into one sorted stream.
1515

16-
The iterator yields `(i::Int, x)` tuples, where `x` is the next element of one of the iterators, and `i` is the 1-based index of the iterator that yielded `x`:
16+
The iterator yields `@NamedTuple{from_iter::Int, value::T}`, where the value field has the next element of one of the iterators, and the from_iter field contains the 1-based index of the iterator that yielded the value:
1717

1818
```julia
19-
julia> it = KWayMerger([[2, 3], [1, 4]]);
19+
julia> it = kway_merge([[2, 3], [1, 4]]);
2020

2121
julia> first(it)
22-
(2, 1)
22+
(from_iter = 2, value = 1)
2323

24-
julia> println(collect(it))
24+
julia> println(map(Tuple, it))
2525
[(1, 2), (1, 3), (2, 4)]
2626
```
2727

2828
The function `peek` can be used to check the next element without advancing the iterator:
2929

3030
```julia
31-
julia> it = KWayMerger([1]);
31+
julia> it = kway_merge([1]);
3232

3333
julia> peek(it)
34-
(1, 1)
34+
(from_iter = 1, value = 1)
3535

3636
julia> first(it)
37-
(1, 1)
37+
(from_iter = 1, value = 1)
3838

3939
julia> peek(it) === nothing
4040
true
4141
```
4242

4343
## Documentation
44-
This package's public functionality are the `KWayMerger` type, and its `Base.peek` method.
44+
This package's public functionality are the `kway_merge` function, the (unexported) `KWayMerger` type, and its `Base.peek` method.
4545
See their docstrings for more details.
4646

4747
## Performance
48-
When merging I iterables with a total length of N:
48+
When merging I iterables:
4949
* A `KWayMerger` allocates O(I) space upon construction
5050
* Producing each element takes O(log(I)) time
5151

52-
Therefore, merging I sorted iterables with N total elements using a KWayMerger therefore takes O(N * log(I)) time.
53-
It is generally faster than flattening the iterators and sorting, when I << N.
52+
Therefore, merging I sorted iterables with N total elements using `kway_merge` takes O(N * log(I)) time.
53+
This is similar to the O(N * log(N)) time taken for comparison-based sorts.
54+
That's no co-incidence: One can take a list with N elements, separate it into N 1-element lists, then merge them with a kway-merge. That is a variant of merge sort.
55+
56+
However, compared to a comparison-based sort like quicksort, using a kway merge has the following differences:
57+
* Usually, we have I << N, and therefore, kway merge is usually faster.
58+
* For large I, quicksort is faster in practice because its overhead per element is smaller.
59+
5460
Note that Julia uses radix sort for integers, which sorts in O(N), and therefore usually beats a k-way merge.
5561

5662
## Contributing
5763
We appreciate contributions from users including reporting bugs, fixing
58-
issues, improving performance and adding new features.
64+
issues, improving performance and adding new fea oftentures.
5965

6066
Take a look at the [contributing files](https://github.com/BioJulia/Contributing)
6167
detailed contributor and maintainer guidelines, and code of conduct.

checklist.md

Lines changed: 0 additions & 16 deletions
This file was deleted.

src/KWayMerges.jl

Lines changed: 90 additions & 55 deletions
Original file line numberDiff line numberDiff line change
@@ -1,135 +1,170 @@
11
module KWayMerges
22

3-
export KWayMerger
3+
using Base.Order: Ordering, Forward, ord, lt
4+
5+
export kway_merge
6+
7+
@static if VERSION >= v"1.11.0"
8+
eval(Meta.parse("public KWayMerger"))
9+
end
410

511
include("heap.jl")
612

713
"""
8-
KWayMerger{T, I, F}(f::F, iterators)
9-
KWayMerger{T, I}(iterators)
10-
KWayMerger(f, iterators)
11-
KWayMerger(iterators)
14+
KWayMerger{T, I, O, S}
15+
16+
Stateful iterator of a k-way merge of multiple iterators of the same type.
17+
Constructed using [`kway_merge`](@ref).
18+
19+
The type parameters are:
20+
* `T`: Element type of iterators
21+
* `I`: Iterator type
22+
* `O`: Ordering, subtype of `Base.Ordering`
23+
* `S`: Type of state of iterators
24+
"""
25+
struct KWayMerger{T, I, O <: Base.Ordering, S}
26+
ordering::O
27+
iterators::Vector{I}
28+
states::Vector{S}
29+
heap::Vector{@NamedTuple{from_iter::Int, value::T}}
30+
end
31+
32+
"""
33+
kway_merge(
34+
iterators;
35+
lt=isless,
36+
by=identity,
37+
rev::Bool=false,
38+
order::Base.Order.Ordering=Base.Order.Forward
39+
)
40+
kway_merge(::Type{T}, ::Type{T}, iterators; kwargs...)
41+
kway_merge(::Type{T}, ::Type{T}, ordering::Ordering, iterators)
1242
1343
Create a stateful iterator which does a k-way merge between multiple
1444
iterators of the same type.
1545
16-
This iterator yields `(index::Int, x::T)` elements, where `x` is the next element from
17-
one of the iterators, and `index` is the 1-based index of the iterator that yielded `x`.
18-
The elements `x` are chosen from among the iterators such that, among all elements which
19-
are the next element of the iterators, the element is chosen which is the smallest
20-
according to the predicate `f::F`, which defaults to `isless`.
21-
22-
This implies that if all iterators are sorted by `f`, the yielded will be in sorted
23-
order.
46+
This iterator yields `@NamedTuple{from_iter::Int, value::T}` elements, where `value` is the
47+
next element from one of the iterators, and `from_iter` is the 1-based index of the iterator
48+
that yielded `value`.
49+
The element `value` is chosen among the iterators such that, among all elements which
50+
are the next element of the iterators, the element is chosen which is the first
51+
according to the ordering.
52+
This implies that if all iterators are sorted by `f`, the yielded will be in sorted order.
2453
Hence, a `KWayMerger` is typically used to combined multiple sorted arrays
2554
into one sorted array.
2655
56+
The ordering is given by the keywords `by`, `lt`, `rev` and `order` - these are the
57+
same as for `Base.sort!`.
58+
59+
2760
# Examples
2861
```jldoctest
2962
julia> arrs = [[1,6], [2], [5,7], [3,4,8]];
3063
31-
julia> it = KWayMerger(arrs);
64+
julia> it = kway_merge(arrs);
65+
66+
julia> first(it, 2)
67+
2-element Vector{@NamedTuple{from_iter::Int64, value::Int64}}:
68+
(from_iter = 1, value = 1)
69+
(from_iter = 2, value = 2)
3270
33-
julia> print(collect(it))
34-
[(1, 1), (2, 2), (4, 3), (4, 4), (3, 5), (1, 6), (3, 7), (4, 8)]
71+
julia> print(map(Tuple, it))
72+
[(4, 3), (4, 4), (3, 5), (1, 6), (3, 7), (4, 8)]
3573
```
3674
3775
# Extended help
38-
The type parameters are:
39-
* `F`: Type of function used to compare the elements. It defaults
40-
to `typeof(Base.isless)`
41-
* `T`: Element type of iterators
42-
* `I`: Iterator type
43-
* `S`: Type of state of iterators
44-
4576
All iterators must be of the same type. For the constructors which don't pass
4677
in `T` and `I` explicitly, `Base.eltype` is used
4778
to determine them; since its default implementation
4879
returns `Any`, explicitly passing them may be needed for good performance for some
4980
iterators.
5081
5182
`S` is derived automatically, but this must be a fixed type;
52-
iterators that use states of multiple different types may
53-
not be supported by `KWayMerger`.
83+
iterators that use states of multiple different types during iteration may
84+
not be supported.
5485
"""
55-
struct KWayMerger{T, I, F, S}
56-
f::F
57-
iterators::Vector{I}
58-
states::Vector{S}
59-
heap::Vector{Tuple{Int, T}}
60-
end
61-
62-
function KWayMerger{T, I, F}(f::F, iterators) where {T, I, F}
86+
function kway_merge(::Type{T}, ::Type{I}, ordering::O, iterators) where {T, I, O}
6387
iters = vec(collect(iterators))
6488
states = nothing
65-
things = Tuple{Int, T}[]
89+
things = @NamedTuple{from_iter::Int, value::T}[]
6690
for i in eachindex(iters)
6791
it = iterate(iters[i])
6892
isnothing(it) && continue
69-
(thing::T, state) = it
93+
(value::T, state) = it
7094
if isnothing(states)
7195
states = Vector{typeof(state)}(undef, length(iters))
7296
end
73-
push!(things, (i, thing))
97+
push!(things, (; from_iter = i, value))
7498
states[i] = state
7599
end
76-
heapify!(f, things)
100+
heapify!(ordering, things)
77101
states = if isnothing(states)
78102
Vector{Union{}}(undef, length(iters))
79103
else
80104
states
81105
end
82-
return KWayMerger{T, I, F, eltype(states)}(f, iters, states, things)
106+
return KWayMerger{T, I, O, eltype(states)}(ordering, iters, states, things)
83107
end
84108

85-
function KWayMerger{T, I}(iterators) where {T, I}
86-
return KWayMerger{T, I, typeof(isless)}(isless, iterators)
109+
function kway_merge(
110+
::Type{T},
111+
::Type{I},
112+
iterators;
113+
lt = isless,
114+
by = identity,
115+
rev::Bool = false,
116+
order::Base.Ordering = Forward,
117+
) where {T, I}
118+
ordering = ord(lt, by, rev, order)
119+
return kway_merge(T, I, ordering, iterators)
87120
end
88121

89-
KWayMerger(iterators) = KWayMerger(isless, iterators)
90-
91-
function KWayMerger(f::F, iterators) where {F}
122+
function kway_merge(iterators; kwargs...)
92123
I = eltype(typeof(iterators))
93124
T = eltype(I)
94-
return KWayMerger{T, I, F}(f, iterators)
125+
return kway_merge(T, I, iterators; kwargs...)
95126
end
96127

97128
# We could technically know this, but KWayMerger is stateful, and
98129
# Julia's iterator length works badly with stateful iterators.
99130
Base.IteratorSize(::Type{<:KWayMerger}) = Base.SizeUnknown()
100-
Base.eltype(::Type{<:KWayMerger{T}}) where {T} = Tuple{Int, T}
131+
Base.eltype(::Type{<:KWayMerger{T}}) where {T} = @NamedTuple{from_iter::Int, value::T}
101132

102133
function Base.iterate(x::KWayMerger, ::Nothing = nothing)
103134
isempty(x.heap) && return nothing
104-
(i, item) = @inbounds x.heap[1]
105-
iterator = @inbounds x.iterators[i]
106-
state = @inbounds x.states[i]
135+
top = @inbounds x.heap[1]
136+
iterator = @inbounds x.iterators[top.from_iter]
137+
state = @inbounds x.states[top.from_iter]
107138
it = iterate(iterator, state)
108139
if it === nothing
109-
@inbounds heappop!(x.f, x.heap)
140+
@inbounds heappop!(x.ordering, x.heap)
110141
else
111142
(new_item, new_state) = it
112-
@inbounds x.states[i] = new_state
113-
@inbounds heapreplace!(x.f, x.heap, (i, new_item))
143+
@inbounds x.states[top.from_iter] = new_state
144+
@inbounds heapreplace!(
145+
x.ordering,
146+
x.heap,
147+
(; from_iter = top.from_iter, value = new_item)
148+
)
114149
end
115-
return ((i, item), nothing)
150+
return (top, nothing)
116151
end
117152

118153
Base.isempty(x::KWayMerger) = isempty(x.heap)
119154
Base.isdone(x::KWayMerger) = isempty(x.heap)
120155

121156
"""
122-
peek(x::KWayMerger{T})::Union{Tuple{Int, T}, Nothing}
157+
peek(x::KWayMerger{T})::Union{@NamedTuple{from_iter::Int, value::T}, Nothing}
123158
124159
Get the first element of `x` without advancing the iterator, or `nothing` if the
125160
iterator is empty.
126161
127162
# Examples
128163
```jldoctest
129-
julia> it = KWayMerger([[3, 4], [2, 7]]);
164+
julia> it = kway_merge([[3, 4], [2, 7]]);
130165
131166
julia> peek(it)
132-
(2, 2)
167+
(from_iter = 2, value = 2)
133168
134169
julia> collect(it); # exhaust stateful iterator
135170

0 commit comments

Comments
 (0)