-
Notifications
You must be signed in to change notification settings - Fork 16
Use LoopVectorization.@turbo
in dynamic expression evaluation scheme
#9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Hm, it's getting stack overflows again on this test, even with the safe mode:
Maybe the safe operators break some of the assumptions?
|
Feel free to either define |
Thanks! I can certainly also make it so that only How hard do you think it would be for me to build a macro in this package that does a test vectorization on the given type and operator, and, if it fails for whatever reason, it falls back to Maybe a simpler but less general solution would just be to have a pre-defined set of operators that are known to work, and only |
Here's an example. Could something like this be done in a macro instead of a generated function? Maybe even as a new using DynamicExpressions
using LoopVectorization
@generated function unary_operator_kernel!(
x::AbstractArray{T}, y::AbstractArray{T}, ::Val{op_idx}, operators::OperatorEnum
) where {T,op_idx}
# First, we try to @turbo and eval an example array:
num_samples = 32
_x = similar(x, num_samples);
_y = similar(x, num_samples);
# Get operator from type:
unaops = operators.parameters[2]
op = unaops.parameters[op_idx].instance
can_turbo = try
@turbo for i in indices(_x)
_y[i] = op(_x[i])
end
true
catch # Catch ALL errors.
false
end
if can_turbo
quote
@turbo for i in indices(x)
y[i] = $op(x[i])
end
end
else
quote
@inbounds @simd for i in indices(x)
y[i] = $op(x[i])
end
end
end
end
x = randn(Float16, 100);
y = similar(x);
operators = OperatorEnum(;unary_operators=[abs])
unary_operator_kernel!(x, y, Val(1), operators) Even though |
My suggestion was to PR LoopVectorization to disable it for Another option is to actually forward and promote type information to the |
Thanks, will consider! Would that be the one remaining issue or do you foresee others? I am thinking if I simply want to add an opt-in “use_turbo” argument to the evaluation function, so that users are not surprised by any errors that come out of evaluation. |
Weird, also seeing this for the derivative ops with Float32 variables. Just to be robust maybe it's best I leave it optional for now with a
|
Okay, with turbo set to optional, I can get the tests to pass. I'm seeing a really nice speed boost!! using DynamicExpressions
X = randn(Float32, 3, 5_000);
# Feature nodes:
x1, x2 = Node(;feature=1), Node(;feature=2)
# Dynamically construct the expression:
tree = cos(x1 - 3.2) * 1.5 - 2.0 / (sin(x2) * sin(x2) + 0.01)
# Evaluate:
@btime tree(X)
# 128 us
@btime tree(X; turbo=true)
# 57 us Wow that is fast! |
Pull Request Test Coverage Report for Build 3331534317
💛 - Coveralls |
Still not as nice as a handwritten kernel though (the more complex the expression, the faster a single kernel will be compared to the recursive evaluation): function f(X)
y = Array{Float32}(undef, size(X, 2))
@turbo for i in indices(y)
x1 = X[1, i]
x2 = X[2, i]
y[i] = cos(x1 - 3.2) * 1.5 - 2.0 / (sin(x2) * sin(x2) + 0.01)
end
y
end
@btime f(X)
# 5.2 us |
With JuliaSIMD/LoopVectorization.jl#431, this should potentially work for arbitrary user-defined operators, since it will fall back to
@inbounds @simd
when an operator cannot be SIMD-ified. It gets the evaluation even faster for me - now evaluation about 30% faster than a simple handwritten function, and even a little bit faster than a manually-written SIMD loop (which does not use@turbo
)Let's see if this works.
cc @chriselrod