Discussion:
[julia-users] efficient use of shared arrays and @parallel for
thr
2015-07-31 00:51:24 UTC
Permalink
Hi all,

I'm implementing a basic explicit advection algorithm of the form:

for t = 1:T-1
for j = 3:n-2
for i = 3:m-2
q[i,j,t+1]= timestep(q[i,j,t], u[i,j,t])
end
end
end


where q is a quantity and u a velocity field.
I'd like to parallelize this by using sharded arrays and @parallel for, I
tried the following:

const n = 500
const m = 500
const T = 500

@everywhere function timestep(x,y)
#return x+y
return x+y +x+y +x+y +x+y +x+y +x+y +x+y
end

function advection_ser(q, u)
println("==============serial=================$n x $m x $T")
for t = 1:T-1
for j = 3:n-2
for i = 3:m-2
q[i,j,t+1]= timestep(q[i,j,t], u[i,j,t])
end
end
end
return q
end

function advection_par(q,u)
println("==============parallel=================$n x $m x $T")
for t = 1:T-1
@sync @parallel for j = 3:n-2
for i = 3:m-2
q[i,j,t+1]= timestep(q[i,j,t], u[i,j,t])
end
end
end
return q
end

q = SharedArray(Float64, (m,n,T), init=false)
u = SharedArray(Float64, (m,n,T), init=false)

@time qs = advection_ser(q,u)
@time qp = advection_par(q,u)




But this yields only a very moderate speed gain: the parallel version is
about 1/3 faster than the serial version for m,n,T=500,500,500 and -p 4.
Is there a way I can improve on this?

I have also seen some weird behaviour regarding shared arrays and I'd like
to verify that I'm not just doing it wrong before opening issues:

1. When I construct q inside of the advection function, @code_warntype
tells me that it's handled as an 'any' and the code is much slower.
However, typeof(q) tells me it's of type SharedArray{Float64,3} as it
should be.

2. I'm pretty sure there's a memory hole associated with SharedArrays, for
when I start above program over and over eventually I get a bus error and
julia crashes. Do I have to somehow release the shared memory from the
workers?

Thanks in advance, Johannes
thr
2015-07-31 01:57:28 UTC
Permalink
I also noticed a lot more Any-type warnings in the parallel version when
run with @code_warntype. I tried to annotate the types almost everywhere,
it didn't help.
Tim Holy
2015-07-31 12:02:20 UTC
Permalink
I have a demo to answer your question, but I'd like to simply add it to the
documentation on SharedArrays. May I use some of your code in writing up the
demo? (MIT license, see
https://github.com/JuliaLang/julia/blob/master/CONTRIBUTING.md#improving-documentation)

--Tim
Post by thr
Hi all,
for t = 1:T-1
for j = 3:n-2
for i = 3:m-2
q[i,j,t+1]= timestep(q[i,j,t], u[i,j,t])
end
end
end
where q is a quantity and u a velocity field.
const n = 500
const m = 500
const T = 500
@everywhere function timestep(x,y)
#return x+y
return x+y +x+y +x+y +x+y +x+y +x+y +x+y
end
function advection_ser(q, u)
println("==============serial=================$n x $m x $T")
for t = 1:T-1
for j = 3:n-2
for i = 3:m-2
q[i,j,t+1]= timestep(q[i,j,t], u[i,j,t])
end
end
end
return q
end
function advection_par(q,u)
println("==============parallel=================$n x $m x $T")
for t = 1:T-1
@sync @parallel for j = 3:n-2
for i = 3:m-2
q[i,j,t+1]= timestep(q[i,j,t], u[i,j,t])
end
end
end
return q
end
q = SharedArray(Float64, (m,n,T), init=false)
u = SharedArray(Float64, (m,n,T), init=false)
@time qs = advection_ser(q,u)
@time qp = advection_par(q,u)
But this yields only a very moderate speed gain: the parallel version is
about 1/3 faster than the serial version for m,n,T=500,500,500 and -p 4.
Is there a way I can improve on this?
I have also seen some weird behaviour regarding shared arrays and I'd like
tells me that it's handled as an 'any' and the code is much slower.
However, typeof(q) tells me it's of type SharedArray{Float64,3} as it
should be.
2. I'm pretty sure there's a memory hole associated with SharedArrays, for
when I start above program over and over eventually I get a bus error and
julia crashes. Do I have to somehow release the shared memory from the
workers?
Thanks in advance, Johannes
thr
2015-07-31 12:47:52 UTC
Permalink
Yes, sure.
Post by Tim Holy
I have a demo to answer your question, but I'd like to simply add it to the
documentation on SharedArrays. May I use some of your code in writing up the
demo? (MIT license, see
https://github.com/JuliaLang/julia/blob/master/CONTRIBUTING.md#improving-documentation)
--Tim
Post by thr
Hi all,
for t = 1:T-1
for j = 3:n-2
for i = 3:m-2
q[i,j,t+1]= timestep(q[i,j,t], u[i,j,t])
end
end
end
where q is a quantity and u a velocity field.
I
Post by thr
const n = 500
const m = 500
const T = 500
@everywhere function timestep(x,y)
#return x+y
return x+y +x+y +x+y +x+y +x+y +x+y +x+y
end
function advection_ser(q, u)
println("==============serial=================$n x $m x $T")
for t = 1:T-1
for j = 3:n-2
for i = 3:m-2
q[i,j,t+1]= timestep(q[i,j,t], u[i,j,t])
end
end
end
return q
end
function advection_par(q,u)
println("==============parallel=================$n x $m x $T")
for t = 1:T-1
@sync @parallel for j = 3:n-2
for i = 3:m-2
q[i,j,t+1]= timestep(q[i,j,t], u[i,j,t])
end
end
end
return q
end
q = SharedArray(Float64, (m,n,T), init=false)
u = SharedArray(Float64, (m,n,T), init=false)
@time qs = advection_ser(q,u)
@time qp = advection_par(q,u)
But this yields only a very moderate speed gain: the parallel version is
about 1/3 faster than the serial version for m,n,T=500,500,500 and -p 4.
Is there a way I can improve on this?
I have also seen some weird behaviour regarding shared arrays and I'd
like
Post by thr
tells me that it's handled as an 'any' and the code is much slower.
However, typeof(q) tells me it's of type SharedArray{Float64,3} as it
should be.
2. I'm pretty sure there's a memory hole associated with SharedArrays,
for
Post by thr
when I start above program over and over eventually I get a bus error
and
Post by thr
julia crashes. Do I have to somehow release the shared memory from the
workers?
Thanks in advance, Johannes
Eduardo Lenz
2015-07-31 14:22:23 UTC
Permalink
Hi.

Regarding your questions, I am also having the same problems and no luck
with those {Any} and SharedArrays. Also,
I am having the same problems with those crashes, so I have to restart
Julia after finishing any parallel code.

I am looking forward to see Tim's comments about those type instabilities.

Thanks.
Post by thr
Yes, sure.
Post by Tim Holy
I have a demo to answer your question, but I'd like to simply add it to the
documentation on SharedArrays. May I use some of your code in writing up the
demo? (MIT license, see
https://github.com/JuliaLang/julia/blob/master/CONTRIBUTING.md#improving-documentation)
--Tim
Post by thr
Hi all,
for t = 1:T-1
for j = 3:n-2
for i = 3:m-2
q[i,j,t+1]= timestep(q[i,j,t], u[i,j,t])
end
end
end
where q is a quantity and u a velocity field.
I
Post by thr
const n = 500
const m = 500
const T = 500
@everywhere function timestep(x,y)
#return x+y
return x+y +x+y +x+y +x+y +x+y +x+y +x+y
end
function advection_ser(q, u)
println("==============serial=================$n x $m x $T")
for t = 1:T-1
for j = 3:n-2
for i = 3:m-2
q[i,j,t+1]= timestep(q[i,j,t], u[i,j,t])
end
end
end
return q
end
function advection_par(q,u)
println("==============parallel=================$n x $m x $T")
for t = 1:T-1
@sync @parallel for j = 3:n-2
for i = 3:m-2
q[i,j,t+1]= timestep(q[i,j,t], u[i,j,t])
end
end
end
return q
end
q = SharedArray(Float64, (m,n,T), init=false)
u = SharedArray(Float64, (m,n,T), init=false)
@time qs = advection_ser(q,u)
@time qp = advection_par(q,u)
But this yields only a very moderate speed gain: the parallel version
is
Post by thr
about 1/3 faster than the serial version for m,n,T=500,500,500 and -p
4.
Post by thr
Is there a way I can improve on this?
I have also seen some weird behaviour regarding shared arrays and I'd
like
Post by thr
tells me that it's handled as an 'any' and the code is much slower.
However, typeof(q) tells me it's of type SharedArray{Float64,3} as it
should be.
2. I'm pretty sure there's a memory hole associated with SharedArrays,
for
Post by thr
when I start above program over and over eventually I get a bus error
and
Post by thr
julia crashes. Do I have to somehow release the shared memory from the
workers?
Thanks in advance, Johannes
Continue reading on narkive:
Loading...