thr
2015-07-31 00:51:24 UTC
Hi all,
I'm implementing a basic explicit advection algorithm of the form:
for t = 1:T-1
for j = 3:n-2
for i = 3:m-2
q[i,j,t+1]= timestep(q[i,j,t], u[i,j,t])
end
end
end
where q is a quantity and u a velocity field.
I'd like to parallelize this by using sharded arrays and @parallel for, I
tried the following:
const n = 500
const m = 500
const T = 500
@everywhere function timestep(x,y)
#return x+y
return x+y +x+y +x+y +x+y +x+y +x+y +x+y
end
function advection_ser(q, u)
println("==============serial=================$n x $m x $T")
for t = 1:T-1
for j = 3:n-2
for i = 3:m-2
q[i,j,t+1]= timestep(q[i,j,t], u[i,j,t])
end
end
end
return q
end
function advection_par(q,u)
println("==============parallel=================$n x $m x $T")
for t = 1:T-1
@sync @parallel for j = 3:n-2
for i = 3:m-2
q[i,j,t+1]= timestep(q[i,j,t], u[i,j,t])
end
end
end
return q
end
q = SharedArray(Float64, (m,n,T), init=false)
u = SharedArray(Float64, (m,n,T), init=false)
@time qs = advection_ser(q,u)
@time qp = advection_par(q,u)
But this yields only a very moderate speed gain: the parallel version is
about 1/3 faster than the serial version for m,n,T=500,500,500 and -p 4.
Is there a way I can improve on this?
I have also seen some weird behaviour regarding shared arrays and I'd like
to verify that I'm not just doing it wrong before opening issues:
1. When I construct q inside of the advection function, @code_warntype
tells me that it's handled as an 'any' and the code is much slower.
However, typeof(q) tells me it's of type SharedArray{Float64,3} as it
should be.
2. I'm pretty sure there's a memory hole associated with SharedArrays, for
when I start above program over and over eventually I get a bus error and
julia crashes. Do I have to somehow release the shared memory from the
workers?
Thanks in advance, Johannes
I'm implementing a basic explicit advection algorithm of the form:
for t = 1:T-1
for j = 3:n-2
for i = 3:m-2
q[i,j,t+1]= timestep(q[i,j,t], u[i,j,t])
end
end
end
where q is a quantity and u a velocity field.
I'd like to parallelize this by using sharded arrays and @parallel for, I
tried the following:
const n = 500
const m = 500
const T = 500
@everywhere function timestep(x,y)
#return x+y
return x+y +x+y +x+y +x+y +x+y +x+y +x+y
end
function advection_ser(q, u)
println("==============serial=================$n x $m x $T")
for t = 1:T-1
for j = 3:n-2
for i = 3:m-2
q[i,j,t+1]= timestep(q[i,j,t], u[i,j,t])
end
end
end
return q
end
function advection_par(q,u)
println("==============parallel=================$n x $m x $T")
for t = 1:T-1
@sync @parallel for j = 3:n-2
for i = 3:m-2
q[i,j,t+1]= timestep(q[i,j,t], u[i,j,t])
end
end
end
return q
end
q = SharedArray(Float64, (m,n,T), init=false)
u = SharedArray(Float64, (m,n,T), init=false)
@time qs = advection_ser(q,u)
@time qp = advection_par(q,u)
But this yields only a very moderate speed gain: the parallel version is
about 1/3 faster than the serial version for m,n,T=500,500,500 and -p 4.
Is there a way I can improve on this?
I have also seen some weird behaviour regarding shared arrays and I'd like
to verify that I'm not just doing it wrong before opening issues:
1. When I construct q inside of the advection function, @code_warntype
tells me that it's handled as an 'any' and the code is much slower.
However, typeof(q) tells me it's of type SharedArray{Float64,3} as it
should be.
2. I'm pretty sure there's a memory hole associated with SharedArrays, for
when I start above program over and over eventually I get a bus error and
julia crashes. Do I have to somehow release the shared memory from the
workers?
Thanks in advance, Johannes