1

its my second Day learning and experiment with Julia. Although I read the Documantation concerning Metaprogramming carefully (but maybe not carefully enough) and several simular threads. I still can't figure out how I can use it inside a function. I tryed to make following function for simulation of some data more flexible:

using Distributions
function gendata(N,NLATENT,NITEMS)
  latent = repeat(rand(Normal(6,2),N,NLATENT), inner=(1,NITEMS))
  errors = rand(Normal(0,1),N,NLATENT*NITEMS)
  x = latent+errors
end

By doing this:

using Distributions
function gendata(N,NLATENT,NITEMS,LATENT_DIST="Normal(0,1)",ERRORS_DIST="Normal(0,1)")
  to_eval_latent = parse("latent = repeat(rand($LATENT_DIST,N,NLATENT), inner=(1,NITEMS))")
  eval(to_eval_latent)
  to_eval_errors = parse("error = rand($ERRORS_DIST,N,NLATENT*NITEMS)")
  eval(to_eval_errors)
  x = latent+errors
end

But since eval don't work on the local scope it dont work. What can I do to work arround this?

Also the originally function, don't seem to be that fast, did I make any major mistakes concerning perfomance?

I really appriciate any recommandation. Thanks in advance.

2
  • 1
    What is wrong with just passing the distributions as arguments? Using eval seems overly complicated. Commented Mar 9, 2017 at 10:40
  • @phg It does not seem overly complicated, it is. The code initially came from an R Script, where this is more or less necessary. Do you have any Idea why repeat is so slow? Its a BLAS so it should be fast as lightning, it the slowest part of the function. Commented Mar 9, 2017 at 13:10

2 Answers 2

5

There is no need to use eval there, you can retain the same flexibility by passing the distribution types as keyword args (or named args with default values). Parsing and eval'ing "stringly-typed" arguments will often defeat optimizations and should be avoided.

function gendata(N,NLATENT,NITEMS;  LATENT_DIST=Normal(0,1),ERRORS_DIST=Normal(0,1))
         latent = repeat(rand(LATENT_DIST,N,NLATENT), inner=(1,NITEMS))
         errors = rand(ERRORS_DIST,N,NLATENT*NITEMS)
         x = latent+errors
end


julia> gendata(10,2,3, LATENT_DIST=Pareto(.3))
...


julia> gendata(10,2,3, ERRORS_DIST=Gamma(.6))
...

etc.

Sign up to request clarification or add additional context in comments.

Comments

0

You're not really supposed to use eval here (slower, won't produce type information, will interfere with compilation, etc) but in case you're trying to understand what went wrong, here's how you would do it:

Either separate it from the rest of the code:

function gendata(N,NLATENT,NITEMS,LDIST_EX="Normal(0,1)",EDIST_EX="Normal(0,1)")

  # Eval your expressions separately

  LATENT_DIST = eval(parse(LDIST_EX))
  ERRORS_DIST = eval(parse(EDIST_EX))

  # Do your thing

  latent = repeat(rand(LATENT_DIST,N,NLATENT), inner=(1,NITEMS))
  errors = rand(ERROR_DIST,N,NLATENT*NITEMS)
  x = latent+errors      
end

Or use interpolation with quoted expressions:

function gendata(N,NLATENT,NITEMS,LDIST_EX="Normal(0,1)",EDIST_EX="Normal(0,1)")

  # Obtain expression objects

  LATENT_DIST = parse(LDIST_EX)
  ERRORS_DIST = parse(EDIST_EX)

  # Eval but interpolate in everything that's local to the function
  # And you can't introduce local variables with eval so keep them
  # out of it.

  latent = eval( :(repeat(rand($LATENT_DIST,$N,$NLATENT), inner=(1,$NITEMS))) )
  errors = eval( :(rand($ERRORS_DIST, $N, $NLATENT*$NITEMS)) )
  x = latent+errors
end

You can also use a single eval with a let block to introduce a self-contained scope:

function gendata(N,NLATENT,NITEMS,LDIST_EX="Normal(0,1)",EDIST_EX="Normal(0,1)")

  LATENT_DIST = parse(LDIST_EX)
  ERRORS_DIST = parse(EDIST_EX)
  x = 
  @eval let
    latent = repeat(rand($LATENT_DIST,$N,$NLATENT), inner=(1,$NITEMS))
    errors = (rand($ERRORS_DIST, $N, $NLATENT*$NITEMS))
    latent+errors
  end
end

((@eval x) == eval(:(x)))

Well, hope you understand the eval thing a little better. Day two I mean, you should be experimenting ;)

5 Comments

Thank you very much. Very helpful to understand "eval" better. But I go with the solutions from the previus answer. Here is my solution with a bit speed optimization: using Distributions function gendata(N,NLATENT,NITEMS;LATENT_DIST=Normal(0,1),ERRORS_DIST=Normal(0,1)) latent = repmat(rand(LATENT_DIST,N,NLATENT), 1,NITEMS) for i in eachindex(latent) latent[i] = latent[i] + rand(ERRORS_DIST) end latent end
But there is something I really don't get, while trying to optimize the performance it turns out the function is ten times faster with repmat instead of repeat, how comes that? Both are BLAS and come from the Linear Algebra Section from the Docs.
Thank you for accepting, but if the other solution is better you should accept that instead, this was more meant to be complementary. (while I am the one addressing the question directly, I think presenting an alternative is better in this case)
I think there was a talk a while ago about repeat being slow, and there is a Issue, and a recent PR addressing it (22 days ago). Try again on latest master and see if it's any different, otherwise you should probably ask another question about it.
See issue #20495 and the links referenced in it.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.