1

I have a vector with > 30000 words. I want to create a subset of this vector which contains only those words whose length is greater than 5. What is the best way to achieve this?

Basically df contains mutiple sentences.

So,

wordlist = df2;
wordlist = [strip(wordlist[i]) for i in [1:length(wordlist)]];

Now, I need to subset wordlist so that it contains only those words whose length is greater than 5.

1
  • Please edit your question and add some example code and what you've tried so far. Commented Sep 29, 2015 at 8:53

2 Answers 2

1
 sub(A,find(x->length(x)>5,A)) # => creates a view (most efficient way to make a subset)

EDIT: getindex() returns a copy of desired elements

getindex(A,find(x->length(x)>5,A)) # => makes a copy 
Sign up to request clarification or add additional context in comments.

3 Comments

It is returning an error, sub has no method matching sub(::Array{Any,1},::Array{Int64,1}) I used the following syntax, wordlist2 = sub(wordlist,find(x -> length(x)>4,wordlist));
Got it! Used getindex instead of sub.
getindex() give you a copy, but sub() creates a view, both have the same syntax. above code works for me (VERSION # => v"0.4.0-rc2")
1

You can use filter

wordlist = filter(x->islenatleast(x,6),wordlist)

and combine it with a fast condition such as islenatleast defined as:

function islenatleast(s,l)
    if sizeof(s)<l return false end
    # assumes each char takes at least a byte
    l==0 && return true
    p=1
    i=0
    while i<l
        if p>sizeof(s) return false end
        p = nextind(s,p)
        i += 1
    end
    return true
end

According to my timings islenatleast is faster than calculating the whole length (in some conditions). Additionally, this shows the strength of Julia, by defining a primitive competitive with the core function length.

But doing:

wordlist = filter(x->length(x)>5,wordlist)

will also do.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.