3

I have a string input that represents a formula like:

BMI = ( Weight / ( Height  * Height ) ) * 703

I want to be able to extract all legal variables into a String[]

Legal variables are determined with almost the same rules as Java variable naming convention, except only alphanumeric characters are allowed:

  • Any alphabet character upper or lower, may be followed by a digit
  • Any word/text
  • Any word/text followed by a digit

Therefore I expect the output to look like this:

BMI
Weight
Height

This is my current attempt:

/* helper method , find all variables in expression,
 * Variables are defined a alphabetical characters a to z, or any word , variables cannot have numbers at the beginning
 * using regex pattern "[A-Za-z0-9\\s]"
 */
public static List<String> variablesArray (String expression)
{
    List<String> varList = null; 
    StringBuilder sb = null; 
    if (expression!=null)
    {
        sb = new StringBuilder(); 

        //list that will contain encountered words,numbers, and white space
        varList = new ArrayList<String>();

        Pattern p = Pattern.compile("[A-Za-z0-9\\s]");
        Matcher m = p.matcher(expression);

        //while matches are found 
        while (m.find())
        {
            //add words/variables found in the expression 
            sb.append(m.group());
        }//end while 

        //split the expression based on white space 
        String [] splitExpression = sb.toString().split("\\s");
        for (int i=0; i<splitExpression.length; i++)
        {
            varList.add(splitExpression[i]);
        }
    }
    return varList; 
}

The result is not as I expected. I got extra empty lines, got "Height" twice, and shouldn't have gotten a number:

BMI


Weight


Height


Height



703
4
  • 1
    Ok, and what is your question? Commented Jun 26, 2012 at 0:12
  • I want a regex expression that given String which represents mathematical formula , extracts all variables once Commented Jun 26, 2012 at 0:28
  • 1
    Why a regex? You're using the wrong tools for the job. For a mathematical expression you should be looking at a scanner/parser combination. Commented Jun 26, 2012 at 1:19
  • Im not using the wrong tools, you are not aware of the full scope of my project , and the question is very clear if you read the bolded criteria above!!! Commented Jun 26, 2012 at 16:11

3 Answers 3

4

I'm not sure why you would make a string and split it to convert to an array. In addition to its inefficiency, the method won't work unless every ID occurrence is followed by space.

Here's a more straightforward code that allows repeats in the output. To get rid of repeats, just replace List and ArrayList with Set and HashSet:

public class Test {

    public static List<String> variablesArray(String expression) {
        if (expression != null) {
            ArrayList<String> vars = new ArrayList<String>();
            Pattern p = Pattern.compile("[a-z][a-z0-9]*", Pattern.CASE_INSENSITIVE);
            Matcher m = p.matcher(expression);
            while (m.find()) {
                vars.add(m.group());
            }
            return vars;
        }
        return null;
    }

    public static void main(String[] args) {
        List<String> vars = variablesArray("BMI=(Weight/(Height*Height)) * 70");
        for (String var : vars) {
            System.out.println(var);
        }
    }
}

If you actually want a String [] as the return value rather than the ArrayList<String>, then do the conversion as you're returning.

return vars.toArray(new String [vars.size()]);

Finally, I wonder what you are trying to accomplish. Having a list of identifiers in an expression doesn't seem very useful. If, for example, you are trying to evaluate the expression, this list of ids is not going to be what you need.

Sign up to request clarification or add additional context in comments.

1 Comment

Thank you the regex worked, thanks for taking the time & actually reading my question, and not voting it down like user EJB
0

Using:

import java.util.regex.Matcher;
import java.util.regex.Pattern;

class Main
{
  public static void main (String[] args) throws java.lang.Exception
  {
     String formula = "BMI = ( Weight / ( Height * Height ) ) * 703";
     String pattern = "(?:^|(?<=[=+\\-*/()]))\\s*([a-z]+)\\s*(?:$|(?=[=+\\-*/()]))";
     Pattern p = Pattern.compile(pattern, Pattern.CASE_INSENSITIVE);
     Matcher m = p.matcher(formula);
     while(m.find()) {
       System.out.println(m.group(1));
     }
  }
}

you will get:

BMI
Weight
Height
Height

So all you need to do after that is just remove duplicates, which is a simple task.


See and test the code here.

8 Comments

when i put your regex in Pattern p = Pattern.compile("(?:^|(?<=[=+\-*\/()]))\s*([a-z]+)\s*(?:$|(?=[=+\-*\/()]))"); I get debug time error "Invalid escape sequence (valid ones are ....")
look at it here at StackOverFlow , Thanks
@AryanNaim - Answer is updated with Java code
You don't need all those backslashes. All characters in a character class lose their special meanings, except the initial ^ and the hyphen (when hyphen not first or last).
@Bohemian - Thanks for lesson, I have updated the code, but as you can see, one of them (\\-) still has to be escaped, which makes sense :)
|
0

This simple regular expression should match all the variables for you:

"[A-Za-z_][A-Za-z0-9_]*"

I took the liberty to include _ in the name, but you can remove it if you really don't want:

"[A-Za-z][A-Za-z0-9]*"

It is impossible to match the variables uniquely, but you can insert the matches into Set to remove the duplicated entries.

4 Comments

Thank you the regex worked, thanks for taking the time & actually reading my question, and not voting it down like user EJB
@AryanNaim: I suggest that you don't bash other user in the comment. Please edit it out.
Just for the sake of learning, I'll point out these expressions will not match variable names that are a single letter.
@Gene: Why did I forgot that case? Thanks for the comment.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.