0

I am making a bot and I want the bot to keep track of group expenses. I need to be able to tell the bot about the amount that was paid and the participants (i.e. the list of people the user who's typing the input paid for). Participant must be sequences of exactly two letters taken from the uppercase latin alphabet (no digits or other symbols are allowed).

EXAMPLE: Say that I go out for lunch with my friends FH, GT, YU, WQ and CS. In order to tell the bot about our lunch together I will type the total amount that was paid, followed by '|', followed by the relevant people who took part in the event other than me (so FH, GT, YU, WQ, CS). If I want (this is not required though) I can also put a space after the list of names and write the name of the event: if present, the name of the event must always be enclosed in double quotes (").

For example, this is a valid input:

 65|FH,GT,YU,WQ,CS "lunch out"

So the format is: number, |, names (separated by commas), space, name of the event. (The last two being optional).

The number must always be positive (for obvious reasons) and it can either be an integer (e.g. 65) or a decimal (e.g. 65.7, 65.32 etc). If the number is a decimal number, it may have at most 2 digits after the decimal point.

All of these are also valid inputs:

65|FH,GT,YU,WQ,CS 
34.56|FH,GT "club night"
120.7|FH,GT,KM,LW,AS,XZ,PO "cinema tickets"

The same participant can't be mentioned more than once, so the following input is not valid.

65|FH,GT,YU,WQ,CS,GT

In short: the command should start with an amount followed by the separator |, followed by the list of people the user paid for. It's optional to insert a message that describes the expense.

There are infinitely many inputs that are valid. They would all be different but they all would follow the above rules (no participant is mentioned twice, each participant is separated by a comma, the amount is either an integer or a decimal with at most 2 digits after the decimal point, etc..).

However, I cannot seem to "capture" what they all share in common (their "format" that follows the rules I stated) so that the bot can distinguish valid inputs from invalid ones. I was thinking of using a regular expression. I am not very familiar with regular expressions but it seems to me that a regex could not capture all the possible forms the input can have (for example, the number of names, the number of decimal digits in the amount, the optional name for the event and so on)

How should I proceed?

3
  • This is the closest I could get: ^[\d.]+\|([A-Z]{2},)*([A-Z]{2})(?: "[\s\w]+")?$. This won't handle duplicate character groups but the RegEx gurus will probably also know of a way how to handle this. Here's a demo of the above query. Commented Jun 3, 2019 at 13:58
  • Thank you so much @SaschaM78 ! This is great. I have noticed that it doesn't rule out numbers that have more than 2 digits after the decimal point (e.g. 54.678) and that it doesn't rule out things like "50." or "32." (that is, numbers with a decimal point but no decimal). Do you have any idea of how your regex could be edited to rule out those instances ? Commented Jun 3, 2019 at 14:16
  • Yes, you are right, I missed that part. The updated Regex would be: ^\d+(?:\.\d{1,2}){0,1}\|([A-Z]{2},)*([A-Z]{2})(?: "[\s\w]+")?$ (also updated in the demo). Commented Jun 3, 2019 at 14:22

2 Answers 2

1

It maybe possible to handle the duplicate with regex but to make it easier I will use split and loop instead

var txt = `65|FH,GT,YU,WQ,CS
34.56|FH,GT "club night"
65|FH,GT,YU,WQ,CS,GT "this is not valid"
65|AH,GT,YU,AH
120.7|FH,GT,KM,LW,AS,XZ,PO "cinema tickets"`

var lines = {
  valid: [],
  notValid: []
};

txt.split("\n").forEach(line => {
  var isValid = true,
    persons = [],
    l = line.trim().replace(/.*\|([\w,]+)(\s".*)?/, "$1")

  l.split(/[,\s]/).forEach(p => {
    if (persons.includes(p))
      isValid = false;
    persons.push(p)
  })

  if (isValid)
    lines.valid.push(line)
  else
    lines.notValid.push(line)
})

console.log(lines)

Sign up to request clarification or add additional context in comments.

Comments

0

This expression is quite interesting. We would be approaching this problem with patterns that we are having, which include, numbers followed by a pipe:

(\d+(\.\d+)?)\|

the undesired repeating two letters:

(([A-Z]{2}),?).*?(\1)

the desired repeating two letters:

(([A-Z]{2}),?)

the optional words in the quotes:

\s+"[\w\s]+"

and we can use alteration:

(\d+(\.\d+)?)\||(([A-Z]{2}),?).*?(\1)|(([A-Z]{2}),?)|\s+"[\w\s]+"

whenever the second part is not undefined, then that string is invalid, otherwise valid, and we would be scripting the rest of problem.

Demo

Test

const regex = /(\d+(\.\d+)?)\||(([A-Z]{2}),?).*?(\1)|(([A-Z]{2}),?)|\s+"[\w\s]+"/gm;
const str = `65|FH,GT,YU,WQ,CS 
34.56|FH,GT "club night"
120.7|FH,GT,KM,LW,AS,XZ,PO "cinema tickets"

65|FH,GT,YU,WQ,CS,GT`;
let m;

while ((m = regex.exec(str)) !== null) {
    // This is necessary to avoid infinite loops with zero-width matches
    if (m.index === regex.lastIndex) {
        regex.lastIndex++;
    }
    
    // The result can be accessed through the `m`-variable.
    m.forEach((match, groupIndex) => {
        console.log(`Found match, group ${groupIndex}: ${match}`);
    });
}

RegEx Circuit

jex.im visualizes regular expressions:

enter image description here

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.