14

I need to detect strings with the form @base64 (e.g. @VGhpcyBpcyBhbiBlbmNvZGVkIHN0cmluZw==) in my application.
The @ has to be at the beginning and the charset for base64 encoded strings is a-z, A-Z, 0-9, +, / and =.

What would the appropriate regular expression to detect them be?

1

3 Answers 3

16

Something like this should do (does not check for proper length!):

^@[a-zA-Z0-9+/]+={,2}$

The length of any base64 encoded string must be a multiple of 4, hence the additional.

See here for a solution that checks against proper length: RegEx to parse or validate Base64 data

A quick explanation of the regex from the linked answer:

^@ #match "@" at beginning of string
(?:[A-Za-z0-9+/]{4})* #match any number of 4-letter blocks of the base64 char set
(?:
    [A-Za-z0-9+/]{2}== #match 2-letter block of the base64 char set followed by "==", together forming a 4-letter block
| # or
    [A-Za-z0-9+/]{3}= #match 3-letter block of the base64 char set followed by "=", together forming a 4-letter block
)?
$ #match end of string
Sign up to request clarification or add additional context in comments.

9 Comments

Something I forgot to mention is that base64 encoded strings have "=" characters only at the end, and have 2 at most. Is possible to check for this?
^@(?:[A-Za-z0-9+/]{4})*(?:[A-Za-z0-9+/]{2}==|[A-Za-z0-9+/]{3}=)?$ would be correct then?
Yes and no, if you have confidence on the source with anything that starts with @ symbol then yes that should be good enough. Although I'm assuming you are trying to detect it because it might not be a valid source in which case even something like @HeyThisIsMyTweeterHandle might be detected as base64. Those are things you should consider. If you have control of both ends of the communications I would restructure it a bit. It might also help to simply do a - if first char @ then if base64_decode($str, true) !== false then base64_decode. No reg ex required.
Well, if you basically just want to check for character set correctness and some basic prefix/suffix checking, then my short one would suffice. The longer one however also checks against proper length.
That would be nice solution, problem is that I'm trying to extract the base64 from a context (in the middle of a text the user submits, for example). And yes, @HeyThisIsMyTweeterHandle would validate aswell, but that's not a problem for me, as long as it is valid (with proper length aswell) base64
|
4

try with:

^@(?:[A-Za-z0-9+/]{4})*(?:[A-Za-z0-9+/]{2}==|[A-Za-z0-9+/]{3}=)?$

=> RegEx to parse or validate Base64 data

5 Comments

@PierrOz probably extracted from stackoverflow.com/questions/475074/…, but still I'm having a hard time so see what's going on there
@Federico-Quagliotto how about linking to Gumbo's answer instead of blatantly stealing it without giving credit where credit is due?
no steal, simply checked on my archive of useful regex. i use base64 for many things, that's all. i can see that the regex it's pretty the same, sorry for haven't checked on stackoverflow before.
@PierrOz: see my answer for an explaination of the regex.
@FedericoQuagliotto: Sorry about the accusation then. Was the first result to show up and looked like a blatant steal.
1

Here's an alternative regular expression:

^@(?=(.{4})*$)[A-Za-z0-9+/]*={0,2}$

It satisfies the following conditions:

  • The string length after the @ sign must be a multiple of four - (?=^(.{4})*$)
  • The content must be alphanumeric characters or + or / - [A-Za-z0-9+/]*
  • It can have up to two padding (=) characters on the end - ={0,2}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.