I'm using the window.atob('string') function to decode a string from base64 to a string. Now I wonder, is there any way to check that 'string' is actually valid base64? I would like to be notified if the string is not base64 so I can perform a different action.
-
Is your question how to determine whether a string it is valid base64 – or whether you're looking at a string of base64 that has information encoded in it? That is a subtle difference – for the former there are a few excellent answers below, for the latter, there is no deterministic answer (it's like asking if a sound is music or language). I'd therefore suggest to replace "in" with "valid" in your question title.Philzen– Philzen2022-07-09 15:18:03 +00:00Commented Jul 9, 2022 at 15:18
14 Answers
Building on @anders-marzi-tornblad's answer, using the regex to make a simple true/false test for base64 validity is as easy as follows:
var base64regex = /^([0-9a-zA-Z+/]{4})*(([0-9a-zA-Z+/]{2}==)|([0-9a-zA-Z+/]{3}=))?$/;
base64regex.test("SomeStringObviouslyNotBase64Encoded..."); // FALSE
base64regex.test("U29tZVN0cmluZ09idmlvdXNseU5vdEJhc2U2NEVuY29kZWQ="); // TRUE
Update 2021
- Following the comments below it transpires this regex-based solution provides a more accurate check than simply
try`ingatobbecause the latter doesn't check for=-padding. According to RFC4648=-padding may only be ignored for base16-encoding or if the data length is known implicitely. - Regex-based solution also seems to be the fastest as hinted by kai. As jsperf seems flaky atm i made a new test on jsbench which confirms this.
17 Comments
window.btoa("\u009a)Ý\u0099ªl")atob is not a good options for testing if a string is base64-encoded, as it's too lax. It allows base64-encoded strings without the required = or == padding. Base64-encoded strings are supposed to have lengths on multiples of 4.base64regex.test('fuel') === true as fuel is a perfectly valid encoded base64 string that decodes to ~ç¥. I suspect you're looking for a function to tell if you're looking at an encoded or vanilla payload … but such a (non-AI based and fully deterministic) function does not exist, though i'd be delighted to be proven wrong. Still that'd be outside the scope of this thread, as the OP clearly asked for a method to tell if a string contains valid base64, which fuel according to RFC4648 is.If you want to check whether it can be decoded or not, you can simply try decoding it and see whether it failed:
try {
window.atob(str);
} catch(e) {
// something failed
// if you want to be specific and only catch the error which means
// the base 64 was invalid, then check for 'e.code === 5'.
// (because 'DOMException.INVALID_CHARACTER_ERR === 5')
}
7 Comments
This should do the trick.
function isBase64(str) {
if (str ==='' || str.trim() ===''){ return false; }
try {
return btoa(atob(str)) == str;
} catch (err) {
return false;
}
}
7 Comments
test isn't valid base64?If "valid" means "only has base64 chars in it" then check against /[A-Za-z0-9+/=]/.
If "valid" means a "legal" base64-encoded string then you should check for the = at the end.
If "valid" means it's something reasonable after decoding then it requires domain knowledge.
5 Comments
+ and / and possibly = at the end.= padding is not there always.I would use a regular expression for that. Try this one:
/^([0-9a-zA-Z+/]{4})*(([0-9a-zA-Z+/]{2}==)|([0-9a-zA-Z+/]{3}=))?$/
Explanation:
^ # Start of input
([0-9a-zA-Z+/]{4})* # Groups of 4 valid characters decode
# to 24 bits of data for each group
( # Either ending with:
([0-9a-zA-Z+/]{2}==) # two valid characters followed by ==
| # , or
([0-9a-zA-Z+/]{3}=) # three valid characters followed by =
)? # , or nothing
$ # End of input
3 Comments
SomeStringObviouslyNotBase64Encoded tests FALSE, although it's valid base64: atob("SomeStringObviouslyNotBase64Encoded") returns JJÚâ¾*.²\¢ÐZ±î¸w(uç. Is it possible to improve this regex so that it is 100% accurate?window.atob that accepts strings that are not completely correct. Your example has exactly 35 characters, and should be padded with exactly one =. Quote from Wikipedia? "...when the length of the unencoded input is not a multiple of three, the encoded output must have padding added so that its length is a multiple of four."This method attempts to decode then encode and compare to the original. Could also be combined with the other answers for environments that throw on parsing errors. Its also possible to have a string that looks like valid base64 from a regex point of view but is not actual base64.
if(btoa(atob(str))==str){
//...
}
2 Comments
str is not valid base64, atob(str) will throw an uncaught error. A try..catch solution would seem better.This is how it's done in one of my favorite validation libs:
const notBase64 = /[^A-Z0-9+\/=]/i;
export default function isBase64(str) {
assertString(str); // remove this line and make sure you pass in a string
const len = str.length;
if (!len || len % 4 !== 0 || notBase64.test(str)) {
return false;
}
const firstPaddingChar = str.indexOf('=');
return firstPaddingChar === -1 ||
firstPaddingChar === len - 1 ||
(firstPaddingChar === len - 2 && str[len - 1] === '=');
}
https://github.com/chriso/validator.js/blob/master/src/lib/isBase64.js
4 Comments
=.For me, a string is likely an encoded base64 if:
- its length is divisible by 4
- uses
A-Za-z0-9+/= - only uses
=in the end (0-2 chars)
so the code would be
function isBase64(str)
{
return str.length % 4 == 0 && /^[A-Za-z0-9+/]+[=]{0,2}$/.test(str);
}
3 Comments
isBase64("SomeStringObviouslyNotBase64Encoded") returns FALSE although it's valid base64atob and btoa and as well the recommended Buffer.from("...", "base64") do not require padding with = as far as I know. I have seen many projects in which the padding = chars are removed for various reasons that are beyond me, and such strings nevertheless can be base64 decoded in JS without throwing an error. Your answer is to the point of the question, just leaving this here for any user that wants to check if a string can be decoded instead of checking whether it matches the actual RFC definitionImplementation in nodejs (validates not just allowed chars but base64 string at all)
const validateBase64 = function(encoded1) {
var decoded1 = Buffer.from(encoded1, 'base64').toString('utf8');
var encoded2 = Buffer.from(decoded1, 'binary').toString('base64');
return encoded1 == encoded2;
}
1 Comment
I have tried the below answers but there are some issues.
var base64regex = /^([0-9a-zA-Z+/]{4})*(([0-9a-zA-Z+/]{2}==)|([0-9a-zA-Z+/]{3}=))?$/;
base64regex.test(value)
when using this it will be true with "BBBBB" capital letters. and also it will be true with "4444".
I added some code to work correctly for me.
function (value) {
var base64regex = /^([0-9a-zA-Z+/]{4})*(([0-9a-zA-Z+/]{2}==)|([0-9a-zA-Z+/]{3}=))?$/;
if (base64regex.test(value) && isNaN(value) && !/^[a-zA-Z]+$/.test(value)) {
return decodeURIComponent(escape(window.atob(value)));
}
Comments
I know its late, but I tried to make it simple here;
function isBase64(encodedString) {
var regexBase64 = /^([0-9a-zA-Z+/]{4})*(([0-9a-zA-Z+/]{2}==)|([0-9a-zA-Z+/]{3}=))?$/;
return regexBase64.test(encodedString); // return TRUE if its base64 string.
}
7 Comments
isBase64("SomeStringObviouslyNotBase64Encoded") returns FALSE although it's valid base64atob specifically. Buffer in Node.js also behaves similarly.atob("SomeStringObviouslyNotBase64Encoded") in the browser console. The result is JJÚâ¾*.²\¢ÐZ±î¸w(uç (SO trims the spaces)The specified value is a valid Base64 string. To decode it, use the Base64 decoder.Throwing my results into the fray here. In my case, there was a string that was not base64 but was valid base64 so it was getting decoded into gibberish. (i.e. yyyyyyyy is valid base64 according to the usual regex)
My testing resulted in checking first if the string was a valid base64 string using the regex others shared here and then decrypting it and testing if it was a valid ascii string since (in my case) I should only get ascii characters back. (This can probably be extended to include other characters that may not fall into ascii ranges.)
This is a bit of a mix of multiple answers.
let base64regex = /^([0-9a-zA-Z+/]{4})*(([0-9a-zA-Z+/]{2}==)|([0-9a-zA-Z+/]{3}=))?$/;
function isBase64(str) {
if (str ==='' || str.trim() ===''){ return false; }
try {
if (base64regex.test(str)) {
return /^[\x00-\x7F]*$/.test(atob(str));
} else {
return false
}
} catch (err) {
// catch
}
}
As always with my JavaScript answers, I have no idea what I am doing. So there might be a better way to write this out. But it works for my needs and covers the case when you have a string that isn't supposed to be base64 but is valid and still decrypts as base64.
Comments
Try the code below, where str is the string you want to check.
Buffer.from(str, 'base64').toString('base64') === str
1 Comment
All Answer Are Wrong when you test them with word like "demo"
function isBase64(str) {
try {
return btoa(atob(str)) === str;
} catch (e) {
return false;
}
}
console.log(isBase64("demo"))
so i asked Copilot And this is THE ANSWER :
function mightBeBase64(str) {
// Base64 strings are usually a multiple of 4 in length
if (str.length % 4 !== 0) {
return false;
}
// Check for base64 character set
if (!/^[A-Za-z0-9+/]+={0,2}$/.test(str)) {
return false;
}
// Attempt to decode and check if the result is a valid string
try {
const decoded = atob(str);
// Check if the decoded string contains only printable characters
if (/^[\x20-\x7E]*$/.test(decoded)) {
return true;
}
} catch (e) {
return false;
}
return false;
}
// Example usage:
console.log(mightBeBase64("demo")); // Should return false
console.log(mightBeBase64("SGVsbG8sIENvcGlsb3Qh")); // Should return true