0

I have a payload string, which I want to convert it into character array and then remove any non-ASCII characters from it. Here is my code:

bool invalidChar (char c) 
{  
    return !(c>=0 && c <256);   
} 
void stripUnicode(string &str) 
{ 
    str.erase(remove_if(str.begin(),str.end(), invalidChar), str.end());  
}

Payload_input is a string consisting of ascii and non-ascii characters:

 stripUnicode(Payload_input) ;

     char input[Payload_input.length()];
    strcpy(input,Payload_input.c_str());



    char chunk1[Payload_input.length()];
    int counter1=0;

for(counter1=0; counter1< size; counter1++)
{
        chunk1[counter1]=input[counter1];
}

Now, here is my string payload which I want to convert into char array:

--90B452BFFF3F395ABDC878D8BEDBD152
Content-Disposition: form-data; name="uploaddir"

language/2BB5B9330E/C/
--90B452BFFF3F395ABDC878D8BEDBD152
Content-Disposition: form-data; name="filename"; filename="lottery[1]20110727082525.jpg"
Content-Type: text/plain
Content-Transfer-Encoding: binary

JFIFddDucky<http://ns.adobe.com/xap/1.0/<?xpacket begin="" id="W5M0MpCehiHzreSzNTczkc9d"?>

In the above string, the few characters after Content-Transfer-Encoding: binary appears in blocks (inside bloacks it is written 0001 etc) on linux terminal.

When I try to print the characters (cout << chunk1[counter1]) after stripping non-ASCII chararcters from the string then even some ASCII characters get omitted after line Content-Transfer-Encoding: binary.

Please point it out if there is something wrong with my code?

2
  • Just a smallthought: ASCII uses 7 bits. Maybe your char should be between 0 <= c < 128. Commented Mar 3, 2014 at 9:21
  • I want to cater all the extended ASCII characters. Commented Mar 3, 2014 at 9:23

1 Answer 1

2

The problem is that on Linux char is always in the range -128-127, so your invalidChar function will return true for all the chars not strictly ASCII. If you want to check for extended ASCII (0-255) then your function is useless: every char value is in the extended ASCII set; however, since char is signed you need to check for negative values.

Sign up to request clarification or add additional context in comments.

11 Comments

char is not always signed. It is platform dependent.
right, but he talked about Linux. I'll edit my answer
Then do I need to change the range of ASCII character checking from -127 to 0 and from 0 to 127 ??
A string is an array of chars. A char can have a value (on Linux) from -128 to 127. It is essentially a byte, if you don't care about the sign. The extended ASCII set is the standard ASCII (covering characters with values from 0 to 127) plus other characters in the range 128-255 (that if you take as signed char, is -128 to -1). So, your Payload_input already contains chars in the extended ASCII set, there is no way to filter it because by definition char (under most OS) covers all the extended ASCII set, nothing more and nothing less.
no: standard ASCII is 0-127, extended is 0-255. char (in most OS) is a signed covering the range -128-+127. if you convert a signed char to an unsigned char, you get the range 0-255, that is the same of extended ASCII. So, positive values of char are in the range of standard ASCII, positive and negative will cover the whole extended ASCII
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.