C++ : String to Character Array conversion (non-Ascii characters removed)

Question

I have a payload string, which I want to convert it into character array and then remove any non-ASCII characters from it. Here is my code:

bool invalidChar (char c) 
{  
    return !(c>=0 && c <256);   
} 
void stripUnicode(string &str) 
{ 
    str.erase(remove_if(str.begin(),str.end(), invalidChar), str.end());  
}

Payload_input is a string consisting of ascii and non-ascii characters:

 stripUnicode(Payload_input) ;

     char input[Payload_input.length()];
    strcpy(input,Payload_input.c_str());



    char chunk1[Payload_input.length()];
    int counter1=0;

for(counter1=0; counter1< size; counter1++)
{
        chunk1[counter1]=input[counter1];
}

Now, here is my string payload which I want to convert into char array:

--90B452BFFF3F395ABDC878D8BEDBD152
Content-Disposition: form-data; name="uploaddir"

language/2BB5B9330E/C/
--90B452BFFF3F395ABDC878D8BEDBD152
Content-Disposition: form-data; name="filename"; filename="lottery[1]20110727082525.jpg"
Content-Type: text/plain
Content-Transfer-Encoding: binary

JFIFddDucky<http://ns.adobe.com/xap/1.0/<?xpacket begin="" id="W5M0MpCehiHzreSzNTczkc9d"?>

In the above string, the few characters after Content-Transfer-Encoding: binary appears in blocks (inside bloacks it is written 0001 etc) on linux terminal.

When I try to print the characters (cout << chunk1[counter1]) after stripping non-ASCII chararcters from the string then even some ASCII characters get omitted after line Content-Transfer-Encoding: binary.

Please point it out if there is something wrong with my code?

Just a smallthought: ASCII uses 7 bits. Maybe your char should be between 0 <= c < 128. — tgmath
– tgmath, Commented Mar 3, 2014 at 9:21

Loghorn · Accepted Answer · 2014-03-03 09:26:46Z

2

The problem is that on Linux char is always in the range -128-127, so your invalidChar function will return true for all the chars not strictly ASCII. If you want to check for extended ASCII (0-255) then your function is useless: every char value is in the extended ASCII set; however, since char is signed you need to check for negative values.

answered Mar 3, 2014 at 9:26

Loghorn

2,8071 gold badge20 silver badges22 bronze badges

Sign up to request clarification or add additional context in comments.

11 Comments

Baruch Over a year ago

char is not always signed. It is platform dependent.

Loghorn Over a year ago

right, but he talked about Linux. I'll edit my answer

Xara Over a year ago

Then do I need to change the range of ASCII character checking from -127 to 0 and from 0 to 127 ??

Loghorn Over a year ago

A string is an array of chars. A char can have a value (on Linux) from -128 to 127. It is essentially a byte, if you don't care about the sign. The extended ASCII set is the standard ASCII (covering characters with values from 0 to 127) plus other characters in the range 128-255 (that if you take as signed char, is -128 to -1). So, your Payload_input already contains chars in the extended ASCII set, there is no way to filter it because by definition char (under most OS) covers all the extended ASCII set, nothing more and nothing less.

Loghorn Over a year ago

no: standard ASCII is 0-127, extended is 0-255. char (in most OS) is a signed covering the range -128-+127. if you convert a signed char to an unsigned char, you get the range 0-255, that is the same of extended ASCII. So, positive values of char are in the range of standard ASCII, positive and negative will cover the whole extended ASCII

|

Collectives™ on Stack Overflow

C++ : String to Character Array conversion (non-Ascii characters removed)

1 Answer 1

11 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

11 Comments

Your Answer

Sign up or log in

Post as a guest

Related