Unicode characters returning from R.NET

Question

I'm returning a Character vector from a function in R to C# using R.NET. The only problem is that unicode characters, such as Greek Letters are being lost. The following line gives an example of the code I'm using:

CharacterVector cvAll = results[5].AsList().AsCharacter();

Where results is a list of results returned by the R function. The characters are also written by R to a text file and they display fine in notepad and other editors. Can I get R.Net to return the characters correctly?

Yass T · Accepted Answer · 2018-12-24 16:18:11Z

0

Looks like you ran into an open issue with RDotNet : https://github.com/jmp75/rdotnet/issues/25

Unicode characters don't seem to be supported yet. I ran into the same issue while calling the engine.CreateDataFrame() method. It did return a DataFrame with all my accentuated strings wrong.

There seems to be a workaround though : when calling RDotNet functions, if I give strings encoded in my computer default encoding (Windows ANSI) and converted from UTF-8 (important), R takes them and gives back correctly interpreted accentuated strings to C#. I don't exactly know why it is working though... It might have something to do with the default encoding used with .Net for string being UTF-16. (cf. here : http://csharpindepth.com/Articles/General/Strings.aspx), hence the conversion from UTF-8 to default ANSI that seems to be working.

Here is an ugly example : when I'm building a RDotNet DataFrame, I convert all strings in a CharacterVector to ANSI (from UTF-8) encoded ones :

try 
{
    string[] colAsStrings = null;
    colAsStrings = Array.ConvertAll<object, string>(uneColonne, s => StringEncodingHelper.EncodeToDefaultFromUTF8((string)s));
    correctedDataArray[i] = colAsStrings;
    columnConverted = true;
}

Here is the static method used for conversion :

public static string EncodeToDefaultFromUTF8(string stringToEncode)
{
    byte[] utf8EncodedBytes = Encoding.UTF8.GetBytes(stringToEncode);

    return Encoding.Default.GetString(utf8EncodedBytes);
}

answered Dec 24, 2018 at 16:18

Yass T

641 silver badge9 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Yass T Over a year ago

It is worth noting that I encountered the issue with RDotNet objects and functions and string manipulation, not R itself. I noticed the bad strings when calling the RDotNet CreateDataFrame() function; which in my code came before even interacting with R. So relating to your problem, maybe you are loosing the greek characters when transmitting data to an R function, and then getting back the results from R with already bad strings. It was the case for me anyway... I hope I am clear, sorry for the bad english

Jennifer B Over a year ago

I'm actually reading text files containing the characters within R and then passing the results back to C#. The Greek letters are fine within R, but are somehow "corrupted" when results are returned from RDotNet. Think that it must just be the open issue that you report above. I could read the files in C#, but there are some functions in R that I'm using, e.g. to convert a dtm from tidy to non-tidy format. If I just read the tidy dtm in C# then I won't be able to do this so easily, but maybe I could read in C#, and convert as you suggest (EncodeToDefaultFromUTF8), before passing to R?

Collectives™ on Stack Overflow

Unicode characters returning from R.NET

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related