0

I'm returning a Character vector from a function in R to C# using R.NET. The only problem is that unicode characters, such as Greek Letters are being lost. The following line gives an example of the code I'm using:

CharacterVector cvAll = results[5].AsList().AsCharacter();

Where results is a list of results returned by the R function. The characters are also written by R to a text file and they display fine in notepad and other editors. Can I get R.Net to return the characters correctly?

1 Answer 1

0

Looks like you ran into an open issue with RDotNet : https://github.com/jmp75/rdotnet/issues/25

Unicode characters don't seem to be supported yet. I ran into the same issue while calling the engine.CreateDataFrame() method. It did return a DataFrame with all my accentuated strings wrong.

There seems to be a workaround though : when calling RDotNet functions, if I give strings encoded in my computer default encoding (Windows ANSI) and converted from UTF-8 (important), R takes them and gives back correctly interpreted accentuated strings to C#. I don't exactly know why it is working though... It might have something to do with the default encoding used with .Net for string being UTF-16. (cf. here : http://csharpindepth.com/Articles/General/Strings.aspx), hence the conversion from UTF-8 to default ANSI that seems to be working.

Here is an ugly example : when I'm building a RDotNet DataFrame, I convert all strings in a CharacterVector to ANSI (from UTF-8) encoded ones :

try 
{
    string[] colAsStrings = null;
    colAsStrings = Array.ConvertAll<object, string>(uneColonne, s => StringEncodingHelper.EncodeToDefaultFromUTF8((string)s));
    correctedDataArray[i] = colAsStrings;
    columnConverted = true;
}

Here is the static method used for conversion :

public static string EncodeToDefaultFromUTF8(string stringToEncode)
{
    byte[] utf8EncodedBytes = Encoding.UTF8.GetBytes(stringToEncode);

    return Encoding.Default.GetString(utf8EncodedBytes);
}
Sign up to request clarification or add additional context in comments.

2 Comments

It is worth noting that I encountered the issue with RDotNet objects and functions and string manipulation, not R itself. I noticed the bad strings when calling the RDotNet CreateDataFrame() function; which in my code came before even interacting with R. So relating to your problem, maybe you are loosing the greek characters when transmitting data to an R function, and then getting back the results from R with already bad strings. It was the case for me anyway... I hope I am clear, sorry for the bad english
I'm actually reading text files containing the characters within R and then passing the results back to C#. The Greek letters are fine within R, but are somehow "corrupted" when results are returned from RDotNet. Think that it must just be the open issue that you report above. I could read the files in C#, but there are some functions in R that I'm using, e.g. to convert a dtm from tidy to non-tidy format. If I just read the tidy dtm in C# then I won't be able to do this so easily, but maybe I could read in C#, and convert as you suggest (EncodeToDefaultFromUTF8), before passing to R?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.