3

I have an in-memory byte[] and need to locate the offset where 13 and 10 are. I will then use the following to extract that line:

 String oneLine = Encoding.ASCII.GetString(bytes, 0, max);

What is the fastest way to search for the two bytes on a x64 bit computer? ..and convert it to a string?

Is there anything I can do other than iterating through each byte, scanning for 13 and then scan for 10?

// Disclaimer: 
// This is just for my curiosity.  Perhaps I'll gain a better understanding of
// how .NET interfaces with RAM, the CPU instructions related to comparisons, etc.
//  
// I don't suspect a performance problem, but I do suspect a lack of understanding
// (on my part) on how C# does low-level operations.
9
  • 8
    Maybe there is - but have you measured and found that the simple approach isn't fast enough? Commented Feb 19, 2013 at 21:03
  • 1
    would it be faster to conver to a string first and do an yourString.IndexOf(Environment.NewLine)? Genuinely curious... Commented Feb 19, 2013 at 21:07
  • 1
    @PaulSasik Well, often a constant factor speed-up can be gained by doing the same thing in a more cache- and CPU-friendly manner. Sometimes that constant factor is significant. However, OP and all other people asking for the fastest way better be aware that this is often a waste of time and the wrong question to ask to begin with. So, OP: Why do you even care? Curiosity (nothing wrong with that)? A gut feeling that the obvious way is too slow? Well-founded (y'know, with profiling and benchmarking) knowledge that it's slower than it should be? Only the middle one is a bad reason. Commented Feb 19, 2013 at 21:09
  • Fastest way ever - create a unmanaged dll and use SCASW - tic01.tic.ec-lyon.fr/~muller/trotek/cours/8086/SCASW.html.en Commented Feb 19, 2013 at 21:10
  • Of course there is something else: fix the array, cast the pointer to ulong*, and for each 8-byte block use SWAR to test whether it contains a 13. If it does, scan from the beginning of that block for 13,10, and well you can make the rest up. Whether that's faster, well, who knows. But it's something you could try, if necessary. Commented Feb 19, 2013 at 21:11

2 Answers 2

1

Not sure if it will be 'the fastest way' but you can look at Boyer-Moore algorithm to find the indexes of the required values.

Have a look at this SO thread Search longest pattern in byte array in C#

Boyer-Moore would be better than a linear array traversal because it can skip elements depending on the length of you 'needle' and it gets better as the 'haystack' gets bigger. HTH.

Sign up to request clarification or add additional context in comments.

1 Comment

The needle just 2 bytes though, so there's almost no skip
1

Since you are looking for a two byte sequence, you don't have to scan every byte, just every other one. If the target index contains a 13, then look at the next byte for a 10. If the target index contatins a 10, look at the previous byte for a 13. That should cut your scan time approximatly in half over a linear search.

1 Comment

Yes this seems to be a variation of the Boyer-Moore pattern (if I understand it correctly)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.