Sorting Algorithms for strings of equal length C++

Question

I need to sort around 100000 strings ASCIIbetically and by length, I sort by lengths by putting it into a 2D vector at the length of the string and then sort each array using quicksort (for ASCIIbetically). But is there a faster sort for strings of equal length? I've heard radix is great but I find it difficult to understand. What would be the best way to sort equal length strings without using the sort() function? If you need the code I can post it up.

I don't get what you want to achieve. Maybe a short example would help. — Henry
– Henry, Commented Sep 9, 2014 at 5:22
Okay so we are given about 100000 strings and we have to sort these strings n times in the fastest way possible. So say, 'the piece of puzzle Was higher' will end up being 'of Was the piece higher puzzle'. I'm just looking for a faster way to sort strings in this manner. — Jackson Collins
– Jackson Collins, Commented Sep 9, 2014 at 6:37
Have you read this article: Engineering Radix Sort for Strings ? It makes a lot of effort to optimize the radix sorting for strings. — vgru
– vgru, Commented Sep 9, 2014 at 9:01

WeaselFox · Accepted Answer · 2014-09-09 09:28:27Z

2

I think building a trie and then retrieving the keys in the trie by means of pre-order traversal is about as efficient as it gets for string sorting, and is actually a form of radix sort. Here is a detailed academic paper discussing this method. In 2006 at least this was the fastest string sorting method out there.

edited Sep 9, 2014 at 9:28

answered Sep 9, 2014 at 7:40

WeaselFox

7,3989 gold badges53 silver badges77 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Jackson Collins Over a year ago

At the moment i'm getting around 0.38 seconds for sorting around 70000 strings using vectors and quicksort. Would a trie beat that? (then again it depends on the computer)

Jackson Collins Over a year ago

Sorry, extra digit 0.38 seconds

Jackson Collins Over a year ago

Does anyone have some simple example code for trie that I can look at, even pseudocode?

user3793679 · Accepted Answer · 2014-09-09 11:43:16Z

For strings of between 8 and 15 characters, your comparison function for quick sort could do the first 8 characters in a single 64-bit chunk. And so on for 16 to 31, etc. So, you end up with as many comparison functions as you feel makes a difference. Unless you have a very large number of strings with long common sub strings, just using what you know about the string lengths may do the trick, straightforwardly.

For completeness, you need to worry about alignment and byte order. So, fetching 8 bytes at a time into a uint64_t:

  uint64_t u ;

  memcpy(&u, pv, 8) ;
  ...convert to big-endian if required...

will do the trick. I can tell you that with gcc and -O2 on an x86_64 the memcpy() compiles to a single instruction, as if it was u = *(uint64_t*)pv :-) For processors with alignment issues, I would hope that the compiler does something suitable.

Sadly, memcmp(foo, bar, 8) does not get the same treatment (at least on gcc 4.8, not even with -O3) :-(

Collectives™ on Stack Overflow

Sorting Algorithms for strings of equal length C++

2 Answers 2

3 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related