3

I have been working on this program where I have to input a string and then display the character distribution in that string.

For example:
if the input is “minecode” the output should be

C – 1
O – 1
D – 1
E – 2
I – 1
M – 1
N – 1

Here is what I tried to do, but I really don't know how to traverse the loop and check for similar character and then increment the count. The assembler is MASM 615 running on a 32 bit machine.

.686
.MODEL flat, stdcall
.STACK
INCLUDE Irvine32.inc
.DATA
  msg0 BYTE "Enter a string of characters: ",0
  msg1 BYTE "Character Distribution: ",0
  MainArray    dword 10000 dup (?)
  UniqueChar   dword 10000 dup (?)
  CharCount    dword 10000 dup (?)
.CODE
MAIN PROC
    lea edx, msg0
    call WriteString
    call dumpregs        ; 1
    call ReadString
    mov MainArray, eax
    call dumpregs        ; 2
    mov MainArray, ebx
    call dumpregs        ; 3
    call CRLF
    lea edx, msg1
    call WriteString
    call CharDist        ; Calls the procedure
    call dumpregs        ; 5
    exit
MAIN ENDP
CharDist PROC
    mov ecx, lengthof MainArray
    mov esi, OFFSET MainArray
    L1:
; what to do here?? 
Loop L1:
CharDist ENDP
END MAIN
1

2 Answers 2

5

One possible approach: make an array of 256 counters, store its base address in ebx, and for each byte in the string, increment the counter at that offset from ebx. Afterward, loop over your array of counters and print the nonzero counts.

You never say whether this is a string terminated by a 0 byte (C-style), one preceded by its length (Pascal-style), or one whose length is passed in as a second parameter, but that will determine when you terminate the loop. If you’re looking for a terminating zero, test the byte you just read, and if you’re counting a specific number of bytes, keep the count of bytes remaining in ecx and test that. (There are special instructions to branch conditionally if ecx is not zero, if you feel like using them.)

If you keep your pointer to the string in esi, you can load the next byte into al with the lodsb instruction. Alternatively, you can mov from [esi] and then inc esi. If you zero out eax before storing each byte in al, this will give you an index in eax that you can use with an array of counters.

Sign up to request clarification or add additional context in comments.

10 Comments

Yup, only 256 bins is small enough to make a standard histogram approach work well with a plain array, not a hash table or other map data structure.
However, your suggestion to use the LOOP instruction to implement the loop is not good. On most CPUs, it's so slow you might as well forget it even exists, unless optimizing for code-size. The other thing I'd suggest is using movzx eax, byte [esi] / inc [table + eax] inside the loop. That's maybe easier to thing about than zeroing eax outside the loop and then just modifying the low byte inside the loop (e.g. with LODSB). Both ways work, though.
I would definitely advise the OP to get the simple approach working before trying to implement a hash table in assembly language.
My point was that a simple array is the right solution for this specific problem size (which is why a beginner was asigned it for homework, I assume :), but might not be for related problems with sparse sets. Using a hash table for this would be slower. If you were going to do anything, you could assume printable ASCII and do inc [table + eax - 0x20] or something, but probably best not to. Allocating memory that's never touched is not really a problem as far as performance.
Who's upvoting both our comments? Is someone lurking here hanging on our every word?
|
3

Other possible approach:

Should be easier to understand and actually it is very "human" straightforward (like how would you do it on paper with pencil), so I wonder how you didn't come up with this one ... you should not only try to understand how this works, but also try to figure out why you didn't see it, what is confusing/blocking you.

for all letters of alphabet [A-Z] as "letter" {
    counter = 0;
    for all characters in input string as "char" {
        if (letter ignore_case_equals char) ++counter;
    }
    if (0 < counter) {
        display "letter - counter" and new line.
    }
}

This may be actually faster for English alphabet and short string, like 3-5 letters (containing letters only); than the suggested count sort, as alphabet is 26 chars and count table is 256 bytes, so 256/26 = ~9. But count table will reveal count for any character, including non-alphabet ones and also it will stall less on branching.


Also notice, how you did start with emitting code for prompts/input/etc... things you already could do in Assembly, avoiding the unknown.

I did start with almost-English description of algorithm. Because I don't care, what I know to write in Assembly (I already know, that pretty much anything :) ), first I want to be sure I know what I want the code do for me and what kind of data I want to process. Then I will command the CPU to do it, by finalizing the data structures plan and splitting those English notes into simpler and simpler steps, until they start to resemble instructions, then I fill up the instructions between comments.

Don't skip the native language reasoning phase, it may save you lot of work on the code, if you figure out some elegant way to organize data or cut down algorithm steps by reusing certain part of it. Not every problem has short elegant solution, but many have.

2 Comments

Depending on how you implement the count-matches part, that part can be branchless (CMP / sete al / add ecx, eax). The 0 < count branch for conditional printing is also present in the histogram version. And yes, for short strings, especially with the character limited to just alphabetic characters, this could be faster.
If you could guarantee that there would only be letters, you could optimize the array of buckets to be faster, too! But this also works, and it’s good to try more than one approach.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.