2

I have to make a program which takes a file of DNA sequences and a DNA subsequence from command arguments and find each time the subsequence and how many times it occurs. I'm having troubles with strcmp in line 36 and 42. Currently the way I have it I figured out through GDB that I am comparing the address of the strings and not the actual strings. But if I remove the & I get an error. I'm not sure what is the correct way to go about this is. TIA

#include <stdio.h>
#include <string.h>

int main(int argc, char *argv[]) {
    // place subsequence in string
    char *subsequence = argv[2];
    // get length of subsequence
    int seqLength = strlen(subsequence);
    // define file type and open for reading
    FILE *inputFile = fopen(argv[1], "r");
    // get each line using while loop
    char inputLine[200]; // string variable to store each line
    int i, j, lineLength, counter = 0, flag = -1;

    while (fgets(inputLine, 200, inputFile) != NULL) { // loop through each line
        lineLength = strlen(inputLine);

        for (i = 0; i < lineLength; i++) { // loop through each char in the line
            if (strcmp(&inputLine[i], &subsequence[0]) == 0) { 
            // if current char matches beginning of sequence loop through
            // each of the remaining chars and check them against 
            // corresponding chars in the sequence

                flag = 0;

                for (j = i + 1; j - i < seqLength; j++) {
                    if (strcmp(&inputLine[j], &subsequence[j - i]) != 0) {
                        flag = 1;
                        break;
                    }
                }

                if (flag == 0) {
                    counter++;
                }
            }
        }
    }

    fclose(inputFile);
    printf("%s appears %d time(s)\n", subsequence, counter);
    return 0;
}

dna.txt:

GGAAGTAGCAGGCCGCATGCTTGGAGGTAAAGTTCATGGTTCCCTGGCCC

input:

./dnaSearch dna.txt GTA

expected output:

GTA appears 2 times
10
  • Are you trying to compare strings or characters? Looks like characters to me. Commented Nov 17, 2017 at 19:41
  • 1
    Then you don't need strcmp(). strcmp() compares zero-terminated strings (sequences of characters), characters just compare like the numbers they are. Commented Nov 17, 2017 at 19:44
  • 1
    BTW, if just char compares, you can just use == per character. Commented Nov 17, 2017 at 19:44
  • 1
    >>I am comparing the address of the strings and not the actual strings No, you are comparing the strings. &string[index] will fetch the substring from index to the end of the sting. Commented Nov 17, 2017 at 19:45
  • 1
    You should use strstr() to locate the substring, and observe a significant gain in speed for your app. Commented Nov 17, 2017 at 20:14

3 Answers 3

1

Just do like this:

if (inputLine[i] == subsequence[0]) {
    if (inputLine[j] != subsequence[j - i]) {

You do not need library functions to compare single characters.

Sign up to request clarification or add additional context in comments.

Comments

1

As others have mentioned, you don't need to call strcmp the first time since you're only checking a single character. You can just compare them directly:

if (inputLine[i] == subsequence[0]) {

However, there's a must simpler way of doing what you want. Since you're looking for a substring inside of another string, you can use the strstr function to do that:

while (fgets(inputLine, 200, inputFile) != NULL) { // loop through each line
    char *sub = inputLine;
    while ((sub = strstr(sub, subsequence) != NULL) {
        counter++;
        sub++;
    }
}

The strstr function will return a pointer inside the string to search of the substring that was found, or NULL if none was found. In the above code, if the substring is found the counter is incremented, then the substring pointer is moved up to continue the search.

Comments

1

Your string inputLine is a pointer to an array of characters and terminated in character '\0'.

strcmp expects a '\0' terminated string.

Passing &inputLine[i] is passing the address of character in position 'i' to the pointer argument and the string will be read until the '\0' character.

As suggested in the comments, you either use the ordinary operators to compare the strings characters:

if (inputLine[i] == subsequence[0]) {
    flag = 0;
    for (j = i + 1; j - i < seqLength; j++) {// loop
        if (inputLine[j] != subsequence[j - i]) {
            flag = 1;
            break;
        }
    }

Or use strncmp, which compares substrings:

if (strncmp(&inputLine[i], subsequence, seqLength) == 0) {
    counter++;
}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.