1

Alright guys, I'm having some trouble with using my file pointers to traverse through a file via looping. I will have a list of strings in my text file, one per line, and I am testing similarities between them. So my method of going about it is having two file pointers to traverse and compare.

Example: FILE* fp1 will be set on the first line to begin. FILE* fp2 will be set on the second line to begin.

I wish to traverse this way:

Line 1 <-> Line 2
Line 1 <-> Line 3
Line 1 <-> Line 4
Line 1 <-> Line 5

(Here I read the next line via fp1 to get to Line 2, I also attempt to set fp2 to the next line read after fp1)

Line 2 <-> Line 3
Line 2 <-> Line 4
Line 2 <-> Line 5

Etc...

And here is the code... The FILE* fp was passed to the function as (FILE* fp)

FILE* nextfp;
for(i = 1; i <= numStr; i++){
    fscanf(fp, "%s", str1);
    nextfp = fp;
    double str1len = (double)(strlen(str1));
    for(j = i + 1; j <= numStr; j++){
        fscanf(nextfp, "%s", str2);
        double str2len = (double)(strlen(str2));

        if((str1len >= str2len) && ((str2len / str1len) >= 0.90000) && (lcsLen(str1, str2) / (double)str2len >= 0.80000))
            sim[i][j] = 'H';
        else if ((str2len >= str1len) && ((str1len / str2len) >= 0.90000) && (lcsLen(str2, str1) / (double)str1len >= 0.80000))
            sim[i][j] = 'H';
    }
}

int numStr is the total number of lines with strings
lcsLen(char*, char*) returns length of longest common subsequence

The sim[][] array is where I am labeling my level of similarity. As of right now I only have it programmed to label strings of high similarity.

My results are incomplete and it is due to my fp not going to the next line and just staying on the same string, AND, my inner loop is keeping the nextfp pointing at the last string and not going where it should due to my nextfp = fp line.

Any help's appreciated! Thank you all so much!

4
  • Depending on the size of the file, I would recommend you read into an array in memory instead of fiddling with file-reading. Then it's a simple nested loop over array indexes. Commented Apr 20, 2016 at 8:02
  • It's going to be of arbitrary sizes. I'm approaching this way so I don't have to worry about memory. Commented Apr 20, 2016 at 8:04
  • 1
    And why are you using a floating point type for the string length? When will a length of a string every be a non-integer? You can cast to floating point when doing the division. Commented Apr 20, 2016 at 8:04
  • I was using it as a method to go around the integer arithmetic (even though I could have very well cast at the calculation instead) Commented Apr 20, 2016 at 8:07

3 Answers 3

1

You can't treat FILE * like a pointer to memory, it's a pointer to an object of type FILE which in turn holds the state associated with the file I/O.

Copying a FILE * makes little sense, and certainly doesn't create a copy of the state in question.

Part of that state is the current position in the file, this doesn't change just because you copy the pointer.

You should either investigate memory-mapping the file, which would give you the type of access you seem to expect, or just read in the entire file once to an array of strings, which you can then iterate over in any way you like.

Sign up to request clarification or add additional context in comments.

Comments

0

After first innerloop, the file stream already goes to end-of-file. After that you can't use fp to read from the file stream. Remember you are reading on stream, stream don't go back. Read man 3 fseek, you can manualy set the file offset to some place, but this doesn't address your problem . You should read all lines to arrays, this is easier and faster.

Comments

0

As the other answers state, you should consider just reading the whole file into an array. If your file size is more than several hundreds of MB, your approach might be the right choice however.

Use ftell to save the current offset after reading the first line and set the file descriptor back to that offset with fseek after you looped through the rest of the lines.

FILE* nextfp;
size_t offset;
for(i = 1; i <= numStr; i++){
    fscanf(fp, "%s", str1);
    offset = ftell(fp); // save the current position
    double str1len = (double)(strlen(str1));
    for(j = i + 1; j <= numStr; j++){
        fscanf(nextfp, "%s", str2);
        double str2len = (double)(strlen(str2));

        if((str1len >= str2len) && ((str2len / str1len) >= 0.90000) && (lcsLen(str1, str2) / (double)str2len >= 0.80000))
            sim[i][j] = 'H';
        else if ((str2len >= str1len) && ((str1len / str2len) >= 0.90000) && (lcsLen(str2, str1) / (double)str1len >= 0.80000))
            sim[i][j] = 'H';
    }
    fseek(fp, offset, SEEK_SET); // set the file descriptor back to the previous position
}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.