0

I try to read a dictionary file, in which each line contains word-id, word and frequency separated by a whitespace. The problem is that the map used to store words turned out have same value. I very appreciate if you can help me.

typedef struct{
    int id;
    int count;
    char* word;
} WORD;

//read file
std::map<int, WORD*> readWordMap(char* file_name)
{
    std::ifstream infile(file_name, std::ifstream::in);
    std::cout<<"word map read file:"<<file_name<<std::endl;
    if (! infile) {
        std::cerr<<"oops! unable to open file "<<file_name<<std::endl;
        exit(-1);
     }
     std::map<int, WORD*> map;
     std::vector<std::string> tokens;
     std::string line;
     char word[100];
     int size;
     while (std::getline(infile, line)) {
         size =  (int)split(line, tokens, ' ');
         WORD* entry = (WORD*) malloc(sizeof(WORD*));
         entry->id = atoi(tokens[0].c_str());
         entry->count = atoi(tokens[2].c_str());
         strcpy(word, tokens[1].c_str());
         entry->word = word;

         map[entry->id] = entry;
         std::cout<< entry->id<<" "<<entry->word<<" "<<entry->count<<std::endl;

      }
      infile.close();
      std::cout<<map.size()<<std::endl;
      std::map<int, WORD*>::const_iterator it;
      for (it = map.begin(); it != map.end(); it++) {
           std::cout<<(it->first)<<" "<<(it->second->word)<<std::endl;

      }

      return map;
}

//split string by a delimiter
size_t split(const std::string &txt, std::vector<std::string> &strs, char ch)
{
    size_t pos = txt.find( ch );
    size_t initialPos = 0;
    strs.clear();

    while( pos != std::string::npos ) {
        strs.push_back( txt.substr( initialPos, pos - initialPos + 1 ) );
        initialPos = pos + 1;

        pos = txt.find( ch, initialPos );
    } 

   strs.push_back( txt.substr( initialPos, std::min( pos, txt.size() ) - initialPos + 1      ) );

   return strs.size();
}

Data file:

2 I  1
3 gave  1
4 him  1
5 the  3
6 book  3
7 .  3
8 He  2
9 read  1
10 loved  1

result:

2 I  1
3 gave  1
4 him  1
5 the  3
6 book  3
7 .  3
8 He  2
9 read  1
10 loved  1
map size:9
2 loved 
3 loved 
4 loved 
5 loved 
6 loved 
7 loved 
8 loved 
9 loved 
10 loved 
5
  • 2
    I highly suggest you kill the pointers and just use std::string in place of char * and std::map<int, WORD> in place of std::map<int, WORD *>. Commented May 16, 2013 at 20:46
  • 1
    Also WORD* entry = (WORD*) malloc(sizeof(WORD*)); is wrong. Should be WORD* entry = (WORD*) malloc(sizeof(WORD)); instead Commented May 16, 2013 at 20:48
  • @chris thank you a lot, I have replaced the WORD* and char* with WORD std::string respectively. Commented May 16, 2013 at 21:03
  • @Xaqq thanks, I have corrected it. But the result is still same. @M M. Thanks, I will try it. Commented May 16, 2013 at 21:06
  • Yeah, I didnt expect that it would fix your problem, but having memory error can't help :D. You're welcome :) Commented May 16, 2013 at 21:08

2 Answers 2

1
WORD* entry = (WORD*) malloc(sizeof(WORD*));

allocates a WORD pointer not a whole WORD struct.

The compiler keeps allocating entry put it is not initalized to anything (they are all pointing to some random address which doesnt even belong to your program possibly. ) and you add that pointer to the map repeatedly. So all firsts of your map are pointing to the same location (coincidentaly). It should be

WORD* entry = new WORD;

This is a cleaner way of doing it

struct WORD{
    int id;
    int count;
    std::string word;
};

while (std::getline(infile, line)) {
     WORD* entry = new WORD;
     std::istringstream iss(line);

     iss >> entry->id >> entry->word >> entry->count;
     map[entry->id] = entry;
     std::cout<< entry->id<<" "<<entry->word<<" "<<entry->count<<std::endl;
  }
Sign up to request clarification or add additional context in comments.

4 Comments

thank very much. it works. But I try to print the allocated addresses of WORD*, which are pointing to different locations. 0x100100ae0 0x100100ab0 0x100100c20 0x100100c90 0x100100d00 0x100100d70 0x100100de0 0x100100e50 0x100100ec0
@user2293003 You mean you tried to print them in your original code? or in my version?
@user2293003 Don't use malloc in C++. Use new. Use stingstreams when you can. And avoid raw pointers when you can.
Well it might be that the compiler allocates them in different places. But they all point to the same address. Their value is undefined since they are not pointing to anything valid. This is called Undefined Behavior. Which means there is really no guarantee of what happens. Things can go any way. Anyway I will update the answer accordingly.
1

You forget to allocate memory for WORD::word before strcpy. And you are assigning the address of char word[100] to all items of the map which is same for all of them.

 

And it's better to use std::string instead of C-style strings. In addition you can use std::stoi to convert strings to integers. Try this:

struct WORD{
    int id;
    int count;
    std::string word;
};

std::map<int, WORD> readWordMap(const std::string &file_name)
{
     ...
     std::map<int, WORD> map;
     ...

     while (std::getline(infile, line)) {
         ...

         WORD entry;
         entry.id = std::stoi(tokens[0]);
         entry.count = std::stoi(tokens[2]);
         entry.word = tokens[1];

         map[entry.id] = entry;

         ...
      }
      infile.close();
      ...
}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.