Why is the algorithm set_intersection only returning matching strings from two text files when they are listed in the same exact order?

Question

I am a C++ novice, so bear with me if my formatting isn't perfect. I am writing a program called palindrome's cousin that finds all the words in the dictionary that spell another word backwards (i.e. dog, god). It calls on two separate text files. One is the dictionary forwards and one is the dictionary backwards. The way it works is, when a word in the backwards dictionary spells another word forwards, that means that it spells a word both backwards and forwards.

Take the word dog for example. In the backwards dictionary, dog was actually god originally, but because it is now dog, it will match up with the word dog in the forwards dictionary which will indicate to us that this is a word that can create a word spelled both backwards and forwards. Between each word is a comma for tokenizing. I am using the boost tokenizer btw.

The program works, but unfortunately it only returns a word match if the words in each text file are in the same exact order. If they are in any other order, there is no result (even if there is a match). For example, it only works if both text files contain "dog, god" in the same order rather than one containing, "dog, god" and the other containing "god, dog."

And it's not just a sorting problem. It hasn't returned any matches even if all the words are in the same exact order (alphabetically) with the whole dictionary (as a text file) and not just a two word dictionary.

Does anyone know how I can fix this? For sample text file content, see this end of this post. Thanks

Edit: Thank you everyone for your help! I was able to get the program running. See below for results!!

Code:

#include <iostream>
#include <string>
#include <fstream>
#include <vector>
#include <iterator>
#include <algorithm>
#include <boost/tokenizer.hpp>
typedef boost::char_separator<char> separator_type;

using namespace std;
using namespace boost;

int main()
{
    //fstream variable files
    fstream file1;
    fstream file2;
    string dictionary1;
    string dictionary2;
    string words1; 
    string words2;

    //filename of the files
    dictionary1 = "Twoworddictionary.txt"; 
    //this dictionary contains only two words separated by a comma as a test
    dictionary2 = "Backwardstwoworddictionary.txt"; 
    //this dictionary contains only two words separated by a comma as a test

    //opening files
    file1.open(dictionary1.c_str()); 
    //opening Twoworddictionary.txt
    file2.open(dictionary2.c_str()); 
    //opening Backwardsdictionary.txt

    if (!file1)
    {
        cout << "Unable to open file1"; 
        //terminate with error
        exit(1);
    }

    if (!file2)
    {
        cout << "Unable to open file2"; 
        //terminate with error
        exit(1);
    }

    while (getline(file1, words1))
    {
        while (getline(file2, words2))
        {
            boost::tokenizer<separator_type> tokenizer1(words1, separator_type(",")); 
            //separates string in Twoworddictionary.txt into individual words for compiler (comma as delimitar) 

            auto it = tokenizer1.begin();
            while (it != tokenizer1.end())
            {
                std::cout << "token: " << *it << std::endl; 
                //test to see if tokenizer works before program continues

                vector<string> words1Vec; 
                // vector to store Twoworddictionary.txt strings in
                words1Vec.push_back(*it++); 
                // adds elements dynamically onto the end of the vector 

                boost::tokenizer<separator_type> tokenizer2(words2, separator_type(",")); 
                //separates string in Backwardstwoworddictionary.txt into individual words for compiler (comma as delimitar) 

                auto it2 = tokenizer2.begin();
                while (it2 != tokenizer2.end())
                {
                    std::cout << "token: " << *it2 << std::endl; 
                    //test to see if tokenizer works before program continues

                    vector<string> words2Vec; 
                    // vector to store Backwardstwoworddictionary.txt strings in
                    words2Vec.push_back(*it2++); 
                    // adds elements dynamically onto the end of the vector 

                    vector<string> matchingwords(words1Vec.size() + words2Vec.size()); 
                    //vector to store elements from both dictionary text files (and ultimately to store the intersection of both, i.e. the matching words)

                    sort(words1Vec.begin(), words1Vec.end()); 
                    //set intersection requires its inputs to be sorted
                    sort(words2Vec.begin(), words2Vec.end()); 
                    //set intersection requires its inputs to be sorted

                    vector<string>::iterator it3 = set_intersection(words1Vec.begin(), words1Vec.end(), words2Vec.begin(), words2Vec.end(), matchingwords.begin()); 
                    // finds the matching words from both dictionaries

                    matchingwords.erase(it3, matchingwords.end());  

                    for (vector<string>::iterator it4 = matchingwords.begin(); it4 < matchingwords.end(); ++it4) cout << *it4 << endl; 
                    // returns matching words

                    //Uh-oh. It only matches when they're in the same order (i.e. dog, fig in both text files)
                }
            }
        }
    }

    file1.close();
    file2.close();

    return 0;
}

This is the output that I get when the words in the text files are in the same order:

token: dog
token: dog
dog
token:  god
token:  god
token: dog
token:  god
 god

This is the output that I get when the words in each text file are not in the same order:

token: fig
token: dog
token:  fig
token:  dog
token: dog
token:  fig

Twoworddictionary.txt contains:

dog, god

Backwardstwoworddictionary.txt contains:

god, dog

Dictionary.txt contains (10 words): *This is the one that returns no result, even if the words are sorted.

gnome,
gnu,
go,
goal,
goals,
goat,
god,
gods,
goes,
going,

Palindrome’s Cousin Results – 10,000 Words

a, aa, aaa, ab, ac, ad, ada, ae, af, ag, ages, ah, ai, aim, aj, ak, aka, al, ala, am, an, ana, and, anna, ap, ar, are, as, at, ata, av, ave, avon, aw, az, b, ba, bat, bb, bc, bd, bg, bk, bl, bm, bo, bob, boob, bp, br, bs, bt, bus, but, bw, c, ca, cam, cap, cb, cc, cd, ce, cf, cfr, cg, ch, ci, civic, cj, cl, cm, cn, co, cod, col, corp, cos, cp, cpu, cr, craps, cs, ct, cu, cv, cw, d, da, dad, dam, das, db, dc, dd, de, deer, def, del, dem, der, devil, df, dg, dh, di, dial, did, dim, dir, div, dj, dl, dm, dna, doc, dod, dog, dom, doom, dp, dr, draw, ds, dt, dts, dvd, e, ea, ec, ed, edit, ee, ef, eg, eh, el, em, en, ep, er, era, erp, es, et, ev, eva, eve, evil, eye, f, fa, fc, fd, fe, fed, ff, fi, fig, fl, flow, fm, fo, fp, fr, fs, ft, g, ga, gb, gc, gd, ge, gel, gg, gif, gig, gis, gl, gm, go, god, gp, gr, gs, gsm, h, ha, hc, hd, he, hh, ho, hp, hr, hs, ht, hu, i, ia, ic, id, if, ii, iii, il, im, in, ip, ir, irs, is, isp, it, iv, ix, j, ja, jc, jd, jj, jm, jp, jr, k, ka, kb, ko, ks, l, la, laid, lap, law, lb, lc, ld, le, led, leg, let, level, lf, lg, li, lil, lit, live, lived, ll, lm, ln, lo, loc, lol, loop, los, lp, ls, lu, m, ma, mac, mad, man, map, maps, mar, mas, mb, mc, md, me, med, mem, mf, mg, mi, mia, mid, mit, mj, ml, mm, mn, mo, mod, mom, mood, mp, mr, ms, msg, mt, mu, mw, n, na, nam, nat, nav, nc, ne, net, ni, nl, nm, nn, no, non, noon, nor, not, nov, nova, now, np, nr, ns, nt, nu, nw, ny, o, ob, oc, of, og, oh, ok, ol, om, on, oo, ooo, op, or, os, ot, p, pa, pac, pal, pam, par, part, parts, pas, pat, pb, pc, pct, pd, pe, per, pets, pf, pg, pgp, ph, php, pi, pit, pj, pl, pm, pn, po, pool, pop, pot, pp, pr, pre, proc, ps, psi, psp, pt, q, r, ra, radar, ram, rap, rat, rats, raw, rb, rc, rd, re, red, reed, refer, rep, res, rev, rf, rfc, rg, rh, ri, rid, rj, rm, rn, ro, ron, rp, rr, rs, rt, ru, rw, s, sa, sad, sam, sap, sas, saw, sb, sc, sd, se, sees, sega, ser, sf, sg, sh, si, sig, sk, sl, sm, sms, sn, so, soc, sol, sp, spam, sparc, spot, spots, sr, sri, ss, st, star, stats, std, step, stops, strap, su, sub, sv, sw, sys, t, ta, tab, tan, tap, tar, tb, tc, tcp, td, te, tel, ten, tf, tft, th, ti, tide, til, tim, tip, tit, tm, tn, to, ton, top, tops, tp, tr, trap, ts, tt, tu, tub, tv, u, uc, uh, ul, um, un, upc, ur, us, ut, uw, v, va, van, vc, ve, ver, vi, vid, von, vs, vt, w, wa, wal, war, ward, was, wb, wc, wm, wn, wolf, won, wow, wr, ws, wu, ww, www, x, xanax, xi, xx, xxx, y, yn, z, za

If you look at the description of std::set_intersection you'll see that it requires sorted inputs. It doesn't work properly otherwise. — Mark Ransom
– Mark Ransom, Commented Apr 21, 2021 at 2:06
But I sort them in my code. Does that not work the way I think it does? It actually shouldn't be a problem as the dictionary is alphabetical. I was just confused as to why it wouldn't sort the strings the way I thought it would. — maria9876
– maria9876, Commented Apr 21, 2021 at 2:07
So you do. I must admit that I didn't read through that huge wall of code before making my comment. You really need to learn to simplify your problems before posting them. — Mark Ransom
– Mark Ransom, Commented Apr 21, 2021 at 2:10
You have a space after a comma in the file, and actually make this space part of the word that follows the comma. You read the text "dog" (a three-character string) from one file, and the text " dog" (a four-character string, the first character being a space) from the other. These strings are different, not equal to each other. — Igor Tandetnik
– Igor Tandetnik, Commented Apr 21, 2021 at 2:35
@maria9876 Does anyone know how I can fix this? -- Trim your strings of leading and trailing white space before putting them in the containers. Since you are using boost, there are relevant string functions that do this job. — PaulMcKenzie
– PaulMcKenzie, Commented Apr 21, 2021 at 5:59

Collectives™ on Stack Overflow

Why is the algorithm set_intersection only returning matching strings from two text files when they are listed in the same exact order?

0

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Your Answer

Sign up or log in

Post as a guest