0

I have two, big, 2 dimensional arrays (pulled from some xml data) one (A list) is ~1000 items containing 5 fields the other (B list) is dinamically between 10.000-12.000 items containing 5 fields.

My idea was to compare EACH id key of list A against EACH id key of list B and on "true" compose a new array of combined fields, or just fields from array A if no match.

I used nested foreach loops and ended up with millions of iterations taking long time to process. needless to say...not a solution.

The form of this two structures and my needed result reminded me straight away of a sql join.

The questions are: 1.) Should i try sql or nested foreach might not be the best php way? 2.) Will a relational query be much faster than the iterations?

EDIT:

I pull data only periodically from an xml file (in a separate process) which contains 10+ fields for each node. Than i store the 5 fields i need in a CSV file to later compare with table A that i pull out from a mysql database. basically much like catalog update of attributes with fresh feed. I'm affraid the original idea of storing into CSV was an error and i should just save the feed updates into a database too.

EDIT 2

The array list B look like this

Array
(
    [0] => Array
        (
            [code] => HTS541010A9E680
            [name] => HDD Mobile HGST Travelstar 5K100 (2.5", 1TB, 8MB, SATA III-600)
            [price] => 385.21
            [avail] => 0
            [retail] => asbis
        )
...
...

while the A list is similar in all but the 'code' field which is the only one useful for comparison

Array
    (
        [0] => Array
            (
                [code] => ASD-HTS541010A
                [name] => HDD Mobile HGST Travelstar 5K100 (2.5", 1TB, 8MB, SATA III-600)
                [price] => 385.21
                [avail] => 0
                [retail] => asbis
            )

As you can see each feed will have universal code BUT some different random data as prefix or suffix so in each loop i have to do a couple of operations on the string to stripos or compare it to feeds id for a match or close match.

Pseudo code:

$mylist = loadfromDB();
$whslist = loadfromCSV();

        foreach ($mylist as $myl) {
                foreach ($whslist as $whl){


                    if ((stripos(code_a,code_b) OR (code_b,code_a) !== false)){
                                        ...
                    }
                    elseif (stripos(substr(strstr(code_a,'-'),1),code_b) !== false) {
                        ...
                    }
                    elseif (stripos( substr(code_a,0,-5);) == !false ){
                        ...
                    }




                    }


            }
7
  • If you don't get the data from sql, i don't think you work it out that way. What about trying regexp on xml plain text instead of node parsing? Also, isn't there a search function to search directly for a given ID in a given xml document, instead of looping ? Commented Nov 12, 2013 at 1:30
  • i'm sorry i wasnt specific enough i get it initially from an xml but the store it in a csv file. i'll edit the post. Commented Nov 12, 2013 at 1:34
  • you need to read about hash tables (in php usual array is hash table) Commented Nov 12, 2013 at 1:36
  • I would agree with @Sebas comments about not loading into SQL if the data does not reside there to begin with. This is really a data structure problem. One that becomes much easier since you know that list A is authoritative. You can simply include all items from A and lookup up values from list B against the id (key) derived from list A. Without looking at your data, it would be hard determine what might be the optimal solution, but putting data from list B into a key-value store and searching that store for id values from list A would be pretty simple. Commented Nov 12, 2013 at 1:39
  • If the data is from xml to csv where does sql come in? What exactly is the question, are you just looking to to improve the performance when merging these arrays? Commented Nov 12, 2013 at 1:40

1 Answer 1

1

Using SQL will be faster because most SQL engines are optimized for joins, and your method is a brute-force method. However, inserting all that data to MySQL tables is quite a heavy task, so it's still not the best solution.

I suggest you do the join in PHP - but use a smarter algorithm. Start by sorting the two arrays by the field you want to match. Iterate both sorted arrays together - use two iterators(or pointers or indices or whatever) - lets say a iterates over A and b over B. On each iteration of the loop, compare the comparison field of the elements pointed by the a and b. If a's is smaller - advance a. If b's is smaller - advance b. If a's is equal to b's - you have a match, which you should store in a new list, and then advance both a and b(assuming the relation is one-to-one - if it's one-to-many you only advance the many iterator, and if it's many-to-many you need a bit more complex solution).

Sign up to request clarification or add additional context in comments.

5 Comments

Is this still true in case where the ID fields are strings like for example from the code: ListA['id'] = 'CVS/543gHS-34'; while in the feed it will be ListA['id'] = '543gHS-34'; or ListA['id'] = '543gHS34'; in each loop i have to have a couple of elseifs with some string operations to be able to do at least stripos if not compare.
You will have to bring them all to the same format before the sort, but you can do that in a single pass.
Can you please check the EDIT 2 ?
it is one to many as a result can contain matches and pseudo matches .. the similar ones
MMMmmm... that's really messed up. I guess it can be done with suffix trees, but that's too much work. MySQL might be able to auto-do it for you, or it might not. Anyways, if you can't transform all the keys to a uniform format, can you at least transform them to a uniform group format? One that is always the same for keys that should match, but might be the same for some keys that shouldn't match(like a hash function, but without the scattering). If you can do that, you can at least divide the data to smaller groups, where matching everything to everything won't be that bad.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.