1

I have a long array (1x75000 !) of string data. In this array, there are repeated strings. i want to find the array indices and the number of each repeating string. E.g.

A=['abc' 'efg' 'hij' 'abc' 'hij' 'efg' 'klm']; the answer should be: 2 times 'abc' at array indices 1, 4 2 times 'efg' at array indices 2, 6 2 times 'hij' at array indices 3, 5 1 time 'klm' at array indices 7

notice the large size of the array (1x75000)

1 Answer 1

0

This code should work:

<?php

$array = array('abc','wrerwe','wrewer','abc');
$out = array();

foreach ($array as $key => $value) {
    if (!isset($out[$value]))  {
        $out[$value]['nr'] = 0;
        $out[$value]['index'] = array();        
    }
    ++$out[$value]['nr'] ;
    $out[$value]['index'][] = $key;
}


foreach ($out as $k => $v) {
    echo "item ".$k." repeats ".$v['nr'].' times at positions: ';
    echo implode(', ', $v['index']);
    echo "<br />";
}

But so far I haven't tested in on such big array. In fact I don't think you should operate on such big arrays. You should rather divide it on smaller arrays.

I've tested it on 75000 array using code ( source for generating random string from How to create a random string using PHP? ) :

<?php

$array = randomTexts(75000);
$out =  array();

foreach ($array as $key => $value) {
    if (!isset($out[$value]))  {
        $out[$value]['nr'] = 0;
        $out[$value]['index'] = array();        
    }
    ++$out[$value]['nr'] ;
    $out[$value]['index'][] = $key;
}


foreach ($out as $k => $v) {
    echo "item ".$k." repeats ".$v['nr'].' times at positions: ';
    echo implode(', ', $v['index']);
    echo "<br />";
}


function randomTexts($nr) {
    $out = array();
    $validString = 'abddefghihklmnopqrstuvwzyx';
    for ($i=0; $i< $nr; ++$i) {
        $len = mt_rand(5,10);        
           $out[] = get_random_string($validString, $len);
    }
    return $out;
}


function get_random_string($valid_chars, $length)
{
    // start with an empty random string
    $random_string = "";

    // count the number of chars in the valid chars string so we know how many choices we have
    $num_valid_chars = strlen($valid_chars);

    // repeat the steps until we've created a string of the right length
    for ($i = 0; $i < $length; $i++)
    {
        // pick a random number from 1 up to the number of valid chars
        $random_pick = mt_rand(1, $num_valid_chars);

        // take the random character out of the string of valid chars
        // subtract 1 from $random_pick because strings are indexed starting at 0, and we started picking at 1
        $random_char = $valid_chars[$random_pick-1];

        // add the randomly-chosen char onto the end of our string so far
        $random_string .= $random_char;
    }

    // return our finished random string
    return $random_string;
}

It also seems to work but it takes a few seconds

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.