3

I am new to the whole php/mysql thing. I have a weeks worth of server logs (about 300,000 items) and I need to do some analysis. I am planning on reading them all into a mysql db and then analysing them with php.

The thing I am not sure about is how to iterate through them. Using java reading a file I would do something like this:

Scanner s = new Scanner(myfile);
while(s.hasNext()){
    String line = s.nextLine();
    ~~ Do something with this record. 
}

How do I iterate through all records in a mysql db using php? I think that something like this will take a stupid amount of memory.

    $query = "SELECT * FROM mytable";
    $result = mysql_query($query);
    $rows = mysql_num_rows($result);
    for($j = 0; $j < $rows; ++$j){
            $curIndex   = mysql_result($result,$j,"index");
            $curURL     = mysql_result($result,$j,"something");
            ~~ Do something with this record
    }

So I have added a limit to the select statement and I repeat until all records have been cycled through. Is there a more standard way to do this? Is there a built in that will do this?

while($startIndex < $numberOfRows){

    $query = "SELECT * FROM mytable ORDERBY mytable.index LIMIT $startIndex,$endIndex";
    $result = mysql_query($query);
    $rows = mysql_num_rows($result);
    for($j = 0; $j < $rows; ++$j){
            $curIndex   = mysql_result($result,$j,"index");
            $curURL     = mysql_result($result,$j,"something");
            ~~ Do something with this record
    }
    $startIndex = $endIndex + 1;
    $endIndex = $endIndes + 10;
}

5 Answers 5

5

You don't want to do a SELECT * FROM MYTABLE if your table is large, you're going to have the whole thing in memory. A trade-off between memory overhead and database calls would be to batch requests. You can get the min and max id's of rows in your table:

SELECT MIN(ID) FROM MYTABLE;
SELECT MAX(ID) FROM MYTABLE;

Now loop from minId to maxId, incrementing by say 10,000 each time. In pseudo-code:

for (int i = minId; i < maxId; i = i + 10000) {
   int x = i;
   int y = i + 10000;
   SELECT * FROM MYTABLE WHERE ID >= x AND ID < y;
}
Sign up to request clarification or add additional context in comments.

3 Comments

This is what I am doing in the third example using LIMIT, except my solution allows the results to be sorted by something other than ID.
My version is more efficient because you only pull out the rows between x and y. Using LIMIT you're pulling out everything, and then taking whatever rows your start and end ids specify (Your startIndex and endIndex ids here are NOT the primary ID of your table, but the row number of the results that have been generated by the preceeding query)
Ok, I buy that. This is just a MYSQL optimization.
4

See here:

http://www.tizag.com/mysqlTutorial/

http://www.tizag.com/mysqlTutorial/mysqlfetcharray.php

<?php
// Make a MySQL Connection
$query = "SELECT * FROM example"; 

$result = mysql_query($query) or die(mysql_error());


while($row = mysql_fetch_array($result)){
    echo $row['name']. " - ". $row['age'];
    echo "<br />";
}
?>

Depending on what you need to do with the resulting rows, you can use a different loops style, whether its 'while', 'for each' or 'for x to x'. Most of the time, a simple 'while' iteration will be great, and is efficient.

5 Comments

won't this use a crazy amount of memory? Does this have some underlying method to get things as they are needed?
Any filter should be applied to the underlying SQL, the SQL statement should produce only the required records, which PHP will then iterate through for your purposes, if you have a large dataset, think about using seperate 'pages'
Re: memory, going on the fact you need to use all the records returned (if you dont, then tweak your SQL), these are inbuilt PHP functions so likely the best approach
Ok so doing what I did in my third example is not necessary and already done under the hood?
Cool. Looks like mysql_query returns a resource. "A resource is a special variable, holding a reference to an external resource. Resources are created and used by special functions. See the appendix for a listing of all these functions and the corresponding resource types." For more info see php.net/manual/en/language.types.resource.php Could you append this to your answer please?
2

Use mysql_fetch_*

$result = mysql_query(...);
while($row = mysql_fetch_assoc($result)) {
 $curIndex = $row['index'];
}

I think that retrieves results in a "streaming" manner, rather than loading them all into memory at once. I'm not sure what exactly mysql_result does.

Side note: Since you're still new, I'd advice to get into good habits right away and immediately skip the mysql_ functions and go for PDO or at least mysqli.

3 Comments

Why is it better to use PDO or mysqli? Is this a standard or a holly war?
The general term to look up would be Object-Relational-Mapping (ORM). There are different more-or-less standards, but I'd not consider it a holy war. It's pretty natural that having OO on one side and a relational formalism on the other, that you need some mapping. And not doing it by hand is always a good thing.
@Frank: Neither PDO nor mysqli have anything to do with ORM I'm afraid, they only provide an OO-interface to the database connection, the data is still relational as always.
0

In an ideal world, PHP would generate aggregate queries, send them to MySQL, and only get a small number of rows in return. For instance, if you're counting the number of log items of each severity between two dates:

SELECT COUNT(*), severity 
FROM logs
WHERE date < ? AND date > ?
GROUP BY severity

Doing the work on the PHP side is quite unusual. If you find out that you have needs too complex for SQL queries to handle (which, given that you have control over your database structure, leaves you with a lot of freedom), a better option would be to move to a Map-Reduce database engine like CouchDB.

Comments

0

I strongly believe the batch processing with Doctrine or any kind of iterations with MySQL (PDO or mysqli) are just an illusion.

@dimitri-k provided a nice explanation especially about unit of work. The problem is the miss leading: "$query->iterate()" which doesn't really iterate over the data source. It's just an \Traversable wrapper around already fully fetched data source.

An example demonstrating that even removing Doctrine abstraction layer completely from the picture, we will still run into memory issues:

echo 'Starting with memory usage: ' . memory_get_usage(true) / 1024 / 1024 . " MB \n";

$pdo  = new \PDO("mysql:dbname=DBNAME;host=HOST", "USER", "PW");
$stmt = $pdo->prepare('SELECT * FROM my_big_table LIMIT 100000');
$stmt->execute();

while ($rawCampaign = $stmt->fetch()) {
    // echo $rawCampaign['id'] . "\n";
}

echo 'Ending with memory usage: ' . memory_get_usage(true) / 1024 / 1024 . " MB \n";

Output:

Starting with memory usage: 6 MB 
Ending with memory usage: 109.46875 MB

Here, the disappointing getIterator() method:

namespace Doctrine\DBAL\Driver\Mysqli\MysqliStatement

/**
 * {@inheritdoc}
 */
public function getIterator()
{
    $data = $this->fetchAll();

    return new \ArrayIterator($data);
}

You can use my little library to actually stream heavy tables using PHP Doctrine or DQL or just pure SQL. However you find appropriate: https://github.com/EnchanterIO/remote-collection-stream

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.