1

I am working on a php script to pull quest data from wowhead, particularly what starts and ends the quest, whether it is an item or a npc, and what its id or name is, respectively. This is the relevant portion of the whole script, with the rest involving database insertion. This is the completed snippet of code I came up with if anyone is interested. Also, seeing as this will run about 15,000 times, is this the best method of obtaining/storing the data?

<?php

$quests = array();
//$questlimit = 14987;
$questlimit = 5;
$currentquest = 1;
$questsprocessed = 0;
while($questsprocessed != $questlimit)
{
echo "<br>";
echo "  Start of iteration: ".$questsprocessed."  ";
echo "<br>";
echo "  Attempting to process quest: ".$currentquest."  ";
echo "<br>";

$quests[$currentquest] = array();
$baseurl = 'http://wowhead.com/quest=';
$fullurl = $baseurl.$currentquest;

$data = drupal_http_request($fullurl);

$queststartloc1 = strpos($data->data, 'quest_start'); 
$queststartloc2 = strpos($data->data, 'quest_end');

if($queststartloc1==false)
{$currentquest++; echo "No data for this quest"; echo "<br>"; continue;}


$questendloc1 = strpos($data->data, 'quest_end');
$questendloc2 = strpos($data->data, 'x5DDifficulty');

$startcaptureLength = $queststartloc2 - $queststartloc1;
$endcaptureLength = $questendloc2 - $questendloc1;


$quest_start_raw = substr($data->data,$queststartloc1, $startcaptureLength);
$quest_end_raw = substr($data->data, $questendloc1, $endcaptureLength);

$startDecoded = preg_replace('~\\\\x([A-Fa-f0-9]{2})~e', 'chr("0x$1")', $quest_start_raw);
$endDecoded = preg_replace('~\\\\x([A-Fa-f0-9]{2})~e', 'chr("0x$1")', $quest_end_raw);
$quests[$currentquest]['Start'] = array();
$quests[$currentquest]['End'] = array();

if(strstr($startDecoded, 'npc'))
  {
   $quests[$currentquest]['Start']['Type'] = "npc";
  preg_match('~npc=(\d+)~', $startDecoded, $startmatch);
  }
else
{
  $quests[$currentquest]['Start']['Type'] = "item";
  preg_match('~item=(\d+)~', $startDecoded, $startmatch);
}


$quests[$currentquest]['Start']['ID'] = $startmatch[1];


if(strstr($endDecoded, 'npc'))
  {
   $quests[$currentquest]['End']['Type'] = "npc";
  preg_match('~npc=(\d+)~', $endDecoded, $endmatch);
  }
else
{
  $quests[$currentquest]['End']['Type'] = "item";
  preg_match('~item=(\d+)~', $endDecoded, $endmatch);
}


$quests[$currentquest]['End']['ID'] = $endmatch[1];

//var_dump($quests[$currentquest]);

echo "  End of iteration: ".$questsprocessed."  ";
echo "<br>";
echo "  Processed quest: ".$currentquest."  ";
echo "<br>";
$currentquest++;
$questsprocessed++;

}
?>
1
  • I added in a couple checks for null data (quest does not exist), and database insertions based off the array type id. The only problem I have with this script is that it says "Connection timed out after 30 seconds" about every hour, how would I go about handling this error and just restarting the loop? Commented Apr 17, 2014 at 8:15

1 Answer 1

3

These are called "escape sequences". Normally, they're used to represent characters not printable otherwise, but can encode any character. In php, you can decode them like this:

$text = '
quest_start\\x5DStart\\x3A\\x20\\x5Bitem\\x3D16305\\x5D\\x5B\\x2Ficon\\x5D\\x5B\\x2Fli\\x5D\\x5Bli\\x5D\\x5Bicon\\x20name\\x3Dquest_end\\x5DEnd\\x3A\\x20\\x5Burl\\x3D\\x2Fnpc\\x3D12696\\x5DSenani\\x20Thunderheart\\x5B\\x2Furl\\x5D\\x5B\\x2Ficon\\x5D\\x5B\\x2Fli\\x5D\\x5Bli\\x5DNot\\x20sharable\\x5B\\x2Fli\\x5D\\x5Bli
';

$decoded = preg_replace('~\\\\x([A-Fa-f0-9]{2})~e', 'chr("0x$1")', $text);

Which gives you a string similar to this:

 quest_start]Start: [item=16305][/icon][/li][li][icon name=quest_end]End: [url=/npc=12696]Senani Thunderheart[/url][/icon][/li][li]Not sharable[/li][li

(obviously, some kind of BB-code). To remove all bbcodes, yet one replacement is necessary:

$clean = preg_replace('~(\[.+?\])+~', ' ', $decoded);
Sign up to request clarification or add additional context in comments.

2 Comments

Thank you very much for this, it will get me well on my way to getting this script done. The only problem I have with this is that the clean seems to clean out my item number. From the decoded message, I need Start: item=16305 End: npc=12696. I mentioned the npc name in my post, because I didn't see the npc ID was in there as well. The IDs are much more useful to me than the name at the moment. I saw a post in a google search where they turned a string into a table, splitting at something like name:john domain:example.com id:123, but I can't seem to find it anymore.
@user28187: you can extract the number before the cleanup with preg_match('~npc=(\d+)~', $decoded, $match)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.