PDA

View Full Version : Help with SimpleXML/PHP Web Scraping



MattTrout
27 Oct 2009, 10:22 AM
Hey everyone, hopefully someone can help me out. I am building an iphone app and doing some web scraping to get an array of data. I am then using SimpleXML to parse it and save it as an XML file which will then be imported into the iPhone app for use in a tableview....it's nothing more than a team schedule.

My array is large (about 120 nodes each with subnodes). Basically I want to structure the XML like this...


<Schedule>
<Month name="October 2009">
<Game>
<Date>
<HomeTeam>
<AwayTeam>
<Time>
<Result>
</Game>
</Month
</Schedule>

The way the array is, you almost have to know what node the November, December, etc. nodes are (for example)....


$arr = Array
(
'0' => Array( '0' => 'October 2009', ),
'1' => Array( '0' => 'Date',
'1' => 'Visitor',
'2' => 'Home',
'3' => 'Time (ET)',
'4' => 'TV Network/Results', ),
'2' => Array( '0' => 'Fri Oct 2, 2009',
'1' => 'Away Team',
'2' => 'Home Team',
'3' => '7:30 PM',
'4' => 'Away Team (2) Home Team (4)',
'5' => '', ),
'3' => Array( '0' => 'Sat Oct 3, 2009',
'1' => 'Away Team-3',
'2' => 'Home Team-3',
'3' => '11:30 AM',
'4' => 'Away Team (3) Home Team (0)',
'5' => '', ),
'4' => Array( '0' => 'Sun Oct 4, 2009',
'1' => 'Away Team-4',
'2' => 'Home Team-4',
'3' => '1:30 PM',
'4' => 'Away Team (1) Home Team (16)',
'5' => '', ),
'5' => Array( '0' => 'November 2009', ),
'6' => Array( '0' => 'Fri Nov 2, 2009',
'1' => 'Away Team',
'2' => 'Home Team',
'3' => '7:30 PM',
'4' => 'Away Team (2) Home Team (4)',
'5' => '', ),
'7' => Array( '0' => 'Sat Nov 3, 2009',
'1' => 'Away Team-3',
'2' => 'Home Team-3',
'3' => '11:30 AM',
'4' => 'Away Team (3) Home Team (0)',
'5' => '', ),
'8' => Array( '0' => 'Sun Nov 4, 2009',
'1' => 'Away Team-4',
'2' => 'Home Team-4',
'3' => '1:30 PM',
'4' => 'Away Team (1) Home Team (16)',
'5' => '', ),

);

I can almost get it to work, I just can't figure out how to use switch or if statements to say for example IF the node is equal to 5, make a new <Month> tag and set the name attribute to whatever the value of $arr[5][0]. The code below just adds all the games under the October <Month> tag, but where the data for $arr[5] is for instance just disappears. Then at the end of all the <Game> tags, the code just adds closing </Month> tags. Really irritating haha.

Sorry for long post, but need some help and would really appreciate it! Thanks!


// print_r($arr);

$xml = '<?xml version="1.0" encoding="UTF-8"?>';
$xml .= "
<Schedule>

</Schedule>";

$obj = new SimpleXMLElement($xml);

//$obj->Month["name"] = $arr[0][0];

foreach ($arr as $ptr => $nothing)
{
if ($ptr == 0)
{
$Month = $obj->addChild('Month');
$Month->addAttribute('Name', $arr[$ptr][0]);
}

if ($ptr == 1)
{
continue;
}

else if($ptr == 16)
{
$Month = $obj->parent->addChild('Month');
$Month->addAttribute('Name', $arr[$ptr][0]);

}

else
{


$game = $obj->Month->addChild('Game');
$game->addChild('Date', $arr[$ptr][0]);
$game->addChild('Visitor', $arr[$ptr][1]);
$game->addChild('Home', $arr[$ptr][2]);
$game->addChild('Time', $arr[$ptr][3]);
$game->addChild('Results', $arr[$ptr][4]);
}
}

$new = $obj->asXML();

$new = str_replace('><', ">\n<", $new);
$new = str_replace('&nbsp', "", $new);
$dom = new DOMDocument('1.0');
$dom->preserveWhiteSpace = true;
$dom->formatOutput = true;
$dom->loadXML($new);
echo $dom->saveXML();
$dom->save("Schedule.xml");