PDA

View Full Version : Sort Script - Extract links, and sort (regex)



matthewchan
16 Feb 2006, 04:33 PM
I have a script that will read web file, extract the hyperlinks and sort them in alphabetical order.

It works fine, but not the way I want.

I want to change the script so that it will extract the text link as well.

//sortlinks.php



<?php
$matches= array();
$page = file_get_contents("reviews.htm");
$pattern = "/((?<=href=')).*?(?=')/i";
preg_match_all($pattern,$page,$matches);

$links[] = $matches[0];

sort($links);

$nlinks = $links[0];
sort($nlinks);

foreach ($nlinks as $l)
echo '<a href="' . $l . '">' . $l . "</a><br />\n";
?>



//reviews.htm the comments are to let you know the url of the links.
- alink // directory/alink.htm
- clink // director/clink.htm
- blink // director/blink.htm

Right now, the output I'm getting is

directory/alink.htm
directory/blink.htm
directory/clink.htm

This is what I want (as hyperlinks directing to their urls):
alink
blink
clink

I suspect we have to change something in $pattern.
I've tried a lot of things, but I keep getting errors.

I just can't come up with the logic to solve this.

Any help would be appreciated.

Thanks