Jump to content
Can't remember your login details? Read more... ×
Sign in to follow this  

PHP ripping/spidering/filtering results from another external 'HTML' page?

Recommended Posts

This is what I have so far. - http://www.photodan.com.au/mayhem.php


It just includes every query URL for every model listed on MM in Australia and I search for the town names I want (scrolling through hundreds and hundreds of profiles 20 per time on a page only to have most of them in Sydney is a pita)



I want to filter the HTML results and select each table row/whatever they're using based on keywords, I can figure this part out myself.


How can I read an external 'HTML' (it's HTML as far as my PHP script sees it) page and be able to get that data into a string to sort and then process and display it how I want server side instead of just including and displaying their HTML output?

Share this post

Link to post
Share on other sites

Unless MM, whatever that is, offers a nice XML/SOAP based method for fetching data and doing searches, then you're going to be stuck with fetching the HTML and then parsing and extracting the data.


Having a quick look at the URL you provided, if I was to do this, I'd probably go through the file one line at a time, setting placeholder variables to keep track of where you are (if necessary).


For example, there's the opening <table class="result"> tag for each result. You can use this as an indicator that you're about t oget information about a new user.


Keep going line by line, the line with class="resultname" in it has both the username and the user's id number in it that you can extract.


Then you can keep going until you get to the class="resulthead" lines, set a variable to tell your program that it should extract data during its next loop, or alternately, have it immediately read the next line inside the loop and extract the data right away.


As for the "Add to friends" type links, you can easily recreate these yourself once you have the user's id number.


You'll be interested in functions like fopen() to open the URL as an input stream, and then use fgets() to get the lines.


Make use of the preg_* and html stripping functions to extract the data yo uactually want.


You'll probably want something like this (consider this pseudo code)

$myfile = fopen("http://..../");
while ($line = fgets($myfile)) {
	// handle $line

Alternately, if you already have the file in an array, you could use a foreach loop, assuming each array element represents a single line, and then you don't need to worry about fopen and fgets.


Also, for fopen'ing a URL, you probably need to edit your .htaccess or php.ini file to set that as an option.


Good luck.

Share this post

Link to post
Share on other sites

Ah okay thanks, I've already extracted all the useful info now anyway :)

Using what method? Let us know how you went about it, it may teach people things and help out others in your position in the future.

Share this post

Link to post
Share on other sites

Well I just let the web page load, and saved it as a complete web page, then opened it and examined the entire thing, after using the searh function finding those nearby.


And for makeup artists, wardrobe people, hair stylists etc, there wasn't really that many in Australia to go through anyway :)

Share this post

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
Sign in to follow this