I am trying to grab some content from a website.
content is in HTML.
so my question is.
1. can we convert HTML page to XML page.
2. if not XML RSS feed.
if none of it could be achived then any method which can let me extract content based on class id of div tags.
You can use php. Something like:
You just need to change the <div> </div> tags to whatever the content you are trying to scape is within. Also, I am not 100% sure on the regular expression I used.PHP Code:
$contents = file_get_contents("http://www.theurl.com");
preg_match("/<div>(.+?)<\/div>/", $contents, $match);
I created something like this to scape the posts from NetBuilders the other day for a small project I'm developing