html to xml or rss
I am trying to grab some content from a website.
content is in HTML.
so my question is.
1. can we convert HTML page to XML page.
2. if not XML RSS feed.
if none of it could be achived then any method which can let me extract content based on class id of div tags.
Originally Posted by JohnKSmith
I am not looking for manual conversion.
I am looking for automatic conversion based on certain condition.
where as HTML Tidy as far as i have used is a beautifier
You can use php. Something like:
You just need to change the <div> </div> tags to whatever the content you are trying to scape is within. Also, I am not 100% sure on the regular expression I used.
$contents = file_get_contents("http://www.theurl.com");
preg_match("/<div>(.+?)<\/div>/", $contents, $match);
I created something like this to scape the posts from NetBuilders the other day for a small project I'm developing :)
thanks for the code.
Originally Posted by stickycarrots
i am also not that good at regexp let me check this code.
just a simple test yielded that this will not work for me.
this will not work for nested div tags
What is the website and what is the content you want to grab?
Originally Posted by anantshri
website is flipkart.com
content is the search page.
and i am getting down to it with the scrapping features of
PHP Simple HTML DOM Parser