Results 1 to 6 of 6

Thread: html to xml or rss

  1. #1
    Join Date
    Apr 2010
    Location
    india
    Posts
    338

    html to xml or rss

    Hi All,

    I am trying to grab some content from a website.

    content is in HTML.

    so my question is.

    1. can we convert HTML page to XML page.
    2. if not XML RSS feed.

    if none of it could be achived then any method which can let me extract content based on class id of div tags.

  2. #2
    Join Date
    Apr 2010
    Location
    india
    Posts
    338
    Quote Originally Posted by JohnKSmith View Post
    Hi Dude

    Its simple open source tool, HTML Tidy. This conversion is useful for webmasters who are migrating to XML. It can also help XML converts who have to interface with legacy HTML tools.
    I have a question to you
    How we can translating XML documents to X12?

    I am not looking for manual conversion.

    I am looking for automatic conversion based on certain condition.

    where as HTML Tidy as far as i have used is a beautifier

  3. #3
    You can use php. Something like:
    PHP Code:
    $contents file_get_contents("http://www.theurl.com");
    preg_match("/<div>(.+?)<\/div>/"$contents$match);
    echo 
    $match[1]; 
    You just need to change the <div> </div> tags to whatever the content you are trying to scape is within. Also, I am not 100% sure on the regular expression I used.

    I created something like this to scape the posts from NetBuilders the other day for a small project I'm developing

  4. #4
    Join Date
    Apr 2010
    Location
    india
    Posts
    338
    Quote Originally Posted by stickycarrots View Post
    You can use php. Something like:
    PHP Code:
    $contents file_get_contents("http://www.theurl.com");
    preg_match("/<div>(.+?)<\/div>/"$contents$match);
    echo 
    $match[1]; 
    You just need to change the <div> </div> tags to whatever the content you are trying to scape is within. Also, I am not 100% sure on the regular expression I used.

    I created something like this to scape the posts from NetBuilders the other day for a small project I'm developing
    thanks for the code.

    i am also not that good at regexp let me check this code.

    EDIT :

    just a simple test yielded that this will not work for me.

    this will not work for nested div tags

  5. #5
    Quote Originally Posted by anantshri View Post
    thanks for the code.

    i am also not that good at regexp let me check this code.

    EDIT :

    just a simple test yielded that this will not work for me.

    this will not work for nested div tags
    What is the website and what is the content you want to grab?

  6. #6
    Join Date
    Apr 2010
    Location
    india
    Posts
    338
    website is flipkart.com

    content is the search page.

    and i am getting down to it with the scrapping features of

    PHP Simple HTML DOM Parser

Similar Threads

  1. [WTS] PSD to HTML / CSS for just $20
    By perfectblue in forum Services
    Replies: 0
    Last Post: 9 February, 2011, 08:07 AM
  2. HTML
    By Sbfc_ in forum Community Building
    Replies: 8
    Last Post: 6 April, 2010, 17:13 PM
  3. [WTH] psd to web 2.0 html
    By chatterbox in forum Services
    Replies: 2
    Last Post: 5 December, 2009, 14:19 PM
  4. HTML 5
    By Duegar in forum Programming
    Replies: 8
    Last Post: 14 January, 2009, 03:58 AM
  5. [WTS] Charm HTML PSD to HTML service
    By yangyang in forum Services
    Replies: 0
    Last Post: 10 December, 2008, 02:36 AM

Tags for this Thread

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •