Results 1 to 6 of 6

Thread: html to xml or rss

  1. #1
    anantshri is offline on leave from Net Builders : will post rarely
    Join Date
    Apr 2010
    Location
    india
    Posts
    338
    Thanks
    80
    Thanked 47 Times in 40 Posts

    html to xml or rss

    Hi All,

    I am trying to grab some content from a website.

    content is in HTML.

    so my question is.

    1. can we convert HTML page to XML page.
    2. if not XML RSS feed.

    if none of it could be achived then any method which can let me extract content based on class id of div tags.

  2. #2
    anantshri is offline on leave from Net Builders : will post rarely
    Join Date
    Apr 2010
    Location
    india
    Posts
    338
    Thanks
    80
    Thanked 47 Times in 40 Posts
    Quote Originally Posted by JohnKSmith View Post
    Hi Dude

    Its simple open source tool, HTML Tidy. This conversion is useful for webmasters who are migrating to XML. It can also help XML converts who have to interface with legacy HTML tools.
    I have a question to you
    How we can translating XML documents to X12?

    I am not looking for manual conversion.

    I am looking for automatic conversion based on certain condition.

    where as HTML Tidy as far as i have used is a beautifier

  3. #3
    stickycarrots's Avatar
    stickycarrots is offline Experienced Net Builder
    Join Date
    Dec 2008
    Location
    QuickInbox.com
    Posts
    753
    Blog Entries
    6
    Thanks
    18
    Thanked 86 Times in 59 Posts
    You can use php. Something like:
    PHP Code:
    $contents file_get_contents("http://www.theurl.com");
    preg_match("/<div>(.+?)<\/div>/"$contents$match);
    echo 
    $match[1]; 
    You just need to change the <div> </div> tags to whatever the content you are trying to scape is within. Also, I am not 100% sure on the regular expression I used.

    I created something like this to scape the posts from NetBuilders the other day for a small project I'm developing

  4. #4
    anantshri is offline on leave from Net Builders : will post rarely
    Join Date
    Apr 2010
    Location
    india
    Posts
    338
    Thanks
    80
    Thanked 47 Times in 40 Posts
    Quote Originally Posted by stickycarrots View Post
    You can use php. Something like:
    PHP Code:
    $contents file_get_contents("http://www.theurl.com");
    preg_match("/<div>(.+?)<\/div>/"$contents$match);
    echo 
    $match[1]; 
    You just need to change the <div> </div> tags to whatever the content you are trying to scape is within. Also, I am not 100% sure on the regular expression I used.

    I created something like this to scape the posts from NetBuilders the other day for a small project I'm developing
    thanks for the code.

    i am also not that good at regexp let me check this code.

    EDIT :

    just a simple test yielded that this will not work for me.

    this will not work for nested div tags

  5. #5
    stickycarrots's Avatar
    stickycarrots is offline Experienced Net Builder
    Join Date
    Dec 2008
    Location
    QuickInbox.com
    Posts
    753
    Blog Entries
    6
    Thanks
    18
    Thanked 86 Times in 59 Posts
    Quote Originally Posted by anantshri View Post
    thanks for the code.

    i am also not that good at regexp let me check this code.

    EDIT :

    just a simple test yielded that this will not work for me.

    this will not work for nested div tags
    What is the website and what is the content you want to grab?

  6. #6
    anantshri is offline on leave from Net Builders : will post rarely
    Join Date
    Apr 2010
    Location
    india
    Posts
    338
    Thanks
    80
    Thanked 47 Times in 40 Posts
    website is flipkart.com

    content is the search page.

    and i am getting down to it with the scrapping features of

    PHP Simple HTML DOM Parser

Similar Threads

  1. [WTS] PSD to HTML / CSS for just $20
    By perfectblue in forum Services
    Replies: 0
    Last Post: 9 February, 2011, 07:07 AM
  2. HTML
    By Sbfc_ in forum Community Building
    Replies: 8
    Last Post: 6 April, 2010, 17:13 PM
  3. [WTH] psd to web 2.0 html
    By chatterbox in forum Services
    Replies: 2
    Last Post: 5 December, 2009, 13:19 PM
  4. HTML 5
    By Duegar in forum Programming
    Replies: 8
    Last Post: 14 January, 2009, 02:58 AM
  5. [WTS] Charm HTML PSD to HTML service
    By yangyang in forum Services
    Replies: 0
    Last Post: 10 December, 2008, 01:36 AM

Tags for this Thread

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •