How to get and save Wordpress content (html, css, images, videos) from a Java program?

I apologize if it's not the good place to ask this question. Please if it's not help me to find where I have to ask it.

So here is my challenge. I need to get and save WordPress content (HTML, CSS, images and videos) from a Java program.

          HTML, images, CSS
WordPress -----------------> File system

Then I'd like to parse this content to integrate the pages into my existing Spring web application. For example it means that


will need to change into


to worked into my Spring application.

I've thought about many possibilities so far. I would like to get your feedback and may be suggest other solutions that I didn't think about.

  1. Use this little Java wget program, to get all the content from the WordPress site and then save it. Pros : It is sure it will work as its wget's goal to create site mirrors. Cons : Links won't work in my Spring application and I will need to parse the html and css codes anyway.

  2. Use jsoup. Pros : As its a parser I can directly change the URL to integrate it in my Spring application. Cons : Can't be sure it is possible to save the content on the hard drive.

  3. Use a WordPress plugin to export pages. Pros : Only pages related to the current change are resaved on the hard drive (it's possible to specify a folder). Cons : It's not in Java (I can't maintain the plugin). Anyway I need to watch the destination folder and each time a file change I need to reparse it to change the links to make it work in my spring application.

I didn't find other solutions and all those solutions are pretty bad. Do you think about other ways to do that?

Thank you for your help.

Answers 1

  • I can answer my own question. Jsoup can do what I need:

    1. Extract the content (tested)
    2. Change the link for Spring (tested)
    3. Save HTML (tested), images (tested), videos (not yet tested)

Related Questions