How to get and save Wordpress content (html, css, images, videos) from a Java program?

I apologize if it's not the good place to ask this question. Please if it's not help me to find where I have to ask it.

So here is my challenge. I need to get and save WordPress content (HTML, CSS, images and videos) from a Java program.

          HTML, images, CSS
WordPress -----------------> File system

Then I'd like to parse this content to integrate the pages into my existing Spring web application. For example it means that

http://localhost/wp-content/image1.png

will need to change into

http://localhost/spring/image1.png

to worked into my Spring application.

I've thought about many possibilities so far. I would like to get your feedback and may be suggest other solutions that I didn't think about.

  1. Use this little Java wget program, to get all the content from the WordPress site and then save it. Pros : It is sure it will work as its wget's goal to create site mirrors. Cons : Links won't work in my Spring application and I will need to parse the html and css codes anyway.

  2. Use jsoup. Pros : As its a parser I can directly change the URL to integrate it in my Spring application. Cons : Can't be sure it is possible to save the content on the hard drive.

  3. Use a WordPress plugin to export pages. Pros : Only pages related to the current change are resaved on the hard drive (it's possible to specify a folder). Cons : It's not in Java (I can't maintain the plugin). Anyway I need to watch the destination folder and each time a file change I need to reparse it to change the links to make it work in my spring application.

I didn't find other solutions and all those solutions are pretty bad. Do you think about other ways to do that?

Thank you for your help.

Answers 1

  • I can answer my own question. Jsoup can do what I need:

    1. Extract the content (tested)
    2. Change the link for Spring (tested)
    3. Save HTML (tested), images (tested), videos (not yet tested)

Related Questions