How to get and save Wordpress content (html, css, images, videos) from a Java program?
I apologize if it's not the good place to ask this question. Please if it's not help me to find where I have to ask it.
So here is my challenge. I need to get and save WordPress content (HTML, CSS, images and videos) from a Java program.
HTML, images, CSS
WordPress -----------------> File system
Then I'd like to parse this content to integrate the pages into my existing Spring web application. For example it means that
http://localhost/wp-content/image1.png
will need to change into
http://localhost/spring/image1.png
to worked into my Spring application.
I've thought about many possibilities so far. I would like to get your feedback and may be suggest other solutions that I didn't think about.
Use this little Java wget program, to get all the content from the WordPress site and then save it. Pros : It is sure it will work as its wget's goal to create site mirrors. Cons : Links won't work in my Spring application and I will need to parse the html and css codes anyway.
Use jsoup. Pros : As its a parser I can directly change the URL to integrate it in my Spring application. Cons : Can't be sure it is possible to save the content on the hard drive.
Use a WordPress plugin to export pages. Pros : Only pages related to the current change are resaved on the hard drive (it's possible to specify a folder). Cons : It's not in Java (I can't maintain the plugin). Anyway I need to watch the destination folder and each time a file change I need to reparse it to change the links to make it work in my spring application.
I didn't find other solutions and all those solutions are pretty bad. Do you think about other ways to do that?
Thank you for your help.
Answers 1
I can answer my own question. Jsoup can do what I need: