$ tar -tzf ~/uploaded-images.tgz wp-content/uploads/ wp-content/uploads/2013/ wp-content/uploads/2013/01/ wp-content/uploads/2013/01/masthead_web.jpg wp-content/uploads/2013/01/masthead_web-150x150.jpg wp-content/uploads/2013/01/masthead_web-300x112.jpg wp-content/uploads/2013/01/masthead_web-1000x288.jpg wp-content/uploads/2013/01/masthead_web-500x187.jpg …But what about the text? I asked the lovely Carol, who said there was no need to capture the comments and a bunch of other stuff. She did, however, want her subscribers.
Following the method shown in that other posting, I went to the database and said
mysql> show tables; +--------------------------+ | Tables_in_REDACTED | +--------------------------+ | wp_commentmeta | | wp_comments | | wp_links | | wp_options | | wp_postmeta | | wp_posts | | wp_subscribe2 | | wp_term_relationships | | wp_term_taxonomy | | wp_termmeta | | wp_terms | | wp_usermeta | | wp_users | +--------------------------+I saved the subscribers off by typing mysql> select * from wp_subscribe2; which will do, because we don't have bazillions of them; a little scripting will get all the email addresses set up as subscribers for the new site.
She doesn't care about saving the old options, or comments, so all I need to do is save the posts and pages. We'll ask the designer who did the original design to please re-do it, once I can give her access.
So, what about saving the posts? It turns out that wp_posts has both the posts and something wordpress calls "pages." My first thought was to save everything into a nice file, like this:
SELECT * INTO OUTFILE '/$HOME/posts-outfile' CHARACTER SET utf8 FROM wp_posts;which didn't work, because I don't have FILE privilege. Dang.
This worked fine though:
echo -e "use REDACTED\nSELECT post_type, post_date, \ post_modified, post_title, post_content \ FROM wp_posts where post_status = 'publish'; " \ | mysql -h REDACTED -u REDACTED -p | tee $HOME/tempfileSo now I had her posts and "pages"
I then moved everything in the "root" directory of her domain into a subdirectory where nobody will look. By "nobody will look" I mean "nobody has permission." And by "nobody has permission" I mean 0700.
Now getting the data out of the aforementioned temp file was harder than I thought. My first naive thought was to do something like
sed -n 5p TEMPFILEand just snarf'n'barf the rather long line I got. Bad idea. It almost worked, except for a few things:
- a bunch of newline characters, which were represented as "\n"; of course, HTML doesn't care about "\n" and just renders it like that....
- Some unwelcome <CR> (aka '\015' or '^M' or '\r') characters, which do a "carriage-return," putting the cursor back at the left margin but without scrolling the window. These caused a jumble of nonsense, but it was over toward the right, where you wouldn't see it without reading the whole thing through
- characters with the high-order bit set. These are things in the range 0x91—0x97, which are translated in this chart
grep '^page' published-posts-pages.out |sed -n 1p | tr -d '^M' | \ sed 's/\\n\\n/<p>/g'which was mostly okay. the bold blue ^M was entered by using <ctrl-V><ctrl-M> I think.
For the "page"s I think I just did the funky characters by hand, addressing (mostly) #1 and #2 in the one-liner. To take care of #3 (the "posts" had a lot more of these than the "pages") I wrote a little Python.
#!/usr/bin/python2.7 -utt # vim:et:sw=4 import sys the_line = sys.stdin.read() sub_table = [ [ 0x91, '‘' ], [ 0x92, '’' ], [ 0x93, '“' ], [ 0x94, '”' ], [ 0x96, '–' ], [ 0x97, '—' ], ] for old_thing, new_thing in sub_table: the_line = the_line.replace(chr(old_thing), new_thing) sys.stdout.write(the_line) sys.exit(0)I stuck this into $HOME/bin/chars.py and then, to handle a typical post, said:
$ LANG=C; grep '^post ' published-posts-pages.out | tr -d '^M' \ | sed 's/\\n\\n/<p>/g' | sed 's/\\n/<br>/g' | sed -n 12p \ | ~/bin/chars.py | pbcopyA few things about this:
- LANG=C was because I had been trying some other character sets. They didn't work.
- after translating all the "\n\n"s into "<p>", there were some single "\n"s left, which I translated into "<br>"
- chars.py of course was to do the right thing with those odd characters
- I used the Mac OS utility "pbcopy" because the terminal emulator renders unknown characters as '?'. Of course, now that I have chars.py in the pipeline, this was probably superfluous. Well, I could just CMD-v in the wordpress window to add a new post.
No comments:
Post a Comment