历经几次换域名,以及从GAE-wordpress造成不少历史遗留的图片分布在不同域名或gae目录下。
主要分布
http://oldres.fengsage.com
http://www.fengsage.com
http://fensageblog.appspot.com
原来使用代码实现图片代理访问。但自己越看越不爽。索性写个脚本一次性把原来的图片下载到wordpress本地。并且更新数据库。主要是圣诞节闲的蛋疼。
代码有不少问题:
- 不用php,因为懒得看wordpress插件制作。
- 因为是python脚本,所以必须是有执行python的权限
- 没有生成wordpress media记录。不能后台编辑多媒体
wordpress插件实现:
http://wordpress.org/extend/plugins/velvet-blues-update-urls/ 等发现的时候脚本已经写好了。悲哀~~
脚本笔记简单。分享下。
#!/usr/bin/env python # --*-- encoding:utf-8 --*-- ''' Created on 2011-12-25 @author: fred''' import MySQLdb import re import urllib import time import os WP_POST_CONTENT = u"wp_posts" SIMPLE_IMG_URL = r'' NEW_IMG_URL = u'' DOWNLOAD_PATH = u'wp-content/uploads/%s'%(time.strftime('%m/%d', time.localtime(time.time()))) REPLACE_LIST = ['http://oldres.fengsage.com','../media','http://www.fengsage.com'] conn = MySQLdb.connect(host="localhost", user="root", passwd="zhufeng", db="wordpress", charset='utf8') def get_post_list(): cursor = conn.cursor() cursor.execute("SET NAMES utf8") cursor.execute("select * from %s"%WP_POST_CONTENT); return cursor.fetchall() def find_img_urls(str): pattern = re.compile(SIMPLE_IMG_URL) match = pattern.findall(str) if match: return match return None def process_imgs(img_urls,replace): result = {} for img in img_urls: for rp in replace: if img.find(rp)>=0: print "\tdownload picture" new_img_ath = download_img(img) result.update({img:new_img_ath}) return result def download_img(img_url): IMG_NAME_REG = r'([^./]*).(png|jpg|gif|jpeg|bmp)' m = re.findall(IMG_NAME_REG, img_url) if m: name,suf = m[0] if os.path.isdir(DOWNLOAD_PATH) == False: os.makedirs(DOWNLOAD_PATH) file_path = u'%s/%s.%s'%(DOWNLOAD_PATH,name,suf) downloaded_image = file(file_path.encode('utf-8'), "wb") try: image_on_web = urllib.urlopen(img_url.encode('utf-8')) while True: buf = image_on_web.read(65536) if len(buf) == 0: break downloaded_image.write(buf) downloaded_image.close() image_on_web.close() return file_path except: print '\tdownload img failture:%s'%img_url def refresh_content(post_content,result): for k in result.keys(): post_content = post_content.replace(k,'%s/%s'%(NEW_IMG_URL,result[k])) return post_content def update_content(post_id,post_content): sql = "update "+WP_POST_CONTENT+" set post_content=%s where ID=%s" cursor = conn.cursor() cursor.execute("SET NAMES utf8") cursor.execute(sql,(post_content,post_id)) def go(): post_list = get_post_list() for post in post_list: post_id = post[0] post_content = post[4] post_title = post[5] if post_id and post_content and post_title: print "process article ID:%s"%(post_id) img_urls = find_img_urls(post_content) if img_urls: result = process_imgs(img_urls,REPLACE_LIST) if result and len(result)>0: post_content = refresh_content(post_content,result) try: update_content(post_id,post_content) except Exception, e: print 'update database failture:%s'%e if __name__ == '__main__': print go()
伟大的职位!感谢分享!