分类目录归档:boto

用S3做资源下载和P2P种子

有时候我们想分享个资源给别人,但是放在线上业务的Nginx上也不太合适,这是可以上传到S3,然后把object设置ACL为public-read,然后拼装下下载的url就可以提供给任何人下载,同时,bucket可以设置lifecycle,可以设置该bucket下的某个目录下的资源的生存天数,到期后自动被删除。

上传S3的部分就不说了,详见这个页面(python版的)
http://boto.readthedocs.org/en/latest/s3_tut.html#storing-large-data

然后我们可以设置ACL给object,然后拼出下载的url和torrent的url

import boto
c = boto.connect_s3()
b = c.get_bucket(bucket_name)
b.set_acl('public-read', object_key)
object_download_url = "http://" + bucket_name + ".s3.amazonaws.com/" + object_key
object_torrent_url = object_download_url + "?torrent"

这个S3的下载url后面加上?torrent就可以以P2P协议下载,直接点这个url会下载下来一个BT种子。

几个第三方清CloudFront缓存工具

除了web console上的清缓存的页面,我们也可以给非AWS管理员使用一些第三方的清缓存的工具
• CloudBuddy Personal – http://m1.mycloudbuddy.com/index.html
• CloudBerry Explorer – http://cloudberrylab.com
• Ylastic – http://ylastic.com
• Cyberduck – http://cyberduck.ch
• Bucket Explorer – http://www.bucketexplorer.com
• CloudFront Invalidator – http://www.swook.net/p/cloudfront-invalidator.html
• CDN Planet CloudFront Purge Tool – http://www.cdnplanet.com/tools/cloudfront-purge-tool/
CloudBerry用过S3资源管理上传下载非常方便,也可以用来管理CF资源,但是免费版,清缓存,只能一次一个,不能针对目录
Ylastic是自己开发了个AWS资源管理和真是的网站,收费的,没体验过
Cyberduck大黄鸭是个客户端的工具,跟CloudBerry类似,但是不如CB好用,也是一次清一个,不能清目录
CloudFront Invalidator是个第三方版的web版的清缓存的工具,在线使用,跟web console一样,一行一个对象,可以写多行
CDN Planet CloudFront Purge Tool做成了chrome扩展,可以清多个CDN的,也是一个对象一行
Bucket Explorer是个收费的客户端工具,也是一个对象一行,写目录也不报错,不知有无效果

或者自己写个脚本批量清缓存也行,每次请求只能最多1000个对象,再多就需要用多个请求来处理,python示例如下

import boto

distribution_id = 'xxxxxxxxxxxx'
paths = ['/path/7eleven.png', '/path/alipay.png', '/path/Braintree.png']

c = boto.connect_cloudfront()
inval_req = c.create_invalidation_request(distribution_id, paths)
print inval_req.paths
invals = c.get_invalidation_requests(distribution_id)
for inval in invals:
    print 'Object: %s, ID: %s, Status: %s' % (inval, inval.id, inval.status)

或者把对象写在文件中,一行一个,用python load进paths这个list中处理:

import sys,os

input_fle = os.sys.path[0] + os.sep + 'purge-object.txt'
f = open(input_fle,'r')
paths = []
for line in f:
    line = line.strip()
    paths.append(line)
print paths

使用HTTP POST在浏览器上传文件至S3

我们可以在http页面中构造一个预先授权的http post表单,实现在浏览器端使用POST方式上传文件到S3。表单内容如下:

<html> 
  <head>
    <title>S3 POST Form</title> 
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
  </head>

  <body> 
    <form action="https://cypay-test.s3.amazonaws.com/" method="post" enctype="multipart/form-data">
      <input type="hidden" name="key" value="upload/${filename}">
      <input type="hidden" name="AWSAccessKeyId" value="xxxxxxxx"> 
      <input type="hidden" name="acl" value="private"> 
      <input type="hidden" name="success_action_redirect" value="http://your-success-page">
      <input type="hidden" name="policy" value="xxxx">
      <input type="hidden" name="signature" value="xxxx">
      <input type="hidden" name="Content-Type" value="image/jpeg">
      <!-- Include any additional input fields here -->

      File to upload to S3: 
      <input name="file" type="file"> 
      <br> 
      <input type="submit" value="Upload File to S3"> 
    </form> 
  </body>
</html>

其中,value="upload/${filename}"为目的地址,目录会自动创建
name="success_action_redirect" value="http://your-success-page",上传成功之后的跳转页面
name="policy" value="xxxx"base64加密的policy字符串
name="signature" value="xxxx",用SECRET_KEY使用base64加密的签名
加密算法如下python版:

import base64
import hmac, hashlib

AWS_SECRET_ACCESS_KEY = r"xxxxxx"
policy_document ='''
{
  "expiration": "2020-01-01T00:00:00Z",
  "conditions": [ 
    {"bucket": "your-bucket"}, 
    ["starts-with", "$key", ""],
    {"acl": "private"},
    {"success_action_redirect": "http://your-success-page"},
    ["starts-with", "$Content-Type", ""],
    ["content-length-range", 0, 1048576000]
  ]
}
'''
policy = base64.b64encode(policy_document)
signature = base64.b64encode(hmac.new(AWS_SECRET_ACCESS_KEY, policy, hashlib.sha1).digest())
print policy
print signature

Block Device Mapping

关于创建实例的时候添加各种磁盘以及其他设置的boto使用方法,下面给个例子
一般建议字母A用作根分区,B到E用做实例存储(InstanceStore,ephemeral),F以后用作EBS, 只是个规范建议,非不这么搞也行
基本是先用BlockDeviceMapping()创建个磁盘映射的对象,然后用BlockDeviceType()来创建磁盘对象,然后参数里设置大小、类型、IOPS之类的,最后设置挂载映射和对应关系

import boto.ec2
from boto.ec2.blockdevicemapping import BlockDeviceMapping, BlockDeviceType

block_device_map = BlockDeviceMapping()
xvda = BlockDeviceType(delete_on_termination=True, size=12)
xvdb = BlockDeviceType(ephemeral_name='ephemeral0')
xvdf = BlockDeviceType(delete_on_termination=False, size=100, volume_type='gp2')
xvdg = BlockDeviceType(delete_on_termination=False, 
                       size=100, volume_type='io1', iops=1000)
block_device_map['/dev/xvda'] = xvda
block_device_map['/dev/sdb'] = xvdb
block_device_map['/dev/sdf'] = xvdf
block_device_map['/dev/sdg'] = xvdg

conn.run_instances(
    # other arguments
    block_device_map=block_device_map,
    # other arguments
    )

基于boto的几个Elastic IP的用法

原始的boto关于ElasticIP的api使用起来比较不友好,例如实例与EIP关联的时候,需要给出VPC里的EIP的allocation_id,而allocation_id要从eip的属性里找出来,disassociate eip的时候,需要提供association_id,这个id比较难找。而常规的思路是,不管是关联和解关联,只需要提供实例ID和EIP就行,因此我封装了几个函数来使EIP的使用变的稍微友好点。
(ElasticIP和PublicIP的区别,详见我的这篇文章http://imbusy.me/elastic_ip-and-public_ip.html)

1,将eip的allocation函数封装,返回IP和allocationIP的字典,供后面关联函数使用

import boto.ec2
region = 'ap-southeast-1'

def eip_allocation(domain='vpc'):
    conn = boto.ec2.connect_to_region(region)
    allocation = conn.allocate_address(domain='vpc', dry_run=False)
    return {'public_ip':allocation.public_ip,
            'allocation_id':allocation.allocation_id}

返回字典例如
{'public_ip':'54.169.xx.xx', 'allocation_id':'eipalloc-b2e3xxxx'}

2,原始的associate函数需要给出实例ID和EIP的allocation_id,那假如不是分配IP后立即使用,而是把已经存在的EIP与实例关联,这样就得通过EIP来解析出allocation_id,于是重新包装下,使其只需要实例ID和EIP即可,而eip_allocation函数正好二者都返回,随便选用。

def eip_association(instance_id,public_ip):
    '''Notice: need InstanceID and PublicIP to make association'''
    conn = boto.ec2.connect_to_region(region)
    ##make sure input ip is valid EIP
    try:
        address = conn.get_all_addresses(addresses=[public_ip])
    except boto.exception.EC2ResponseError:
        print "Error: IP not found or not EIP"
    else:
        allocation_id = address[0].allocation_id
    ## to call boto associate_address func
    conn.associate_address(
        instance_id=instance_id,
        public_ip=None,
        allocation_id=allocation_id,
        network_interface_id=None,
        private_ip_address=None,
        allow_reassociation=False,
        dry_run=False)

3,disassociate原始的函数也是需要association_id的,而正常思路是只需要提供EIP就可以将其与实例解除关联,于是重新包装了函数,只需要给出EIP就能解除关联

def eip_disassociation(public_ip):
    conn = boto.ec2.connect_to_region(region)
    '''Notice: use public_ip to get network_interface_id
    and use network_interface_id to get association_id'''
    ##make sure input ip is valid EIP
    try:
        address = conn.get_all_addresses(addresses=[public_ip])
    except boto.exception.EC2ResponseError:
        print "Error: IP not found or not EIP"
    else:
        association_id = address[0].association_id
        #network_interface_id = address[0].network_interface_id    
    conn.disassociate_address(
        public_ip=None,
        association_id=association_id,
        dry_run=False)

4,释放EIP,重新包装了下,使其不需要allocation_id,只需要给出EIP就能释放,并且会判断EIP是否被使用,只有没被使用的EIP才能被释放,当然,每个函数还有个判断输入的EIP是否为真实的EIP。

def eip_release(public_ip):
    conn = boto.ec2.connect_to_region(region)
    ##make sure input ip is valid EIP
    try:
        address = conn.get_all_addresses(addresses=[public_ip])
    except boto.exception.EC2ResponseError:
        print "Error: IP not found or not EIP"
    else:
        association_id = address[0].association_id
        allocation_id = address[0].allocation_id
    ## only release EIP that not associated to any instance
    if association_id == None:
        conn.release_address(
            public_ip=None,
            allocation_id=allocation_id)
    else:
        print "IP %s is in use, cannot be released" % public_ip

5,重新构造了个函数,用户给实例快速更换EIP,适用于爬虫服务器,因为他的IP经常被封掉。另外提一句,如果实例创建之初自动分配过Public IP的话,关联EIP之后,PublicIP也会变成与ElasticIP一样(被覆盖),等解除EIP关联之后,PublicIP才会显露,但此时会重新分配,因此PublicIP会变。

def eip_change(instance_id):
    conn = boto.ec2.connect_to_region(region)
    reservations = conn.get_all_reservations([instance_id])
    instances = reservations[0].instances
    inst = instances[0]
    old_public_ip = inst.ip_address
    ##make sure the public ip is valid EIP
    try:
        address = conn.get_all_addresses(addresses=[old_public_ip])
    except boto.exception.EC2ResponseError:
        print "Error: Old public IP not found or not EIP"
    else:
        eip_disassociation(old_public_ip)
        eip_release(old_public_ip)
    ## add new IP 
    allocation = eip_allocation()
    public_ip = allocation['public_ip']
    eip_association(instance_id,public_ip)

6,下面是主函数里的调用示例

def main():
    #usage demos
    #eip_disassociation('54.xx.xx.xx')
    #eip_association('i-xxxxxxxx','54.xx.xx.xx')
    #eip_release('54.xx.xx.xx')
    #eip_change('i-xxxxxxxx')
     
if __name__ == '__main__':
    main()