GAE NDB配额优化

last modified : 2016-11-29 | published: 2015-05-27 | category:

GAE的数据库额度存在3个关键:

  1. Small Op目前是免费的,keys_only=True可以随便用。
  2. get()和get_multi()查询会被自动memcache。
  3. indexed会倍增write Op

NOTE1:提取单条数据,使用get_by_key_name(),而不是fetch(1) / first()

user = User.query(User.username = "tom").first()

替换为

user = User.get_by_key_name("tom")

原方法会消耗1 Fetch Op + 1 Query Op = 2 read Op,修改后,会产生1 Small Op + 1 Read Op,而且这个Read Op会被自动cache。

NOTE2:提取多条数据时,使用keys_only + get_multi()

比如一个表有,我想一次取出N条数据时,常规ORM的写法:

feeds = Feed.query().fetch(N)

每次查询,都会消耗1+N Read Op,为了优化额度,可以修改成:

q = Feed.query()
feeds = ndb.get_multi(q.fetch(N,keys_only=True))

首次查询,消耗1 Small Op + N Read Op,但是在重复查询是,则只消耗1 Small Op + m*N Read Op,m是memcache未命中的概率,理想情况是0。

至于性能,可以参看这里,大概75%缓存命中是性能的分界线。

Memcache hit ratio: 100% (everything was in cache)

  Query for entities:              3755 ms
  Query/memcache/ndb:              3239 ms
    Keys-only query:       834 ms
    Memcache.get_multi:   2387 ms
    ndb.get_mutli:           0 ms

Memcache hit ratio: 75%

  Query for entities:              3847 ms
  Query/memcache/ndb:              3928 ms
    Keys-only query:       859 ms
    Memcache.get_multi:   1564 ms
    ndb.get_mutli:        1491 ms

Memcache hit ratio: 50%

  Query for entities:              3507 ms
  Query/memcache/ndb:              5170 ms
    Keys-only query:       825 ms
    Memcache.get_multi:   1061 ms
    ndb.get_mutli:        3168 ms

Memcache hit ratio: 25%

  Query for entities:              3799 ms
  Query/memcache/ndb:              6335 ms
    Keys-only query:       835 ms
    Memcache.get_multi:    486 ms
    ndb.get_mutli:        4875 ms

Memcache hit ratio: 0% (no memcache hits)

  Query for entities:              3828 ms
  Query/memcache/ndb:              8866 ms
    Keys-only query:       836 ms
    Memcache.get_multi:     13 ms
    ndb.get_mutli:        8012 ms</pre>

NOTE3:尽可能的禁用索引。

对于下面这个EntryCollect对象

class EntryCollect(ndb.Model):
    published = ndb.DateTimeProperty()
    need_collect_word = ndb.BooleanProperty(default=True, indexed=False)
    key_word = ndb.StringProperty(repeated=True, indexed=False)

in(List)的查询:

keys = EntryCollect.query().order(-EntryCollect.published)
entrys = ndb.get_multi(keys.fetch(PER_PAGE*2, keys_only=True))
new_entry = []
for entry in entrys:
    if keyword.decode('utf-8') in entry.key_word:
        new_entry.append(entry)

list.IN(other_list)的查询:

keys = EntryCollect.query().order(-EntryCollect.published)
entrys = ndb.get_multi(keys.fetch(PER_PAGE*2, keys_only=True))
top_entry = []
for entry in entrys:
    if set(other_list).intersection(set(entry.key_word)):
        top_entry.append(entry)

Boolean的字段:

keys = EntryCollect.query().order(-EntryCollect.published)
entrys = ndb.get_multi(kesy.fetch(CONT*2, keys_only=True))
for entry in entrys:
    if entry.need_collect_word:
        # do something

NOTE4:projected()的利弊权衡

这里就有个权衡,如果read Op紧张,write Op富裕,那么就可以使用projected()。

NOTE5:使用Memcache

TextProperty 和 StringProperty的区别