Model Evolution and Migrations

One of the most irritating parts of maintaining an application for a while is the need to do data migrations from one version of the schema to another. While Ming can’t completely remove the pain of migrations, it does seek to make migrations as simple as possible.

Performing Migrations

First of all let’s populate our database with some stray data that needs to be migrated:

>>> import random
>>> TAGS = ['foo', 'bar', 'snafu', 'mongodb']
>>> 
>>> # Insert the documents through PyMongo so that Ming is not involved
>>> session.db.wiki_page.insert([
...     dict(title='Page %s' % idx, text='Text of Page %s' %idx, tags=random.sample(TAGS, 2)) for idx in range(10)
... ])
[ObjectId('64135757bd1c2ce4e0454dfe'), ObjectId('64135757bd1c2ce4e0454dff'), ObjectId('64135757bd1c2ce4e0454e00'), ObjectId('64135757bd1c2ce4e0454e01'), ObjectId('64135757bd1c2ce4e0454e02'), ObjectId('64135757bd1c2ce4e0454e03'), ObjectId('64135757bd1c2ce4e0454e04'), ObjectId('64135757bd1c2ce4e0454e05'), ObjectId('64135757bd1c2ce4e0454e06'), ObjectId('64135757bd1c2ce4e0454e07')]
>>> 
>>> session.db.wiki_page.find_one()
{'_id': ObjectId('64135757bd1c2ce4e0454dfe'), 'title': 'Page 0', 'text': 'Text of Page 0', 'tags': ['bar', 'mongodb']}

Suppose we decided that we want to gather metadata of the pages in a metadata property, which will contain the categories and tags of the page. We might write our new schema as follows:

class WikiPage(MappedClass):
    class __mongometa__:
        session = session
        name = 'wiki_page'

    _id = FieldProperty(schema.ObjectId)
    title = FieldProperty(schema.String(required=True))
    text = FieldProperty(schema.String(if_missing=''))

    metadata = FieldProperty(schema.Object({
        'tags': schema.Array(schema.String),
        'categories': schema.Array(schema.String)
    }))

But now if we try to .find() things in our database, our metadata has gone missing:

>>> WikiPage.query.find().first()
<WikiPage _id=ObjectId('64135757bd1c2ce4e0454dfe')
  title='Page 0' text='Text of Page 0' metadata=I{'tags':
  [], 'categories': []}>

What we need now is a migration. Luckily, Ming makes migrations manageable.

First of all we need to declare the previous schema so that Ming knows how to validate the old values (previous versions schemas are declared using the Ming Foundation Layer as they are not tracked by the UnitOfWork or IdentityMap):

from ming import collection, Field

OldWikiPageCollection = collection('wiki_page', session,
    Field('_id', schema.ObjectId),
    Field('title', schema.String),
    Field('text', schema.String),
    Field('tags', schema.Array(schema.String))
)

Whenever Ming fetches a document from the database it will validate it against our model schema.

If the validation fails it will check the document against the previous version of the schema (provided as __mongometa__.version_of) and if validation passes the __mongometa__.migrate function is called to upgrade the data.

So, to be able to upgrade our data, all we need to do is include the previous schema, and a migration function in our __mongometa__:

class WikiPage(MappedClass):
    class __mongometa__:
        session = session
        name = 'wiki_page'
        version_of = OldWikiPageCollection

        @staticmethod
        def migrate(data):
            result = dict(data, metadata={'tags': data['tags']}, _version=1)
            del result['tags']
            return result

    _id = FieldProperty(schema.ObjectId)
    title = FieldProperty(schema.String(required=True))
    text = FieldProperty(schema.String(if_missing=''))

    _version = FieldProperty(1, required=True)

    metadata = FieldProperty(schema.Object({
        'tags': schema.Array(schema.String),
        'categories': schema.Array(schema.String)
    }))

Then to force the migration we also added a _version property which passes validation only when its value is 1 (Using schema.Value). As old models do not provide a _version field they won’t pass validation and so they will trigger the migrate process:

>>> WikiPage.query.find().limit(3).all()
[<WikiPage _id=ObjectId('64135757bd1c2ce4e0454dfe')
  title='Page 0' text='Text of Page 0' _version=1
  metadata=I{'categories': [], 'tags': ['bar', 'mongodb']}>, <WikiPage _id=ObjectId('64135757bd1c2ce4e0454dff')
  title='Page 1' text='Text of Page 1' _version=1
  metadata=I{'categories': [], 'tags': ['bar', 'mongodb']}>, <WikiPage _id=ObjectId('64135757bd1c2ce4e0454e00')
  title='Page 2' text='Text of Page 2' _version=1
  metadata=I{'categories': [], 'tags': ['foo', 'mongodb']}>]

And that’s it.

Lazy Migrations

Migrations are performed lazily as the objects are loaded from the database, so you only pay the cost of migration the data you access. Also the migrated data is not saved back on the database unless the object is modified. This can be easily seen by querying documents directly through pymongo as on mongodb they still have tags outside of metadata:

>>> next(session.db.wiki_page.find())
{'_id': ObjectId('64135757bd1c2ce4e0454dfe'), 'title': 'Page 0', 'text': 'Text of Page 0', 'tags': ['bar', 'mongodb']}

Eager Migrations

If, unlike for lazy migrations, you wish to migrate all the objects in a collection, and save them back you can use the migrate function available on the foundation layer manager:

>>> next(session.db.wiki_page.find()).get('tags')
['bar', 'mongodb']
>>> 
>>> from ming.odm import mapper
>>> mapper(WikiPage).collection.m.migrate()
>>> 
>>> next(session.db.wiki_page.find()).get('metadata')
{'categories': [], 'tags': ['bar', 'mongodb']}

That will automatically migrate all the documents in the collection one by one.

Chained Migrations

If you evolved your schema multiple times you can chain migrations by adding a version_of to all the previous versions of the data:

class MyModel(MappedClass):
    class __mongometa__:
        session = session
        name = 'mymodel'
        version_of = collection('mymodel', session,
            Field('_id', schema.ObjectId),
            Field('name', schema.String),
            Field('_version', schema.Value(1, required=True)),
            version_of=collection('mymodel', session,
                Field('_id', schema.ObjectId),
                Field('name', schema.String),
            ),
            migrate=lambda data: dict(_id=data['_id'], name=data['name'].upper(), _version=1)
        )

        @staticmethod
        def migrate(data):
            return dict(_id=data['_id'], name=data['name'][::-1], _version=2)

    _id = FieldProperty(schema.ObjectId)
    name = FieldProperty(schema.String(required=True))

    _version = FieldProperty(2, required=True)

Then just apply all the migrations as you normally would:

>>> session.db.mymodel.insert(dict(name='desrever'))
[ObjectId('64135757bd1c2ce4e0454e08')]
>>> session.db.mymodel.find_one()
{'_id': ObjectId('64135757bd1c2ce4e0454e08'), 'name': 'desrever'}
>>> 
>>> # Apply migration to version 1 and then to version 2
>>> mapper(MyModel).collection.m.migrate()
>>> 
>>> session.db.mymodel.find_one()
{'_id': ObjectId('64135757bd1c2ce4e0454e08'), '_version': 2, 'name': 'REVERSED'}

The resulting documented changed name from "desrever" to "REVERSED" that is because _version=1 forced the name to be uppercase and then _version=2 reversed it.

Note

When migrating make sure you always bring forward the _id value in the old data, or you will end up with duplicated data for each migration step as a new id would be generated for newly migrated documents.