Python – SQLAlchethe and Multiple Databases

pythonsqlalchemy

I have an assortment of similar (but not identical) databases, and would like to use SQLAlchemy as a way to "standardize" access. The databases can differ very slightly, such as having a unique prefix on the column names, or they can differ more dramatically and be missing columns (or for old databases, missing entire tables).

What I'm looking for help on isn't so much an SQLAlchemy problem as it is a Python/Organizational one. How can I have multiple databases setup that can be reused in projects easily?

I've read about SQLAlchemy sessions, but can't see a way of using those without instantiating each of them every project.

My question is this: How can I go about making a module/package that will contain many database model setups to be used in SQLAlchemy that can be easily imported/used in another python project?

I'm not so worried about dealing with the missing columns/tables as of yet. I can tackle that problem later, but it is something that needs to be kept in mind as I can't use the exact same model for each database.

Any resources, pointers, or reading material on this topic would be truly appreciated. Thanks in advance, and I'm sorry if this has been answered elsewhere, searches didn't show anything relating to this.

EDIT: I've left the original intact, and am adding more content based on Paul's advice.

RE: SA ORM – Yes, I plan on using the SQLAlchemy ORM. For what are likely obvious reasons, I can't provide real databases. However, assume these three fictitious databases, aptly named DB1, DB2, and DB3 (we'll assume one table in each, with only a few columns, real world would have significantly more of both).

Each database has a user table with a few columns in each. The following is some SQL notation for the tables/columns:

DB1.user  --> DB1.user.id,      DB1.user.username,  DB1.user.email
DB2.user  --> DB2.user.id,      DB2.user.user_name, DB2.user.email
DB3._user --> DB3._user.userid, DB3._user.username, DB3.user.email_address

At present, I am trying to separate out these databases to 'modular', and be able to just add additional databases as I go.

I've consider a couple different file organization aspects (assume __init__.py exists where needed, but omitted for brevity's sake), including:

Databases         |    Databases            |    Databases
    DB1.py        |        DB1              |        DB1
    DB2.py        |            models.py    |            models
    DB3.py        |        DB2              |                user.py
                  |            models.py    |                anothertable.py
                  |        DB2              |        ...
                  |            models.py    |        DB3
                  |                         |            models
                  |                         |                user.py
                  |                         |                anothertable.py

I'd love to be able to access these with the SA ORM, and do so with as little importing/declarations as possible when it comes time to use these databases in a python file. Needing to do something similar to:

from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker
from Database import DB1, ..., DB[N]
db1_engine = create_engine('connect_string/db1')
db1_session = sessionmaker(bind=db1_engine)()
...
db3_engine = create_engine('connect_string/db3')
db3_session = sessionmaker(bind=db3_engine)()

would be incredibly cumbersome as I'll be dealing with far more than just three databases. I'd much rather have that already taken care of for me (in the __init__.py file, maybe?)

Being able to access and use it similar to:

import Databases

Databases.DB1.session.query('blahblah')

would be infinitely better.

EDIT2: I also know how to get around the variants in the naming conventions of the databases/columns when setting up my models. That isn't an issue, but I did mention it so that it was known I can't just use one model set for multiple databases.

I hope by expanding this I didn't muddy the waters or make this too confusing. Thanks for taking the time to read it!

EDIT3: I've managed to spend a little more time on this. I've set up the project in the following way:

Databases
    __init__.py
    databases.py
    settings.py
    DB1
        __init__.py
        models.py
    ...
    DB3
        __init__.py
        models.py

At present, I have a tupple of databases that are 'installed' in the settings.py file. Each entry is would like like INSTALLED_DATABASES = ('DB1', ..., 'DB3'). As I complete more models, and they get added to the list of tupples. This allows me to add or remove content as I go.

I have the engine and sessios setup inside the models.py file, and have the init.py file for each database setup to from models import *.

In the databases.py file I have the following

class Databases(object):
    def __init__(self):
        for database in INSTALLED_DATABASES:
            setattr(self, database, __import__(database))

I can now use these via:

from databases import Databases

db = Databases()

for qr in db.DB1.query(db.DB1.User):
    print qr.userid, qr.username

SQLAlchemy is allowing me to manually specify column names when defining the models, which is a huge bonus to the standardization I wanted.

I've got a lot of work ahead of me still. I would like to create objects that force model validation (ie, is a field present? does a non-present field have a default value? etc.) and better incorporate how this works with my IDE (it kind of doesn't at present). But I'm off to the right track. I figured I'd update this for anyone who might by chance be wondering how to do the same thing I was.

Sorry this has become so long!

Cheers!

Best Answer

As per the requests to my initial question, I've taken my third edit and made it my answer. As I'm not sure of the proper protocols, I've left the third edit in place above. If you've already read EDIT3, then you've read what I have as an answer.

I've managed to spend a little more time on this. I've set up the project in the following way:

Databases
    __init__.py
    databases.py
    settings.py
    DB1
        __init__.py
        models.py
    ...
    DB3
        __init__.py
        models.py

At present, I have a tuple of databases that are 'installed' in the settings.py file. Each DB entry is specified in INSTALLED_DATABASES = ('DB1', ..., 'DB3'). As I complete more models, they get added to the list of tuples. This allows me to add or remove content as I go.

I have the engine and sessions setup inside the models.py file, and have the __init.py__ file for each database set to from models import *.

In the databases.py file I have the following

class Databases(object):
    def __init__(self):
        for database in INSTALLED_DATABASES:
            setattr(self, database, __import__(database))

I can now use these via:

from databases import Databases

db = Databases()

for qr in db.DB1.query(db.DB1.User):
    print qr.userid, qr.username

SQLAlchemy is allowing me to manually specify column names when defining the models, which is a huge bonus to the standardization I wanted.

I've got a lot of work ahead of me still. I would like to create objects that force model validation (ie, is a field present? does a non-present field have a default value? etc.) and better incorporate how this works with my IDE (it kind of doesn't at present). But I'm off to the right track. I figured I'd update this for anyone who might by chance be wondering how to do the same thing I was.

Sorry this has become so long!

Cheers!