[repost from Stephen Forte's Blog]
See also:
In
Part I we looked at the advantages of building a data warehouse
independent of cubes/a BI system and in Part II we looked at how to
architect a data warehouse’s table schema. Today we are going to look at
where to put your data warehouse tables.
Let’s look at the location of your data warehouse. Usually as your system matures, it follows this pattern:
- Segmenting your data warehouse tables into their own isolated schema inside of the OLTP database
- Moving the data warehouse tables to their own physical database
- Moving the data warehouse database to its own hardware
When
you bring a new system online, or start a new BI effort, to keep things
simple you can put your data warehouse tables inside of your OLTP
database, just segregated from the other tables. You can do this a
variety of ways, most easily is using a database schema (ie dbo), I
usually use dwh as the schema. This way it is easy for your application
to access these tables as well as fill them and keep them in sync. The
advantage of this is that your data warehouse and OLTP system is
self-contained and it is easy to keep the systems in sync.
As
your data warehouse grows, you may want to isolate your data warehouse
further and move it to its own database. This will add a small amount of
complexity to the load and synchronization, however, moving the data
warehouse tables to their own table brings some benefits that make the
move worth it. The benefits include implementing a separate security
scheme. This is also very helpful if your OLTP database scheme locks
down all of the tables and will not allow SELECT access and you don’t
want to create new users and roles just for the data warehouse. In
addition, you can implement a separate backup and maintenance plan, not
having your date warehouse tables, which tend to be larger, slow down
your OLTP backup (and potential restore!). If you only load data at
night, you can even make the data warehouse database read only. Lastly,
while minor, you will have less table clutter, making it easier to work
with.
Once your system grows even further, you can isolate the
data warehouse onto its own hardware. The benefits of this are huge, you
can have less I/O contention on the database server with the OLTP
system. Depending on your network topology, you can reduce network
traffic. You can also load up on more RAM and CPUs. In addition you can
consider different RAID array techniques for the OLTP and data warehouse
servers (OLTP would be better with RAID 5, data warehouse RAID 1.)
Once
you move your data warehouse to its own database or its own database
server, you can also start to replicate the data warehouse. For example,
let’s say that you have an OLTP that works worldwide but you have
management in offices in different parts of the world. You can reduce
network traffic by having all reporting (and what else do managers do??)
run on a local network against a local data warehouse. This only works
if you don’t have to update the date warehouse more than a few times a
day.
Where you put your data warehouse is important, I suggest that you start small and work your way up as the needs dictate.