10 October 2011

Change a dimension to a fact

Life don't always follow the "best practicing" methodologies of Life. "Star Schema" is the best and most preferred way to design the data warehouse; unfortunately, it's not always possible. In the industry I'm at (financial planning) you would expect the measures to be some financial product valuable (i.e. money); so the major feed is actually clients and their financial products. The portfolio is the "fact" table (easy to decide by the size of it) while the clients and their attributes are the "dimension" table. Sounds OK, until they asked for the number of clients. Easy enough, you'd say, every client must have a representation in the "portfolio" fact table, so all we have to do is to use "distinct count" on the "client id" in the fact table, right? Wrong. Some clients have financial products, but not in the feed of the portfolio but in some other file which hasn't been created yet. Again, before you scream "bad design!" may I add that the original intention was to create a "source of truth" which lays in an outsourced system, and only later to create the date warehouse itself?
So here I am, trying to create count of clients per month per adviser, or in DW/BI terms: trying to use a dimension table of a slowly moving dimension (that has the "date" only as time attributes) as a fact table (which would have a link to the date as a dimension). Or, as my Business Analyst put it: Before, the "portfolio" was in the center of what we want. Now, the center of attention is the client.
The solution?

Using indexed view from dimClient, created a view called "factClient". The view takes only the necessary fields from the dimClient table (no "date inserted" and no "valid till" - which is defaulted to 31 of December, 9999) and joins them to the dimDate table.
The view is created like this:






CREATE VIEW [dbo].[factClient]
SELECT d.[id] AS dimDateId
      ,1 AS ClientGroupCount

      -- Other dimension fields and other necessary fields
  FROM [dbo].[dimDate] d INNER JOIN
    [dbo].[dimClient] C ON c.DateInserted <= d.DateFormat AND c.ValidTill > d.DateFormat
  WHERE d.HasPast <> 0

As you can see I deleted (for the sake of this view) the columns that are particular to my database, but what I've kept are the followings:
First, I've added a field in the Date dimension table that relates weather this data has past or not. This fields get updated after the ETL process completes. The reason for that is that I didn't want to inner join with the (large) fact Portfolio table, and the other is that any other way of querying that client would result with an error of "Cannot create index on view 'dbo.factClient'. The function 'getdate' yields nondeterministic results. Use a deterministic system function, or modify the user-defined function to return deterministic results."
The second thing that happens is that it gives you the user id, and a dimension, like this:
Member Id   dimDateID      dimAdviser Id    Count
1                  1 Dec 2010     1                            1
1                  2 Dec 2010     2                            1
Which gives me the ability to query by adviser on a different date dimension. Giving every line the value of "1" enables me to "sum" without using a "distinct count" (or any count, for that matter) as distinct count requires a unique measure group.
Please note that I'm using SQL Server 2008 R2 which enables me to use the "date" data type instead of "datetime" datatype.
Please note that the "dateFormat" field in the "dimDate" table is in order to give the "date" version of the date id: so when the key is 20101130 (data type: int) the "date format" is '20101130' (data type: date).
(More about "Indexed Views" in here)

Now there's a new fact table the following stages should be made in the Analysis Services project:
Add the new fact table to the DSV (data source view) file
Create the necessary relationships between the fact table and the dimensions surrounding it
Create a new measure group in the cube. Make sure that the measures are defined correctly (the [Member id] field  is an INT, but I don't want it to be summed!)
Hand it over to your Business Analyst and hope that s/he approves...

No comments:

Post a Comment