|
Figure 1 - the data source view. |
|
A Many to Many relationship example in SSAS: an implementation with a difference.
It all started with a pretty innocent request:
in my work we have advisers and they serve clients.
Sometimes clients change across advisers, and we would like to keep track of that.
So, How do you do that? Or in technical terms:
How do you measure changes in measure across Slowly Changing Dimension?
(You can use Many To Many relationship for Distinct Count as well: more about it in
here).
In my case there are actually two types of references to "clients". The first is the reference to "dimClient" which is the original Slowly Changing Dimension (SCD) and the second is the factless fact table: or, to be exact, a view that picks up the relevant clients from dimClient table and cross join it with the date table, and this way I have the changes per client per month.
As the clients sits in a factless fact table, I've googled "factless fact table", and before long (actually, on link # 7), I reached this invaluable article:
The many to many revolution. This article was written by Marco Russo & Alberto Ferrari and it is brilliant. I'm not even trying to closely imitate that - I'm just giving a short example of how it worked in my cube, that's all.
So
the problem:
There's the SCD (Slowly changing dimension) called dimClient. For all-sort-of-reasons (you can read about them in
here if you're really interested) I've made a "factless" fact table out of it. Now the end user (or the business analyst, whatever you'd name him) likes to know what happened to different clients across the time. The first question referred to Advisers: clients move across advisers, how do you keep track of that?
The SQL part of the solution
After I've read the article closely, printed the relevant part (in this case: pages 63-69), and read it thoroughly several times, I created 3 new views:
The original view, factClient, became the measure group Adviser A.
A new view, called v_bridge_Adviser, with the following structure:
SELECT
distinct
[dimDateId]
,[dimAdviserId]
,ClientGroup
FROM [dbo].[factClient]
This view gives me another link between the ClientGroup (the natural key for every client), the Adviser id and the date id. This view became the measure group Adviser B.
Next two views are actually identical:
CREATE VIEW [dbo].[vBridge_ClientAdviserA]
AS
SELECT DISTINCT
[ClientGroup]
,[dimAdviserId]
FROM factClient
and
CREATE VIEW [dbo].[vBridge_ClientAdviserB]
AS
SELECT DISTINCT
[ClientGroup]
,[dimAdviserId]
FROM factClient
Brilliant!
The Data Modelling part of the solution
Now all I had to do is:
add the tables to the dsv
make sure that they are linked correctly
create the new measure group ("Adviser A", "Adviser B", "bridge Client Adviser A" and "bridge Client Adviser B")
process the data
find an incompatibility in the data source
sort it out (details in the next post)
And this is how the code looks like in the "measure group".
|
Figure 2 - the cube and the measures usage |
|
Please note:
Measure group: "Adviser A" is simply fact client
Measure group "Adviser B" is v_Bridge_Adviser
Measure group "Client Adviser A" is v_Bridge_Adviser A, and;
Measure group "Client Adviser B" is v_Bridge_Adviser B
The most important tip is:
Make sure that the dimensions and the measure groups are all linked correctly.
If you don't you'll get "weird" numbers that you'll have no way of linking to each other, and that's a very frustrating experience.
As the entire process is being done with dsv's and cubes, you need to be able to read the visual graphs
better than reading code.
Another tip: (sound obvious, but still) : First link all the "regular" measure groups, only after that the "many to many" ones.
the Dimensions:
dimDate 1, dim date are the "dear old" dim dates dimensions. I had to re insert dim date again to the cube; no need to re define the dimension itself (hierarchy and all of that staff).
dim Adviser 1 and dim Adviser are the Advisers dimension referring to dim Adviser table from the dsv (see figure 1)
vBridgeClient 1 is vBridgeClient view (don't know why I had two in the dimensions, some things I found too difficult to deal with) gives you a list of the distinct clientGroup members (the natural key - there's only one natural key for clients in my DW). Getting the distinct number enables me not only to get the distinct count, but also of getting the info of who exactly had moved where; very important data, as it is
SCD type 2).
And after all is set & done, this is how I've written a very complicated MDX to use it (crap, the user can do it by drag & drop himself!)
SELECT
[Adviser A].[Id].&[20] ON 0,
NON EMPTY [Adviser B].[Id].allmembers ON 1
FROM [TestMtoM]
WHERE ([Date A].[Id].[Aug 2011],
[Date B].[Year-Month].[Month Year].&[Oct 2011])
Then I was asked to do the same, but not with the "Adviser" dimension but with the Category dimension. That's a different dimension, but when it came to implementation, I realized something very important:
I don't need the "bridge Category A" and "Bridge Category B" measure groups!
They could be useful in Marco & Alberto's example; but not in mine. They refer to checking changes in a dimension linked to a fact table - I'm dealing with changes in a factless fact table. In my case, where there are way fewer data every month, and the view to measure link the advisers, clients & dates isn't large; the total amount of lines include only less than half a million rows - that's not a lot! (that's for a year and a half worth of data, when we don't need to retain history for more than 3 years).
Therefore, I deleted the "bridge Client Adviser A" and "Bridge Client Adviser B" measure groups, and this is how the end result looks like:
As you can clearly see, while we still need the many to many relationship, we don't need it in as many measure groups as we needed it as the complicated manner which had been used before.
Yes, it works brilliantly, and yes, the data is accurate; it does give you the end results as it should!
One last thing: Yes, it is worth your while to set up the measure groups correctly. Of course you can write those complicated SQL queries, but why should you bother? This way your Business Analyst can create the reports him/herself to the best interest of the business without me worrying about the details of the display.