Business Intelligence in Sydney: 2014

13 December 2014

You don't really need a website

I was given a most horrible assignment of data entry. The task is basically to search up names of companies in Google. I could use my time more efficiently by using Sensis API and the likes but as it's not always possible I decided to look up the companies one by one.
And this is what I learned:

All of these companies are unique, special and different. They all offer exceptional value and excellent service. They are all highly professional, experienced and have excellent customer service.
Other companies just have contact details in their website. Could you please make sure that it looks well? Maybe update the look of it once every ten years or so?
Small business owners don't always have a website. I know it sounds crazy, but you don't really need one if you don't have the time to manage it, or if your name isn't too generic. Just make sure that you can be found in Yellow pages and the professional listings. For example. Lawyer's listings or Accountant Listings, if you're a lawyer or an accountant. Artists and those who work with art can keep a Facebook page, it's also a good way to make sure that those who look for you can find you.
If your name is generic ( or if you share it with someone else who is more famous then you) , then make sure that in LinkedIn your title includes your location as well as your profession. This way you're easily found without the hassle of maintaining a website.
If you do have a website, make sure that it is Mobile friendly.

Even if you prefer to say that you're not into being famous and increasing your workload, be aware that maybe an existing client would like to look you up (as they've lost their mobile or something like this). So having a clear listing is always important.

So you don't really need a website, just make sure that you can be easily found online.

27 November 2014

Remind me to never use access again....

I came back from Maternity Leave, all enthusiasm. It lasted of course only a few weeks, and then I was asked to help with a "Training Record System"; as an in-house developer, if they ask for something to record the training of the employees in the company, I said yes. It's called a "can do" attitude and people really appreciate it. And because I felt a little bit rusty, I decided to use Access as front end and SQL as the back end. I can do that with my eyes closed, right?
And then I was easily reminded of the joys of working with Access.
A basic form with two sub forms, and a combo box on top which is linked to them. The first sub form contains the details of the course, the second sub form contains the details of the participants. So when I first added details to the course sub form (the top one) (On new record) I would get the very descriptive

You can't assign a value to this object

with all sort of non relevat reasons.

And then I pressed "ok" and continued to edit the report.

It slightly help to add the following code to the sub form

And that resolved it temporarily.

So I managed to put all the course details without getting any annoying error messages, and then when I moved from the "course" part details to the "participants" part details, just to get the following error:

ODBC -- call failed

[Microsoft][ODBC SQL Server Driver]Optional feature not implemented (#0)

So I looked this problem up and found myself on #2 in Google! Well, that only shows how rusty I am! I only needed to refresh the underlying table this time in order to sort out the problem (I knew it's not the ODBC driver as yesterday it worked...).

One more thing (basic but good to know):
If you get the
"The expression you entered has a function name that Microsoft Office Access can't find"
error.
Sometime it is just due to the fact that a Macro must refer a Public function (not a Sub!) in a module. And the function's name should be different than the Module Name. And of course, make sure you've spelled the word correctly. For a Function that could be written as a sub just do this

Public Function Foo () AS Integer
'Do Something
Foo = 1
End Function

I Also downloaded this
http://www.microsoft.com/en-us/download/confirmation.aspx?id=6627
in order to set up the Menu item. Ho, the good old MS Access 2003 days! You can find a code example of how to add a side menu to a report here. As for manipulating the Ribbon for better functionality; sorry, it requires messing with the Registry, an absolute no-no in my view. I really don't think it is even slightly necessary.

Another Example:

Working on an older database, running a trusted code and Receiving
Error: Record is deleted, (3167) encountered.
Apparently I needed to do Compact and Repair

Another Example:

I've placed a copy of an Access MDB file on the Network so everyone can share the joy.
So I run it and it doesn't let you run the product from the Network, you need to add it to the Trust Center.
So I go File -> Options -> Trust Center -> Trust Center Settings ->Trusted Locations
And I add a new location.

It tells me that I'm not allowed ("The remote or network path you have entered is not allowed by your current security settings.")
And I need to check the "check "Allow trusted locations on my network (not recommended)"

So:
If you want to run Access from the Network is not recommended.
Which means that Access isn't a corporate product! Ask MS if you don't believe me!

The issues is this, but really, really, please:
How come that if I'm working on a Trusted product, Like SQL or C#, I rarely rarely google my problems, and I hardly ever get "weird" error messages, while 10 minutes into using MS-Access I Google like crazy?

25 July 2014

How to: process SSAS database via C# code

There are several ways to process a database, but if you'd like to use C# code, then this is the way:

using Microsoft.AnalysisServices;

void main()
{
try
{

Server OlapServer = new Server();
Database db = null;
OlapServer.Connect("Provider=MSOLAP;Data Source=ServerName;");
db = OlapServer.Databases.FindByName("");

db.Process(ProcessType.ProcessFull );

OlapServer.Disconnect();
}
catch (Exception eex)
{
System.Diagnostics.Trace.TraceInformation(eex.Message);
}
}

You can make changes to the Process Type by changing the Process Type enumeration.

Good luck!

04 July 2014

The difference between A/B testing and the Facebook experiment

I assume you've heard about the Facebook experiment in which they've changed the feed of news for random people in order to affect their mood. Many people are outraged. Some say: "well, what are you so surprised about, it's a simple A/B testing and people in marketing and advertisement had been doing this for years?"
And I say: "yes, it's a simple A/B testing. but what did they try to achieve?"
A/B testing is posting the same message in different wordings, or changing the font/layout, in order to...

produce more clicks
engage your audience
create more sales

And they are fine with me because I assume I know what's the interest is. Yes, the car company would like me to purchase their car (or that I would recommend it to my friend). Yes, the newspaper wants me to read their news. Yes, the twitter marketer would like me to follow their tweets. Yes, the people in twitter would love me to invoke my sleeping account. No surprises here.

But what is Facebook's agenda?

Are they trying to make me more engaged? are they trying to sell me something? would they like me to start a revolution? No, they're trying to see how the feed affects my mood, in order to make sure that I won't shy away from FB if everything is sad around me, or that I wouldn't be depressed if everyone around me are going on holidays, which is a common theme in my feed around Christmas time.

Which is obviously rubbish, since if let's say it's the December 8th, 1980, and John Lennon had just been shot, I assume if I'd turn to FB I wouldn't find many smiling faces. Would that make me shy away from my Social Network?

And if (as it happened) I happen to have two new nieces in a gap of one month, it's not that bad to look at new photos of them in my feed, is it?

So they are a bunch of manipulative jerks. What a surprise. My ex is a network researcher in FB. I'm happy that as far as I found out he wasn't involved in this experiment, but I wouldn't be too surprised. Yes, I still own an account on FB.

And now to the classic question:

How do we protect ourselves?

If it's an advertisement we know what the hidden agenda is. If we read the newspaper we know what they want. And if we read a "news" article which is simply hidden my marketing we have learnt to protect ourselves from that as well. My solution? Shy away from FB. Make sure to keep you social network social, not only in the network.

12 June 2014

My first cursor - faster than SET based operation

Of course, you should NEVER write cursor. Especially not for SQL server. Or so I was told.

The rational behind the code

I believe people hate them so much because usually you might find them in some legacy code, where there could be some other proper SET based operations, or at least they might exist nowadays. Nobody likes legacy code. Nothing too delightful about jumping into code that you don't know and you didn't write. Another reason is of course, for the usual T-SQL code that you write some smart people wrote optimizer for. If you run a Cursor the optimizer cannot kick in, so it's likely that the code would run slower.

But sometimes you do need to write cursors. The main use is when you need to go row by row. Then why not use client side programs, like C# or VB? Well, maybe it's because the developer feels more competent using SQL than by using C#? That's been my excuse for writing the code below.
So I had to go row by row for a one off. My SQL skills exceed my C# skills. I needed to run this code once and that's all. So I choose to write a cursor.

What does it do?

There had been some duplicated rows of data in the "attachment" table. The data is a file, saved in the "content" column as varbinary(VARMAX), and I just wanted to make sure that the duplicated file name didn't stem from a different files, but rather, represented the same file on multiple instances. I've made sure that all attachments were saved to a new table called "multiple_attachment" when "min_attachid" is the ref (there's obviously another field called "file name" which I didn't bother with this time).
Since it's hard to trace errors in Cursor I've added the "INSERT" statement.
Using SET based operation would not be quicker (tested!) due to the INNER JOIN of a table on itself, on top of the expensive operation of comparing varbinary(max).

The code

DECLARE first_cursor CURSOR
FOR
SELECT content, attachid, min_attachmentid
FROM dbo.multiple_attachment
WHERE attachid <> min_attachmentid

DECLARE @content VARBINARY (max)
DECLARE @attachid BIGINT
DECLARE @min_attachid BIGINT

OPEN first_cursor

FETCH NEXT FROM first_cursor
INTO @content, @attachid, @min_attachid

WHILE @@FETCH_STATUS = 0
BEGIN
IF @content <> (SELECT content
FROM dbo.attachment
WHERE attachid = @min_attachid)
INSERT [dbo].[trace_errors]
([attachid])
SELECT (@attachid)

FETCH NEXT FROM first_cursor
END

CLOSE first_cursor
DEALLOCATE first_cursor

A problem:

The code above got stuck with the error "An error occurred while executing batch. Error message is: Error creating window handle."
because there were over 2000 lines in the table and there was a limit to how much "Display Result" window panes you can create. For every line it produced a new message! the solution was to disable the results pane in the Query Option (Query -> Query Options -> Results
choose "Discard results after execution" in SQL Server 2012). The above proc run much quicker and I would recommend setting this option each time you need to write a Cursor.

22 May 2014

Investigating undocumented database - part 2

I've written in the past about how to investigate an undocumented database, and here are a few more insights to my previous post.
First of all, if you need to list all the tables with data you can use the following script:

SELECT
t.NAME AS TableName,
p.rows AS RowCounts,
SUM(a.total_pages) * 8 AS TotalSpaceKB,
SUM(a.used_pages) * 8 AS UsedSpaceKB,
(SUM(a.total_pages) - SUM(a.used_pages)) * 8 AS UnuasedSpaceKB
FROM
sys.tables t
INNER JOIN
sys.indexes i ON t.OBJECT_ID = i.object_id
INNER JOIN
sys.partitions p ON i.object_id = p.OBJECT_ID AND i.index_id = p.index_id
INNER JOIN
sys.allocation_units a ON p.partition_id = a.container_id
WHERE
t.NAME NOT LIKE 'dt%'
AND t.is_ms_shipped = 0
AND i.OBJECT_ID > 255
AND p.rows > 0 --As we want only tables with data
GROUP BY
t.Name, p.Rows
ORDER BY
t.Name;

Warning: sometimes if a table is very large due to partitioning it could appear twice on the list! so you'd see both tables with the same name and only one schema... that's confusing, but it could happen.

When you look around the different tables, sometimes they are not all perfectly normalized! It's very frustrating, and in that case have a look in the triggers:

To List all triggers:

SELECT
sysobjects.name AS trigger_name
,USER_NAME(sysobjects.uid) AS trigger_owner
,s.name AS table_schema
,OBJECT_NAME(parent_obj) AS table_name
,OBJECTPROPERTY( id, 'ExecIsUpdateTrigger') AS isupdate
,OBJECTPROPERTY( id, 'ExecIsDeleteTrigger') AS isdelete
,OBJECTPROPERTY( id, 'ExecIsInsertTrigger') AS isinsert
,OBJECTPROPERTY( id, 'ExecIsAfterTrigger') AS isafter
,OBJECTPROPERTY( id, 'ExecIsInsteadOfTrigger') AS isinsteadof
,OBJECTPROPERTY(id, 'ExecIsTriggerDisabled') AS [disabled]
FROM sysobjects

INNER JOIN sysusers
ON sysobjects.uid = sysusers.uid

INNER JOIN sys.tables t
ON sysobjects.parent_obj = t.object_id

INNER JOIN sys.schemas s
ON t.schema_id = s.schema_id

WHERE sysobjects.type = 'TR'

Sometimes you need to have a printout of an "on delete trigger" to find out all the tables that are related to a certain tables, when foreign key isn't always defined.

Ah, and if there's a certain and not quite common data type that you need to find out about. I'm not talking about INT here, maybe it's XML or Money or in the following case, Varbinary:

select IC.TABLE_NAME, IC.COLUMN_NAME
from INFORMATION_SCHEMA.COLUMNS IC where IC.DATA_TYPE = 'VARBINARY'

Ah, and BTW: SQL Server 2012 is FUN to work with, you're welcome to try out!

15 January 2014

Distinct Count using Many to Many relationship - SSAS

I've blogged here about implementing a many to many relationship. That was in order to view changes in a dimension across time (like Clients changing Advisers between one month to another). This time I want to blog about implementing the same technique in order to view the distinct count of Clients, as obviously it is a very important measure for an advisory company.
The problem with Distinct Count for SSAS 2008 R2 is that it is heavy to use. Of course you can use some methods to improve the performance, but nevertheless it is a heavy task. In my not-too-big data warehouse eliminating the measure altogether gave me 1 minute of benefit in the processing, which is a lot. So instead of going Measures -> New Measure... -> and choosing Distinct Count for the appropriate measure (in my case: the natural key of the Client, as Clients are a Slowly Changing Dimension).
I recommend reading the wonderful document The many to many revolution. This article was written by Marco Russo & Alberto Ferrari and I base this blog entry on their work, and obviously on what works for me. You can read an example in this blog entry, but I think he ignores Slowly Changing nature of the Slowly Changing Dimension.
So this is how it's done properly:

Distinct Count on SCD type 2 using Many to Many relationship

The SQL part

First, you need to create a view on the SQL server. This view bridge the Client and the Fact Table.
Fact Table: FactCommission
Dim Table: DimClient
I don't want the (surrogate) key which links the Fact Commission to Dim Client: I want the Natural key!
Therefore I created a view as follows:

SELECT DISTINCT [Client Natural Key]
FROM dbo.factCommission

Since I can't create an Index on a view which uses DISTINCT, I had to change the view as follows:

CREATE VIEW [dbo].[v_bridge_fCommissionClient]
WITH SCHEMABINDING
AS
SELECT [Client Natural Key]
COUNT_BIG(*) AS totalNumber
FROM dbo.factCommission
GROUP BY [Client Natural Key]

I also created a Unique Clustered Index on the [Client Natural Key] field. I am not certain that this part is necessary when you use M2M, on the other hand I don't think it hurts...

The Data Modelling Part

After you add the v_bridge_fCommissionClient table to the data source view, and add it to the Cube Structure. First it is a Dimension, and then you create a measure out of it (when it is based as a fact).

The dimension has obviously only one attribute: [Client Natural Key]. When you add it as a dimension it is important to make sure the the link to the Fact Commission is based on the [Client Natural Key].

And now in order to make this work across the board you must link *all* of the relevant dimensions from Fact Commission to the new dimension using the Many To Many relationship. Many would advise you to link the Time dimension via the Fact Table using the Many To Many: true, but not enough. All the dimensions that are needed to be queried are queried via the Fact Commission measure. Therefore, you need to link them as well using Many To Many relationship (see below).

This way when you browse the data you get the numbers correctly for all dimensions.

Conclusion

I'm happy to say that:

It does give you the correct data across the board just like the good old distinct count
It works like a charm and there seems to be no overhead for the SSAS engine on the extra data. I believe this new Measure helps even helps the Distinct Count measure (but maybe it's just a belief).
Once you get the hang of it, it is actually even simpler than using Distinct Count...
Marco & Alberto claims that it is easier on the Caching. I'll have to see about that.

Business Intelligence in Sydney