• Narrow screen resolution
  • Wide screen resolution
  • Fluid screen
  • style1 color
  • style2 color
  • style3 color

SAS Portlets, Widgets, Themes and Tutorials for sale

Blogging about all things SAS

 

Archive for September, 2008

Beer and Nappies - Data Mining Urban Legend

Sunday, September 21st, 2008

Interesting post over at blogs.sas.com  pointing to an article by Bob Garfield that dispels as urban legend the Data Mining anecdote that putting beer and nappies together increases beer sales due to the fact that Dad’s pop in to buy nappies on the way home from work and pick up some beer as well.

Must admit I have used that line a few times in the past, but it does highlight the idea that Data Mining can identify differences or similarities that would not normally be pciked up via simple data exploraton techniques

A while ago I was luck enough to be involved in a Data Mining exercise for a large supermarket chain, and a few things came out of the analysis that we probably wouldn’t have picked up manually:

  • One of the stores had a high proportion of shopping carts with meat in them, the other was primarily vegetarian
  • Banana’s were the major linkage product for all carts in all stores
  • There was a certain customer segment that regularly brought cat food, chocolate and red wine
    (I still think naming market segments is a fun task!)
  • House brands vs non house brands were dependent on a certain percentage price difference
  • People would regularly buy house brands and non house brands in the same cart

So would changing the layout of the store based on some of these preferences increase sales?

Good question…

All  I know is that this insight took a few days of effort, it probably would have taken months to do it manually (and that is assuming we would find these results).

Thats the power of Data Mining!

Dealing with Oracle Dates in SAS

Friday, September 5th, 2008

Working with a customer where we have decided to utilise the current Oracle Data Warehouse as much as possible as the source of data for reporting, given the significant investment in the data warehouse over the last few years.

Rather than extract the Oracle data into SAS we were testing using SAS Information Maps and Web Report Studio over the top of the Oracle Star Schema’s.  I have to say I was impressed with the performance of the Web Report Studio reports given the number of layerswe were traversing, i.e

Web Report Studio > Information Map > SAS Workspace Server > Proc Sql >SAS Access to Oracle > Oracle (and back again).

Anyway one issue we found was that we couldn’t apply SAS Date Formats against the Oracle date fields, if we did it would just show ******* instead of a date.

We could use a datepart function to substring the date field and that worked ok, but that could only be applied in the Information Map, not against the Metadata definition which means other tools such as SAS OLAP Cubes and Enterprise Guide etc wouldn’t pick it up.

We tried  changing the Metadata definition for datefields from DATETIME20. to say DATE9. , but then our friends ****** would appear.  Even worse if we reapplied the DATETIME20. format to the metadata definition it set all the dates to 1960.  We had to delete the metadata definition and reimport it to get the correct dates again.

The solution was in the Usage Note 6413: SASDATEFMT option can be defined globally in the config file or as an environment variable with SAS/ACCESS Interface to Oracle” here:

http://support.sas.com/kb/6/413.html

Although it talks about SAS 8 we applied the following statement to the sasv9.cfg file:

-set SASDATEFMT date

Rebooted the workspace server, deleted and re-imported the Oracle Table Metadata and wallah the Oracle date fields were know registered as Date. and aplpying date formats in the Metadata definition, Information Maps or Enterprise Guide worked the way they should!

As an aside in working through the options we worked out that thwre is a magic number that if you take it away from the numeric Oracle data value you get the numeric SAS date value, whihc makes sense I suppose given they both juts start counting days to manage dates, they just start counting from different points.