Togaware DATA MINING
Desktop Survival Guide
by Graham Williams
Google

ODBC Option

Rattle supports obtaining a dataset from any database accessible through ODBC (Open Database Connectivity).

[width=]rattle-odbc

The key to using ODBC is to know (or to set up) the data source name (the DSN). Within Rattle we specify a known DSN by typing the name into the text entry. Once that is done, we press the Enter key and Rattle will attempt to connect. This may require a username and password to be supplied. For a Teradata Warehouse connection you will be presented with a dialog box like that in the figure below.

[width=0.3]rattle-odbc-teradata

For a Netezza ODBC connection we will get a window like the one in the following (on the left):

[width=0.5]rattle-odbc-netezza

If the connection is successful we will find a list of available tables in the Table combobox.

We can choose a Table, and also include a limit on the number of rows that we wish to load into Rattle. This allows us to get a smaller sample of the data for testing purposes before loading up a large dataset. If the Row Limit is set to 0 then all of the rows from the table are retrieved. Unfortunately there is now SQL standard for limiting the number of rows returned from a query. For the Teradata and Netezza warehouses the SQL keyword is LIMIT and this is what is used by Rattle.

The Believe Num Rows option is an oddity required for some ODBC drivers and appears to be associated with the pre-fetch behaviour of these drivers. The default is to activate the check button (i.e., Believe Num Rows is True). However, if you find that you are not retrieveing all rows from the source table, the the ODBC driver may be using a pre-fetch mechanism that does not ``correctly'' report the number of rows (it is probably only reporting the number of rows limited to the size of the pre-fetch). In these cases deactivate the Believe Num Rows check button. See Section [*] for more details. Another solution is to either disable the pre-fetch option of the driver, or to increase its count. For example, in connecting through the Netezza ODBC driver the configuration window is available, where you can change the default Prefetch Count value.

[width=0.3]rattle-odbc-netezza-config

Copyright © 2004-2006 Graham.Williams@togaware.com
Support further development through the purchase of the PDF version of the book.
Brought to you by Togaware.