26 January 2012

SQL Loader Utility handling Multi Byte chars



Situation
We had erroneous data after loading with SQL Loader.
Settings
Database created with
CHARACTER SET "UTF8"
NATIONAL CHARACTER SET "WE8ISO8859P1"
NLS_LANGUAGE = AMERICAN
NLS_TERRITORY = AMERICA
Example
Create a test table with
CONNECT scott/tiger;
DROP TABLE LOADER_TEST;
CREATE TABLE LOADER_TEST (
   USR_ID          NUMBER NOT NULL ,
   USR_NAME        VARCHAR2(50) ,
   USR_LNK_NAME    VARCHAR2(50) ,
   USR_LNK_ORDER   NUMBER
);

·         Case 1
Client (Win 2000)
NLS_LANG=AMERICAN_AMERICA.UTF8
Controlfile (ctrl.txt)
LOAD DATA
CHARACTERSET UTF8
INFILE *
REPLACE INTO TABLE LOADER_TEST
FIELDS TERMINATED BY ';'
TRAILING NULLCOLS (
   USR_ID         INTEGER EXTERNAL,
   USR_NAME       CHAR(50),
   USR_LNK_NAME   CHAR(50),
   USR_LNK_ORDER  INTEGER EXTERNAL
)
BEGINDATA
1;Santé bien;http://www.vaud-sante.ch;3
2;Santé;http://www.vaud-sante.ch;4
3;Alle à gessa;http://www.gessa.com/now;2
4;Alle à;http://www.gessa.com/now;1
5;Gägs;http://www.gaegs.ch;5
6;Gägs ä;http://www.gaegs.ch;6
Invoke sqlldr
sqlldr userid=scott/tiger control=ctrl.txt
Result
column usr_id format 99 heading "Id"
column USR_NAME format a32 heading "Name"
column USR_LNK_NAME format a26 heading "Link"
column USR_LNK_ORDER format 99 heading "Ord"
select * from loader_test;
Id Name                             Link                      Ord
-- -------------------------------- ------------------------- ---
 1 Santé bien                       http://www.vaud-sante.ch    3
 2 Santé;http://www.vaud-sante.ch/   4
 3 Alle à gessa                     http://www.gessa.com/now    2
 4 Alle à;http://www.gessa.com/now  1
 5 Gägs                             http://www.gaegs.ch         5
 6 Gägs ä;http://www.gaegs.ch/        6
For the records 2,4,6 where a special character is prior to the delimiter, the SQL Loader does not see the delimiter.

·         Case 2
Change the control file (ctrl.txt) to
CHARACTERSET WE8ISO8859P1
Reload the data and select the LOADER_TEST table again
Id Name                             Link                      Ord
-- -------------------------------- ------------------------- ---
 1 Santé bien                      http://www.vaud-sante.ch    3
 2 Santé                            http://www.vaud-sante.ch    4
 3 Alle à gessa                    http://www.gessa.com/now    2
 4 Alle à                          http://www.gessa.com/now    1
 5 Gägs                             http://www.gaegs.ch         5
 6 Gägs ä                         http://www.gaegs.ch         6
Now, SQL Loader recognises all delimiters, but data seems to be corrupted.

·         Case 3
Change the client environment to
NLS_LANG=AMERICAN_AMERICA.WE8ISO8859P1
Reselect data - now the result is now correct !
 Id Name                           Link                       Ord
--- ------------------------------ -------------------------- ---
  1 Santé bien                     http://www.vaud-sante.ch     3
  2 Santé                          http://www.vaud-sante.ch     4
  3 Alle à gessa                   http://www.gessa.com/now     2
  4 Alle à                         http://www.gessa.com/now     1
  5 Gägs                           http://www.gaegs.ch          5
  6 Gägs ä                         http://www.gaegs.ch          6
Conclusion
Running client tools (e.g. SQL Loader, SQL Plus) against a database with UTF8 character set, you must assign WE8ISO8859P1 to NLS_LANG for the client tools due to prober data representation. That means a correct characterset conversion will take place within SQL Net and Net8 respectively.

No comments: