Relational Database Technologies: 2013

Saturday, August 24, 2013

How to change all objects ownership in a particular schema in PostgreSQL ?

Few suggesion's here (Thanks), inspired me to compose a bash script for changing all object's (TABLES / SEQUENCES / VIEWS / FUNCTIONS / AGGREGATES / TYPES) ownership in a particular schema in one go. No special code included in a script, I basically picked the technique suggested and simplified the implementation method via script. Actually, REASSIGN OWNED BY command does most of the work smoothly, however, it changes database-wide objects ownership regardless of any schema. Two eventualities, where you may not use REASSIGN OWNED BY:

1. If the user by mistake creates all his objects with super-user(postgres), and later intend to change to other user, then REASSIGN OWNED BY will not work and it merely error out as:

postgres=# reassign owned by postgres to user1;
ERROR:  cannot reassign ownership of objects owned by role postgres because they are required by the database system

2. If user wish to change just only one schema objects ownership.

Either cases of changing objects, from "postgres" user to other user or just changing only one schema objects, we need to loop through each object by collecting object details from pg_catalog's & information_schema and calling ALTER TABLE / FUNCTION / AGGREGATE / TYPE etc.

I liked the technique of tweaking pg_dump output using OS commands(sed/egrep), because it known that by nature the pg_dump writes ALTER .. OWNER TO of every object (TABLES / SEQUENCES / VIEWS / FUNCTIONS / AGGREGATES / TYPES) in its output. Grep'ing those statements from pg_dump stdout by replacing new USER/SCHEMANAME with sed and then passing back those statements to psql client will fix the things even if the object owned by Postgres user. I used same approach in script and allowed user to pass NEW USER NAME and SCHEMA NAME, so to replace it in ALTER...OWNER TO.. statement.

Script usage and output:

sh change_owner.sh  -n new_rolename -S schema_name

-bash-4.1$ sh change_owner.sh -n user1 -S public

 Summary:
        Tables/Sequences/Views : 16
        Functions              : 43
        Aggregates             : 1
        Type                   : 2

You can download the script from here, and there's also README to help you on the usage.

--Raghav

Wednesday, August 7, 2013

New Oracle-Compatibility features in PostgresPlus Advanced Server 9.3Beta

In short about EnterpriseDB product PostgresPlus Advanced Server, its built upon PostgreSQL community version with Oracle-Compatibility features and its ability to run with existing Oracle applications without any re-coding(typically required in migration) and allow enterprises to enjoy the benefit of the Open source database with PostgresPlus Advanced Server.

As PostgresPlus Advanced Server 9.3 Beta built upon community PostgreSQL 9.3 Beta feature, thence varied Oracle-Compatiblity features introduced in BETA unleash, like Oracle-style syntaxes, packages, SQL function etc. Below are few of the examples executed in PPAS 9.3 BETA:-

Materialized Views Syntax:

CREATE MATERIALIZED VIEW name [build_clause][create_mv_refresh] AS subquery
Where build_clause is:
BUILD {IMMEDIATE | DEFERRED}
Where create_mv_refresh is:
REFRESH [COMPLETE] [ON DEMAND]

Build deferred - In this option, data not populated in MV at the creation time, instead its populate later by using REFRESH MATERIALIZED VIEW.
Build Immediate - In this option, data populated in MV at the creation time from tables (Default)

edb=# create materialized view mymview1 
build immediate 
as select ename,sum(sal) from emp group by ename; 
SELECT 14 
edb=# create materialized view mymview2 
build deferred 
as select ename,sum(sal) from emp group by ename; 
SELECT 0

As you'll be able to figure it out from above output that "BUILD IMMEDIATE" populates all rows at the create time, whereas "BUILD DEFERRED" simply created a dummy object that ought to be later populated using REFRESH MATERIALIZED VIEW.

edb=# refresh materialized view mymview2 ;
REFRESH MATERIALIZED VIEW
edb=# select count(*) from mymview2;
 count
-------
    14
(1 row)

REFRESH [COMPLETE] [ON DEMAND] modes are also supported at creation time of Materialized View.

More Oracle SQL functions support:
REGEXP_SUBSTR()

edb=# SELECT REGEXP_SUBSTR('EDB documentation AT http://www.enterprisedb.com/','http://([[:alnum:]]+){3,4}/?') as RESULT FROM dual;
   result
------------
 http://www
(1 row)

REGEXP_COUNT()

edb=# select regexp_count('I am in PG-Mailing list','i',1,'i');
 regexp_count
--------------
            5
(1 row)

REGEXP_INSTR()

edb=# SELECT REGEXP_INSTR('PostgreSQL','P') "position" FROM dual;
 position
----------
        1
(1 row)

RAWTOHEX() / HEXTORAW ()

edb=# CREATE TABLE test (raw_col RAW(10));
CREATE TABLE
edb=# INSERT INTO test VALUES (HEXTORAW('7D'));
INSERT 0 1
edb=# select * from test ;
 raw_col
---------
 \x7d
(1 row)

edb=# select rawtohex(raw_col) from test ;
 rawtohex
----------
 7d
(1 row)

More Oracle Packages support:

DBMS_SCHEDULER
DBMS_RANDOM
DBMS_CRYPTO
DBMS_MVIEW
DBMS_LOCK.sleep
UTL_HTTP
UTL_ENCODE
UTL_URL

For more details you can refer release notes:
http://get.enterprisedb.com/releasenotes/PgPlus_AS_93Beta1_Release_Notes_20130802.pdf

--Raghav

Monday, July 29, 2013

PostgreSQL Compressed Archive Logs in Windows

Many have blogged and offered glaringly on how to compress archive logs in PostgreSQL, I desire to share one of a short version from my end which's on Windows.

If archive storage is concern, then you can choose compressed archive logging feature in PostgreSQL.

"archive_command(string)" in $PGDATA/postgresql.conf, is like a shell command to execute what's passed in string section to copy the completed source file (WAL file segment in $PGDATA/pg_xlog) to destination(ARCHIVE LOCATION). "string" can be anything like shell script(batch in Windows) itself, OS compression utilites, and a special tool pg_compresslog. In Windows, cmd.exe will execute the command passed in archive_command "string".

Since we are applying on Windows platform, pre-requesites are:

Archive directory should have full postgres user access. ("C:\Program Files\PostgreSQL\9.2\archives" in my case)
Window version GZIP utility. Though there are numerious good windows variant compression utilities, I choosed gzip because its supported both on Linux & Windows.
Gzip.exe should have access to Postgres User and also in PATH. ("C:\Program Files\GnuWin32\bin" in my case).

Assuming all pre-requisites are in place and next step should be editing the $PGDATA/postgresql.conf file and changing the archiving related parameters and restart the cluster:

wal_level=archive
archive_mode=on
archive_command = '"C:\\Program Files\\GnuWin32\\bin\\gzip.exe -1 " < "%p" > "C:\\Program Files\\PostgreSQL\\9.2\\archives\\%f.gz"'

c:\Program Files\PostgreSQL\9.2\bin>pg_ctl.exe -D ..\data start      (You can also start from services.msc)

As per PG documentation, changes has been made and restarted the cluster, anticipating from hereon my archives will be compressed one. Lets look at the logs:

2013-07-26 16:07:22 IST LOG:  archive command failed with exit code 1
2013-07-26 16:07:22 IST DETAIL:  The failed archive command was: """C:\Program Files\GnuWin32\bin\gzip.exe" -1 < "pg_xlog\000000010000000000000002" > "C:\Program Files\PostgreSQL\9.2\archives\000000010000000000000002.gz"
'""C:\Program' is not recognized as an internal or external command,
operable program or batch file

Hmmm Ok, archiver process has failed with an interesting error '""C:\Program' is not recognized as an internal or external command ", which signify cmd.exe utility utilized to parse the string has failed to execute the string we passed. First, lets know how its parsing the string from command line and then fix the archive_command(string).

c:\Program Files\PostgreSQL\9.2\bin>cmd.exe /C "C:/Program Files/PostgreSQL/9.2/bin/psql.exe --version"
'C:/Program' is not recognized as an internal or external command,
operable program or batch file.

"/C" mean, Carry out the command specified by the string and then terminate.

Check the above simple command passed in string to find the "psql version" and the error it has thrown which's similar to the one I have in logs. Lets check what "cmd.exe /?" say about parsing the string.

If /C or /K is specified, then the remainder of the command line after
the switch is processed as a command line, where the following logic is
used to process quote (") characters:

    1.  If all of the following conditions are met, then quote characters
        on the command line are preserved:

        - no /S switch
        - exactly two quote characters
        - no special characters between the two quote characters,
          where special is one of: &<>()@^|
        - there are one or more whitespace characters between the
          two quote characters
        - the string between the two quote characters is the name
          of an executable file.

    2.  Otherwise, old behavior is to see if the first character is
        a quote character and if so, strip the leading character and
        remove the last quote character on the command line, preserving
        any text after the last quote character.

As per cmd.exe help, its clear that we should not have special character (which I have in gzip.exe > , < , -1) between the two quoted characters, so to fix, you should close the entire string again in another set of double quotes. Lets retake our previous command with extra double quoted.


c:\Program Files\PostgreSQL\9.2\bin>cmd.exe /C ""C:/Program Files/PostgreSQL/9.2/bin/psql.exe" --version"

psql (PostgreSQL) 9.2.2

Fine, its fixed now. Let me put all in points where you can bypass errors while setting archive_command:

archive_command(string) must be in single quotes
Each part of the command should be double quoted like command,source path, and destination path.
Command options SHOULD NOT be double quoted
Maintain one extra double quote on entire string within start/end of single quote.

Now, let me correct my archive_command....

archive_command = '""C:\\Program Files\\GnuWin32\\bin\\gzip.exe" -1 < "%p" > "C:\\Program Files\\PostgreSQL\\9.2\\archives\\%f.gz""'

Though RELOAD will effect the changes, but I recommend to take RESTART. Now you should have all .gz archive files:

c:\Program Files\PostgreSQL\9.2\archives>dir
 Volume in drive C has no label.
 Volume Serial Number is 4ABE-037A

 Directory of c:\Program Files\PostgreSQL\9.2\archives

07/27/2013  09:05 PM         3,613,153 00000001000000000000005A.gz
07/27/2013  09:05 PM         3,611,096 00000001000000000000005B.gz
07/27/2013  09:05 PM         3,612,856 00000001000000000000005C.gz
              89 File(s)    367,755,965 bytes
               2 Dir(s)  45,557,706,752 bytes free

Nice....
Similarly, you can furthermore use the GZIP.EXE -d for decompressing in "restore_command" while doing PITR on compressed archives.

restore_command='""C:\\Program Files\\GnuWin32\\bin\\gzip.exe" -d < "C:\\Program Files\\PostgreSQL\\9.2\\archives\\%f.gz" > "%p""'

Comments & Corrections are welcomed.

--Raghav

Monday, July 22, 2013

ERROR: could not access file "$libdir/plpython2" - ERROR: could not access file "$libdir/plpython3"

Above error described on PG mailing as it failes to CREATE LANGUAGE plpython2u/plpython3u on PG9.3Beta.

Error:
postgres=# create language plpython3u;
ERROR: could not access file "$libdir/plpython3": No such file or directory

postgres=# create language plpython2u;
ERROR: could not access file "$libdir/plpython2": No such file or directory

Before doing some study on above errors I read below PG documentation link on how PostgreSQL permits to create langage plpython and how they should be configured.

http://www.postgresql.org/docs/9.3/static/plpython-python23.html

Its clear from above link you need to compile the binary two time if you need both plpython2u & plpython3u. AFAIK, ActivePython 2.7.x for plpython2u and 3.2.x for plpython3u can be configured on PG 9.2.x without any difficulty, but I not ever gave a try on PG 9.3Beta2. So, considered to give a try and analyze the error about why and how it could be repaired, I first, begun source installation by setting base path with ActivePython 3.2. (ActivePython download link http://www.activestate.com/activepython/downloads)

[root@localhost postgresql-9.3beta1]# export PATH=/opt/ActivePython-3.2/bin:$PATH
[root@localhost postgresql-9.3beta1]# ./configure --prefix=/usr/local/pg93b --with-python

Compilation has failed and showed error in Config.log file as:

make[3]: Entering directory `/usr/local/src/postgresql-9.3beta1/src/pl/plpython'
*** Cannot build PL/Python because libpython is not a shared library.
*** You might have to rebuild your Python installation. Refer to
make[3]: Leaving directory `/usr/local/src/postgresql-9.3beta1/src/pl/plpython'

In PG documentation we have clear directions on the above error and why it happen, plpython will be a shared library on most of the platforms, but on some platforms we need to specifically force the compiler as python from shared library. For that either you can proceed to /src/pl/python/Makefile for changes or specifically state "shared_libpython=yes" while compiling.

Source compilation examines for two files when you use --with-python that are "python" & "python-config". Although I have ActivePython-3.2 in my base path, still compiler fails to find them "python" & "python-conifg"

[root@localhost postgresql-9.3beta1]# which python
/usr/local/bin/python
[root@localhost postgresql-9.3beta1]# which python-config
/usr/local/bin/python-config

Its because, in ActivePython 3.2 file names will be, "python" as "python3" and "python-config" as "python3-config" therefore compiler pointed to old one instead to new. At this point, Asif Naeem (Thank you for your insight) from our core Dev. Team notfied me to mock existing ActivePython-3.2 files as python & python-config. Its nearly like a hack from him, so I duplicated those files as:

cd /opt/ActivePython-3.2/bin
cp python3-config python-config 
cp python3 python

Ok, now I can see that in my base path.

[root@localhost postgresql-9.3beta1]# export PATH=/opt/ActivePython-3.2/bin:$PATH
[root@localhost postgresql-9.3beta1]# which python 
/opt/ActivePython-3.2/bin/python
[root@localhost postgresql-9.3beta1]# which python-config
/opt/ActivePython-3.2/bin/python-config

I recompiled PG source using --with-python after the alterations and also forcing the compiler to choose SHARED_LIBPYTHON using "shared_libpython".

./configure --prefix=/usr/local/pg93b3 --with-python
make shared_libpython=yes
make shared_libpython=yes install

These steps will effectively compile PG9.3Beta with ActivePython-3.2 libraries. Now lets create language plpython3u:

-bash-4.1$ psql -p 4444
psql (9.3beta1)
Type "help" for help.

postgres=# create language plpython3u;
The connection to the server was lost. Attempting reset: Failed.
!>
!>

Oops, this is strange, why it has crashed now.. In this kind of situation $PGDATA/pg_log are your friends to insight about the issue. Here's the database server log information about crash:

2013-07-13 22:08:37 IST-31208-postgres-postgres :LOG: statement: create language plpython3u;
Could not find platform independent libraries 
Could not find platform dependent libraries 
Consider setting $PYTHONHOME to [:]
Fatal Python error: Py_Initialize: Unable to get the locale encoding

Ok. I missed to set python paths, this is very important when you are working with python or perl languages on your database.
I have set PYTHONHOME, PYTHONPATH & LD_LIBRARY_PATH before starting the cluster.

export PYTHONHOME=/opt/ActivePython-3.2/
export PYTHONPATH=/opt/ActivePython-3.2/bin:$PATH
export LD_LIBRARY_PATH=/opt/ActivePython-3.2/lib:$LD_LIBRARY_PATH

/usr/local/pg93b3/bin/pg_ctl -D /usr/local/pg93b3/data/ start

-bash-4.1$ psql -p 4444
psql (9.3beta1)

Type "help" for help.

postgres=# create language plpython3u;
CREATE LANGUAGE

Nice...It has created plpython3u with ActivePython-3.2.

If you want plpython2u also on the same installation. Don't tweak as we did for ActivePython-3.2, just have a copy of ActivePython-2.7 and set it in the base path and recompile the source.

export PATH=/opt/ActivePython-2.7/bin:$PATH
./configure --prefix=/usr/local/pg93b2 --with-python
make shared_libpython=yes
make shared_libpython=yes install


export PYTHONHOME=/opt/ActivePython-2.7/
export PYTHONPATH=/opt/ActivePython-2.7/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/pg93b2/lib
export LD_LIBRARY_PATH=/opt/ActivePython-2.7/lib:$LD_LIBRARY_PATH

-bash-4.1$ ./psql -p 4444
psql (9.3beta2)
Type "help" for help.

postgres=#
postgres=# create language plpython2u;
CREATE LANGUAGE

Comments & Corrections are most welcomed.

--Raghav

Tuesday, July 9, 2013

Improvements in Slony-I 2.2.0beta

Thanks to Slony-I team for releasing Slony-I 2.2.0Beta with many enhancements on key areas. Few are mentioned here:

Event capturing protocol changed for performance (sl_log_1 / sl_log_2):

Earlier release, any DML(INSERT/UPDATE/DELETE) event data must store as SQL statement in two tables(sl_log_1/sl_log_2). Now, the protocol has completely modified to COPY format, as a result of this there'll be sensible performance improvement like lower processing overhead, lower memory consumption, less query processing work on subscriber database in slon process. Check below, on a simple INSERT statement how the event has been captured in previous and latest version.

Previous version 2.1.2:

==> slony1_log_2_00000000000000000005.sql <==

-- start of Slony-I data
------------------------------------------------------------------
insert into "public"."stest" ("id") values ('103');

Latest version 2.2.0Beta:

==> slony1_log_2_00000000000000000006.sql <==

------------------------------------------------------------------
COPY "_rep220"."sl_log_archive" ( log_origin, log_txid,log_tableid,log_actionseq,log_tablenspname, log_tablerelname, log_cmdtype, log_cmdupdnc
ols,log_cmdargs) FROM STDIN;
1       565688  1       2       public  stest   I       0       {id,1000}
\.

DDL handling: In old version, DDL's are treated as a unique slony event and stored in sl_event table with flag "DDL_SCRIPT" and there clearly no indication whether applied or not. Now, unique slony events are shifted from "sl_event" to "sl_log_script" and a flag to share with if it has successfully applied or not. Sl_log_script.log_cmdtype column indicates, "S"(upper) if EXECUTE SCRIPT issues and registered as event to utilize and "s" (lower) indicates that script execution completed. Check below, previous and latest version:

Previous version 2.1.2:

postgres=# select ev_origin,ev_type,ev_data1,ev_data2 from _newbuild.sl_event where ev_type ilike 'ddl_script';
 ev_origin |  ev_type   | ev_data1 |                ev_data2
-----------+------------+----------+-----------------------------------------
         1 | DDL_SCRIPT | 1        | begin;                                 +
           |            |          | alter table stest add column name text;+
           |            |          | end;
           
Latest version 2.2.0Beta:
           
postgres=# select * from _rep220.sl_log_script ;
 log_origin | log_txid | log_actionseq | log_cmdtype |                 log_cmdargs
------------+----------+---------------+-------------+---------------------------------------------
          1 |   681589 |             1 | S           | {"alter table stest add column name text;"}
          1 |   681589 |             2 | S           | {"                                         +
            |          |               |             | "}
          1 |   681589 |             3 | s           | {}

New 3 catalogs added to Slony-I schema:

sl_apply_sync : This table gives clear picture on how many events like DML's, DDL's applied so far including the event applying frequence.

postgres=# select * from _rep220.sl_apply_stats ;
-[ RECORD 1 ]--------+---------------------------------
as_origin            | 1
as_num_insert        | 21
as_num_update        | 0
as_num_delete        | 0
as_num_truncate      | 0
as_num_script        | 1
as_num_total         | 21
as_duration          | 00:00:11.84
as_apply_first       | 2013-06-16 22:43:18.866365+05:30
as_apply_last        | 2013-06-17 03:18:13.324941+05:30
as_cache_prepare     | 2
as_cache_hit         | 19
as_cache_evict       | 0
as_cache_prepare_max | 1

sl_log_script : Its another catalog to catch an unique event like DDL's issued with EXECUTE SCRIPT. Earlier release, these events were captured in sl_log_1/sl_log_2 without any status of execution of those events. This new catalog can facilitate user to grasp concerning the DDL event details.

sl_failover_targets: The view sl_failover_targets displays the valid failover targets for every origin node.

postgres=# select * from _rep220.sl_failover_targets ;
 set_id | set_origin | backup_id
--------+------------+-----------
      1 |          1 |         2
(1 row)

Many more changes in new release, you can refer to the release notes:

http://git.postgresql.org/gitweb/?p=slony1-engine.git;a=blob_plain;f=RELEASE;h=be03be66d8f39ee148b50dadf71ddbe20a1e5ad8;hb=e3e785c93d14b009abc8424fa7e53e8b75c0f098

Slony-I 2.2.0 beta 4 can be downloaded from:
http://www.slony.info/downloads/2.2/source/slony1-2.2.0.b4.tar.bz2
http://www.slony.info/downloads/2.2/source/slony1-2.2.0.b4-docs.tar.bz2

--Raghav

Friday, May 31, 2013

GET DIAGNOSTICS with COPY statement in Pl/pgsql function

Now GET DIAGNOSTIC will return the number of rows processed by COPY statement in a Pl/Pgsql function.
COPY statement in Pl/Pgsql Function:

CREATE OR REPLACE FUNCTION public.copy_data(fname text) RETURNS integer
AS 
$$
declare
    copy_qry text;
    cnt integer;
Begin
copy_qry := 'copy t from'||quote_literal(fname)||' with CSV HEADER;';
Execute copy_qry;
GET DIAGNOSTICS cnt = ROW_COUNT;
return cnt;
end;
$$ Language plpgsql;

Previous release:

-bash-4.1$ psql
psql.bin (9.2.3)
Type "help" for help.

postgres=# select copy_data('/usr/local/pg93beta/t_load.csv');
 copy_data
-----------
         0
(1 row)

In PostgreSQL 9.3

-bash-4.1$ ./psql -p 5555
psql (9.3beta1)
Type "help" for help.

postgres=# select copy_data('/usr/local/pg93beta/t_load.csv');
 copy_data
-----------
         3
(1 row)
or
-bash-4.1$ ./psql -p 5555 -c "select copy_data('/usr/local/pg93beta/t_load.csv');"
 copy_data
-----------
         3
(1 row)

Thanks to author, it looks simple but very effective when working with loading the data using scripts and want to know the number of rows processed by the COPY statement.

--Raghav

Thursday, May 16, 2013

Disk page checksums to detect filesystem failures in PostgreSQL 9.3Beta 1

New feature introduced in PostgreSQL 9.3Beta 1 i.e. "Disk page checksums". Thanks to author Ants Aasama and Simon Riggs, Jeff Davis,Greg Smith.

In earlier releases, if there's any data corruption block on disk it was silently ignored until any pointer arrives on it or some wrong results shown by the queries. Now, data corruption detected beforehand by triggering WARNING message instead of silently using or waiting for hit on the corrupted block.

Disk page checksums feature implementation is unique, its not plug-able like EXTENSIONs its selectable feature. That's if you need your database should be under monitory umbrella of data corruption then it should be enabled at the time of cluster initialization not on existing or running cluster. Below's the example how it works.

Initialize the cluster with checksums:

initdb [OPTION]... [DATADIR]
   ........
   -k, --data-checksums      use data page checksums
   
initdb -D data_directory -k

Now, any data corruption found will be notified as below:

postgres=# select * from corruption_test;
WARNING:  page verification failed, calculated checksum 63023 but expected 48009
ERROR:  invalid page in block 0 of relation base/12896/16409

In earlier version,just an error message.

postgres=# select * from corruption_test where id=1;
ERROR:  invalid page header in block 0 of relation base/12870/18192

That's cool right....

So, how do you know whether disk page checksums enabled on the cluster or not ?
As of now, there's no pg_catalog to store such information or any files created in the $PGDATA directory, only pg_control file will hold that information. Using pg_controldata utility you can know about it.

$ export PGDATA=/usr/local/pg93beta/data
$ pg_controldata 
....
....
....
Data page checksum version:           1

Some points on Disk page checksums:
1. Temp tables are excluded from checksums checks.
2. There's performance overhead if checksums enabled as per the PG documentation.
3. Once enabled checksums on a cluster cannot be rolled back.

Thanks
Raghav

Tuesday, May 14, 2013

What if, import file (txt/csv) having "BOM-ed UTF-8" encoding?

So what is "UTF-8 BOM" mean ? its byte order mark for UTF-8, some bytes (0xEF,0xBB,0xBF) are added at the start of the file to indicate that the file having unicode characters in it. BOM Characters "ï»¿9".

As per Unicode documentation, the presence of BOM in file are useless, because it causes problems with non-BOM-aware software's to identify or parse the leading characters having at the start. Same has been quoted at the bottom of the Wikipedia page:

Related errors in PostgreSQL:
ERROR: invalid input syntax for integer: "ï»¿9" (in psql-client)
SQL state: 22P02 (in PgAdmin-III)

Test case & fix on Windows:
Sample file "state_data.txt" created in NOTEPAD with unicode characters in it:

9,Karnataka,कर्नाटक
10,Kerala,केरळा

Table to import data:

create table states(state_code int, state_name char(30), state_in_hindi text);

Error:

postgres=# copy test from 'c:/Pgfile/state_data.txt' with delimiter ',' CSV;
ERROR:  invalid input syntax for integer: "ï»¿9"
CONTEXT:  COPY test, line 1, column state_code: "ï»¿9"

To fix, I have used a tool "bomremover.exe" to remove leading characters from a file as its on windows, if its on linux, then there are many tips & tricks available on net to wipe BOM from a utf-8 format file.

Tool Download link and usage:
http://www.mannaz.at/codebase/utf-byte-order-mark-bom-remover/

Eg:-
C:\Pgfile>bomremover.exe . *
Added '.\state_data.txt' to processing list.
Press enter to process all files in the list. (1 files in total)

Processing file '.\state_data.txt'...
Finished. Press Enter to Exit

After running bomremover.exe on file, re-run COPY command which will succeed to import data.

 state_code | state_name | State_name_in_hindi
------------+------------+---------------------
          9 | Karnataka  | αñòαñ░αÑ<8d>αñ¿αñ╛αñƒαñò
         10 | Kerala     | αñòαÑçαñ░αñ│αñ╛
(2 rows)

Some of the editors, avoids default saving text with UTF8-BOM:
- Windows - Notepad++ (In Notepade default BOM enabled)
- Linux - VI
- Mac - TextEdit

--Raghav

Saturday, February 16, 2013

Configuring .pgpass (Password File) on Windows/Linux/Solaris

PostgreSQL users access to the database goes through libpq library. It also has a feature of allowing automation for users without prompting password by keeping their details in .pgpass file. Lets see...

Points to be considered when setting .pgpass file.
1. It should be created in postgres user home directory.
2. It should disallow any access-level to world or group.
3. Location can be controlled with PGPASSFILE environment variable.

Syntax to follow in .pgpass file:
hostname:port:database:username:password
Eg:
localhost:5432:demo:dbuser:demo@123

On Windows Platform:
1. Open the command prompt as Postgres user. So, any files/directories created will be owned by "postgres" user, no need to give explicit permissions.

runas /user:postgres "cmd"
Note: It prompts for OS postgres user password. It can also be set 
      from command prompt opened as administrator. Right-click on 
      "Command Prompt" option and choose "Runas Administrator".
       Command to set pasword: "net user user_name new_password".

2. Switch to user home directory and create "postgresql" directory and "pgpass.conf" file as below

cd %appdata%
mkdir postgresql
cd postgresql
notepad pgpass.conf  //You can also check the file ownership from command line with "calcs" command.
Eg:
C:\Users\postgres\AppData\Roaming\postgresql>cacls pgpass.conf
C:\Users\postgres\AppData\Roaming\postgresql\pgpass.conf NT AUTHORITY\SYSTEM:(ID)F
                                                         BUILTIN\Administrators:(ID)F
                                                         WIN-G3BNU2NQAI1\postgres:(ID)F

Now connection will be via pgpass.conf file.

On Solaris/Linux platform:
Procedure is same on both of the platforms.

su - postgres     //this will land in the home directory set for postgres user
vi .pgpass.conf    //enter all users entries 
chmod 0600 .pgpass.conf    // change the ownership to 0600 to avoid error like 
-bash-3.2$ psql
WARNING: password file "/opt/PostgreSQL/9.2//.pgpass" has group or world access; permissions should be u=rw (0600) or less

Note: If the environment variable PGPASSWORD is set, then the ~/.pgpass file is not read.

Regards
Raghav