Monday, August 4, 2025

PostgreSQL DBA Interview Questions and Answers

 

PostgreSQL DBA interview questions with detailed answers.


🔹 Basic PostgreSQL Concepts

1. What is PostgreSQL? How is it different from MySQL or Oracle?

Answer:
PostgreSQL is an advanced open-source object-relational database system with strong support for SQL standards, JSON, full ACID compliance, and extensibility (e.g., custom data types, functions).
Differences:

  • vs MySQL: PostgreSQL is more standards-compliant and better at handling complex queries, indexing, and concurrency.

  • vs Oracle: Oracle is proprietary and offers advanced features, but PostgreSQL provides similar capabilities in open-source form.


2. How do you check the PostgreSQL version?

Answer:

bash

psql --version

Or inside psql:

sql

SELECT version();

3. What are schemas in PostgreSQL?

Answer:
Schemas are namespaces within a database. They allow you to organize tables, views, functions, etc., and prevent name conflicts.


4. What is the role of pg_hba.conf and postgresql.conf?

Answer:

  • pg_hba.conf: Defines client authentication rules.

  • postgresql.conf: Controls database server configuration (memory, logging, replication, etc.).


5. What data types are supported in PostgreSQL?

Answer:
Standard types: INTEGER, TEXT, BOOLEAN, DATE, NUMERIC
Advanced: JSON/JSONB, ARRAY, HSTORE, UUID, INET, TSVECTOR


6. What is a sequence in PostgreSQL?

Answer:
A sequence is a special kind of object used to generate unique numeric identifiers, often used for auto-incrementing primary keys.


🔹 SQL & Administration

7. How do you create and manage users and roles in PostgreSQL?

Answer:

sql

CREATE USER myuser WITH PASSWORD 'pass'; GRANT CONNECT ON DATABASE mydb TO myuser; ALTER USER myuser WITH SUPERUSER; -- if needed

8. How do you check current connections?

Answer:

sql

SELECT * FROM pg_stat_activity;

9. How do you terminate an active session?

Answer:

sql

SELECT pg_terminate_backend(pid) FROM pg_stat_activity WHERE datname = 'mydb';

10. How do you monitor query performance?

Answer:

  • Enable pg_stat_statements extension.

  • Use:

sql

SELECT * FROM pg_stat_statements ORDER BY total_time DESC LIMIT 10;

11. How do you list all tables in PostgreSQL?

Answer:

  • Inside psql: \dt

  • Or SQL:

sql

SELECT table_name FROM information_schema.tables WHERE table_schema='public';

12. Difference between a table and a materialized view?

Answer:

  • Table: Stores actual data and is updated real-time.

  • Materialized View: Stores the result of a query physically and needs to be refreshed manually.


🔹 Backup and Recovery

13. What are the different backup types in PostgreSQL?

Answer:

  • Logical: pg_dump, pg_dumpall

  • Physical: pg_basebackup

  • PITR: Continuous WAL archiving


14. How to perform full logical backup using pg_dump?

Answer:

bash

pg_dump -U postgres -F c -b -v -f mydb.backup mydb

15. What is pg_basebackup?

Answer:
It’s a tool for physical base backups used with streaming replication setups. Captures the entire data directory.


16. How to restore a .sql or .dump file?

Answer:

bash

psql -U postgres -d mydb -f mydb.sql # for .sql files pg_restore -U postgres -d mydb mydb.backup # for .dump files

17. What is PITR and how is it done?

Answer:
Point-In-Time Recovery lets you restore a DB to a specific timestamp using a base backup + WAL files.
Steps:

  • Take base backup.

  • Archive WAL logs.

  • Restore backup.

  • Set recovery_target_time in recovery.conf.


🔹 Performance Tuning

18. How to identify slow-running queries?

Answer:
Enable logging:

conf

log_min_duration_statement = 1000 # Logs queries longer than 1 second

Or use pg_stat_statements.


19. What is EXPLAIN and EXPLAIN ANALYZE?

Answer:

  • EXPLAIN: Shows the query plan.

  • EXPLAIN ANALYZE: Executes the query and shows actual time taken at each step.


20. Key performance parameters:

Answer:

  • work_mem: Memory per query operation (e.g., sort)

  • shared_buffers: Memory for caching

  • effective_cache_size: Estimation of available OS cache

  • maintenance_work_mem: For vacuum and create index operations


21. What is the difference between VACUUM, VACUUM FULL, and ANALYZE?

Answer:

  • VACUUM: Cleans up dead tuples.

  • VACUUM FULL: Rewrites and compacts table (locks table).

  • ANALYZE: Updates planner stats.


22. What causes table bloat and how to fix it?

Answer:
Bloat is caused by dead tuples left behind by UPDATE/DELETE.
Fix: Run VACUUM, VACUUM FULL, or pg_repack.


🔹 Replication & High Availability

23. What is streaming replication and how to set it up?

Answer:
Primary continuously streams WAL changes to standby.
Setup involves:

  • Enable WAL on primary (wal_level = replica)

  • Use pg_basebackup to clone

  • Configure primary_conninfo on standby


24. Difference between synchronous and asynchronous replication?

Answer:

  • Synchronous: Primary waits for standby to confirm.

  • Asynchronous: Primary does not wait; better performance, less safe.


25. What is a replication slot?

Answer:
A mechanism to ensure the primary retains WAL files until a subscriber or replica has received them.


26. What happens when the primary goes down?

Answer:
If synchronous: standby may block writes. If configured properly, standby can be promoted to primary using:

bash

pg_ctl promote

27. How to monitor replication lag?

Answer:

sql

SELECT now() - pg_last_xact_replay_timestamp() AS replication_lag;

🔹 Security

28. How is authentication handled in PostgreSQL?

Answer:
Defined in pg_hba.conf using methods like md5, trust, peer, or cert.


29. How to restrict user access to certain tables?

Answer:

sql

REVOKE ALL ON table_name FROM public; GRANT SELECT ON table_name TO myuser;

30. What is row-level security and how is it implemented?

Answer:
Allows filtering rows per user/condition:

sql

ALTER TABLE tablename ENABLE ROW LEVEL SECURITY; CREATE POLICY policy_name ON tablename USING (user_id = current_user);

🔹 Troubleshooting & Real-World Scenarios

31. PostgreSQL is running but not accepting connections — what do you check?

Answer:

  • pg_hba.conf rules

  • Port in postgresql.conf

  • listen_addresses

  • OS firewall or SELinux

  • Max connections


32. What if autovacuum is not working?

Answer:
Check:

  • If autovacuum is enabled in postgresql.conf

  • Table thresholds exceeded

  • Long-running transactions blocking cleanup


33. How do you find long-running queries?

Answer:

sql

SELECT pid, now() - query_start AS runtime, query FROM pg_stat_activity WHERE state = 'active';

34. User reports missing table – what do you do?

Answer:

  • Check schema and ownership.

  • Check logs for DROP statements.

  • Restore from backup if dropped.


35. How to find blocking sessions and resolve deadlocks?

Answer:

sql

SELECT blocked.pid AS blocked_pid, blocking.pid AS blocking_pid FROM pg_locks blocked JOIN pg_locks blocking ON blocking.locktype = blocked.locktype AND blocked.pid != blocking.pid JOIN pg_stat_activity blocked_activity ON blocked.pid = blocked_activity.pid JOIN pg_stat_activity blocking_activity ON blocking.pid = blocking_activity.pid WHERE NOT blocked.granted;

To terminate:

sql

SELECT pg_terminate_backend(pid);

Tuesday, November 27, 2018

How to install Hadoop Hive on CentOS7

How to install Hadoop Hive on CentOS7



What is Hive

Hive is a data warehouse infrastructure tool to process structured data in Hadoop. It resides on top of Hadoop to summarize Big Data, and makes querying and analyzing easy.
Initially Hive was developed by Facebook, later the Apache Software Foundation took it up and developed it further as an open source under the name Apache Hive. It is used by different companies. For example, Amazon uses it in Amazon Elastic MapReduce.
Hive is not
  • A relational database
  • A design for OnLine Transaction Processing (OLTP)
  • A language for real-time queries and row-level updates
  • It stores schema in a database and processed data into HDFS.
  • It is designed for OLAP.
  • It provides SQL type language for querying called HiveQL or HQL.
  • It is familiar, fast, scalable, and extensible.
The following component diagram depicts the architecture of Hive:Hive Architecture

Requirements: Install JAVA and Hadoop

Apache Hive2.1 requires java 7 or later version. We also need to install hadoop first before installing apache hive on our system. In example below, I'm using java 8
$ java -version
java version "1.8.0_92"
Java(TM) SE Runtime Environment (build 1.8.0_92-b14)
Java HotSpot(TM) 64-Bit Server VM (build 25.92-b14, mixed mode)



Download Hive package

After configuring hadoop successfully on your linux system. lets start hive setup. First download latest hive source code and extract archive using following commands.
$ cd /opt/hadoop
$ wget http://archive.apache.org/dist/hive/hive-2.1.0/apache-hive-2.1.0-bin.tar.gz
$ tar xzf apache-hive-2.1.0-bin.tar.gz
$ mv hive-2.1.0-bin hive
$ chown hadoop -R hive

Setup Environment Variables

Beside Hadoop environments, there are two more environment variables you need to add to ~.bashrc
# su - hadoop
export HIVE_HOME=/opt/hadoop/hive
export PATH=$HIVE_HOME/bin:$PATH

Preparation before starting Hive

Create /tmp and /user/hive/warehouse and set them chmod g+w in HDFS.

$ hdfs dfs -mkdir /tmp
$ hdfs dfs -mkdir /user/hive/warehouse
$ hdfs dfs -chmod g+w /tmp
$ hdfs dfs -chmod g+w /user/hive/warehouse

Hive metastore schema initialization

On Hive2.1.0, it's required to initialize metastore schema before hive starts, in the case below, I used db type derby

$ schematool -dbType derby -initSchemaTo 2.0.0 -verbose

which: no hbase in (/opt/hadoop/hive/bin:/usr/lib64/qt-3.3/bin:/bin:/usr/bin:/usr/X11R6/bin:/opt/local/bin:/usr/local/bin:/opt/hadoop/hadoop-2.7.3/sbin:/opt/hadoop/hadoop-2.7.3/bin:/home/hadoop/.local/bin:/home/hadoop/bin:/opt/hadoop/hadoop-2.7.3/sbin:/opt/hadoop/hadoop-2.7.3/bin)
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/hadoop/hive/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/hadoop/hadoop-2.7.3/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Metastore connection URL:     jdbc:derby:;databaseName=metastore_db;create=true
Metastore Connection Driver :     org.apache.derby.jdbc.EmbeddedDriver
Metastore connection User:     APP
Starting metastore schema initialization to 2.0.0
Initialization script hive-schema-2.0.0.derby.sql
Connecting to jdbc:derby:;databaseName=metastore_db;create=true
Connected to: Apache Derby (version 10.10.2.0 - (1582446))
Driver: Apache Derby Embedded JDBC Driver (version 10.10.2.0 - (1582446))
Transaction isolation: TRANSACTION_READ_COMMITTED
0: jdbc:derby:> !autocommit on
Autocommit status: true
0: jdbc:derby:> CREATE FUNCTION "APP"."NUCLEUS_ASCII" (C CHAR(1)) RETURNS INTEGER LANGUAGE JAVA PARAMETER STYLE JAVA READS SQL DATA CALLED ON NULL INPUT EXTERNAL NAME 'org.datanucleus.store.rdbms.adapter.DerbySQLFunction.ascii'
No rows affected (0.374 seconds)


Note:
You may have noticed that I initialized the schema to version 2.0.0 instead of 2.1.0, why?
This is caused old known problem of Hive, new release of Hive quite often missing the latest version of db schema support. To avoid that, just use previous release.
Here is the example error output when using version 2.1.0

$ schematool -dbType derby -initSchemaTo 2.1.0 -verbose

which: no hbase in (/opt/hadoop/hive/bin:/usr/lib64/qt-3.3/bin:/bin:/usr/bin:/usr/X11R6/bin:/opt/local/bin:/usr/local/bin:/opt/hadoop/hadoop-2.7.3/sbin:/opt/hadoop/hadoop-2.7.3/bin:/home/hadoop/.local/bin:/home/hadoop/bin:/opt/hadoop/hadoop-2.7.3/sbin:/opt/hadoop/hadoop-2.7.3/bin)
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/hadoop/hive/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/hadoop/hadoop-2.7.3/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Metastore connection URL:     jdbc:derby:;databaseName=metastore_db;create=true
Metastore Connection Driver :     org.apache.derby.jdbc.EmbeddedDriver
Metastore connection User:     APP
Starting metastore schema initialization to 2.1.0
org.apache.hadoop.hive.metastore.HiveMetaException: Unknown version specified for initialization: 2.1.0
org.apache.hadoop.hive.metastore.HiveMetaException: Unknown version specified for initialization: 2.1.0
    at org.apache.hadoop.hive.metastore.MetaStoreSchemaInfo.generateInitFileName(MetaStoreSchemaInfo.java:128)
    at org.apache.hive.beeline.HiveSchemaTool.doInit(HiveSchemaTool.java:282)
    at org.apache.hive.beeline.HiveSchemaTool.main(HiveSchemaTool.java:508)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
*** schemaTool failed ***

Tips:
In old Hive version, you don't have init metastore schema, so if you accidently ran hive before schema initialization, you may see error like below:
Error: FUNCTION 'NUCLEUS_ASCII' already exists. (state=X0Y68,code=30000)
Here is the fix for it
rm -f $HIVE_HOME/scripts/metastore/upgrade/derby/hive-schema-2.1.0.derby.sql 
then re init Hive metastore schema

Start Hive

Lets start using hive using following command.
$ hive
which: no hbase in (/opt/hadoop/hive/bin:/usr/lib64/qt-3.3/bin:/bin:/usr/bin:/usr/X11R6/bin:/opt/local/bin:/usr/local/bin:/opt/hadoop/hadoop-2.7.3/sbin:/opt/hadoop/hadoop-2.7.3/bin:/home/hadoop/.local/bin:/home/hadoop/bin:/opt/hadoop/hadoop-2.7.3/sbin:/opt/hadoop/hadoop-2.7.3/bin)
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/hadoop/hive/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/hadoop/hadoop-2.7.3/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]

Logging initialized using configuration in jar:file:/opt/hadoop/hive/lib/hive-common-2.1.0.jar!/hive-log4j2.properties Async: true
Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
$ hive >

Create  test Table

At this stage you have successfully installed hive. Lets create a sample table using following command
hive>  CREATE TABLE test (id int, name string, age int);
OK
Time taken: 1.223 seconds
Show the created tables with below command.
hive> SHOW TABLES;
OK
test
Time taken: 0.431 seconds, Fetched: 1 row(s)
Drop the table using below command.
hive> DROP TABLE test;
OK
Time taken: 1.393 seconds
Now you have Hadoop Hive installed and configured on your Hadoop cluster!

http://fibrevillage.com/storage/649-how-to-install-hadoop-hive-on-rhel7-centos7


Wednesday, May 2, 2018

High Level Installation steps for OBIEE 12c

High Level Installation steps for OBIEE


Steps

5 Configuration of OBIEE -12.2.1.0.0
4 Create the OBIEE(RCU)
3 Install OBIEE-12.2.1.0.0
2 Fusion Middleware Infrastructure -12.2.1.0.0
1 Install the JDK -1.8.0_65
0 Install Oracle database -12.2.1.0.0



High Level Installation Steps for OBIA 11g

High Level Installation steps for  OBIA

Step OBIA-Version -11.1.1.10.2

Steps 
10 Configuration of OBIA 11g
9 Updating FSM, ATGLite
8 Install OBIA software-11.1.1.10.2
7 Create the OBIA(RCU)
6 Install ODI -11.1.1.9
5 Configuration of OBIEE -11.1.1.9
4 Create the OBIEE(RCU)
3 Install OBIEE -11.1.1.9
2 Weblogic -10.3.6
1 Install the JDK-1.7.0_80


Start and Stop OBIEE Services in OBIEE 12c


Start and Stop OBIEE Services in OBIEE 12c

Start  and Stop OBIEE Server

[oracle@ServerName]$ cd /d01/Middleware/Oracle_home/user_projects/domains/bi/bitools/bin/
 [oracle@ServerName bin]$./status.sh

  • AdminServer (BI admin server)
  • bi_server1 (BI managed server)
  • obips1 (managed BI presentation server)
  • obijh1 (managed BI Java host server)
  • obiccs1 (managed BI clustering server)
  • obisch1 (managed BI scheduler server)
  • obis1 (managed BI NQ server)

 [oracle@ServerName bin]$./start.sh
[oracle@ServerName bin]$./stop.sh

Stop  particular server
[oracle@ServerName bin]$./stop.sh -i obips1
[oracle@ServerName bin]$./stop.sh -i obijh1
[oracle@ServerName bin]$./stop.sh -i obiccs1
[oracle@ServerName bin]$./stop.sh -i obis1
oracle@ServerName bin]$./stop.sh -i obisch1

Start  particular server
[oracle@ServerName bin]$./start.sh -i obips1
[oracle@ServerName bin]$./start.sh -i obijh1
[oracle@ServerName bin]$./start.sh -i obiccs1
[oracle@ServerName bin]$./start.sh -i obis1
oracle@ServerName bin]$./start.sh -i obisch1

PostgreSQL DBA Interview Questions and Answers

  PostgreSQL DBA interview questions with detailed answers. 🔹 Basic PostgreSQL Concepts 1. What is PostgreSQL? How is it different from M...