During one of my projects, I had to install and maintain Sun Grid Engine. The big issue was since ever Oracle took over Sun and they released Oracle Grid Engine, the posts in the Sun's web site which gave the help for installing and trouble shooting was removed off. During installing, I found many issues for which I struggled, Googled days together for a simple setting.
I wish to give credits to all the people who have blogged about SGE. This post is a collected googled search result which helped me to install SGE successfully.
I am posting installing and configuration of SGE, placing jobs in python via DMRAA using MySQL as backend DB.
---------------------------------
setting up the hosts file
---------------------------------
open terminal, type:
sudo gedit
open /etc/hosts
add
192.168.xxx.xxx <ur sys name>
192.168.xxx.xxx <host sys name> # slave system UP address and name
127.0.0.1 localhost
save and close
-------------------------------
setting up the sources file
-------------------------------
open etc/apt/sources
add
deb http://archive.canonical.com/ubuntu maverick partner
deb-src http://archive.canonical.com/ubuntu maverick partner
uncomment
deb http://archive.canonical.com/ubuntu natty partner
save and close
----------------------------------------------------
installing the pre-requisite and grid engine
----------------------------------------------------
in the cmd prompt:
sudo apt-get install default-jre
sudo apt-get install python-drmaa
sudo apt-get install gridengine-client gridengine-common gridengine-master gridengine-qmon gridengine-exec
set sge master as nami-ubuntu #or the name of the qmaster's system
for client system:
sudo apt-get install gridengine-client gridengine-exec
----------------------------------------------------
setting up the environment variables
----------------------------------------------------
sudo gedit
Edit /etc/profile and /etc/bash.bachrc, add the following two lines
export SGE_ROOT=/var/lib/gridengine #this is the path on our machines
export SGE_CELL=default
#for the below entry add the qmaster's username@systemname
export SGE_O_HOST=namitha@nami-ubuntu
export DRMAA_LIBRARY_PATH=/usr/lib/pymodules/python2.7/libdrmaa.so.1.0
---------------------------------
clearing GE error state if any
---------------------------------
in the cmd prompt: type qmon if error like qmon failed to start due to missing fonts ‘-adobe-helvetica-…” then
sudo apt-get install xfs xfstt
sudo apt-get install t1-xfree86-nonfree ttf-xfree86-nonfree ttf-xfree86-nonfree-syriac xfonts-75dpi xfonts-100dpi
----------------------------------
installing GE at the slave system
----------------------------------
ssh -x assetmgr@192.168.150.198 enter password
open etc/hosts
add
192.168.xxx.xxx <ur sys name>
192.168.xxx.xxx <GE Master sys name>
127.0.0.1 localhost
save and close
open /etc/profile and /etc/bash.bachrc, add the following two lines
add SGE_O_HOST environment variable to all the master and slave hosts
export SGE_ROOT=/var/lib/gridengine #this is the path on our machines
export SGE_CELL=default
export SGE_SLAVE_MOUNTED=/mnt/slave-path/
export SGE_O_HOST=namitha@nami-ubuntu
export DRMAA_LIBRARY_PATH=/usr/lib/pymodules/python2.7/libdrmaa.so.1.0
#Source the script: source /etc/profile
copy sge_global_config_files.py to /usr/lib/python2.7/ and make the necessary changes
exit from ssh type: exit
----------------------------------------------------
setting up Qmon settings
----------------------------------------------------
in the command prompt type:
#add the user as a manager
sudo qconf -am <user>
qconf -sm #lists all the managers for the sge
sudo qconf -am <user name> #for the user name type the user of the sys where qmon is running.
IMP: sudo cp -r /usr/share/doc/gridengine-common/examples/mpi /var/lib/gridengine/mpi
Type Qmon
NOTE: for all the integer fields, click and get the correct no. typing the no directly works in certain fields, some just defaults the value to 0 on ok click
In the qmon, :go to host configuration
Configure hosts and:
"Host Configuration" -> "Administration Host" -> Add master node and other administrative nodes
"Host Configuration" -> "Submit Host" -> Add master node and other submit nodes
"Host Configuration" -> "Execution Host" -> Add slave nodes.
In the Exec host, add the user group 'arusers'
->Click on "Done" to finish
"User Configuration" -> "Userset" -> Highlight userset "arusers" and click on "Modify" -> Input user name in "User/Group" field(add the master and the slaves user name)
->Click "Done" to finish.
"Queue Control" -> q name: main.q, New Host/hostgroup: Master and slave name, user access: arusers, General configuration: slot = 10
parallel environment: click the add PE-> name: assetPE, slot: 99, user lists: arusers,
start_proc_args = /var/lib/gridengine/mpi/startmpi.sh $pe_hostfile
stop_proc_args = /var/lib/gridengine/mpi/stopmpi.sh
allocation_rule=$fill_up,
check "Control slaves" to make this variable checked.
-------------------------------------------------
checking if gridengine is installed properly
-------------------------------------------------
now we are set to assign a job to qmaster. but first check whether sge hosts are running properly, in the command prompt type:
qhost #it should list the system info from all nodes
qconf -sel #it should list the hostnames of nodes
qconf -sql #it should list the queues
ps aux | grep sge_qmaster | grep -v grep #check master daemon
ps aux | grep sge_execd | grep -v grep #check execute daemon
ps aux | grep sge_execd | grep -v grep #check execute daemon
#If sge_qmaster or sge_execd daemon is not running, try starting by service
#mpiuser@ub1:~$ sudo service gridengine-master start
#mpiuser@ub1:~$ sudo service gridengine-exec start
#Reboot node(s) if sge_qmaster or sge_execd fails to start
check if there are any error messages logged in the system
go to /var/spool/gridengine/execd/<sysname>/messages for the execd
go to /var/spool/gridengine/qmaster/<sysname>/messages for the qmaster
if any error messages resolve and then proceed ahead
-------------------------------------------------------
creation of test script to test the GE settings
-------------------------------------------------------
create a new file named scripts.sh
#!/bin/bash
### Request Bourne shell as shell for job
#$ -S /bin/bash
### Name the job:
#$ -N test
echo “Running environment:”
env
echo “=============================”
###end of script
in the command prompt: qsub /<path>/script1.sh
------------------------------
installing of mysql
------------------------------
sudo apt-get install openjdk-7-jdk openjdk-7-jre icedtea-7-plugin
sudo apt-get install mysql-server
Add a login user and group for mysqld: in the terminal add:
groupadd mysql
useradd -g mysql mysql
to start mysql: mysql -u root -p
user account created for SGE in mysql:
CREATE USER 'sge_user'@'localhost' IDENTIFIED BY 'password';
mysql -u root -p
GRANT ALL PRIVILEGES ON *.* TO 'sge_user'@'localhost' WITH GRANT OPTION;
---------------------------------
for DRMAA support in SGE:
---------------------------------
for DRMAA support in SGE:
sudo apt-get install libdrmaa1.0
set environment variable
open /etc/profile and /etc/bash.bachrc, add the following:
export DRMAA_LIBRARY_PATH=/usr/lib/pymodules/python2.7/libdrmaa.so.1.0
----------------------------
for installing apache:
----------------------------
sudo apt-get install apache2
sudo apt-get install libapache2-mod-wsgi
-------------------------------
for python-mysql usage:
-------------------------------
sudo apt-get install python-mysqldb
sudo /etc/init.d/apache2 restart
if getting error like Could not reliably determine the server's fully qualified domain name
sudo vim /etc/apache2/httpd.conf
Insert the following line at the httpd.conf: ServerName localhost
Just restart the Apache: sudo /etc/init.d/apache2 restart
-------------------------------------
for making an exe of python script
-------------------------------------
install pyinstaller-1.5.1
python ~/Documents/pyinstaller-1.5.1/Configure.py
python ~/Documents/pyinstaller-1.5.1/Makespec.py --onefile <filename>.py
sudo python Build.py hello10sec/<filename>.spec
python ~/Documents/pyinstaller-1.5.1/Build.py /home/namitha/scripts/<filename>.spec
---------------------------------------------------------------------------
for doing scp without password have to generate a kengen in the resp sys
---------------------------------------------------------------------------
-----------do this to all the slave hosts---------------
suppose slave-ubuntu is a slave and nami-ubuntu is the SGE-master, then:
in slave-ubuntu: ssh-keygen
type enter without entering the password
make sure the file id_rsa should be chmod 600 /home/namitha/.ssh/id_rsa
slave-ubuntu: ssh-copy-id -i /home/slave-ubuntu/.ssh/id_rsa.pub namitha@nami-ubuntu
-----------------------
for the web service
------------------------
in the terminal: sudo apt-get install python-webob
----------------------------------
for hosting the web service
-----------------------------------
vim /etc/apache2/sites-available/default
add this line at the top: WSGIPythonPath /home/namitha/Documents/GridEngine/
at the last add:
Alias /gridengineservice/ "/home/namitha/Documents/GridEngine/"
<Directory "/home/namitha/Documents/GridEngine/">
WSGIApplicationGroup %{GLOBAL}
Options Indexes FollowSymLinks MultiViews ExecCGI
AddHandler cgi-script .cgi
AddHandler wsgi-script .wsgi
AllowOverride None
Order allow,deny
allow from all
</Directory>
---------------to see error in the command.wsgi---- vim /var/log/apache2/error.log
restart apache2: sudo /etc/init.d/apache2 restart
---------------------------
for mounting and umounting
---------------------------
sudo apt-get install nfs-kernel-server
sudo apt-get install nfs-common
reboot the sytem
sudo /etc/init.d/nfs-kernel-server restart
sudo umount /mnt/Path/
sudo mount 192.168.xxx.xxx:/home/namitha/share/ /mnt/Path/
I wish to give credits to all the people who have blogged about SGE. This post is a collected googled search result which helped me to install SGE successfully.
I am posting installing and configuration of SGE, placing jobs in python via DMRAA using MySQL as backend DB.
---------------------------------
setting up the hosts file
---------------------------------
open terminal, type:
sudo gedit
open /etc/hosts
add
192.168.xxx.xxx <ur sys name>
192.168.xxx.xxx <host sys name> # slave system UP address and name
127.0.0.1 localhost
save and close
-------------------------------
setting up the sources file
-------------------------------
open etc/apt/sources
add
deb http://archive.canonical.com/ubuntu maverick partner
deb-src http://archive.canonical.com/ubuntu maverick partner
uncomment
deb http://archive.canonical.com/ubuntu natty partner
save and close
----------------------------------------------------
installing the pre-requisite and grid engine
----------------------------------------------------
in the cmd prompt:
sudo apt-get install default-jre
sudo apt-get install python-drmaa
sudo apt-get install gridengine-client gridengine-common gridengine-master gridengine-qmon gridengine-exec
set sge master as nami-ubuntu #or the name of the qmaster's system
for client system:
sudo apt-get install gridengine-client gridengine-exec
----------------------------------------------------
setting up the environment variables
----------------------------------------------------
sudo gedit
Edit /etc/profile and /etc/bash.bachrc, add the following two lines
export SGE_ROOT=/var/lib/gridengine #this is the path on our machines
export SGE_CELL=default
#for the below entry add the qmaster's username@systemname
export SGE_O_HOST=namitha@nami-ubuntu
export DRMAA_LIBRARY_PATH=/usr/lib/pymodules/python2.7/libdrmaa.so.1.0
---------------------------------
clearing GE error state if any
---------------------------------
in the cmd prompt: type qmon if error like qmon failed to start due to missing fonts ‘-adobe-helvetica-…” then
sudo apt-get install xfs xfstt
sudo apt-get install t1-xfree86-nonfree ttf-xfree86-nonfree ttf-xfree86-nonfree-syriac xfonts-75dpi xfonts-100dpi
----------------------------------
installing GE at the slave system
----------------------------------
ssh -x assetmgr@192.168.150.198 enter password
open etc/hosts
add
192.168.xxx.xxx <ur sys name>
192.168.xxx.xxx <GE Master sys name>
127.0.0.1 localhost
save and close
open /etc/profile and /etc/bash.bachrc, add the following two lines
add SGE_O_HOST environment variable to all the master and slave hosts
export SGE_ROOT=/var/lib/gridengine #this is the path on our machines
export SGE_CELL=default
export SGE_SLAVE_MOUNTED=/mnt/slave-path/
export SGE_O_HOST=namitha@nami-ubuntu
export DRMAA_LIBRARY_PATH=/usr/lib/pymodules/python2.7/libdrmaa.so.1.0
#Source the script: source /etc/profile
copy sge_global_config_files.py to /usr/lib/python2.7/ and make the necessary changes
exit from ssh type: exit
----------------------------------------------------
setting up Qmon settings
----------------------------------------------------
in the command prompt type:
#add the user as a manager
sudo qconf -am <user>
qconf -sm #lists all the managers for the sge
sudo qconf -am <user name> #for the user name type the user of the sys where qmon is running.
IMP: sudo cp -r /usr/share/doc/gridengine-common/examples/mpi /var/lib/gridengine/mpi
Type Qmon
NOTE: for all the integer fields, click and get the correct no. typing the no directly works in certain fields, some just defaults the value to 0 on ok click
In the qmon, :go to host configuration
Configure hosts and:
"Host Configuration" -> "Administration Host" -> Add master node and other administrative nodes
"Host Configuration" -> "Submit Host" -> Add master node and other submit nodes
"Host Configuration" -> "Execution Host" -> Add slave nodes.
In the Exec host, add the user group 'arusers'
->Click on "Done" to finish
"User Configuration" -> "Userset" -> Highlight userset "arusers" and click on "Modify" -> Input user name in "User/Group" field(add the master and the slaves user name)
->Click "Done" to finish.
"Queue Control" -> q name: main.q, New Host/hostgroup: Master and slave name, user access: arusers, General configuration: slot = 10
parallel environment: click the add PE-> name: assetPE, slot: 99, user lists: arusers,
start_proc_args = /var/lib/gridengine/mpi/startmpi.sh $pe_hostfile
stop_proc_args = /var/lib/gridengine/mpi/stopmpi.sh
allocation_rule=$fill_up,
check "Control slaves" to make this variable checked.
-------------------------------------------------
checking if gridengine is installed properly
-------------------------------------------------
now we are set to assign a job to qmaster. but first check whether sge hosts are running properly, in the command prompt type:
qhost #it should list the system info from all nodes
qconf -sel #it should list the hostnames of nodes
qconf -sql #it should list the queues
ps aux | grep sge_qmaster | grep -v grep #check master daemon
ps aux | grep sge_execd | grep -v grep #check execute daemon
ps aux | grep sge_execd | grep -v grep #check execute daemon
#If sge_qmaster or sge_execd daemon is not running, try starting by service
#mpiuser@ub1:~$ sudo service gridengine-master start
#mpiuser@ub1:~$ sudo service gridengine-exec start
#Reboot node(s) if sge_qmaster or sge_execd fails to start
check if there are any error messages logged in the system
go to /var/spool/gridengine/execd/<sysname>/messages for the execd
go to /var/spool/gridengine/qmaster/<sysname>/messages for the qmaster
if any error messages resolve and then proceed ahead
-------------------------------------------------------
creation of test script to test the GE settings
-------------------------------------------------------
create a new file named scripts.sh
#!/bin/bash
### Request Bourne shell as shell for job
#$ -S /bin/bash
### Name the job:
#$ -N test
echo “Running environment:”
env
echo “=============================”
###end of script
in the command prompt: qsub /<path>/script1.sh
------------------------------
installing of mysql
------------------------------
sudo apt-get install openjdk-7-jdk openjdk-7-jre icedtea-7-plugin
sudo apt-get install mysql-server
Add a login user and group for mysqld: in the terminal add:
groupadd mysql
useradd -g mysql mysql
to start mysql: mysql -u root -p
user account created for SGE in mysql:
CREATE USER 'sge_user'@'localhost' IDENTIFIED BY 'password';
mysql -u root -p
GRANT ALL PRIVILEGES ON *.* TO 'sge_user'@'localhost' WITH GRANT OPTION;
---------------------------------
for DRMAA support in SGE:
---------------------------------
for DRMAA support in SGE:
sudo apt-get install libdrmaa1.0
set environment variable
open /etc/profile and /etc/bash.bachrc, add the following:
export DRMAA_LIBRARY_PATH=/usr/lib/pymodules/python2.7/libdrmaa.so.1.0
----------------------------
for installing apache:
----------------------------
sudo apt-get install apache2
sudo apt-get install libapache2-mod-wsgi
-------------------------------
for python-mysql usage:
-------------------------------
sudo apt-get install python-mysqldb
sudo /etc/init.d/apache2 restart
if getting error like Could not reliably determine the server's fully qualified domain name
sudo vim /etc/apache2/httpd.conf
Insert the following line at the httpd.conf: ServerName localhost
Just restart the Apache: sudo /etc/init.d/apache2 restart
-------------------------------------
for making an exe of python script
-------------------------------------
install pyinstaller-1.5.1
python ~/Documents/pyinstaller-1.5.1/Configure.py
python ~/Documents/pyinstaller-1.5.1/Makespec.py --onefile <filename>.py
sudo python Build.py hello10sec/<filename>.spec
python ~/Documents/pyinstaller-1.5.1/Build.py /home/namitha/scripts/<filename>.spec
---------------------------------------------------------------------------
for doing scp without password have to generate a kengen in the resp sys
---------------------------------------------------------------------------
-----------do this to all the slave hosts---------------
suppose slave-ubuntu is a slave and nami-ubuntu is the SGE-master, then:
in slave-ubuntu: ssh-keygen
type enter without entering the password
make sure the file id_rsa should be chmod 600 /home/namitha/.ssh/id_rsa
slave-ubuntu: ssh-copy-id -i /home/slave-ubuntu/.ssh/id_rsa.pub namitha@nami-ubuntu
-----------------------
for the web service
------------------------
in the terminal: sudo apt-get install python-webob
----------------------------------
for hosting the web service
-----------------------------------
vim /etc/apache2/sites-available/default
add this line at the top: WSGIPythonPath /home/namitha/Documents/GridEngine/
at the last add:
Alias /gridengineservice/ "/home/namitha/Documents/GridEngine/"
<Directory "/home/namitha/Documents/GridEngine/">
WSGIApplicationGroup %{GLOBAL}
Options Indexes FollowSymLinks MultiViews ExecCGI
AddHandler cgi-script .cgi
AddHandler wsgi-script .wsgi
AllowOverride None
Order allow,deny
allow from all
</Directory>
---------------to see error in the command.wsgi---- vim /var/log/apache2/error.log
restart apache2: sudo /etc/init.d/apache2 restart
---------------------------
for mounting and umounting
---------------------------
sudo apt-get install nfs-kernel-server
sudo apt-get install nfs-common
reboot the sytem
sudo /etc/init.d/nfs-kernel-server restart
sudo umount /mnt/Path/
sudo mount 192.168.xxx.xxx:/home/namitha/share/ /mnt/Path/
No comments:
Post a Comment