Alan De Smet Computer Sciences Department University of Wisconsin-Madison [email protected] ...

187
Alan De Smet Computer Sciences Department University of Wisconsin-Madison [email protected] http://www.cs.wisc.edu/condor Condor Administration

Transcript of Alan De Smet Computer Sciences Department University of Wisconsin-Madison [email protected] ...

Page 1: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

Alan De SmetComputer Sciences DepartmentUniversity of Wisconsin-Madison

[email protected]://www.cs.wisc.edu/condor

Condor Administration

Page 2: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

2www.cs.wisc.edu/condor

Outline

› Condor Daemons Job Startup

› Configuration Files

› Policy Expressions Startd (Machine) Negotiator

› Priorities

› Security

› Administration

› Installation “Full

Installation”

› Other Sources

Page 3: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

3www.cs.wisc.edu/condor

Condor Daemons

Page 4: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

4www.cs.wisc.edu/condor

Condor Daemons

›condor_master - controls everything else

›condor_startd - executing jobs condor_starter - helper for starting

jobs

›condor_schedd - submitting jobs condor_shadow - submit-side helper

Page 5: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

5www.cs.wisc.edu/condor

Condor Daemons

›condor_collector - Collects system information; only on Central Manager

›condor_negotiator - Assigns jobs to machines; only on Central Manager

› You only have to run the daemons for the services you want to provide

Page 6: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

6www.cs.wisc.edu/condor

condor_master

› Starts up all other Condor daemons

› If a daemon exits unexpectedly, restarts deamon and emails administrator

› If a daemon binary is updated (timestamp changed), restarts the daemon

Page 7: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

7www.cs.wisc.edu/condor

condor_master

› Provides access to many remote administration commands: condor_reconfig, condor_restart, condor_off, condor_on, etc.

› Default server for many other commands: condor_config_val, etc.

Page 8: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

8www.cs.wisc.edu/condor

condor_master

› Periodically runs condor_preen to clean up any files Condor might have left on the machine Backup behavior, the rest of the

daemons clean up after themselves, as well

Page 9: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

9www.cs.wisc.edu/condor

condor_startd

› Represents a machine to the Condor pool

› Should be run on any machine you want to run jobs

› Enforces the wishes of the machine owner (the owner’s “policy”)

Page 10: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

10www.cs.wisc.edu/condor

condor_startd

› Starts, stops, suspends jobs

› Spawns the appropriate condor_starter, depending on the type of job

› Provides other administrative commands (for example, condor_vacate)

Page 11: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

11www.cs.wisc.edu/condor

condor_starter

› Spawned by the condor_startd to handle all the details of starting and managing the job Transfer job’s binary to execute

machine Send back exit status Etc.

Page 12: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

12www.cs.wisc.edu/condor

condor_starter

› On multi-processor machines, you get one condor_starter per CPU Actually one per running job Can configure to run more (or less)

jobs than CPUs

› For PVM jobs, the starter also spawns a PVM daemon (condor_pvmd)

Page 13: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

13www.cs.wisc.edu/condor

condor_schedd

› Represents jobs to the Condor pool

› Maintains persistent queue of jobs Queue is not strictly FIFO (priority

based) Each machine running condor_schedd maintains its own queue

Page 14: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

14www.cs.wisc.edu/condor

condor_schedd

› Responsible for contacting available machines and spawning waiting jobs When told to by condor_negotiator

› Should be run on any machine you want to submit jobs from

› Services most user commands: condor_submit, condor_rm, condor_q

Page 15: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

15www.cs.wisc.edu/condor

condor_shadow

› Represents job on the submit machine

› Services requests from standard universe jobs for remote system calls including all file I/O

› Makes decisions on behalf of the job for example: where to store the checkpoint

file

Page 16: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

16www.cs.wisc.edu/condor

condor_shadow Impact

› One condor_shadow running on submit machine for each actively running Condor job

› Minimal load on submit machine Usually blocked waiting for requests

from the job or doing I/O Relatively small memory footprint

Page 17: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

17www.cs.wisc.edu/condor

Limiting condor_shadow

› Still, you can limit the impact of the shadows on a given submit machine: They can be started by Condor with a

“nice-level” that you configure (SHADOW_RENICE_INCREMENT)

Can limit total number of shadows running on a machine (MAX_JOBS_RUNNING)

Page 18: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

18www.cs.wisc.edu/condor

condor_collector

› Collects information from all other Condor daemons in the pool

› Each daemon sends a periodic update called a ClassAd to the collector

› Services queries for information: Queries from other Condor daemons Queries from users (condor_status)

Page 19: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

19www.cs.wisc.edu/condor

condor_negotiator

› Performs matchmaking in Condor Pulls list of available machines and

job queues from condor_collector Matches jobs with available machines Both the job and the machine must

satisfy each other’s requirements (2-way matching)

› Handles user priorities

Page 20: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

20www.cs.wisc.edu/condor

Typical Condor PoolCentral Manager

master

collector

negotiator

schedd

startd

= ClassAd Communication Pathway

= Process Spawned

Submit-Only

master

schedd

Execute-Only

master

startd

Regular Node

schedd

startd

master

Regular Node

schedd

startd

master

Execute-Only

master

startd

Page 21: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

21www.cs.wisc.edu/condor

Execute MachineSubmit Machine

Job Startup

Submit

Schedd

Starter Job

Shadow CondorSyscall Lib

Startd

Central Manager

CollectorNegotiator

Page 22: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

22www.cs.wisc.edu/condor

Configuration Files

Page 23: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

23www.cs.wisc.edu/condor

Configuration Files

› Multiple files concatenated Definitions in later files overwrite

previous definitions

› Order of files: Global config file Local config files, shared config files Global and Local Root config file

Page 24: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

24www.cs.wisc.edu/condor

Global Config File

› Found either in file pointed to with the CONDOR_CONFIG environment variable, /etc/condor/condor_config, or ~condor/condor_config

› Most settings can be in this file

› Only works as a global file if it is on a shared file system

Page 25: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

25www.cs.wisc.edu/condor

Other Shared Files

› LOCAL_CONFIG_FILE macro Comma separated, processed in order

› You can configure a number of other shared config files: Organize common settings (for

example, all policy expressions) platform-specific config files

Page 26: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

26www.cs.wisc.edu/condor

Local Config File

› LOCAL_CONFIG_FILE macro (again) Usually uses $(HOSTNAME)

› Machine-specific settings local policy settings for a given owner different daemons to run (for

example, on the Central Manager!)

Page 27: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

27www.cs.wisc.edu/condor

Local Config File

› Can be on local disk of each machine

/var/adm/condor/condor_config.local

› Can be in a shared directory/shared/condor/condor_config.$(HOSTNAME)

/shared/condor/hosts/$(HOSTNAME)/ condor_config.local

Page 28: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

28www.cs.wisc.edu/condor

Root Config File (optional)

› Always processed last

› Allows root to specify settings which cannot be changed by other users For example, the path to Condor

daemons

› Useful if daemons are started as root but someone else has write access to config files

Page 29: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

29www.cs.wisc.edu/condor

Root Config File (optional)

› /etc/condor/condor_config.root or ~condor/condor_config.root

› Then loads any files specified in ROOT_CONFIG_FILE_LOCAL

Page 30: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

30www.cs.wisc.edu/condor

Configuration File Syntax

› # at start of line is a comment not allowed in names, confuses Condor.

› \ at the end of line is a line-continuation Both lines are treated as one big entry Works in comments!

› Names are case insensitive Values are case sensitive

Page 31: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

31www.cs.wisc.edu/condor

Configuration File Macros

› Macros have the form: Attribute_Name = value

› You reference other macros with: A = $(B)

› Can create additional macros for organizational purposes

Page 32: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

32www.cs.wisc.edu/condor

Configuration File Macros

› Can append to macros:A=abc

A=($A),def

› Don’t let macros recursively define each other!A=$(B)

B=($A)

Page 33: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

33www.cs.wisc.edu/condor

Configuration File Macros

› Later macros in a file overwrite earlier ones B will evaluate to 2:A=1

B=$(A)

A=2

Page 34: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

34www.cs.wisc.edu/condor

ClassAds

› Set of key-value pairs

› Can be matched against each other Requirements and Rank

› This is old ClassAds New, more expressive ClassAds exist

• Not yet used in Condor

Page 35: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

35www.cs.wisc.edu/condor

ClassAd Expressions

› Some configuration file macros specify expressions for the Machine’s ClassAd Notably START, RANK, SUSPEND,

CONTINUE, PREEMPT, KILL

› Can contain a mixture of macros and ClassAd references

› Notable: UNDEFINED, ERROR

Page 36: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

36www.cs.wisc.edu/condor

ClassAd Expressions

› +, -, *, /, <, <=,>, >=, ==, !=, &&, and || all work as expected

› TRUE==1 and FALSE==0 (guaranteed)

Page 37: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

37www.cs.wisc.edu/condor

Macros and Expressions Gotcha

› These are simple replacement macros

› Put parentheses around expressionsTEN=5+5

HUNDRED=$(TEN)*$(TEN)• HUNDRED becomes 5+5*5+5 or 35!

TEN=(5+5)

HUNDRED=($(TEN)*$(TEN))• ((5+5)*(5+5)) = 100

Page 38: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

38www.cs.wisc.edu/condor

ClassAd Expressions: UNDEFINED and ERROR

› Special values

› Passed through most operators Anything == UNDEFINED is

UNDEFINED

› && and || eliminate if possible. UNDEFINED && FALSE is FALSE UNDEFINED && TRUE is UNDEFINED

Page 39: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

39www.cs.wisc.edu/condor

ClassAd Expressions: =?= and =!=

=?= and =!= are similar to == and != =?= tests if operands have the same

type and the same value. •10 == UNDEFINED -> UNDEFINED•UNDEFINED == UNDEFINED -> UNDEFINED •10 =?= UNDEFINED -> FALSE•UNDEFINED =?= UNDEFINED -> TRUE

=!= inverts =?=

Page 40: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

40www.cs.wisc.edu/condor

ClassAd Expressions

› Further information: Section 4.1, “Condor's ClassAd Mechanism,” in the Condor Manual.

Page 41: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

41www.cs.wisc.edu/condor

Policy Expressions

Page 42: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

42www.cs.wisc.edu/condor

Policy Expressions

› Allow machine owners to specify job priorities, restrict access, and implement local policies

Page 43: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

43www.cs.wisc.edu/condor

Policy Expressions

› Specified in condor_config

› Policy evaluates both a machine ClassAd and a job ClassAd together Policy can reference items in either

ClassAd (See manual for list)

› Can reference condor_config macros: $(MACRONAME)

Page 44: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

44www.cs.wisc.edu/condor

Machine (Startd) Policy Expression Summary

› START – When is this machine willing to start a job Typically used to restrict access when

the machine is being used directly

› RANK - Job preferences

Page 45: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

45www.cs.wisc.edu/condor

Machine (Startd) Policy Expression Summary

› SUSPEND - When to suspend a job

› CONTINUE - When to continue a suspended job

› PREEMPT – When to nicely stop running a job

› KILL - When to immediately kill a preempting job

Page 46: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

46www.cs.wisc.edu/condor

START

› START is the primary policy

› When FALSE the machine enters the Owner state and will not run jobs

› Acts as the Requirements expression for the machine, the job must satisfy START Can reference job ClassAd values

including Owner and ImageSize

Page 47: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

47www.cs.wisc.edu/condor

RANK

› Indicates which jobs a machine prefers Jobs can also specify a rank

› Floating point number Larger numbers are higher ranked Typically evaluate attributes in the Job

ClassAd Typically use + instead of &&

Page 48: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

48www.cs.wisc.edu/condor

RANK

› Often used to give priority to owner of a particular group of machines

› Claimed machines still advertise looking for higher ranked job to preemp thet current job

Page 49: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

49www.cs.wisc.edu/condor

SUSPEND and CONTINUE

› When SUSPEND becomes true, the job is suspended

› When CONTINUE becomes true a suspended job is released

Page 50: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

50www.cs.wisc.edu/condor

PREEMPT and KILL

› When PREEMPT becomes true, the job will be politely shut down Vanilla universejobs get SIGTERM Standard universe jobs checkpoint

› When KILL becomes true, the job is SIGKILL Checkpointing is aborted if started

Page 51: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

51www.cs.wisc.edu/condor

WANT_SUSPEND and WANT_VACATE

› Typically leave both to TRUE

› WANT_SUSPEND - If false, skip SUSPEND test, jump to PREEMPT

› WANT_VACATE If true, gives job time to vacate cleanly

(until KILL becomes true) If false, job is immediately killed (KILL

is ignored)

Page 52: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

52www.cs.wisc.edu/condor

True

True

Road Map of the Policy

Expressions

START

WANT SUSPEND

SUSPEND

VacatingVacating

PREEMPT

KILL

True

True

True

True

False

WANT VACATE

KillingKilling

False

Expression

Activity

Page 53: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

53www.cs.wisc.edu/condor

Minimal Settings

› Always runs jobsSTART = True

RANK =

SUSPEND = False

CONTINUE = True

PREEMPT = False

KILL = False

Page 54: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

54www.cs.wisc.edu/condor

(Boss Fat Cat)› I am adding

nodes to the Cluster… but the Chemistry Department has priority on these nodes

Policy Configuration

Page 55: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

55www.cs.wisc.edu/condor

New Settings for the Chemistry nodes

› Prefer Chemistry jobsSTART = True

RANK = Department == "Chemistry"

SUSPEND = False

CONTINUE = True

PREEMPT = False

KILL = False

Page 56: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

56www.cs.wisc.edu/condor

Submit file with Custom Attribute

› Prefix an entry with “+” to add to job ClassAdExecutable = charm-run

Universe = standard

+Department = Chemistry

queue

Page 57: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

57www.cs.wisc.edu/condor

What if “Department” not specified?

START = True

RANK = Department =!= UNDEFINED && Department == "Chemistry"

SUSPEND = False

CONTINUE = True

PREEMPT = False

KILL = False

Page 58: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

58www.cs.wisc.edu/condor

More Complex RANK

› Give the machine’s owners (adesmet and livny) highest priority, followed by the Chemistry department, followed by the Physics department, followed by everyone else.

Page 59: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

59www.cs.wisc.edu/condor

More Complex RANK

IsOwner = (Owner == "adesmet“ || Owner == "livny")

IsChem =(Department =!= UNDEFINED && Department == "Chemistry")

IsPhys =(Department =!= UNDEFINED && Department == "Physics")

RANK = $(IsOwner)*20 + $(IsChem)*10 + $(IsPhys)

Page 60: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

60www.cs.wisc.edu/condor

› Cluster is okay, but... Condor can only use the desktops when they would otherwise be idle

Policy Configuration(Boss Fat Cat)

Page 61: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

61www.cs.wisc.edu/condor

Defining Idle

› One possible definition: No keyboard or mouse activity for 5

minutes Load average below 0.3

Page 62: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

62www.cs.wisc.edu/condor

Desktops should

› START jobs when the machine becomes idle

› SUSPEND jobs as soon as activity is detected

› PREEMPT jobs if the activity continues for 5 minutes or more

› KILL jobs if they take more than 5 minutes to preempt

Page 63: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

63www.cs.wisc.edu/condor

Macros in the Config FileNonCondorLoadAvg = (LoadAvg - CondorLoadAvg)

HighLoad = 0.5BgndLoad = 0.3CPU_Busy = ($(NonCondorLoadAvg) >= $(HighLoad))CPU_Idle = ($(NonCondorLoadAvg) <= $(BgndLoad))KeyboardBusy = (KeyboardIdle < 10)MachineBusy = ($(CPU_Busy) || $(KeyboardBusy))ActivityTimer = \ (CurrentTime - EnteredCurrentActivity)

Page 64: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

64www.cs.wisc.edu/condor

Desktop Machine Policy

START = $(CPU_Idle) && KeyboardIdle > 300

SUSPEND = $(MachineBusy)

CONTINUE = $(CPU_Idle) && KeyboardIdle > 120

PREEMPT = (Activity == "Suspended") && \

$(ActivityTimer) > 300

KILL = $(ActivityTimer) > 300

Page 65: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

65www.cs.wisc.edu/condor

Real World Policies

› University of Wisconsin at Madison Computer Science department’s policies condor_config.policy See handout

Page 66: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

66www.cs.wisc.edu/condor

Useful Macros: Universe

STANDARD = 1

VANILLA = 5

IsVanilla = (TARGET.JobUniverse == $(VANILLA)

IsStandard = (TARGET.JobUniverse == $(STANDARD)

Page 67: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

67www.cs.wisc.edu/condor

Useful Macros: Timers

StateTimer = (CurrentTime – EnteredCurrentState)

ActivityTimer = (CurrentTime – EnteredCurrentActivity)

LastCkpt = (CurrentTime – LastPeriodicCheckpoint)

Page 68: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

68www.cs.wisc.edu/condor

Useful Macros: Limits

BackgroundLoad = 0.3

HighLoad = 0.7

StartIdleTime = 15*$(MINUTE)

MaxSuspendTime = 10*$(MINUTE)

Page 69: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

69www.cs.wisc.edu/condor

Useful Macros: Concepts

NonCondorLoadAvg = (LoadAvg - CondorLoadAvg)

KeyboardBusy = (KeyboardIdle < $(MINUTE))

CPU_Idle = ($(NonCondorLoadAvg) <= $(BackgroundLoad))

SmallJob = (TARGET.ImageSize < (15 * 1024))

Page 70: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

70www.cs.wisc.edu/condor

Useful Macros: Concepts

MachineBusy = ($(CPU_Busy) || $(KeyboardBusy))

Maintenance = (ClockMin > 255 && ClockMin < 315 && $(ConsoleBusy) == False)

•Maintenance is when nightly scripts run on CS machines raising the load

Page 71: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

71www.cs.wisc.edu/condor

WANT_SUSPEND and WANT_VACATE

WANT_SUSPEND = ( $(SmallJob) || $(KeyboardNotBusy) || $(Maintenance) || $(IsPVM) || $(IsVanilla) )

WANT_VACATE = $(ActivationTimer) > 10 * $(MINUTE) || $(IsPVM) || $(IsVanilla)

Page 72: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

72www.cs.wisc.edu/condor

STARTCS_START = \

( ($(CPU_Idle) ||

(State!="Unclaimed" && State!="Owner")) \

&& (KeyboardIdle > $(StartIdleTime)) \

&& (TARGET.ImageSize <= ((Memory - 15)*1024)) \

&& ( (MemoryRequirements < (Memory - 15)) \

|| (MemoryRequirements =?= UNDEFINED \

&& (RemoteUserCpu > 0.0 || Memory > 127)) ) )

Page 73: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

73www.cs.wisc.edu/condor

SUSPEND

CS_SUSPEND =

( ( (CpuBusyTime > 2 * $(MINUTE))

&& $(ActivationTimer) > 90 )

|| $(KeyboardBusy) )

› CpuBusyTime – Seconds since CPUBusy became TRUE (Condor provides)

Page 74: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

74www.cs.wisc.edu/condor

CONTINUE

CS_CONTINUE =

( ($(CPU_Idle) &&

($(ActivityTimer) > 10))

&& (KeyboardIdle >

$(ContinueIdleTime)) )

Page 75: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

75www.cs.wisc.edu/condor

PREEMPT

CS_PREEMPT =

( ( ($(ActivityTimer) >

$(MaxSuspendTime))

&& (Activity == "Suspended"))

|| (SUSPEND &&

(WANT_SUSPEND == False)) )

Page 76: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

76www.cs.wisc.edu/condor

KILL

CS_KILL =

($(ActivityTimer) >

$(MaxVacateTime))

Page 77: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

77www.cs.wisc.edu/condor

Policy Review

› Users submitting jobs can specify Requirements and Rank expressions

› Administrators can specify Startd policy expressions individually for each machine

› Custom attributes easily added

› You can enforce almost any policy!

Page 78: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

78www.cs.wisc.edu/condor

Further Machine Policy Information

› For further information, see section 3.6 “Startd Policy Configuration” in the Condor manual

› condor-users mailing listhttp://www.cs.wisc.edu/condor/mail-

lists/

[email protected]

Page 79: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

79www.cs.wisc.edu/condor

Negotiator Policy Expressions

›PREEMPTION_REQUIREMENTS and PREEMPTION_RANK

› Evaluated when condor_negotiator considers replacing a lower priority job with a higher priority job

› Completely unrelated to the PREEMPT expression

Page 80: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

80www.cs.wisc.edu/condor

PREEMPTION_REQUIREMENTS

› If false will not preempt machine Typically used to avoid pool thrashing

PREEMPTION_REQUIREMENTS = \

$(StateTimer) > (1 * $(HOUR)) \

&& RemoteUserPrio > SubmittorPrio * 1.2

Only replace jobs running for at least one hour and 20% lower priority

Page 81: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

81www.cs.wisc.edu/condor

PREEMPTION_RANK

› Picks which already claimed machine to reclaim

PREEMPTION_RANK = \

(RemoteUserPrio * 1000000)\

- ImageSize Strongly prefers preempting jobs with

a large (bad) priority and a small image size

Page 82: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

82www.cs.wisc.edu/condor

Custom Machine Attributes

› Can add attributes to a machine’s ClassAd, typically done in the local config fileINSTRUCTIONAL=TRUE

NETWORK_SPEED=100

STARTD_EXPRS=INSTRUCTIONAL, NETWORK_SPEED

Page 83: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

83www.cs.wisc.edu/condor

Custom Machine Attributes

› Jobs can now specify Rank and Requirements using new attributes:Requirements = (INSTRUCTIONAL=?=UNDEFINED || INSTRUCTIONAL==FALSE)

Rank = NETWORK_SPEED =!= UNDEFINED && NETWORK_SPEED

Page 84: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

84www.cs.wisc.edu/condor

Machine

States

PREEMPTING

CLAIMED

UNCLAIMED

OWNER

MATCHED

begin

Page 85: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

85www.cs.wisc.edu/condor

Machine

Activities

PREEMPTING

CLAIMED

UNCLAIMED

OWNER

MATCHED

Benchmarking

begin

Idle

Suspended

Busy

Idle

Idle

Idle

Killing

Vacating

Page 86: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

86www.cs.wisc.edu/condor

Machine

Activities

PREEMPTING

CLAIMED

UNCLAIMED

OWNER

MATCHED

Benchmarking

begin

Idle

Suspended

Busy

Idle

Killing

Vacating

Idle

Idle

See the manual for the gory details

(Section 3.6: Configuring the Startd Policy)

Page 87: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

87www.cs.wisc.edu/condor

Priorities

Page 88: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

88www.cs.wisc.edu/condor

Job Priority

› Set with condor_prio

› Range from -20 to 20

› Only impacts order between jobs for a single user

Page 89: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

89www.cs.wisc.edu/condor

User Priority

› Determines allocation of machines to waiting users

› View with condor_userprio

› Inversely related to machines allocated A user with priority of 10 will be able to

claim twice as many machines as a user with priority 20

Page 90: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

90www.cs.wisc.edu/condor

User Priority

› Effective User Priority is determined by multiplying two factors Real Priority Priority Factor

Page 91: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

91www.cs.wisc.edu/condor

Real Priority

› Based on actual usage

› Defaults to 0.5

› Approaches actual number of machines used over time Configuration setting PRIORITY_HALFLIFE

Page 92: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

92www.cs.wisc.edu/condor

Priority Factor

› Assigned by administrator Set with condor_userprio

› Defaults to 1 (DEFAULT_PRIO_FACTOR)

› Nice users default to 1,000,000 (NICE_USER_PRIO_FACTOR) Used for true bottom feeding jobs Add “nice_user=true” to your

submit file

Page 93: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

93www.cs.wisc.edu/condor

Security

Page 94: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

94www.cs.wisc.edu/condor

Host/IP Address Security

› The basic security model in Condor Stronger security available

(Encrypted communications, cryptographic authentication)

› Can configure each machine in your pool to allow or deny certain actions from different groups of machines

Page 95: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

95www.cs.wisc.edu/condor

Security Levels› READ access - querying

information condor_status, condor_q, etc

› WRITE access - updating information Does not include READ access! condor_submit, adding nodes to a

pool, etc

Page 96: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

96www.cs.wisc.edu/condor

Security Levels

› ADMINISTRATOR access condor_on, condor_off, condor_reconfig, condor_ restart, etc.

› OWNER access Things a machine owner can do

(notably condor_vacate)

Page 97: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

97www.cs.wisc.edu/condor

Setting Up Security

› List what hosts are allowed or denied to perform each action If you list allowed hosts, everything

else is denied If you list denied hosts, everything

else is allowed If you list both, only allow hosts that

are listed in “allow” but not in “deny”

Page 98: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

98www.cs.wisc.edu/condor

Specifying Hosts

› There are many possibilities for specifying which hosts are allowed or denied: Host names, domain names IP addresses, subnets

Page 99: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

99www.cs.wisc.edu/condor

Wildcards

› ‘*’ can be used anywhere (once) in a host name for example, “infn-corsi*.corsi.infn.it”

› ‘*’ can be used at the end of any IP address for example “128.105.101.*” or

“128.105.*”

Page 100: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

100www.cs.wisc.edu/condor

Setting up Host/IP Address Security

› Can define values that effect all daemons: HOSTALLOW_WRITE, HOSTDENY_READ, HOSTALLOW_ADMINISTRATOR, etc.

› Can define daemon-specific settings: HOSTALLOW_READ_SCHEDD, HOSTDENY_WRITE_COLLECTOR, etc.

Page 101: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

101www.cs.wisc.edu/condor

Example Security Settings

HOSTALLOW_WRITE = *.infn.it

HOSTALLOW_ADMINISTRATOR = infn-corsi1*, \

$(CONDOR_HOST), axpb07.bo.infn.it, \

$(FULL_HOSTNAME)

HOSTDENY_ADMINISTRATOR = infn-corsi15

HOSTDENY_READ = *.gov, *.mil

HOSTDENY_ADMINISTRATOR_NEGOTIATOR = *

Page 102: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

102www.cs.wisc.edu/condor

Default Security Settings

HOSTALLOW_ADMINISTRATOR = $(CONDOR_HOST)

HOSTALLOW_OWNER = $(FULL_HOSTNAME), $(HOSTALLOW_ADMINISTRATOR)

HOSTALLOW_READ = *

HOSTALLOW_WRITE = *

› Make write restrictiveHOSTALLOW_WRITE=*.site.uk

Page 103: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

103www.cs.wisc.edu/condor

Advanced Security Features

›AUTHENTICATION – Who is allowed›ENCRYPTION - Private

communications, requires AUTHENTICATION.

›INTEGRITY - Checksums›NEGOTIATION - Required for all

others

Page 104: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

104www.cs.wisc.edu/condor

Security Features

› Features individually set as REQUIRED, PREFERRED, OPTIONAL, or NEVER

› Can set default and for each level (READ, WRITE, etc)

› All default to OPTIONAL

› Leave NEGOTIATOR at OPTIONAL

Page 105: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

105www.cs.wisc.edu/condor

Authentication Complexity

› Authentication comes at a price: complexity

› Authentication between machines requires an authentication system

› Condor supports several existing authentication systems We don’t want to create yet another

one

Page 106: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

106www.cs.wisc.edu/condor

AUTHENTICATION_METHODS

› Authentication requires one or more methods: FS FS_REMOTE GSI Kerberos NTSSPI CLAIMTOBE

Page 107: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

107www.cs.wisc.edu/condor

FS and FS_REMOTEFilesystem Tests

› FS checks that the user can create a file owned by the user. Only works on local machine Assumes the filesystem is trustworthy

› FS_REMOTE works remotely Allows test file to be on NFS, AFS, or

other shared file system

Page 108: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

108www.cs.wisc.edu/condor

GSI Globus Security

Infrastructure› Daemons and users have X.509

certs

› All Condor daemons in pool can share one certificate

› Map file maps from X.509 distinguished names to identities.

Page 109: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

109www.cs.wisc.edu/condor

Kerberos and NTSSPI

› Kerberos Complex to set up If you are already using, easy to add

to Condor

› NTSSPI – Windows NT Only works on Windows

Page 110: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

110www.cs.wisc.edu/condor

CLAIMTOBE

› Trust any claims about user identity If used, encryption’s secret password

passed in clear! Use with care

Page 111: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

111www.cs.wisc.edu/condor

Additional Security Levels

›CONFIG Dynamically change config settings

›IMMEDIATE_FAMILY Daemon to daemon communications

›NEGOTIATOR condor_negotiator to other

daemons

Page 112: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

112www.cs.wisc.edu/condor

ALLOW and DENY

› When authentication is enabled you can filter based on user identifier

› Use ALLOW and DENY instead of HOSTALLOW and HOSTDENY

› Can specify hostnames and IPs as before

Page 113: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

113www.cs.wisc.edu/condor

Specifying User Identities

[email protected]/hostname

› Can use * wildcard

› Hostname can be hostname or IP address with optional netmask

Page 114: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

114www.cs.wisc.edu/condor

Example Filters

› Allow anyone from wisc.edu:ALLOW_READ=*@wisc.edu/*.wisc.edu

› Allow any authorized local user:ALLOW_READ=*/*.wisc.edu

› Allow specific user/machineALLOW_NEGOTIATOR= [email protected]/condor.wisc.edu

Page 115: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

115www.cs.wisc.edu/condor

Example Advanced Security Configuration

› Enable authentication, encryption, and integrity

› Use GSI authentication for between machine connections

› Use GSI or FS authentication on a single machine

Page 116: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

116www.cs.wisc.edu/condor

Example Advanced Security Configuration

# Turn on all security:

SEC_DEFAULT_AUTHENTICATION=REQUIRED

SEC_DEFAULT_ENCRYPTION=REQUIRED

SEC_DEFAULT_INTEGRITY=REQUIRED

Page 117: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

117www.cs.wisc.edu/condor

Example Advanced Security Configuration

# Require authentication

SEC_DEFAULT_AUTHENTICATION_METHODS = FS, GSI

Page 118: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

118www.cs.wisc.edu/condor

Example Advanced Security Configuration

ALLOW_READ = *

ALLOW_WRITE = *@wisc.edu/*.wisc.edu

DENY_WRITE = [email protected]/*

ALLOW_ADMINISTRATOR = [email protected]/*wisc.edu, *@wisc.edu/$(CONDOR_HOST)

Page 119: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

119www.cs.wisc.edu/condor

Example Advanced Security Configuration

ALLOW_CONFIG = $(ALLOW_ADMINISTRATOR)

ALLOW_IMMEDIATE_FAMILY = [email protected]/*wisc.edu

Page 120: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

120www.cs.wisc.edu/condor

Example Advanced Security Configuration

ALLOW_OWNER = $(ALLOW_ADMINISTRATOR), $(FULL_HOSTNAME)

ALLOW_NEGOTIATOR = [email protected]/ $(CONDOR_HOST)

Page 121: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

121www.cs.wisc.edu/condor

Users without Certs

› Using FS authentication users can submit jobs and check the local queue

› condor_status won’t work for normal users without an X.509 Cert Requires READ access to condor_collector

› Can let anyone read any daemon!

Page 122: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

122www.cs.wisc.edu/condor

Allow Any User Read Access

# Using dreaded CLAIMTOBE

SEC_READ_AUTHENTIATION_METHODS = FS, GSI, CLAIMTOBE

Page 123: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

123www.cs.wisc.edu/condor

Advanced Security Features› Some AUTHENTICATION_METHODS

support strong encryption

› For further details Condor Manual [email protected]

Page 124: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

124www.cs.wisc.edu/condor

Administration

Page 125: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

125www.cs.wisc.edu/condor

condor_config_val

› Find current configuration values% condor_config_val MASTER_LOG

/var/condor/logs/MasterLog

Page 126: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

126www.cs.wisc.edu/condor

condor_config_val -v

› Can identify source% condor_config_val –v CONDOR_HOST

CONDOR_HOST: condor.cs.wisc.edu

Defined in ‘/etc/condor_config.hosts’, line 6

Page 127: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

127www.cs.wisc.edu/condor

condor_fetchlog

› Retrieve logs remotelycondor_fetchlog beak.cs.wisc.edu Master

Page 128: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

128www.cs.wisc.edu/condor

Querying daemons condor_status

› Queries the collector for information about daemons in your pool

› Defaults to finding condor_startds›condor_status –schedd

summarizes all job queues›condor_status –master returns

list of all condor_masters

Page 129: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

129www.cs.wisc.edu/condor

condor_status

› -long displays the full ClassAd

› Specifiy a machine name to limit results to a single host

condor_q –l node4.cs.wisc.edu

Page 130: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

130www.cs.wisc.edu/condor

condor_status -constraint

› Only return ClassAds that match an expression you specify

› Show me idle machines with 1GB or more memory condor_status -constraint 'Memory >= 1024 && Activity == "Idle"‘

Page 131: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

131www.cs.wisc.edu/condor

condor_status -format

› Controls format of output

› Useful for writing scripts

› Uses C printf style formats One field per argument

Page 132: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

132www.cs.wisc.edu/condor

condor_status -format

› Census of systems in your pool:% condor_status -format '%s ' Arch -format '%s\n' OpSys | sort | uniq –c

797 INTEL LINUX 118 INTEL WINNT50 108 SUN4u SOLARIS28 6 SUN4x SOLARIS28

Page 133: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

133www.cs.wisc.edu/condor

Examinging Queues condor_q

› View the job queue

› The “-long” option is useful to see the entire ClassAd for a given job

› supports –constraint and -format

› Can view job queues on remote machines with the “-name” option

Page 134: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

134www.cs.wisc.edu/condor

condor_q -format

› Census of jobs per user% condor_q -format '%8s ' Owner -format '%s\n' Cmd | sort | uniq –c64 adesmet /scratch/submit/a.out 2 adesmet /home/bin/run_events 4 smith /nfs/sim1/em2d3d 4 smith /nfs/sim2/em2d3d

Page 135: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

135www.cs.wisc.edu/condor

condor_q -analyze

› condor_q will try to figure out why the job isn’t running

› Good at determining that no machine matches the job Requirements expressions

Page 136: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

136www.cs.wisc.edu/condor

condor_q -analyze

› Typical results:471216.000: Run analysis summary. Of 820 machines,

458 are rejected by your job's requirements

25 reject your job because of their own requirements

0 match, but are serving users with a better priority in the pool

4 match, but prefer another specific job despite its worse user-priority

6 match, but will not currently preempt their existing job

327 are available to run your job

Page 137: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

137www.cs.wisc.edu/condor

condor_analyze

› Available in Condor 6.5 and beyond

› Breaks down the job’s requirements and suggests modifications

Page 138: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

138www.cs.wisc.edu/condor

condor_analyze› (Heavily truncated output)The Requirements expression for your job is:

( ( target.Arch == "SUN4u" ) && ( target.OpSys == "WINNT50" ) && [snip]

Condition Machines Suggestion

1 (target.Disk > 100000000) 0 MODIFY TO 14223201

2 (target.Memory > 10000) 0 MODIFY TO 2047

3 (target.Arch == "SUN4u") 106

4 (target.OpSys == "WINNT50") 110 MOD TO "SOLARIS28"

Conflicts: conditions: 3, 4

Page 139: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

139www.cs.wisc.edu/condor

Condor’s Log Files

› Condor maintains one log file per daemon

Page 140: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

140www.cs.wisc.edu/condor

Condor’s Log Files

› Can increase verbosity of logs on a per daemon basis SHADOW_DEBUG, SHADOW_SCHEDD,

and others Space separated list

Page 141: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

141www.cs.wisc.edu/condor

Useful Debug Levels

›D_FULLDEBUG dramatically increases information logged

›D_COMMAND adds information about about commands receivedSHADOW_DEBUG = \

D_FULLDEBUG D_COMMAND

Page 142: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

142www.cs.wisc.edu/condor

Condor’s Log Files

› Log files are automatically rolled over when a size limit is reached Defaults to 64000 bytes, you will

probably want to increase. Rolls over quickly with D_FULLDEBUG MAX_*_LOG, one setting per daemon

•MAX_SHADOW_LOG, MAX_SCHEDD_LOG, and others

Page 143: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

143www.cs.wisc.edu/condor

Condor’s Log Files

› Many log files entries primarily useful to Condor developers Especially if D_FULLDEBUG is on Minor errors are often logged but

corrected

Page 144: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

144www.cs.wisc.edu/condor

Debugging Jobs:condor_q

› Examine the job with condor_q especially -long and –analyze Compare with condor_status –long

Page 145: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

145www.cs.wisc.edu/condor

Debugging Jobs:User Log

› Examine the job’s user log Quickly find with:

condor_q -format '%s\n' UserLog 17.0 Users should always have a user log (set

with “log” in the submit file)

› Contains the life history of the job

› If a problem occurred, user log often contains details

Page 146: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

146www.cs.wisc.edu/condor

Debugging Jobs:ShadowLog

› Examine ShadowLog on the submit machine Note any machines the job tried to

execute on There is often an “ERROR” entry that

can give a good indication of what failed

Page 147: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

147www.cs.wisc.edu/condor

Debugging Jobs:Matching Problems

› No ShadowLog entries? Possible problem matching the job. Examine ScheddLog on the submit

machine Examine NegotiatorLog on the

central manager

Page 148: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

148www.cs.wisc.edu/condor

Debugging Jobs:Local Problems

›ShadowLog entries suggest an error but aren’t specific? Examine StartLog and StarterLog

on the execute machine

Page 149: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

149www.cs.wisc.edu/condor

Debugging Jobs:Reading Log Files

› Condor logs will note the job ID each entry is for Useful if multiple jobs are being

processed simultaneously grepping for the job ID will make it

easy to find relavent entries

Page 150: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

150www.cs.wisc.edu/condor

Debugging Jobs: What Next?

› If necessary add “D_FULLDEBUG D_COMMAND” to DEBUG_DAEMONNAME setting for additional log information

› Increase MAX_DAEMONNAME_LOG if logs are rolling over too quickly

› If all else fails, email us [email protected]

Page 151: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

151www.cs.wisc.edu/condor

Installation

Page 152: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

152www.cs.wisc.edu/condor

Considerations for Installing a Condor Pool› What machine should be your central

manager?› Does your pool have a shared file system?› Where to install Condor binaries and

configuration files?› Where should you put each machine’s

local directories?› Start the daemons as root or as some

other user?

Page 153: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

153www.cs.wisc.edu/condor

What machine should be your central

manager?› The central manager is very important for the proper functioning of your pool

› If the central manager crashes, jobs that are currently matched will continue to run, but new jobs will not be matched

Page 154: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

154www.cs.wisc.edu/condor

Central Manager

› Want assurances of high uptime or prompt reboots

› A good network connection helps

Page 155: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

155www.cs.wisc.edu/condor

Does your pool have a shared file system?

› It is easier to run vanilla universe jobs if so, but one is not required

› Shared location for configuration files can ease administration of a pool

› AFS can work, but Condor does not yet manage AFS tokens

Page 156: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

156www.cs.wisc.edu/condor

Where to install binaries and

configuration files?› Shared location for configuration files can ease administration of a pool

› Binaries on a shared file system makes upgrading easier, but can be less stable if there are network problems

›condor_master on the local disk is a good compromise

Page 157: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

157www.cs.wisc.edu/condor

Where should you put each machine’s local

directories?› You need a fair amount of disk space in the spool directory for each condor_schedd (holds job queue and binaries for each job submitted)

› The execute directory is used by the condor_starter to hold the binary for any Condor job running on a machine

Page 158: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

158www.cs.wisc.edu/condor

Where should you put each machine’s local

directories?› The log directory is used by all daemons More space means more saved info

Page 159: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

159www.cs.wisc.edu/condor

Hostnames

› Any two machines that will be communicating must know each others names

Page 160: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

160www.cs.wisc.edu/condor

Start the daemons as root or some other

user?› If possible, we recommend starting the daemons as root More secure Less confusion for users Condor will try to run as the user

“condor” whenever possible

Page 161: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

161www.cs.wisc.edu/condor

Running Daemons asNon-Root

› Condor will still work, users just have to take some extra steps to submit jobs

› Can have “personal Condor” installed - only you can submit jobs

Page 162: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

162www.cs.wisc.edu/condor

Basic Installation Procedure

› 1. Decide what version and parts of Condor to install and download them

› 2. Install the “release directory” - all the Condor binaries and libraries

› 3. Setup the Central Manager › 4. (optional) Setup Condor on any other

machines you wish to add to the pool› 5. Spawn the Condor daemons

Page 163: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

163www.cs.wisc.edu/condor

Condor Version Series

› We distribute two versions of Condor Stable Series Development Series

Page 164: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

164www.cs.wisc.edu/condor

Stable Series

› Heavily tested

› Recommended for general use

› 2nd number of version string is even (6.4.7)

Page 165: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

165www.cs.wisc.edu/condor

Development Series

› Latest features, not necessarily well-tested

› Not recommended unless you’re willing to work with beta code or need new features

› 2nd number of version string is odd (6.5.1)

Page 166: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

166www.cs.wisc.edu/condor

Condor Versions

› What am I running?

› All daemons advertise a CondorVersion attribute in the ClassAd they publish

› You can also view the version string by running ident on any Condor binary

Page 167: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

167www.cs.wisc.edu/condor

Condor Versions

› All parts of Condor on a single machine should run the same version!

› Machines in a pool can usually run different versions and communicate with each other

› Documentation will specify when a version is incompatible with older versions

Page 168: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

168www.cs.wisc.edu/condor

Downloading Condor

› Go to http://www.cs.wisc.edu/condor/

› Fill out the form and download the different pieces you need Normally, you want the full stable release

› There are also “contrib” modules for non-standard parts of Condor For example, the View Server

Page 169: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

169www.cs.wisc.edu/condor

Downloading Condor

› Distributed as compressed “tar” files

› Once you download, unpack them

Page 170: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

170www.cs.wisc.edu/condor

Install the Release Directory

› In the directory where you unpacked the tar file, you’ll find a release.tar file with all the binaries and libraries

› Use condor_install or condor_configure

›condor_install will install this as the release directory for you

Page 171: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

171www.cs.wisc.edu/condor

condor_install

› Our old installation script

› Interactive

› Overly complex

Page 172: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

172www.cs.wisc.edu/condor

condor_configure

› New script

› Handles installation and reconfiguration

condor_configure --install --install-dir=/nfs/opt/condor --local-dir=/var/condor --owner=condor

Page 173: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

173www.cs.wisc.edu/condor

Install the Release Directory

› In a pool with a shared release directory, you should run condor_install somewhere with write access to the shared directory

› You need a separate release directory for each platform!

Page 174: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

174www.cs.wisc.edu/condor

Setup the Central Manager

› Central manager needs specific configuration to start the condor_collector and condor_negotiator condor_configure --type=manager

Page 175: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

175www.cs.wisc.edu/condor

Setup Additional Machines

› If you have a shared file system, just run condor_init on any other machine you wish to add to your pool

› Without a shared file system, you must run condor_install on each host

Page 176: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

176www.cs.wisc.edu/condor

Spawn the Condor daemons

› Run condor_master to start Condor Remember to start as root if desired

› Start Condor on the central manager first

› Add Condor to your boot scripts? We provide a “SysV-style” init script (<release>/etc/examples/condor.boot)

Page 177: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

177www.cs.wisc.edu/condor

Shared Release Directory

› Simplifies administration

Page 178: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

178www.cs.wisc.edu/condor

Shared Release Directory

› Unifies configuration files, simplifying changes Same shared global config file for all

machines All local config files visible in one

place• Can symlink local files for multiple

machines to a single file

Page 179: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

179www.cs.wisc.edu/condor

Shared Release Directory

› Keep all of your binaries in one place Prevents having different versions

accidentally left on different machines Easier to upgrade

Page 180: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

180www.cs.wisc.edu/condor

Condor-G Special Notes

› Condor-G should work out of the box

› Globus can push several limits, consider increasing: /proc/sys/fs/file-max /proc/sys/net/ipv4/ip_local_port_range Per process file descriptor limits

http://www.cs.wisc.edu/condor/condorg/linux_scalability.html

Page 181: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

181www.cs.wisc.edu/condor

“Full Installation” of condor_compile

› condor_compile re-links user jobs with Condor libraries to create “standard” jobs.

› By default, only works with certain commands (gcc, g++, g77, cc, CC, f77, f90, ld)

› With a “full-installation”, works with any command (notably, make)

Page 182: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

182www.cs.wisc.edu/condor

“Full Installation” of condor_compile

› Move real ld binary, the linker, to ld.real Location of ld varies between systems,

typically /bin/ld

› Install Condor’s ld script in its place› Transparently passes to ld.real by

default; during condor_compile hooks in Condor libraries.

Page 183: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

183www.cs.wisc.edu/condor

Other Installation Options

› VDT – Virtual Data Toolkit PacMan installer Includes other Grid software http://www.lsc-group.phys.uwm.edu/

vdt/

› RPM

Page 184: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

184www.cs.wisc.edu/condor

Other Sources

› Condor Manual

› Condor Web Site

› condor-users mailing listhttp://www.cs.wisc.edu/condor/mail-

lists/

[email protected]

Page 185: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

185www.cs.wisc.edu/condor

Publications “Condor - A Distributed Job Scheduler,”

Beowulf Cluster Computing with Linux, MIT Press, 2002

“Condor and the Grid,” Grid Computing: Making the Global Infrastructure a Reality, John Wiley & Sons, 2003

These chapters and other publications available online at our web site

Page 186: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

186www.cs.wisc.edu/condor

Thank you!

http://www.cs.wisc.edu/[email protected]

Page 187: Alan De Smet Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu  Condor Administration.

187www.cs.wisc.edu/condor

Changes

› Changes since this talk was originally given: References to D_SECONDS debug

level removed, it’s automatic in Condor 6.4 and later.