Computing Performance Tests
This document summarizes performance tests done on various software and
hardware components used in the
KP3 computing environment.
- NSF performance on Linux (update Feb. 99) - February 19th, 1999
- adsmcli Retrieve Preformance - June 10th, 1998
- C++ Compiler Performance on AIX and Linux - May 22th, 1998
- NSF performance on AIX and Linux - April 8th, 1998
- adsmcli Archive Preformance - August 19th, 1996
- DLT2000 SMFS Positioning Performance - September 17th, 1995
- TMSCP and AXP602 throughput - September 8th, 1995
- Using BACKUP with DLT2000 - May 7th, 1995
- PAW on various AXP's - May 6th, 1995
- Disk I/O performance on AXP - April 29th, 1995
- COPY/EXTENSION - April 15th, 1995
- VMS Virtual I/O Cache - September 10th, 1994
- CERN HTTPD 3.0pre6vms3 - August 15th, 1994
- DEC FORTRAN on AXP - July 15th, 1994
- DLT2000 I/O Performance - July 14th, 1994 and May 9th, 1995
- The FDDI MVS Connection: FTP throughput - July 9th, 1994
Test Environment
Results
Client Operation Block r/wsize=8k r/wsize=default
Size Time kb/sec Time kb/sec
--------------------------------------------------------
linux1 write /s 1k 8.15s 1256 7.93s 1291
lxi003 write /s 1k 8.04s 1273 7.33s 1396
linux1 write /s 2k 4.97s 2060
linux1 write /s 4k 3.59s 2852
linux1 write /s 8k 2.48s 4129
linux1 write /s 16k 2.50s 4096
linux1 write /s 32k 2.48s 4129
linux1 write /s 64k 2.48s 4129 7.58s 1350
lxi003 write /s 64k 2.56s 4000 6.87s 1490
linux1 read /s 1k 1.68s 6095 6.83s 1499
lxi003 read /s 1k 1.86s 5505 6.56s 1560
linux1 read /s 8k 1.65s 6206
linux1 read /s 16k 1.57s 6522
linux1 read /s 32k 1.63s 6282
linux1 read /s 64k 8.53s 1200 6.74s 1519
lxi003 read /s 8k 1.57s 6522
lxi003 read /s 16k 1.61s 6360
lxi003 read /s 32k 2.30s 4452
lxi003 read /s 64k 10.57s 968 6.70s 1528
Conclusions
- Specifying rsize=8192,wsize=8192 is of paramount importance
for getting good performance. Mounting with default parameters results in
a maximal transfer rate of about 1.5 Mb/sec, regardless of blocking. With
an 8kb request size one can achieve 4 Mb/sec write and
4 Mb/sec read rate.
- The block size used by the application has an effect on the I/O rate.
For write, everything above 8kb gives about 4 Mb/sec, while smaller
block size slow things down. For read, there is no observable
penalty for small block sizes, due to the read ahead performed by the
nfsiod's. However, there is a huge performance drop for a block
size of 64 kb.
- The already good Linux-Linux performance seen in
test done in April 1998 on older (and slower) hardware is now
significantly improved.
Test Environment
- Done June 10th-11th, 1998
- The ADSM dataset /indra/egsi/egsi_4011.dat (16k records, 128450560 bytes)
was accessed and thus staged to disk prior to the following tests.
Command: time adsmcli retrieve egsi_4011.dat indra egsi
- The dataset was retrieved twice to the local /tmp disk.
- The node axp607 has an FDDI attachment, linux1 has a 100 Mb Ethernet.
Results
Node Data rates (kb/sec) CPUtime
10-jun 11-jun 13-jun sec
axp607 1629,1817 4325,4046 4645,4645 4.7- 5-2
linux1 1587,1629 4046,5453 5701,4480 6.2-10.3
clri6f 2133,1945 6472,6419 6453,6443 8.9-19.4
sp2a05 2268,2456 3064,3111 4415,6934 6.9- 7.6
sp2b05 1161,1057 11929,5530 5217,4570 3.7- 4.1
Test Environment
- Done May 29th, 1998
- Compiled and linked two applications written in C++ and C:
- ISM: a simple stand-alone application:
4 C++ and 6 C compilation units, total of 2700 lines.
- ibr: a ROOT application
1 C++ compilation unit, 2600 lines.
- Compilers and options
Linux: g++ and gcc
AIX: xlC and xlc
- The systems had in all cases plenty idle time. The compilations were done
twice and the time of the second try was taken (to avoid biases due to
different cache histories).
Results
Code Node Command Compile options Real time user+Sys time
ISM linux1 make -O 5.3s 4.7s
linux1 make -j 2 -O 3.3s 5.0s
clri6g make -O 21.1s 17.5s
ibr linux1 make -O 12.4s 10.6s
linux1 make 11.2s 8.8s
clri6g make -O 2m10.3s 1m52.2s
clri6g make 21.2s 14.9s
Conclusions
The xlC compiler under AIC is substancially slower than gcc under Linux
if optimization (-O) is enabled. For test it's therefore highly advisable
to compile without optimization under AIX. Under Linux the difference is
much smaller.
In the weeks before May 22th xlC under AIX had a problem with the license
manager configuration, leading to real time of 4-5 minutes for the two
test compilations. This has been resolved.
Test Environment
- Done April 8th, 1998
- Written 1024 kbyte data to a disk with the dd command using
block sizes of 1, 8, and 64 kbytes:
dd if=/dev/zero of=dst-path bs=1k count=1024
dd if=/dev/zero of=dst-path bs=8k count=128
dd if=/dev/zero of=dst-path bs=64k count=16
- Tested the following combinations of source and destination:
Client Destination Server File System
------------------------------------------------------------------
linux1 /u/fopi (Linux) linux0:/home1
linux1 /d/fopi/data21 (AIX) clri6j:/nfs/clri6k/d_fopi/data21
linux1 ~/AIX (AIX) filesv2:/userfs/userc06
clri6f /d/fopi/data21 (AIX) clri6j:/nfs/clri6k/d_fopi/data21
clri6f /u/fopi (AIX) filesv2:/userfs/userc06
Results
Client Destination Server Block Time kb/sec
Size
--------------------------------------------------------
linux1 /u/fopi Linux 1k 1.01s 1013
8k 0.33s 3103
64k 0.34s 3011
linux1 /d/fopi/data21 AIX 1k 49.83s 21
8k 6.70s 152
64k 6.71s 152
linux1 ~/AIX AIX 1k 16.17s 63
8k 2.38s 430
64k 2.37s 432
clri6f /d/fopi/data21 AIX 1k 8.55s 119
8k 1.40s 731
64k 1.46s 701
clri6f /u/fopi AIX 1k 3.28s 312
8k 0.58s 1765
64k 0.67s 1528
For comparison: data rate to a local harddisk is 6.1 Mb/sec, to a floppy
28 kb/sec.
See also:
NSF performance on Linux (update Feb. 99).
Test Environment
- Done August 19th, 1996
- Many small files with a typical size of 4 Mbyte were archived with adsmcli
and written directly to TAPE.
- In detail:
Results
The job executed in 8010 sec, transfering 11.89 Gbyte. The average transfer
rate was 1.48 Mbyte/sec. The average CPU load was about 25%.
Test Environment
- Done September 17th, 1995
- Files on a DLT2000 tape with 91 files and a total of 10 Gbyte data were
opened through SMFS in random order to read the first record. The time to
access the first record was measured.
- In detail:
- Used tape B04011 containing the files s117_1932.lmd
to s117_2022.lmd.
- The tape had been imported into a SMFS file system, all file accesses
were done through SMFS.
- A random permutation of the file name sequence was determined, the
scrambled list thus contained each file once, but in random order.
- A DUMP/REC=COUNT=1 command was used to access the first
record of each file.
- Note: This test determines positioning times only. All files
were on the same volume, thus mount and load times did not play any
role.
Results
The distribution of access times in seconds for the random access of 91 files
is shown in the histogram below. The average access time was 35.9 seconds.
0 - 9 4 ****
10 - 19 18 ******************
20 - 29 19 *******************
30 - 39 17 ***************** Average 35.9 sec
40 - 49 10 **********
50 - 59 10 **********
60 - 69 9 *********
70 - 79 2 **
80 - 89 2 **
Conclusions
- SMFS can access files on a DLT2000 volume with an average positioning time
of about 40 seconds.
- This agrees well with the DLT2000 specification of 45 sec average and under
90 sec worst case positioning time for the LOCATE (2Bh) SCSI
command.
- In contrast, a set mag /skip=files=31 to skip to the 10th file
took almost 10 minutes ! The `file mark cache' feature of the DLT2000
drives seems not to help in this case.
Test Environment
- Done September 8th, 1995
- Test: Processing 4 DLT tapes with 10 Gbyte each in parallel over TMSCP
- Setup:
Results
The four jobs showed the following execution times:
Tape Data from CPU Time Execution Time Throughput
------ -------------- ----------- -------------- -----------
B04021 AXP612$MKD0: 00:43:53.43 02:25:39.86 1140 kb/sec
B04022 AXP612$MKD100: 00:48:09.34 02:30:32.51 1100 kb/sec
B04023 AXP601$MKA500: 00:47:41.63 03:04:10.67 905 kb/sec
B04024 AXP603$MKB500: 00:43:01.15 02:36:24.57 1065 kb/sec
Conclusions
Test Environment
- Done between May 7th and 10th, 1995
- Test: Image backup of a 7Gbyte disk volume set (al_pool0:)
to a DLT2000.
- Backup qualifier: /image /media=compact/list=...
- Disks: two DEC DSP5350S (6976375 blocks each) bound to
a volume set. Attached to two different SCSI ports (DKA200:
and DKB200:).
- Tape: DLT2000 attached to a third SCSI port (mkd0:)
- Node: axp612, a DEC 3000 Model 400.
Results
Test Qualifier /Block Files Alloc Used DIO's CPU Time Rate
Mbyte Mbyte sec Mbyte/s
1. - 27648 13612 6643 6514 544646 1792 01:03:34 1.71
2. /ver 27648 13611 6640 6514 1088509 3282 02:08:45 0.84
3. /ver/nocrc 27648 13619 6844 6707 1119842 1130 02:06:37 0.88
4. /nocrc 49152 13630 7144 6773 334776 662 01:03:43 1.77
Conclusions
- Backup at full DLT2000 speed with default CRC and redundancy generation
takes about 50% of the CPU power of a 4/166 class AXP system.
- Using the /nocrc qualifier reduces the CPU usage by a factor 2.8
from 0.275 sec/Mbyte to 0.098 sec/Mbyte.
- The verification (/veri) almost exactly halfs the throughput.
Test Environment
- Done on May 6th, 1995
- Test: Run paw_s117_x11 with soa for the first 5000 events
of file s117_2267 (from al_temp0, so file was remote
accessed in all cases).
Results
Node Model Clock Sint Sfp Time Performance
axp634 ASta 400 4/233 233 113 166 00:44.87 1.64
axp628 DEC 3000 600 175 88 165 00:52.87 1.39
axp602 AServ 2100 4/200 200 97 160 00:53.75 1.37
axp633 ASta 200 4/166 166 83 125 01:01.07 1.20
axp603 DEC 2100 A500MP 01:02.01 1.19
axp613 DEC 3000 400 166 01:13.81 1.00 <-- norm.
axp620 DEC 3000 300 01:24.40 0.87
axp626 DEC 3000 300X 64 102 01:21.74 0.90
axp625 DEC 3000 300LX 125 49 77 01:42.20 0.72
Conclusions
- The relative performance of an AlphaStation 400 model is better than
what one would expected from clock frequency ratios.
Test Environment
- Done on April 29th, 1995
- Processors:
- AXP601 (AXP-2100-4/200)
- AXP612 (AXP-3000-M400)
- AXP627 (AXP-3000-M300LX)
- VSBZ (VAX 4000-60)
- VSBL (VAX 3100-M76)
- VSAO (VAX 3100)
- Test: Allocated a 16 Mbyte file and measured to time to write and read
1024 records of 16kbyte with Fortran unformatted I/O (program
test_diskspeed).
- The set rms setting was /block=96/buff=3 thus bypassing
the virtual I/O cache.
- Various disk, controller and client-server combinations were tested.
Results
Test Client Server Net Disk Type Size Control Write Read
Gb Mb/s Mb/s
1. AXP601 local - $1$DIA130 RF72 2.0 DSSI 1.27 1.45
2. AXP601 local - $1$DIA440 RF72 4.0 DSSI 1.31 1.42
3. AXP601 AXP612 FDDI $12$DKC0 Sea ST15150N 4.0 SCSI-int 2.35 2.90
4. AXP601 AXP612 FDDI $12$DKA200 DEC DSP5350S 3.5 SCSI-ext 1.53 2.84
5. AXP627 AXP601 Ethe $1$DIA130 RF72 2.0 DSSI 0.83 0.68
6. AXP627 AXP612 Ethe $12$DKC0 Sea ST15150N 4.0 SCSI-int 0.94 0.82
7. AXP612 AXP601 FDDI $1$DIA130 RF72 2.0 DSSI 1.20 1.26
8. AXP612 local - $12$DKC300 RZ26 1.0 SCSI-int 1.46 2.24
9. AXP612 local - $12$DKC0 Sea ST15150N 4.0 SCSI-int 3.01 3.62
10. AXP612 local - $12$DKA200 DEC DSP5350S 3.5 SCSI-ext 1.86 3.23
11. AXP612 local - $12$DKD200 DEC DSP5200S 2.0 SCSI-int 1.15 1.89
12. VSBZ local - $9$DKA400 ?generic? 2.0 SCSI-ext 1.92 1.81
13. VSBZ local - $9$DKA0 RZ28 2.0 SCSI-int 2.52 2.02
14. VSBZ local - $9$DKA100 RZ25 0.4 SCSI-int 1.23 1.87
15. VSBL local - $55$DKB0 ?generic? 1.0 SCSI-ext 0.47 0.43
16. VSAO local - $25$DKA100 ?generic? 0.6 SCSI-ext 0.50 0.44
Conclusions
- The DSSI controller is a serious bottleneck. The remote access from
AXP601 to a `normal' SCSI disk on AXP612 over FDDI is
about twice as fast as the local access to a disk on a DSSI to SCSI
controler !
Test Environment
- Done on April 15th, 1995
- Processor: AXP612, an AXP-3000-M400
- Test Job: Copied a raw data file (S117_0131.LMD) with 234400
blocks from a DLT2000 drive to a local Seagate ST15150N disk
(al_data0).
- With the default set rms settings (block = 32, tape buffer = 6 and
disk buffer = 3) do a copy with and without /extension.
Results
Test DIO's BIO's FCP rate CPU time Time Data rate
no /ext 12854 1844 13.0 12.70 2:30.10 781 kb/sec
/ext=8192 11136 41 0.33 6.62 1:33.26 1256 kb/sec
Conclusions
- Using /ext=8182 is highly recommendable, it
- improves transfer rate by 60%
- reduces CPU load by 50%
- The DLT2000 is streaming with /ext=8182 whereas it has to
backup frequently otherwise.
- The default extent quantity on al_data0 seems to be 128 blocks.
Test Environment
- Done on September 10th, 1994
- Processor: AXP612, an AXP-3000-M400
- Test Job: $ gltest comp ref_alib:am_pck*.for /nodeb
Compile 18 files, total of 1821 lines with 169 includes.
- Vary RMS buffer size (set rms/blo) and PFCDEFAULT
sysgen parameter
- The cache performance was measured with show mem/cache/full,
determining the differential Read IO count, Read Hit rate and
bypass rate.
- The FORTAN compiler dec_fortran was reinstalled after
PFCDEFAULT was changed.
Results
Test rms/blo PFC Read IO Hit Bypass Elapsed time
1 64 64 2726 78% 11.7% 48.19 sec
2 32 64 2632 89% 9.8% 43.25 sec
3 32 32 2550 91% 8.8% 45.90 sec
Conclusions
- Setting the rms block_count to 32, below the cache cutoff of 35 blocks,
improves performance somewhat.
- Setting PFCDEFAULT to 32 doesn't improve performance. Page fault
I/O's are probably always bypassing the virtual I/O cache.
- Bottom line: The cache doesn't help much because image activation is what
usually slows interactive sessions down !
Test Environment
- Done on August 15th, 1994
- Server on VSBZ, a VAX 4000 Model 60 under VMS V5.5
- Mosaic client on VSBZ or AXP612, a AXP-3000-M400
under VMS 6.1
- The display was in all cases on an Xterminal XWTAU, a Tek-XP338.
- The test was reloading /www/img/gif_summary.html, a document with 40
inlined GIF images, all together about 33 kbytes.
- Tested server configurations:
- old CERN Server, version 2.14
- new CERN Server, version 3.0pre6vms3 with DNSLookup On
- new CERN Server, version 3.0pre6vms3 with DNSLookup Off
- The message line in Mosaic was occluded in some test, visible in others.
Results
Test Server DNSLookup Message VSBZ AXP612
1 2.14 -- occluded 89 154
2 3.0pre6vms3 On occluded 46 122
3 3.0pre6vms3 Off occluded 44 23
4 3.0pre6vms3 Off visible 67 68
Conclusions
- The 3.0 server with DNSLookup On is faster as the 2.14 server,
which reverse translated IP addresses by default.
- Switching the reverse IP address translation of with DNSLookup Off
gives a substantial performance improvement if client and server execute
on different machines.
- The frequent update of the Mosaic status display slows things down too.
Note: It turned out on August 18th, that the UCX setup was wrong. The primary
name server did never respond, causing a 1 second delay for the fail-over to
the secondary name server. This explains much of the DNS bottlenecks...
Test Environment
- Done on July 15, 1994
- AXP612, a DEC 3000 Model 400
- DEC Fortran V6.2-508
- Compiled the ALADIN MUSIC software (ref_alib:am*.for) with various
compiler options and compared the execution time for the analysis of the
first 500 events of S114 run 2631.
- In test 1-3 all 153 source files were compiled separately, in test 4-7
just one concatinated source was compiled.
Results
Test Time (sec) Fortran qualifiers Comments
1 29.24 - none - old objects from openVMS 1.5
2 28.94 - none - re-compiled with Fortran V6.2-508
3 28.54 /opt=(lev=5)
4 28.20 - none - 1 source file
5 29.50 /opt=(unro=8) 1 source file
6 30.70 /opt=(unro=1) 1 source file
7 101.14 /noopt 1 source file
Conclusions
- The default is fine. /OPT=LEVEL=5 doesn't increase performance
- Messing with UNROLL doesn't pay out
Test Environment
- Done on July 14, 1994 and 9th May, 1995
- AXP612, a DEC 3000 Model 400
- Tape connected to MKB500: (SCSI-int 7.94) or MKD500:
(SCSI-ext 5.95).
- Wrote a single S114 or S117 RAW data file until cardridge was full.
S114: run 2880 266832 blocks, 136.6 Mbyte (130.3 Mb)
S117: run 2267 234400 blocks, 120.0 Mbyte (114.4 Mb)
Results
Test Data Compress #files DataVolume TimeNeeded DataRate
1. S114 no 73 9.973 Gbyte 2.34 hr 1.18 Mbyte/sec
2. S114 yes 89 12.160 Gbyte 2.36 hr 1.43 Mbyte/sec
3. S117 no 84 10.081 Gbyte 2.34 hr 1.19 Mbyte/sec
4. S117 yes 90 10.801 Gbyte 2.33 hr 1.28 Mbyte/sec
5. S117 yes 89 10.681 Gbyte 2.33 hr 1.27 Mbyte/sec
Conclusions
- The rated capacity of 10Gbyte is exactly reached in uncompressed
mode.
- The improvement reached with compression is 21% for S114 and only 5.9%
for S117 raw data, thus not very important.
- Note: The positioning (file search) is quite slow. It took 62 minutes
to skip to the logical end of volume for a volume with 7.9 Gbyte data
on it ! The Sequential Media File System SMFS will improve this (see
SMFS Positioning test).
Test Environment
- Done on July 9, 1994
- AXP612, a DEC 3000 Model 400; FDDI connected directly to GigaSwitch
- DEC TCP/IP Services for OpenVMS AXP Version 3.1
- FTPSERVC IBM MVS V2R2.1
- FTP client on AXP612, connection to MVS16
- The IP routes on MVS16 and AXP612 were set to use
the CTC connection between MVS16 and rzri6f
- Transfered a 19644416 bytes dataset (LMDV.S114NMU.RAW4900) in
binary mode to a local disk on AXP612
- Varied the following parameters
- Socket window size (logical name UCX$FTP_WNDSIZ)
- File extend size (with set rms/ext=nn)
- RMS buffering (with set rms /blo=nn /buf=nn)
Results
Test /blo /buf /ext WNDSIZ time rate Comment
(sec) (kb/sec)
--------------------------------------------------------------
1 64 3 0 9216 50.83 377 default settings...
2 112 4 2048 16384 30.32 632
3 112 4 2048 24576 23.25 825
4 112 4 2048 32768 20.87 919
5 112 4 0 32768 29.21 656
6 16 3 2048 32768 20.25 947
7 8 3 2048 32768 20.84 920
8 64 3 2048 24576 23.66 810 No other traffic
9 64 3 2048 24576 44.34 432 also transfer to VSBZ
Conclusions
- One achieves a transfer rate of 900-950 kb/sec with an extend
size of 2048 blocks and a window size of 32 kbytes. (Test 4)
- Transfer size improves with increasing window size (Test 2-4).
32k however is the maximum accepted by the MVS server, larger values
result in the default 9k window size !
- Using a large file extend size improves transfer rate (Test 4-5).
- The block size RMS setting has no effect on FTP (Test 4,6,7).
- A parallel FTP transfer to node VSBZ, also routed trough the
CTC connection and rzri6f but using Ethernet on the last
hopp, reduces the rate by almost a factor 2 (Test 8,9) !
Under optimized conditions one gets 900-950 kb/sec transfer rate. The
bottleneck seems to be the CTC connection between mvs16 and
rzri6f.
Back to KP3 Computing home page
Last updated: February 19th, 1999
Walter F.J. Müller