wiki:XmtVirtualFunctions

Using Virtual Functions on the XMT
Greg Mackey

1. Introduction

This experiment is to determine the cost of using virtual functions on an XMT. Some earlier attempts to use virtual functions at Sandia several years ago resulted in bad performance. This experiment will determine the cost of using virtual functions in a simple example to see if they are a viable option on the XMT. The code for this experiment is virtual_test.cpp Download. The canal output for compiling this code on an XMT 1 using software release 1.5 is virtual_test.canal Download.

2. The Classes

  • Polygon – children add new interface, but don’t modify parent’s
    • Square
    • Triangle
  • VirtualPolygon – children modify parent’s virtual function
    • VirtualSquare
    • VirtualTriangle

The functions area() and no_inline_area() are defined in the child classes while height_times_width() and set_dimensions() are defined in the parent classes.

3. Test Sections

  • Square Array
    • Operations are performed on an array of Square.
    • This is the baseline.
  • Square Pointer Array
    • Operations are performed on an array of pointers to Square.
    • This shows the cost of an array of pointers to objects as compared to an array of objects as in the Square Array section.
  • Polygon Array Normal Inheritance
    • Operations are performed on array of pointers to Polygon.
    • This shows the cost of an array of pointers to parent objects as compared to an array of pointers to children objects as in the Square Pointer Array section.
  • Polygon Array Virtual Inheritance
    • Operations are performed on array of pointers to VirtualPolygon.
    • This shows the cost of an array of pointers to parent objects with virtual functions as composed to an array of pointers to parent objects that don’t have virtual functions as in the Polygon Array Normal Inheritance section.

The array size is the same for all sections. In the Polygon sections, the array is initialized with alternating Square and Triangle objects. Also in these sections, Square / Triangle Init and Square / Triangle area() operate on only the Squares or Triangles, so they operate on half the elements in the array. Additionally, Square / Triangle area() in the Normal Inheritance section static cast to the child object type before calling area().

4. Test Results and Discussion

The tests are using a test size of 1000000. Note that I only give results for a single run of the tests. Each run has slightly varying results, but all of them show the same broad conclusions about the costs of the various features.

4.1 Compiled Using gcc 4.2 with –O2 on a Mac Pro.

The following results show a baseline for using various features when using a good modern compiler and running on a serial system. Comparing the Square Pointer Array results to the Square Array results, we see that pointer indirection increases initialization times by about 5x and destruction times by about 10x. Pointer indirection adds around 30% to the cost of calling functions on the classes.

Comparing the Polygon Array Normal Inheritance results to the Square Pointer Array results, we see that calling parent functions on an array of child objects stored as parent object pointers adds no cost to calling the functions. Calling child functions when statically casting the pointers to child types also adds no cost to calling the functions.

Comparing the Polygon Array Virtual Inheritance results to the Polygon Array Normal Inheritance results, we see that initialization and destruction times are somewhat higher which is probably due to having to deal with the virtual table. Calling any parent or child function on an array of child objects stored as parent object pointers has a cost increase in the range of 50% to 100%. Statically casting the parent objects to child objects and calling the child’s implementation of the virtual function doesn’t reduce the cost of calling the function. These results for virtual functions are probably due to how the compiler handles objects with virtual functions.

Square Array:
                Square Init:  0.013782
           set_dimensions():  0.005853
       height_times_width():  0.005032    2000000
                     area():  0.005337    2000000
              Square Delete:  0.003479

Square Pointer Array:
                Square Init:  0.057156
           set_dimensions():  0.007643
       height_times_width():  0.006819    2000000
                     area():  0.006699    2000000
              Square Delete:  0.104700

Polygon Array Normal Inheritance:
                Square Init:  0.027229
              Triangle Init:  0.029506
           set_dimensions():  0.007750
       height_times_width():  0.006867    2000000
              Square area():  0.004432    1000000
            Triangle area():  0.004482    500000
                Poly Delete:  0.114717

Polygon Array Virtual Inheritance:
                Square Init:  0.034639
              Triangle Init:  0.038253
           set_dimensions():  0.013812
       height_times_width():  0.011250    2000000
                     area():  0.011079    1500000
              Square area():  0.006966    1000000
            Triangle area():  0.006734    500000
                Poly Delete:  0.121331



4.2 Compiled on an XMT 1 using 1.5 Software

The following results show the costs for using various features when using the XMT 1.5 compiler and running on an XMT 1. Comparing the Square Pointer Array results to the Square Array results, we see that pointer indirection increases initialization times by about 26x and destruction times from nothing to more than half the cost of initialization. These increases are mostly because allocating / deallocating memory on the XMT is not cheap. Pointer indirection adds around 50% to the cost of calling functions on the classes.

Comparing the Polygon Array Normal Inheritance results to the Square Pointer Array results, we see that calling parent functions on an array of child objects stored as parent object pointers adds no cost to calling the functions. Calling child functions when statically casting the pointers to child types also adds no cost to calling the functions. Comparing the inlined to the not inlined versions of Square / Triangle area(), we see that in this case not inlining the function basically doubles the cost of the function. The user must also use the assert parallel pragma. The compiler won’t parallelize the loop otherwise because it doesn’t know if the function has unknown side effects.

Comparing the Polygon Array Virtual Inheritance results to the Polygon Array Normal Inheritance results, we see that the initialization time is a little higher which is probably due to having to deal with the virtual table. Non-virtual functions have the same cost whether or not the object contains a virtual function. The cost of calling area() in the Polygon Array Normal Inheritance section is the sum of finding the Square and Triangle areas as each piece does half the array. The virtual function area() has about a 5x cost compared to its non-virtual counterpart. The user must use the assert parallel pragma to get the compiler to parallelize the loop containing the virtual function. Statically casting the parent objects to child objects and calling the child’s implementation of the virtual function doesn’t reduce the cost of calling the function, and the user must still use the assert parallel pragma to get the compiler to parallelize the loop.

Comparing the 1 processor run to the 10 processor run, we see that all of the loops scale well.

4.2.1 1 processor

Square Array:
                Square Init:  0.200548
           set_dimensions():  0.018795
       height_times_width():  0.019088    2000000
                     area():  0.019075    2000000
              Square Delete:  0.000465

Square Pointer Array:
                Square Init:  5.430825
           set_dimensions():  0.029308
       height_times_width():  0.029220    2000000
                     area():  0.029247    2000000
              Square Delete:  3.711512

Polygon Array Normal Inheritance:
                Square Init:  2.537096
              Triangle Init:  2.526114
           set_dimensions():  0.029644
       height_times_width():  0.029368    2000000
              Square area():  0.015205    1000000
            Triangle area():  0.015213    500000
    No Inline Square area():  0.028855    1000000
  No Inline Triangle area():  0.028800    500000
                Poly Delete:  3.717798

Polygon Array Virtual Inheritance:
                Square Init:  2.743437
              Triangle Init:  2.766702
           set_dimensions():  0.029907
       height_times_width():  0.029436    2000000
                     area():  0.155269    1500000
              Square area():  0.078456    1000000
            Triangle area():  0.078353    500000
                Poly Delete:  3.706754



4.2.2 10 processors

Square Array:
                Square Init:  0.032570
           set_dimensions():  0.002307
       height_times_width():  0.002422    2000000
                     area():  0.002416    2000000
              Square Delete:  0.000445

Square Pointer Array:
                Square Init:  0.715191
           set_dimensions():  0.003681
       height_times_width():  0.003700    2000000
                     area():  0.003695    2000000
              Square Delete:  0.368542

Polygon Array Normal Inheritance:
                Square Init:  0.322430
              Triangle Init:  0.323723
           set_dimensions():  0.003696
       height_times_width():  0.003704    2000000
              Square area():  0.001982    1000000
            Triangle area():  0.002015    500000
    No Inline Square area():  0.003837    1000000
  No Inline Triangle area():  0.003832    500000
                Poly Delete:  0.368709

Polygon Array Virtual Inheritance:
                Square Init:  0.364686
              Triangle Init:  0.360857
           set_dimensions():  0.003712
       height_times_width():  0.003761    2000000
                     area():  0.016558    1500000
              Square area():  0.008794    1000000
            Triangle area():  0.008800    500000
                Poly Delete:  0.369526



5. Conclusions

There is a sizable cost for using virtual functions on an XMT, but they can be used if necessary. Executing a virtual function was about five times slower in execution time than a non-virtual function, but virtual functions can be executed in parallel with an assert parallel pragma. I propose that part of the performance hit is due to the compiler not being able to inline virtual functions and part is due to the extra operations that decide which version of the virtual function to execute.