Graphics device state change optimization

The purpose of this post is to describe an algorithm prototype intended to reduce the amount of state changes to be submitted to the graphics device when rendering a group of meshes whose visual aspect (a set of graphics device states and shader input data) vary. Submission of state changes and shader input data is expensive and is a typical problem every engine tries to solve.

Any correction, be it gramatical or technical, is welcome. If you have any idea to improve the algorithm, feel free to share it in the comments below.

Code and algorithms discussed in this post have not been tested yet, they have been elaborated from scratch and almost everything are conjectures with an actual technical base. Similar to thinking aloud.

 

Inputs

The algorithm receives 2 sets of data: mesh subsets and their corresponding visual aspect Id. A mesh subset is just a vertex buffer that will be rendered using a single visual aspect. A visual aspect contains all the information required to render a mesh subset in an expected way, which includes data to be sent to the shaders pipeline and graphics device states to be changed. A visual aspect structure can be seen below:

Visual Aspect

  • Visual Aspect Component [ #COMPONENTS ]
    • Texture Layer [ #LAYERS ]
      • Texture Id
      • Sampler Id
      • Texture Blender
        • Blend Factor
        • Blend Operation
    • Texture Stack Size
    • Component Enabled
  • Material Id
  • Shaders Pipeline Id
  • Alpha Blender Id
  • Opacity
  • Wireframe Enabled
  • Face Culling Enabled

 

#COMPONENTS is the number of visual aspect components, which are: Ambient, Diffuse, Specular, Normals, Displacement, Height, Emission, Shininess, Reflection, Lightmap, Opacity; hence, the value of #COMPONENTS is 11.

#LAYERS is the number of texture layers in the stack, currently 8.

The total number of attributes of a visual aspect is 11 x (8 x 4 + 2) + 6 = 380.

The following table shows some additional information about every attribute. Cost is calculated taking into account the number of state change functions to call, the amount of data to copy and, in the case of Shader Pipeline Id, the consequences  of changing (depending on the implementation of the engine, it may require always sending all the attributes whose Operation in the table is Send to shader). The Probability of change is just an estimation based of personal recalls and may change in the future; the high the value is, the more often such attribute is expected to be different among the existing visual aspects.

Attribute

Operation

Cost

Probability of change

Texture Id

Change state

1

6*

Sampler Id

Change state

1

3*

Blend Factor

Send to shader

1

3*

Blend Operation

Send to shader

1

3*

Texture Stack Size

Send to shader

2

5**

Component Enabled

Send to shader

3

5**

Material Id

Send to shader

2

5

Shaders Pipeline Id

Shaders Pipeline

3

3

Alpha Blender Id

Change state

2

2

Opacity

Send to shader

1

4

Wireframe Enabled

Change state

1

1

Full Culling Enabled

Change state

1

1

*This value must be multiplied by both modifiers described below, which depend on the texture layer and the visual aspect component.

** This value must be multiplied only by the visual aspect component modifier, described below.

The probability of change depends also on which visual aspect component and which texture layer we are evaluating. The most used texture layer would be, of course, the first layer; the most used visual aspect component would be, with no doubt, the Diffuse component. So, these are the probabilities of each one:

Texture Layer

Probability of change modifier

0

8

1

7

2

6

3

5

4

4

5

3

6

2

7

1

 

Visual Aspect Component

Probability of change modifier

Diffuse

11

Specular

10

Normals

9

Displacement

8

Shininess

7

Reflection

6

Opacity

5

Height

4

Lightmap

3

Emissive

2

Ambient

1

 

The Algorithm

The main idea is to put similar visual aspects together so the number of states to change when switching from one visual aspect to the next is the minimum possible. The steps of the algorithm are:

1-   Grouping mesh subsets by visual aspect in common and ordering by equality.

2-   Comparing all the visual aspects to each other and storing the differences.

Proposal A:

3-   Ordering comparison results by similarity, probability of change and cost.

4-   According to the ordered comparison results, extract the list of visual aspects putting the ones that have less differences contiguously.

Proposal B: Traveling Salesman Problem

3-   Choosing visual aspect with more expensive edges.

4-   Generating a minimum spanning tree.

5-   Traversing the tree in pre-order walk.

This algorithm is valid only in case the number of different visual aspects is greater than 2.

Mesh subsets grouped by every visual aspect should be rendered in the order the visual aspects appear in the result list.

 

1-    Grouping mesh subsets by visual aspect in common

As an input to this stage, we have a list of mesh subsets (which belong to different meshes associated to different scene entities). Every mesh subset has one and only one associated visual aspect, and every visual aspect may be associated to one to many mesh subsets.

The output will be one list of mesh subsets per visual aspect.

Mesh subsets are traversed one by one and added to one of the lists, the one that corresponds to its visual aspect. If such list does not exist yet (because no previous mesh subset had that visual aspect), it is created and the mesh subset is added as the first element.

 

After all mesh subsets have been grouped, they must be ordered in such a way that all those subsets that are equal (same index in same mesh) inside the same group end up contiguous. This way, when vertices are rendered, it is not necessary to set up the same subset more than once.

 

2-    Comparing all the visual aspects to each other and storing the differences

At the end of this stage, we should know how different every visual aspect is with respect to the rest.

As we have seen, the number of attributes of a visual aspect, this is, the number of changes we have to check per pair of visual aspects is 380 at most; note that if the Component Enabled attribute is False, there is no need to compare the rest of attributes in that visual aspect component. The same type of omission occurs depending on the value of Texture Stack Size.

There are many ways to store the result of all the comparisons but we will use a bit set; i.e. every result will be represented with 1 bit:

  • 0 means attributes are different
  • 1 means attributes are equal

 

Therefore, we need 380 bits per comparison. Modern CPUs support operating with 128 and 256-bit operands (moreover, at the time of writing, it is planned to add 512-bit instructions too in the 2nd generation of Intel Xeon Phi processors family, named Knights Landing) so we could use either 2x256-bit integers (512 bits in total, wasting 512 – 380 = 132 bits) or 3x128-bit integers (384 bits in total, wasting 384 – 380 = 4 bits) to store all the comparison results. Using 128-bit types would let us pack arrays more tightly. However, the amount of visual aspect attributes is expected to increase so maybe leaving just 4 free bits is not a good idea; anyway, if we add more than 4 attributes, we will need 512 bits (4x128 or 2x256) and it is faster to execute the same instruction twice than four times.

Now we have to designate the position of every attribute in the bit set. The criterion to follow is that the most significant bit (MSB) corresponds to the attribute with a highest cost of change and with a greatest probability of change. Note that the MSB is not the actual MSB of the 512 bits bit set but the bit #380 (being 1 the least significant bit (LSB)). This table shows the order of all the attributes in the bit set, using a relative order (POC = Probability of change). Although it is not perfect, that table is a well enough estimation of which position every attribute should occupy in the bit set. The Final POC column is calculated by multiplying the Probability of change by the corresponding modifiers for the visual aspect component and the texture layer to which the attribute belongs (depending on the attribute, either modifier could not be applicable, as shown in the tables of the Inputs section). The Final order is calculated by multiplying the Final POC column by the Cost column.

A small adjustment has been performed to get a more reasonable result, since multiplying by the mentioned modifiers has a side effect: those attributes that are not multiplied end up being one or two orders of magnitude lesser than those that are. For example, Component Enabled is multiplied only by the Visual Aspect Component modifier (which can be up to 11) whereas Texture Id is multiplied also by the Texture Layer modifier (which can be up to 8); therefore, the Probability of change of the first attribute will be always lesser than the second’s by one order of magnitude. This makes no sense, as we will see later; the Component Enabled bit must be checked before the rest of attributes of the component. To fix this incorrectness, we just increase some attributes by either one or two orders of magnitude, multiplying by 10 or 100 respectively.

Attribute

Probability of change modifier

Texture Stack Size

10

Component Enabled

10

Material Id

100

Shaders Pipeline Id

100

Alpha Blender Id

100

Opacity

100

Wireframe Enabled

100

Full Culling Enabled

100

 

Comparison results (blocks of 512 bits) will be stored in a 32-byte aligned array. At the same time, we need to store another array that contains pairs of visual aspect Ids, which correspond to every comparison result, and one last array to store the amount of equal attributes per comparison.

In order to know which visual aspects are more similar to each other we need to compare all to all. We can know in advance how many comparisons we will perform by using a simple combinatorial analysis formula for calculating the number of combinations in a set of N elements taken 2 at a time in any order, without repetition:

In our case, m is the number of visual aspects to compare (#VAspects below) and n equals 2 since we are going to make pairs.

Calculating factorials can be both expensive in terms of CPU usage and inaccurate due to integer overflows, which will occur for sure if the rendered scene includes just 100 different visual aspects; take into account that 12! represents a number greater than what can be stored into a 32-bit unsigned integer.  In order to avoid these problems, an equivalent formula can be used instead:

If the simplification steps are not obvious to you, just think about a very simple case. For example let m be 5:

Now the calculation is reduced to 1 division, 1 subtraction and 1 multiplication. It is done just once per frame so it is not really a big performance improvement.

Just to have an idea of what numbers are we talking about, when using 100 visual aspects we have to perform exactly 4,950 visual aspect comparisons. The worst case would mean that we have to do 4950 x 380 = 1,881,000 attribute comparisons. Obviously this is not likely to happen since most of the visual aspects are expected to consist in 1 to 3 textures combined using 1 to 5 components (diffuse, specular, displacement, normals map and shininess, maybe). We can also know the minimum number of comparisons by counting non-conditional attributes (including Component Enabled): 7 x 4950 = 34,650 attribute comparisons.

An additional advantage of knowing the number of comparisons is that we can reserve exactly the amount of memory we need for the three arrays instead of reallocating buffers as they grow. The array of comparison results when using 100 visual aspects would occupy 512 x 4950 = 2,534,400 bits (316.8 Kb), which is not a big number taking into account the amount of memory required by applications nowadays; the array of visual aspect ids would require 64 x 4950 = 316,800 bits (39.6 Kb), supposing we use pairs of 32-bit unsigned integers; and lastly the array of equal attribute counts, for which 16-bit unsigned integers could be used, would need 16 x 4950 = 79,200 bits (9.9 Kb). Total: 366.3 Kb.

Optimization note: Reserved memory could be kept to be recycled in the next frame if and when the required memory block for the next frame is not bigger. Pool allocators might be useful for this purpose. In the end, there will be no need to reserve more memory as the maximum will be reached eventually.

As said before, visual aspect attributes are compared one by one and, every time they are equal, a 1 is written at the corresponding position in the bit set; otherwise a 0 is written. This is performed by using bit masks and the AVX2 256-bit binary OR instruction called VPOR, which is wrapped by the intrinsic function _mm256_or_si256. From Intel’s documentation:

VPOR ymm1, ymm2, ymm3/m256

Bitwise OR of ymm2/m256 and ymm3

 

extern __m256i _mm256_or_si256(__m256i s1, __m256i s2);

Performs a bitwise logical OR operation of the signed integer elements of source vector s2 and the corresponding elements in source vector s1 and stores the result in the destination vector. If either of the corresponding bits of the first and second vectors are 1, each bit of the result is set to 1, otherwise it is set to 0.

 

REMINDER: Do not forget to activate AVX2 compiler extensions with either /arch:AVX2 (VC++) or –mavx2 (GCC). You need to compile for x64 since AVX and AVX2 are only available on 64-bit processors.

We are forced to use signed integers, there is no specific intrinsic function for vectorized unsigned integers. In a later step we will need to compare bit sets as signed integers, which could be problematic if we set the MSB to 1 because that would mean the integer is negative; any comparison between that value and any other would produce an incorrect result since we need unsigned comparisons. Fortunately we do not need to worry about this as long as we do not use the MSB.

The following example shows how input data is processed through this stage:

Imagine every visual aspect consists of just 4 attributes in a bit set of 1 byte. Imagine also that the last 4 bits are assigned to AT1, AT2, AT3 and AT4, in that order. If we have the following 3 visual aspects:

VAspect 1

VAspect 2

VAspect 3

AT1 = 5

AT1 = 2

AT1 = 22

AT2 = 4

AT2 = 4

AT2 = 100

AT3 = 0

AT3 = 1

AT3 = 0

AT4 = 10

AT4 = 99

AT4 = 10

 

We have to do the following comparisons:

VAspect 1 -> VAspect 2

VAspect 1 -> VAspect 3

VAspect 2 -> VAspect 3

 

VAspect 1 -> VAspect 2

Attribute

VAspect 1

VAspect 2

Result

AT1

5

2

0

AT2

4

4

1

AT3

0

1

0

AT4

10

99

0

 

VAspect 1 -> VAspect 3

Attribute

VAspect 1

VAspect 3

Result

AT1

5

22

0

AT2

4

100

0

AT3

0

0

1

AT4

10

10

1

 

VAspect 2 -> VAspect 3

Attribute

VAspect 2

VAspect 3

Result

AT1

2

22

0

AT2

4

100

0

AT3

1

0

0

AT4

99

10

0

 

Using the proper bit masks for every bit position and executing the OR operation, we would end up with these arrays:

Visual Aspects compared (pair of ids)

Comparison result (bit set)

Equal attrib. counters

VAspect 1 -> VAspect 2

00000100

1

VAspect 1 -> VAspect 3

00000011

2

VAspect 2 -> VAspect 3

00000000

0

 

In the case of conditional attributes, such as Texture Id which depends on both Texture Stack Size and Component Enabled, three things can happen:

A.      When the condition has the same value in both visual aspects.

a)    And Component Enabled is TRUE or Texture Stack Size is greater than attribute’s texture layer.

All the dependent attributes are compared.

b)    And Component Enabled is FALSE or Texture Stack Size is not greater than attribute’s texture layer.

No dependent attribute is compared. The bit at the position of every dependent attribute is set to 1. The equal attribute counter is increased by the number of affected dependent attributes.

 B.       When the condition has a different value.

No dependent attribute is compared. The bit at the position of every dependent attribute is set to 0.

Note: If the 512 bits integer equals zero from the beginning it is not necessary to set any bit to zero during the process.

As we will see later, it is necessary to add an extra group of comparisons to the array. This group is reserved for storing pairs whose first Id is the last visual aspect of the input list (“VAspect 3”, in the example) and the second Id is each of the other visual aspects. Due to this addition, this amount of rows must be considered when pre-calculating the size of the arrays (not taken into account in previous calculations in this document). Obviously, the number of extra rows equals the number of different visual aspects – 1.

Every time the algorithm visits a pair whose second Id equals the last visual aspect Id in the original array (“VAspect 3”, in the example), the entire “row” has to be copied to the end of the array, swapping both Ids in the destination pair. This will improve the performance when searching for a pair with a first Id that matches the last visual aspect Id, in the step 4. Remember that all the memory we need has been already reserved so when we say “copy to the end of the array” we are referring to the logical position, i.e. the first empty “row”.

Look at the following example (not related to any other):

Visual Aspects compared (pair of ids)

Comparison result (bit set)

Equal attrib. counters

VAspect 1 -> VAspect 2

0101010

3

VAspect 1 -> VAspect 3

0110101

4

VAspect 1 -> VAspect 4

0010101

3

VAspect 2 -> VAspect 3

0000001

1

VAspect 2 -> VAspect 4

1110010

4

VAspect 3 -> VAspect 4

0111000

3

VAspect 4 -> VAspect 1

0010101

3

VAspect 4 -> VAspect 2

1110010

4

VAspect 4 -> VAspect 3

0111000

3

 

Additionally, in order to optimize next steps, we have to create an array of unsigned integers as we create comparison pairs; every time the first Id of the pair changes, we add its position index to this new array. Since we added an extra group, as explained in the previous paragraph, the length of the array equals the amount of different visual aspects. Thanks to this information, the program knows where every group of comparisons starts and will be able to do less comparisons later. Take a look to the example below (not related to any other) for 4 visual aspects (so the array has a length of 4):

Group positions

 Pairs of Ids

0

VAspect 1 -> VAspect 2

VAspect 1 -> VAspect 3

VAspect 1 -> VAspect 4

3

VAspect 2 -> VAspect 3

VAspect 2 -> VAspect 4

5

VAspect 3 -> VAspect 4

6

VAspect 4 -> VAspect 1

VAspect 4 -> VAspect 2

VAspect 4 -> VAspect 3

 

Note that the first group will always be zero. We could remove it and save a slot but that would imply adding some conditions in later steps and could affect performance; it just is not worth. The value of the slot for the extra group always equals the previous slot’s value plus 1.

As a summary, these are the formulas to know the amount of elements in all the arrays:

  • Visual aspect Id pairs, Comparison results and Equal attribute counters:

 


 

  • Group position index lookup:

 


 

3-    Ordering comparison results by similarity

Once the algorithm reaches this stage, we have all the information we need about how different each visual aspect is with respect to all the others. What we have to do now is to order the array of ids by the amount of equal attributes between visual aspects, from greatest to least, inside every group of comparisons (contiguous “rows” where the first Id is the same).  Besides when there is more than one pair with the same number of equal attributes in the same group, they must be reordered by their Comparison result bit set. As it will be explained later in this step, we have to calculate more things as we traverse the array to order it.

Extending the previous example a bit, we have:

VAspect 1

VAspect 2

VAspect 3

VAspect 4

AT1 = 1

AT1 = 5

AT1 = 6

AT1 = 5

AT2 = 2

AT2 = 2

AT2 = 2

AT2 = 4

AT3 = 3

AT3 = 3

AT3 = 3

AT3 = 3

AT4 = 4

AT4 = 5

AT4 = 4

AT4 = 5

 

 

Visual Aspects compared (pair of ids)

Comparison result (bit set)

Equal attrib. counters

A

VAspect 1 -> VAspect 2

00000110

2

B

VAspect 1 -> VAspect 3

00000111

3

C

VAspect 1 -> VAspect 4

00000010

1

D

VAspect 2 -> VAspect 3

00000110

2

E

VAspect 2 -> VAspect 4

00001011

3

F

VAspect 3 -> VAspect 4

00000010

1

G

VAspect 4 -> VAspect 1

00000010

1

H

VAspect 4 -> VAspect 2

00001011

3

I

VAspect 4 -> VAspect 3

00000010

1

 

So the order would be:

B

VAspect 1 -> VAspect 3

00000111

3

A

VAspect 1 -> VAspect 2

00000110

2

C

VAspect 1 -> VAspect 4

00000010

1

E

VAspect 2 -> VAspect 4

00001011

3

D

VAspect 2 -> VAspect 3

00000110

2

F

VAspect 3 -> VAspect 4

00000010

1

H

VAspect 4 -> VAspect 2

00001011

3

G

VAspect 4 -> VAspect 1

00000010

1

I

VAspect 4 -> VAspect 3

00000010

1

 

In the group of the VAspect 1 comparisons, the comparison B is the first, while in the group of the VAspect 2 comparisons, it is the comparison E.

There are no pairs with the same number of equal attributes in the same group and different bit set in the example but, if they were, they would be ordered as follows:

 

Pair of ids

Comparison result (bit set)

Equal attrib. counters

A

VAspect 1 -> VAspect 2

00000011

10

B

VAspect 1 -> VAspect 3

00001111

10

C

VAspect 1 -> VAspect 4

00000110

10

 becomes:

 

Pair of ids

Comparison result (bit set)

Equal attrib. counters

B

VAspect 1 -> VAspect 3

00001111

10

C

VAspect 1 -> VAspect 4

00000110

10

A

VAspect 1 -> VAspect 2

0000011

10

 

Comparison results should be compared (hail redundancy! :D) using a 256-bit signed integer greater-than comparison.  Bad news, there is no such function in either AVX or AVX2; there are only comparison instructions for vectors of smaller signed integers, like _mm256_cmpgt_epi64. We cannot use those due to the same problem mentioned in a previous section: the input 256-bit value is unpacked as four 64-bit signed integers (if we use the function with the -64 suffix), which are compared separately, i.e. each one uses its MSB as sign bit and that would generate incorrect results. We cannot use it, unless we skip those bits. If we did not use the bits #64, #128, #192 and #256 (being 1 the LSB) then the problem would be avoided. This does not have any cost; we just do not assign any attribute to those bits. But forget all these explanations; this proposal is not to be used, for performance’s sake. We are going to compare the 512-bit bit sets as if they were vectors of 8x64-bit unsigned integers, using a for loop. Take a look to both alternatives:

  • With AVX2
  1. Execute _mm256_cmpgt_epi64, passing both operands as parameters.
  2. Repeat for the second half of both operands (the bit set is formed by 2x256).
  3. Execute _mm256_store_si256 to put the results into signed integers.
  4. Repeat for the second half of both operands.
  5. Cast the resultant bit set to an array of unsigned integers.
  6. Iterate from the most significant integer to the least, comparing each one to the constant 0xFFFFFFFFFFFFFFFF. If it is equal, stop comparing. Eight iterations at most.

  • Without AVX2
  1. Cast the bit set of the first operand to an array of unsigned integers.
  2. Cast the bit set of the second operand to an array of unsigned integers.
  3. Iterate from the most significant integer to the least, comparing both integers to know which is greater than the other. If one value is greater than the other, stop comparing. Eight iterations at most.

 

As you can see, the first requires 4 instruction executions, casting 1 pointer and 8 iterations comparing integers, whereas the second requires casting 2 pointers and 8 iterations comparing integers. Not a big difference but enough to prefer to the solution without AVX2 instructions.

So now, in the top of every group of comparisons, we have the pair of visual aspects with less differences and whose equal attributes are the most expensive to set and the most likely to change. If an attribute is more likely to change it means that it appears in a more significant position in the Comparison result bit set and, as we have just seen, the loop that compares the integers array will stop sooner, saving some processing time. That is why condition attributes (Component Enabled and Texture Stack Size) must be in a more significant position.

Optimization note: Use another array to store the position indices of the other three arrays so you order just an array of integers.

At the same time we traverse the arrays and order them, we have to get what will be the first pair to include in the result, in the step 4. The first pair has the greatest value of Comparison result and Equal attribute counter. We can store the first “row” and then compare each “row” to the stored one as we visit them and, if the current “row” is greater, then the stored value is replaced with the current value and we keep visiting and comparing. To optimize the process, we can limit the times the comparison is performed to the moment when the group changes; at that moment, we compare the first “row” of the last group to the stored value. In the example, when we visit F, we are changing from the “VAspect 2” group to the “VAspect 3” group, and then we can compare the previous first “row” E to the stored value, which should be B. The result would be E.

Optimization note: This process can be parallelized so every thread orders just one group (the size of the group does not change, the memory region is not accessed by any other thread) and returns a proposal for “first pair”. Then all proposals are compared sequentially and the maximum is chosen.

 

4-    According to the ordered comparison results, extract the list of visual aspects putting the ones that have less differences contiguously

We have to choose which will be the first visual aspect we are going to set in our engine, which will be followed by the rest of visual aspects in order of similarity. We cannot know at first sight if any of them will lead to the most optimal sequence, but we know which of them is the most similar to another one with a bigger saving. The “row” we look for is E, as we found in the previous step.

The first time a pair is taken, both visual aspects ids are added to a result array, in the same order. The following times, only the second visual aspect Id is added to the array.

VAspect 2

VAspect 4

 

Once we have taken the first pair, we search for the first pair in the group of the second Id. To find it quickly, we just use the lookup array created in the step 2, which contains its index in the main array. When searching in the last group, we have to iterate forward until we find a pair that contains a non-taken Id. In the example, it is the G pair.

VAspect 2

VAspect 4

VAspect 1

 

Finally, the last pair is the B pair in the example.

VAspect 2

VAspect 4

VAspect 1

VAspect 3

 

When the second Id of the next pair already exists in the result array, this stage finishes. As we can see, the next pair would be F, but VAspect 4 already exists in the result array so we ignore it and finish.

We can now make an estimation of how many state changes we have saved. It always depends on the input list of visual aspects, of course; in some cases we will gain a lot of performance (mostly when the input list is big), and in some other cases we may not gain any (surely when the input list is small). The worst case that implies setting all the states of every visual aspect is 16 (4 attributes for 4 visual aspects) in our example. Thanks to this algorithm we will only need to change 9: 4 for the first visual aspect and 5 when switching to the rest.  Almost 50% of the state changes will be avoided in this case. Please note that, if we obtained the same percentage of similar attributes for a larger list of visual aspects, the first 4 state changes would be relatively marginal and the result would be much better than 50%.

 

Comparison

Result

Equal attrib.

States to change

E

VAspect 2 -> VAspect 4

00001011

3

1

G

VAspect 4 -> VAspect 1

00000010

1

3

B

VAspect 1 -> VAspect 3

00000111

3

1

 

 

An alternative implementation for Steps 3 and 4

If we pay attention to step 2, what we are really doing is to create a weighted undirected complete graph whose vertices are visual aspects and whose edges are visual aspect comparisons. Once we have a that graph, we try to find the longest path that traverses all the vertices, without cycles, no matter which are the start and the end points; the distances used to calculate the path’s length are the Equal attributes counter and the Comparison result bit set.

 

This problem is commonly known as the Traveling Salesman Problem (TSP), symmetric in this case. There are many ways to solve it, from exact algorithms to heuristic or approximation algorithms. Exact algorithms give us the best result but are too slow and their running time goes from polynomial of order O(n!) to O(n2 2n), where n is the amount of visual aspects. Approximation algorithms like Ant Colony Optimization or Christofides’, among others, are faster in exchange for obtaining a result that may not be the best one but most of the time is very close. Apparently, there is a very fast 2-approximation (the result may be at most twice long than the optimal) algorithm consisting in creating a minimum spanning tree (MST) using the Prim’s algorithm starting at an arbitrary vertex, and then visiting the tree’s nodes in preorder walk. We should have to change the comparison operation since we need the longest path instead of the shortest. Finally, just to mention, the Floyd-Warshall is another algorithm that could help us get the shortest Hamiltonian path (i.e. visits all vertices) in the graph if we select 2 vertices as end points (for example, the two with less differences regarding their neighbors); however, it seems to be much slower than the previously described algorithm.

In our example, if we apply the mentioned 2-approximation algorithm, using the Equal attributes counter value as edge size, the result would be:

The “VA2” vertex was chosen as the first due to it is the one whose edges are longer. As an addition to the algorithm, every time two edges have the same length, the one with the greatest Comparison result bit set is chosen. We can see how the same result than before in the other alternative step is obtained; this is just a coincidence due to the amount of visual aspects is so small. We are saving 7 state changes of 16.

 

Conclusion

After some tests on paper with more visual aspects, it seems that the TSP returns better results. It is too soon to be sure of which alternative is better in terms of performance. Both alternatives will be implemented and measured; steps 1 and 2 are mandatory whichever proposal we use, although some parts may be skipped for the TSP. Besides, it is important to compare the performance with and without this optimization algorithm to be sure it does not have the inverse effect (it might happen that executing this algorithm is more expensive than setting all the graphics device’s states); this will happen surely when the number of visual aspects is small; the algorithm could be disabled for a certain amount of visual aspects to avoid that. Although several optimization techniques have been mentioned along the document, more effort has to be put into optimizing the code for speed, even if it means increasing memory consumption (the typical tradeoff), since this algorithm has to be executed once per frame and there is more processing to do by the engine before rendering.

 

Twitter