### Some f32 profiling

Hi, remember that math coproccesor I wrote about yesterday? I have been doing some testing on it, to check if it is worth using it if you are not using parallel processing.

I have wrote a little program making some simple opperations and checking the time for them. Basically the code for operations is like this:

counter->Reset();

int t1 = 0;

for(int i = 0; i < NUM_OPS; ++i)

{

c = a++ * b++;

t1 += counter->GetTimeSinceLastCall();

}

So, basically what I do is

1 - Reset my counter (that's not nds counter but one of my own classes)

2 - Creating a var for store the time

3 - Making the operation and also a pair of adds to avoid that compiler deletes this code for not doing anything.

4 - My counter cannot store a very long amount of time because of the DS hardware, so it is a good idea to store the time passed each itteration

Well, that is not going to give me the time that nintendo ds spends on each operation, but gives me something to compare one each other.

Each operation is run 5000000 times(NUM_OPS = 5000000)

The operations I have checked out are:

- MUL32-64: multiplications with int (4 bytes) versus multiplications with long long int(8 bytes, using mulf32 from ndslib)

- DIV32: 4-bits division first with normal division and then with math coprocessor (using divf32 from ndslib)

- Mul64: the same opperation that mulf32 form ndslib but without displacement

- Div64: (64 / 32) bits division, with math coprocessor (using divf64 from ndslib)

- Sqrt32: 32 bits square root, with math coprocessor(using sqrt32 from ndslib)

After running it on my Nintendo DS I have got:

- For Mul32/64: 4425ms and 4273ms

- For Div32: 4273ms and 20450ms with math cop.

- For mul64: 4273ms as spected... that means displacements are very fast

- For Div64: 20450ms

- For SQRT32:12361ms

So, extract your own conclussions but it seems that using math cooprocesor is a little slower than not doing it. For sqrt there is no problem because there is no other way to do it (well you can implement your own sqrt with Newton's or other method and check if it is faster). But for the division the time multiplies by almost 5! That doesn't mean you shouldn't use it, in fact, while this operations are being calculated you can do other thins instead of "while(...);" but that won't be able if you use ndslib directly

The good news are that 64 bits multiplication (used for FP) can be done almost as faster as 32 bits one. This is very useful for a Fixed Point class. What can I do with divisions... that is something I need to check a little deeper. I have written a division with uses normal 32bit division for fixed point, but it makes some evaluations and additions that surely makes it slower.

If anybody, one of this days wants the code to see what I hace done, only ask for it. I still don't know how to upload files into this blog (maybe it can't be done XDD)

Well, that's all for today. It's been funny this time ^_^

I have wrote a little program making some simple opperations and checking the time for them. Basically the code for operations is like this:

counter->Reset();

int t1 = 0;

for(int i = 0; i < NUM_OPS; ++i)

{

c = a++ * b++;

t1 += counter->GetTimeSinceLastCall();

}

So, basically what I do is

1 - Reset my counter (that's not nds counter but one of my own classes)

2 - Creating a var for store the time

3 - Making the operation and also a pair of adds to avoid that compiler deletes this code for not doing anything.

4 - My counter cannot store a very long amount of time because of the DS hardware, so it is a good idea to store the time passed each itteration

Well, that is not going to give me the time that nintendo ds spends on each operation, but gives me something to compare one each other.

Each operation is run 5000000 times(NUM_OPS = 5000000)

The operations I have checked out are:

- MUL32-64: multiplications with int (4 bytes) versus multiplications with long long int(8 bytes, using mulf32 from ndslib)

- DIV32: 4-bits division first with normal division and then with math coprocessor (using divf32 from ndslib)

- Mul64: the same opperation that mulf32 form ndslib but without displacement

- Div64: (64 / 32) bits division, with math coprocessor (using divf64 from ndslib)

- Sqrt32: 32 bits square root, with math coprocessor(using sqrt32 from ndslib)

After running it on my Nintendo DS I have got:

- For Mul32/64: 4425ms and 4273ms

- For Div32: 4273ms and 20450ms with math cop.

- For mul64: 4273ms as spected... that means displacements are very fast

- For Div64: 20450ms

- For SQRT32:12361ms

So, extract your own conclussions but it seems that using math cooprocesor is a little slower than not doing it. For sqrt there is no problem because there is no other way to do it (well you can implement your own sqrt with Newton's or other method and check if it is faster). But for the division the time multiplies by almost 5! That doesn't mean you shouldn't use it, in fact, while this operations are being calculated you can do other thins instead of "while(...);" but that won't be able if you use ndslib directly

The good news are that 64 bits multiplication (used for FP) can be done almost as faster as 32 bits one. This is very useful for a Fixed Point class. What can I do with divisions... that is something I need to check a little deeper. I have written a division with uses normal 32bit division for fixed point, but it makes some evaluations and additions that surely makes it slower.

If anybody, one of this days wants the code to see what I hace done, only ask for it. I still don't know how to upload files into this blog (maybe it can't be done XDD)

Well, that's all for today. It's been funny this time ^_^

## 0 Comments:

Post a Comment

<< Home