There is an interesting but rarely mentioned technique in a C++ context: signature-based polymorphism, a more permissive variation of subtype polymorphism, usually called duck typing. Two objects having nothing in common can share an implicit interface and be commonly manipulated by such an interface with no inheritance involved. Part I and Part II. Also, making use of SIMD units such as MMX, SSE, or AltiVec is usually a tradeoff of portability for speed. Recent versions of GCC include an extension that allows you to write vector code without sacrificing portability. Take a look at how to use it.
1) Quote from that article:
#if defined(__ALTIVEC__)
return vec_all_eq((vector int)v1, (vector int)v2);
#elif defined(__SSE__)
v4si compare = __builtin_ia32_cmpeqps((v4sf)v1, (v4sf)v2);
return __builtin_ia32_movmskps((v4sf)compare);
#else
…..
#endif
Indeed, very good portability has been achieved…
2) Moreover on MIPS you MUST declare vector of size 4bytes (not 16 as in intel/ppc) and gcc is dumb enough not to split 16 byte vector into 4 vec operations, but it does 16 non-vector ops, so add #ifdef MIPS to that
_portable_ piece of code….
Both articles contain errors :/.
The first one states that by ‘mimicking a v-table’ one can get something that is ‘fast due to inlining’. This is completely wrong – there will only be possibility of inlining in case of trivial code, e.g. when the call is made in the block in which the object has been constructed. In other cases one usually cannot avoid making an indirect call using the function pointers – which is exactly the same as a virtual function would do.
The second article states that comparing integers is practically the same as comparing floats. This is also wrong – a NaN is not equal to anything, even to itself. If you want to compare an integer that happens to correspond to a NaN, you’re out of luck.
Edit: These articles are very interesting otherwise.
Edited 2007-04-22 10:48
there is a library that works like that: macstl
http://www.pixelglow.com/macstl/
but I think it’s not improved anymore :/
What’s sad about most of these “vector” ops is that they work mostly with 32bit floats.
Test our code at work with standard x86 floating point instructions we get faster optimzed executable code using doubles vs floats on 32bit, with 64bit being dramatically faster with doubles compared to floats.
The biggest problem is that we need to maintain precision normally to between 1e-8 to 1e-10 (normalized data sets)
Perhaps the next gen vector instructions will help more with scientific type applications.
Right now what holds back actually deploying apps using 64bit vectorized floats are the number of older gen socket 940 (and 754) machines out there that only have SSE2 support.
Edited 2007-04-23 13:28
SSE3 adds very little in additional instructions (only 11 on AMD, 13 on Intel (2 extra for hyper-threading stuff)). The SSE2 instructions contain all the vectorized 64bit floating point operations. When working with the 128 bit SIMD registers you can use the SSE2 instructions to perform vector operations on 2 doubles at once. If you were still using older generation Athlons that would be one thing but all the Opterons support the double precision SSE2 operations.
typedef union
{
v4si v;
int s[4];
} vector;
Some compilers would just generate slower code if you use union in the way he described.
Some compilers? These are gcc extensions.
BTW, duck typing is not about writing hundreds of lines of hard to understand template code ( tell me that code peppered with vTable_, struct VTable, struct vTable_ <- pay attention to underscores! is easy to unerstand ) , which uses custom implementation of vtables, that are already supported by language. Duck typing is not about tearing legs from duck to see if it yiells like a duck. Duck typing is about one keyword, which is still not suppoted by this otherwise over-complicated language.
I think new languages for this kind of programming are in order, such as Brook. As the second article states in its introduction, C is very close to the metal – PDP11 metal, that is. Programming todays hardware with languages from 30 years ago will only go so far, and C does not reflect the capabilities of SIMD extensions or GPU stream processors properly.