“And just when you thought the whole Stuxnet/Duqu trojan saga couldn’t get any crazier, a security firm who has been analyzing Duqu writes that it employs a programming language that they’ve never seen before.” Pretty crazy, especially when you consider what some think the mystery language looks like “The unknown c++ looks like the older IBM compilers found in OS400 SYS38 and the oldest sys36.The C++ code was used to write the tcp/ip stack for the operating system and all of the communications.”
It`s the unknown hacker at MIT, leftover from the 70`s who built an underground facility below the campus, leeching off the area, without anyone knowing.
It is him. You know it. Mad virus made with own programming language.
It just is the signature of Mr. Hacker Legend.
Required reading: http://www.catb.org/jargon/html/story-of-mel.html
Great reading Kroc. Peace!
Wow, I think this sub-thread might be the first time (that I’ve seen, at least) when you aren’t abusive, disrespectful, offensive, and so on (ironically, what you usually accuse all others of)
(not like the opening post was particularly amusing to me, personally; but.. hey, improvement)
I’ve always loved that story
Hmm.. If you want it even more crazy, maybe this is not a trojan / virus / worm at all (per se).
It^A's main purpose seem to collecting information and be able to connect to basically everything. It is quite possible that it might be “machine made” code, so it doesnt have to make necessary sense to the human reader.
Also im not sure how obfuscators work, but doesnt that fit into the reasons why you obfuscate your code?
Im quite interested in that Duqu story, i got a hunch that it might turn out into something really interesting.
I enjoy a lot reading about hacking. It is something telling about the complexity of current browsers and the protocols we use when people can break in exploiting just vulnerabilities on them. There is something really wrong with all this mess.
Another day I went to some small business to check what was happening on their internal network because they were having all sorts of trouble. Result was, their server was hacked, nothing special I thought being it a Windows Server 2003, the funny part is that the hacker used the server to reconfigure their router, disabling the web interface of it and re-routing the traffic through a rogue DNS server. How clever! The irritating part is that only after they call me later because something was wrong again that I checked the damn thing. Had to reset it and update the firmware “just in case”. To all people here: don’t leave your router with default or easy passwords! (They were not using default passwords, but I guess the guy that break in was able to sniff it from monitoring their activities).
Hey Thom, what about an interview with PinkiePie (from Chrome hacking, perhaps he is relative to rose unicorns you used to like)?
In that case you should look at this article, it has the image PinkiePie used in the contest. It is the exploit at work:
http://arstechnica.com/business/news/2012/03/googles-chrome-browser…
I already ready it. And also the comments, what have very good comments an links. I was thinking about a little more long and informative interview.
They should have used Perl. Then no one would have ever figured it out.
Edited 2012-03-12 21:29 UTC
Only the ones that did not see THE light. May the Force be with you.
Alright Mr Perl Monk
P.S.
I like perl it has all the qualities I like.
“Duqu” ? Seriously ?
http://translate.google.com/#fr|en|du%20cul
Talk about pulling names out of their arses…
The version two names cracks me up.
Like “Son of Stuxnet seen in the wild”.
Hopefully will never see an evil progeny of Duqu.
I like it. Maybe…
“Duqu, baffler of Kapersky, son of Stuxnet, destroyer of centrifuges”?
But surely if you write code in C for example, and just use a compiler you modified yourself along with some custom header files, the resulting binary wouldn’t look like anything from any other known compiler, and hence wouldn’t be relatable to a known language? It’ll be roughly procedure and event-driven, and any language and compiler can do that. Write the code in a text editor instead of getting an IDE to generate it, use your custom compiler and include some reworked open source TCP/IP code that’s modified sufficiently to not look like any other code, and that should do the job.
I’m no expert on these things, but am I missing something here?
Most of the time, it’s relatively easy to see which programming language was used, by taking a look at the calling conventions: are parameters passed using the stack, or by using registers… If the stack is used, are they pushed from left to right, or right or left… etc…
Sure you can modify your compiler to change your calling conventions, but it would make it impossible to call external libraries + there is no real benefit (= it doesn’t result in better code). Also, since C compilers compile to native code, it’s still possible for a reverse-engineer what the code is doing, despite the modified calling convention.
I doubt those guys wrote their own compiler. They probably used some more obscure programming language for that piece of code, whatever the reason might be…
Edited 2012-03-13 15:49 UTC
sithlord2,
“Sure you can modify your compiler to change your calling conventions, but it would make it impossible to call external libraries + there is no real benefit (= it doesn’t result in better code).”
It’s usually not worth the immense development/maintenance burden, but I found that breaking with strict calling conventions can boost performance since you’re not shifting registers around anywhere near as much to fit within a standard calling convention. If you look at ASM dumps frequently, you see a lot of functions have boilerplate MOVs just to get things in and out of place. This is often trivial to eliminate when your working in assembly without restraints.
Some day I envision optimizing compilers which can do inter-procedural optimizations without any calling convention at all to get rid of all that “useless” cruft. After all, the only time a calling convention truly matters is when calling a function of an external component/library.
C++ style exceptions might still might require a consistent stack frame, but a static calling convention like CDECL is not necessary.
Take a look at the AMD64 calling convention then… It seems that they have spent so much effort into making it faster through increased register use that now, only optimizing compilers can understand the logic behind it…
Neolander,
I haven’t done asm for amd64, but it’d make sense that they’ve done something more optimal than passing via stack considering the extra registers.
http://en.wikipedia.org/wiki/X86_calling_conventions
“The registers RCX, RDX, R8, R9 are used for integer and pointer arguments (in that order left to right), and XMM0, XMM1, XMM2, XMM3 are used for floating point arguments. Additional arguments are pushed onto the stack (right to left). Integer return values (similar to x86) are returned in RAX if 64 bits or less. Floating point return values are returned in XMM0. Parameters less than 64 bits long are not zero extended; the high bits contain garbage.”
(more info about the stack omitted)
However the point I was trying to get at is that any fixed calling convention is always going to require more shuffling simply for the sake of getting parameters in the right place.
Here’s a pointless example:
int F1(int a, int b) {
int r=0;
while(b–>0) r+=F2(a,b);
return r;
}
int F2(int a, int b) {
while(a–) b+= F3(b);
return b;
}
int F3(int a) {
return a*(a+3);
}
Obviously in this case it makes the most sense to inline the whole thing, but maybe we’re using function pointers or polymorphism which makes inlining impractical. It should be fairly easy to make F2 work without stepping on F1’s registers, and the same goes for F3 so that no runtime register shuffling is needed at all between the three functions.
The moment any calling convention imposed however, moving/saving/restoring registers becomes an unavoidable necessity.
Of course, today’s pipelined processors are good at doing register renaming and what not to reduce the overhead of such shuffling. However one inefficient scenario has always stood out like a sore thumb, and it perturbs me when I program in high level languages, it’s the inability to return more than one unit of data from a function call. The CPU has no such limitation, and BIOS programmers routinely return more data points as needed, even using CPU flags which the caller can use for conditional jumps. I find this model works extremely well in ASM, but alas C programmers are forced to overload the return value (using the sign bit) and/or return extra values using memory pointers.
I don’t have anything directly on topic to contribute to this, but… I want to say that this thread is very interesting and informative; exactly the kind of thing that made me a regular reader of OSAlert.
yoursecretninja,
Yea, I love the technical topics and to understand the OS internals… Optimizing stuff is a challenge I always enjoy, but it’s an archaic concept these days. I only wish I could land a decent job where my skills were actually appreciated, it’s a struggle. On my own I’m working on a secure dedupping P2P backup protocol, which is alot more interesting than my day job.
Indeed, you raised very interesting points about the drawbacks of having a calling convention (CC).
Disclaimer: there are more than 15 years since I last coded in asm.
About the multiple data return (MDR), perhaps, it would create a nightmare for compilers writers for, perhaps, not so much benefit? We also should note that one of the key points of a CC is also to allow code efficiency. For example, if a function returns an integer, the only thing you need to do before call it is save the return register, for example, eax.
You do:
push eax ; save it as eax will be used as rval
push ff0 ; 2nd arg – 8 bytes
push ebx ; 1st arg
call randomf
add esp, 12
mov [edi], eax ; get rval
pop eax ; restore eax
Suppose you had a MDR operator, like =* for example, and you could declare a function to be like int : float getboth(int i, float f).
You write:
m:q =* getboth(1, 2.0);
Everything nice but what are the implications if you write:
m : q =* getboth(1, 2.0) * getboth(2, 1.2); ?
You now must extend the syntax of the whole language so that this kind of construction can be useful and, to make code efficient, would need to reserve two registers to cope with the return values. Now, imagine you would like to return, say, 16 values on processors with few resources. You would run out of registers.
Also, on C compilers now you just use a reference and the compiler may altogher try to eliminate the associated pushes and pops.
acrobar,
“About the multiple data return (MDR), perhaps, it would create a nightmare for compilers writers for, perhaps, not so much benefit?”
I guess the multiple return has pros and cons on two fronts:
1. What would be the necessary impact on compiler implementations and calling conventions under the hood?
2. How would this language feature change the way high level software is written?
I won’t speak towards #1 since that would deserve a much more in-depth analysis than either of us can commit to for this conversation.
As for #2, there’s at least one extremely common use case that crops up over and over again, and that’s a function which returns both a status and a data value. This pattern is so common I wouldn’t mind a language addressing it specifically.
long pos=ftell(file);
if (pos<0) {
printf(“error %s\n”, str_error(errno));
}
This convention which is so common on linux has some problems. For one, pos cannot distinguish between a high bit being a legitimate position or an ftell error. Therefor, because of overloading, the range is half of what it should be. Another is the use of the TLS global errno to return a status. Maybe it’s a necessary evil, but it’s still not pretty and it is still compulsory to flag the error using a returned value. Other times the return value/type cannot be overloaded for the error case at all.
All these problems can be easily/efficiently solved in ASM using output flags and registers, but as you rightly observed the question is how to create a clear syntax to deal with it.
One approach is to adapt the perl error syntax which I find pretty clear.
my $pos = ftell(FILE) or die($!); # what to do on error
Of course perl supports multi-value returns directly too.
my ($a, $b) = ftell(FILE); // this requires just one input register to be occupied, leaving the rest free
// The syntax may be rough for “one-liners”, but the $a and $b temp variables can reference the registers as is without any copying.
C forces us to offload the value temporarily into memory
int pos;
if (!ftell(FILE, &pos)) { error… } // This burns one more input register than the prior version, and also wastes two memory accesses.
“Also, on C compilers now you just use a reference and the compiler may altogher try to eliminate the associated pushes and pops.”
I think C only has leeway to do this for inlined functions. Inter-procedural code cannot be optimized without breaking calling conventions.
Sure, I was just arguing that the set of registers which they have picked seems to only make sense in the context of specific compiler implementations. Why do they use R8 and R9, as an example ? Why RAX, RCX, RDX, but not RBX ? How is a regular C compiler supposed to figure out what is a system call and what isn’t in order to use R10 properly, and why is only one syscall parameter getting that optimization ? The set of registers which they have picked has no apparent internal logic, and I cannot see how an ASM dev could remember it all except by keeping the doc at hand at all time or memorizing it in a brute force fashion.
A possible problem which I would spontaneously see with the examples is that in the cases that you mention, unless I’m misunderstood, inlining is not performed because the compiler is unable to efficiently detect the relationship between F1, F2, and F3 at compile time. If so, how could it make sure that the functions are not stepping on each other’s registers ?
Besides, I am not sure that compilers have to follow calling conventions for anything but external library calls, for which some kind of standard is necessary since the program and the library are compiled separately. As an example, when inlining is performed, calling conventions are violated (or rather bypassed), and no one cares.
Indeed, the inability of C and C++ to return any other status information that “operation failed” without using fancy tricks have bothered me more than once too. I typically use structures to get around that, but that too can quickly become a bother.
Ideally, any language would support tuples like Python’s, where you can shove a set of inhomogeneous objects into the returned “value” of a function without caring what happens under the hood. But I suspect that this can be hard to optimize properly.
Edited 2012-03-14 04:47 UTC
Neolander,
“Sure, I was just arguing that the set of registers which they have picked seems to only make sense in the context of specific compiler implementations. Why do they use R8 and R9, as an example ? Why RAX, RCX, RDX, but not RBX ?”
Ah well now I can’t answer that (or your other questions). Back with real mode addressing the choice of registers was more significant, but now…it may be somewhat arbitrary? I’m not sure about the conventions for special AMD64 cases.
“…unless I’m misunderstood, inlining is not performed because the compiler is unable to efficiently detect the relationship between F1, F2, and F3 at compile time. If so, how could it make sure that the functions are not stepping on each other’s registers ?”
My counter argument is that if a human programmer can see the relationship, so too should an ideal compiler. Of course it can only prove relationships for internal dependencies which are available at compile time, but I think that’s a given.
As for the reason not to inline, besides the two I already listed (function pointers and polymorphism), one might be circular recursion. Another obvious one is size/cache optimization. Another reason might be “tail calling” where a function can perform a jump directly into another function instead of a call followed by a ret. Sometimes these end up being 100% free in the context of conditional logic which would require a jump anyways, so nothing is saved under the inline code path.
Note: GCC is already able to optimize away tail calls so that the function below will run indefinitely without running out of stack.
int forever(int x) {
printf(“%d\n”, x);
return forever(x+1);
}
“Besides, I am not sure that compilers have to follow calling conventions for anything but external library calls, for which some kind of standard is necessary since the program and the library are compiled separately. As an example, when inlining is performed, calling conventions are violated (or rather bypassed), and no one cares.”
Yes that’s the theory, but in practice GCC always uses the calling convention. I think C functions default to being externally callable to aid in external linking. In fact the whole methodology of compiling to objects and then linking together into a static binary is an obstacle for any compiler which would like to perform interprocedural optimization.
“Ideally, any language would support tuples like Python’s, where you can shove a set of inhomogeneous objects into the returned ‘value’ of a function without caring what happens under the hood. But I suspect that this can be hard to optimize properly.”
Well I don’t know about python’s implementation. However sometimes I find it helpful to stop looking at things as well defined mathematical functions and instead look at it like a sequence of code blocks, always moving forward by “jumping” to the next block with more parameters for it. Call and ret are simply different mechanisms for locating the next block, but otherwise do the same thing. So there’s no difference in what I can pass from one block to the next.
To highlight this:
function A() {
{blockA1}
B()
{blockA2}
}
function B() {
{blockB1}
}
We can ‘Unthink’ the function abstraction to get:
{blockA1} -> {blockB1} -> {blockA2}
There’s no fundamental reason the parameters from B1 to A2 cannot be just as rich as the parameters from A1 to B1.
Edit: I wanted to highlight how input and output parameters are really different perspectives on the same thing, and they would not need to have two different optimization mechanisms if it weren’t for differences in higher level function semantics.
Addendum: It would be pretty cool to have a language where a “return” would be syntactically the same as a function call with type-checking and everything.
function A() {
B() returns(int a, char b);
printf(“%d %c\n”, a,b);
}
function B() {
return(4,’c’);
}
Edited 2012-03-14 15:10 UTC
Except if what you’re trying to do is make it look like you created a new language.
Probably not but how much headlines is that going to create for your security company?
May be it’s the compiler used for QNX Neutrino is the GNU compiler (gcc). Currently, development can be done from these hosts:
QNX Neutrino
MS-Windows
Solaris
If you have the QNX Momentics Professional Edition, you can create anything using the Integrated Development Environment (IDE) from any host. Alternatively, you can use command-line tools that are based on the GNU compiler.
For MS-Windows hosts, you also have the option which is CodeWarrior tools from Metrowerks. Currently, the CodeWarrior IDE also uses gcc.
Did you actually checked the code?
http://www.securelist.com/en/blog/667/The_Mystery_of_the_Duqu_Frame…