Higher Order Byte-Code Instructions
5 Message(s) by 2 Author(s) originally posted in java machine
| From: Christopher Diggins |
Date: Sunday, April 08, 2007
|
I was wondering if there has been any research into adding higher-
order instruction to the JAVA bytecode? In other words instructions
that either
push or pop instructions on the
evaluation stack .
There are only a few
core instructions that'd be neccessary to
build others :
- constantly : pop a value, push an instruction on the stack that
returns that value
- compose : pop two instructions, push a new instruction that
evaluates the first
function , then the second.
- eval : pop an instruction and evaluate
This functionality'd make it easier for me to compile functional
language s to the CIL, and make them much more efficient.
We've been discussing the topic on Lambda-the-Ultimate (
http://lambda-the-ultimate.org/node/2177 ). The first response from
many people is that they believe that this functionality has a huge
performance hit, and loses the effect of statically verifiable
type
safe ty. This is untrue.
I have developed a type-
system for stack-based languages with higher-
order functions and written a paper about it at :
http://www.cat-language.com/paper.html.
I believe the work to be novel, and I'd be interested in
discussing it further.
Cheers,
Christopher Diggins
http://www.cdiggins.com
| From: Chris Uppal |
Date: Monday, April 09, 2007
|
wrote in message:
I was wondering if there has been any research into adding higher-
order instruction to the JAVA bytecode? In other words instructions
that either push or pop instructions on the evaluation stack.
I do not know whether there has been any research on such ideas, I have not seen
anything myself, but I (think I) miss quite a
bit more of what's going on than
I catch.
I'm rather sceptical about the approach -- at least, if taken as a candidate
for extending the expressiveness of the
JVM . (It does look
interesting in and
of itself.)
The problems I see are "political" and practical, I'm willing to believe your
claim that there are no
real killer problem in theory.
(Although I think you've a fair bit more work to do to
support that claim.
Just for one example: how do you ensure that a constant-pool
index pushed
onto
the stack to form part of a
generate d invokevirtual instruction call sequence
is in range, let alone correctly "typed" ?).
The "political" problems (not really a very good term, but I can not think of a
better one off-hand) are boring and obvious. The JVM is a
standard (in the
sense of having a public
specification independent of any one implementation),
and non-standard extensions are doomed to
die in obscurity. No one will use a
program which requires a non-standard JVM as an
execution platform, and so any
JVM language will surely remain (to say the best of it) very much a niche
language unless it targets the /real/ JVM. But if only very unimportant (in
numerical terms) languages require the non-standard extensions then there's not
a lot of chance of them making it into the standard.
The practical problems are more interesting ;-)
The best way I can put it is that the idea seem to assume, or require, an
approach to JVM implementation which isn't used in practise. If the JVM were
typically implemented as an
interpreter with a
class ic "big-switch" on the
opcode, then the adding this
feature 'd obviously be trivial. If the
implementation approach were to translate bytecode into native
code by doing a
table lookup to translate each opcode into a (stereotyped) native instruction
sequence, concatenating the results, then shoving everything through a peephole
optimise r, then adding this kind of feature'd also be pretty trivial. But
neither of those approaches are relevant (as far as I know no production JVM
for desktop class machines uses either of them, or has done since about JDK
1.0.2).
In fact the stack-based
nature of the JVM bytecode language is a bit of a red
herring. For one thing JVM bytecode is not really that much of a stack-based
language[*]. More importantly the stack is only an incidental feature of the
intermediate language used to
compress source code into a
portable delivery
format. It is not necessarily a particularly well chosen format (though it is
reasonably compact, and does keep JAVAc simple), since the JVM has to translate
code back from the stack-based
expression into something suitable for input to
its optimising
compiler (SSA or whatever). I do not know the details of how the
current crop of JVMs generate native code, but it is at least a plausible
approach that the JITer never sees bytecode at all, nor ever thinks in terms of
a stack (except the stack of activation records, of course).
It may well be that there are ways of extending the analysis the JVM must do to
allow higher-order bytecodes in the
context of a
state of the art JITer, but I
do not see any reason to
expect that to be simple at all -- nor any reason to
expect it even to fit into the current
framework at all (it'd be like
adding eval(char *) to an optimising C compiler -- how can the compiler
optimise code that is unknown until runtime ?)
But then, perhaps the overtly stack-based nature of your idea is itself a
red-herring. It does not seem to me that much'd change if you wanted to be
able to assemble sequences of executable bytecode in byte[]
array s, instead of
on the stack (the latter might significantly simplify the implementation of
languages like Cat, but it does not seem to me that it's similarly important for
languages outside that niche-within-niche). But the techniques for doing that
are already well-understood, rather widely used, and don't require changes to
the semantics or implementation of the JVM. So I suspect there's more mileage
to be gained in looking for ways to optimise that process (e.g. creating
lighter-weight classes) than in a wholesale reorganisation of the JVM
semantics.
-- chris
[*] It is true that there is an explicit stack, with operations to manipulate
it, but that is only the expression stack and it is not used in anything like
the way "the" stack is used in Forth or PostScript -- the character of the JVM
bytecode language'd be essentially unchanged if it used, say, the
unbounded-number-of-named-registers-per-activation approach -- which it does,
anyway, for everything except expression evaluation.
P.S. Very small notes re: your paper. "PostScript" (which is a registered
trademark) should be spelled with a capital S. Also, I do not understand why
you say that PostScript is not higher order -- you manipulate instructions as
data every time you write conditional code, or define a procedure. Lastly, I'm
not sure that citing the PostScript reference manual as "Inc. 1999" is quite in
tune with established academic norms ;-)
| From: Christopher Diggins |
Date: Tuesday, April 10, 2007
|
[snip]
The problems I see are "political" and practical, I'm willing to believe your
claim that there are no real killer problem in theory.
(Although I think you've a fair bit more work to do to support that claim.
Just for one example: how do you ensure that a constant-pool index pushed onto
the stack to form part of a generated invokevirtual instruction call sequence
is in range, let alone correctly "typed" ?).
I believe this can only be done through dynamic checks. To be honest I
have not studied the JVML type system in depth, but this is definitely
out of the scope of what I have been studying or propose. What I am
saying is that given a stack-language that can be statically typed,
you can add higher-order functions in a type safe manner with relative
ease.
The "political" problems (not really a very good term, but I can not think of a
better one off-hand) are boring and obvious. The JVM is a standard (in the
sense of having a public specification independent of any one implementation),
and non-standard extensions are doomed to die in obscurity. No one will use a
program which requires a non-standard JVM as an execution platform, and so any
JVM language will surely remain (to say the best of it) very much a niche
language unless it targets the /real/ JVM. But if only very unimportant (in
numerical terms) languages require the non-standard extensions then there's not
a lot of chance of them making it into the standard.
Sure, but it is definitely premature to start confronting a proposal
with political obstacles. In the end, all I really care about is the
proposal and sharing my research. I'm building my own
virtual machine
language (
http://www.cat-language.com) so political matters are of
little interest to me.
The practical problems are more interesting ;-)
Agreed :-)
The best way I can put it is that the idea seem to assume, or require, an
approach to JVM implementation which isn't used in practise. If the JVM were
typically implemented as an interpreter with a classic "big-switch" on the
opcode, then the adding this feature'd obviously be trivial. If the
implementation approach were to translate bytecode into native code by doing a
table lookup to translate each opcode into a (stereotyped) native instruction
sequence, concatenating the results, then shoving everything through a peephole
optimiser, then adding this kind of feature'd also be pretty trivial. But
neither of those approaches are relevant (as far as I know no production JVM
for desktop class machines uses either of them, or has done since about JDK
1.0.2).
In fact the stack-based nature of the JVM bytecode language is a bit of a red
herring. For one thing JVM bytecode is not really that much of a stack-based
language[*].
It is sufficiently for the purposes of the proposal.
More importantly the stack is only an incidental feature of the
intermediate language used to compress source code into a portable delivery
format. It is not necessarily a particularly well chosen format (though it is
reasonably compact, and does keep JAVAc simple), since the JVM has to translate
code back from the stack-based expression into something suitable for input to
its optimising compiler (SSA or whatever). I do not know the details of how the
current crop of JVMs generate native code, but it is at least a plausible
approach that the JITer never sees bytecode at all, nor ever thinks in terms of
a stack (except the stack of activation records, of course).
This is all fine and dandy, but it does not take away from my
proposal.
It may well be that there are ways of extending the analysis the JVM must do to
allow higher-order bytecodes in the context of a state of the art JITer, but I
do not see any reason to expect that to be simple at all -- nor any reason to
expect it even to fit into the current framework at all (it'd be like
adding eval(char *) to an optimising C compiler -- how can the compiler
optimise code that is unknown until runtime ?)
This is an incorrect comparison. I am not proposing the execution of
untyped strings, or untyped sequence but rather the execution of type
functions generated through typed higher order instructions.
But then, perhaps the overtly stack-based nature of your idea is itself a
red-herring. It does not seem to me that much'd change if you wanted to be
able to assemble sequences of executable bytecode in byte[] arrays, instead of
on the stack [snip]
Yes.
But the techniques for doing that
are already well-understood, rather widely used, and don't require changes to
the semantics or implementation of the JVM. So I suspect there's more mileage
to be gained in looking for ways to optimise that process (e.g. creating
lighter-weight classes) than in a wholesale reorganisation of the JVM
semantics.
Even lighter weight classes will always be far slower compare to the
assembly code you can generate if you enable higher order opcodes.
-- chris
P.S. Very small notes re: your paper. "PostScript" (which is a registered
trademark) should be spelled with a capital S. Also, I do not understand why
you say that PostScript is not higher order -- you manipulate instructions as
data every time you write conditional code, or define a procedure. Lastly, I'm
not sure that citing the PostScript reference manual as "Inc. 1999" is quite in
tune with established academic norms ;-)
Thanks for the corrections, and thank you very much for the
discussion.
Christopher Diggins
http://www.cdiggins.com
| From: Chris Uppal |
Date: Wednesday, April 11, 2007
|
wrote in message:
> It may well be that there are ways of extending the analysis the JVM
> must do to allow higher-order bytecodes in the context of a state of
> the art JITer, but I do not see any reason to expect that to be simple
> at all -- nor any reason to expect it even to fit into the current
> framework at all (it'd be like adding eval(char *) to an optimising
> C compiler -- how can the compiler optimise code that is unknown until
> runtime ?)
This is an incorrect comparison. I am not proposing the execution of
untyped strings, or untyped sequence but rather the execution of type
functions generated through typed higher order instructions.
You keep talking about the type system, but (while I understand that it is
important to be able to do this checking, and I agree that it is significant
that you /can/ do the checking), I do not see the
relevance here. The issue is
the implementation
technology used in generating native code, not the
technology used in static
verification of bytecode.
Following the terms of my analogy, consider the following code:
sum = 0;
for (int I = 0; I < array.length; i++)
{
for (int j = i; j < array.length; j++)
{
eval("sum += array[i] * array[j]");
}
}
return sum;
the problem isn't that the input to eval() might specify unsafe operations, or
operations which would not make sense, but that the compiler has no idea which,
if any, of the variables are used in the loop; what kinds of
loop-transformation operations are valid; or even whether there are any loops
at all there. It can not compile code it has not seen.
-- chris
| From: Christopher Diggins |
Date: Wednesday, April 11, 2007
|
On Apr 11, 2:09 am, "Chris Uppal"
<chris.up...@xxxxxxxxxxx
wrote in message:
wrote in message:
> > It may well be that there are ways of extending the analysis the JVM
> > must do to allow higher-order bytecodes in the context of a state of
> > the art JITer, but I do not see any reason to expect that to be simple
> > at all -- nor any reason to expect it even to fit into the current
> > framework at all (it'd be like adding eval(char *) to an optimising
> > C compiler -- how can the compiler optimise code that is unknown until
> > runtime ?)
> This is an incorrect comparison. I am not proposing the execution of
> untyped strings, or untyped sequence but rather the execution of type
> functions generated through typed higher order instructions.
You keep talking about the type system, but (while I understand that it is
important to be able to do this checking, and I agree that it is significant
that you /can/ do the checking), I do not see the relevance here. The issue is
the implementation technology used in generating native code, not the
technology used in static verification of bytecode.
What I meant was that the type system prevents you from using "eval"
on anything but a function value. A function value is guaranteed to be
a piece of well-typed code, by the type system.
Following the terms of my analogy, consider the following code:
sum = 0;
for (int I = 0; I < array.length; i++)
{
for (int j = i; j < array.length; j++)
{
eval("sum += array[i] * array[j]");
}
}
return sum;
the problem isn't that the input to eval() might specify unsafe operations, or
operations which would not make sense, but that the compiler has no idea which,
if any, of the variables are used in the loop; what kinds of
loop-transformation operations are valid; or even whether there are any loops
at all there. It can not compile code it has not seen.
Because you can not evaluate strings in my proposal, you can only
evaluate dynamically created functions, which are neccessarily
constructed by quoting functions (e.g. pushing functions onto the
stack) that the compiler has already seen. So your example becomes
sum = 0;
for (int I = 0; I < array.length; i++)
{
for (int j = i; j < array.length; j++)
{
eval(closure{sum += array[i] * array[j]});
}
}
return sum;
There are plenty of well-known optimizations that can be done to the
above code using functional optimization techniques, including pre-
evaluation.
The next
logical step after introducing higher order opcodes is the
introduction of array/list
processing primitives based. Iterating over
an array for an example could also be done using higher order
functions like "map" or "fold", which can be easily parallelized by a
compiler.
Christopher Diggins
http://www.cdiggins.com
Next Message: modifying JAVA.lang.String.JAVA
Blogs related to Higher Order Byte-Code Instructions
comp.lang.eiffel Frequently Asked Questions (FAQ)
It includes an Eiffel to C compiler, an Eiffel to
Java bytecode compiler, a documentation tool, a pretty printer, etc. The compiler uses an innovative strategy involving whole system analysis which allows compilation to be often faster
...
Invasion Of The Dynamic Language Weenies
Deibel responded in short
order, confirming that indeed the quote had been taken out of context and wasn't intended to "apply to everything, such as implementing the same sorting or image processing algorithm in Python, C++, and
Java.
...
[Java] 이클립스 플러그인 리스트 정보
Groovy can be used as an alternative compiler to javac to generate standard
Java bytecode to be used by any
Java project or it can be used dynamically as an alternative language such as for scripting
Java objects, templating or writing
...
How to Build a POJO-based Data Grid using Open Terracotta
Terracotta uses
bytecode instrumentation to adapt the target application at class load time. In this phase it extends the application in
order to ensure that the semantics of the
Java Language Specification (JLS) are correctly
...
Beware of the Natives
Having a dynamic adaptive compiler (JIT) is more than simply translating bytecodes into equivalent machine
instructions one
bytecode at a time. Such a naive translation strategy is more like an assembler than a compiler,
...
Multi-tasking the Java platform: What's the Big Deal?
The VM will be able to unwind stack frames for
Java bytecode methods. But native methods will require that the native method check for the exception and return. Since we're not allowing any application native code, we only have to deal
...