JRuby disabling ObjectSpace: what implications?
12 Message(s) by 5 Author(s) originally posted in ruby programming
| From: Charles Oliver Nutter |
Date: Sunday, October 28, 2007
|
As some of you may have heard, we're considering disabling
ObjectSpace.each_
object by default in JRuby. Primarily, this is for
performance; to
support each_object, we've to bend over backwards,
maintaining
lists of weak
reference s to all objects in the
system and
periodically cleaning out those lists. Here's some example performance,
from a
fractal benchmark in the JRuby source:
With ObjectSpace:
Ruby Elapsed 45.967000
Without ObjectSpace: Ruby Elapsed 4.280000
What's most frustrating about this is that almost *no* libraries or apps
use each_object, and it's a terrible performance
hit for us.
The one
real ly visible use of each_object is in test/unit, where the
default console-based
run ner does each_object(Class) to find all
subclasses of TestCase. Because this is a heavily-used
library (to say
the least), I have made modifications to JRuby to always support
each_object(Class) by maintaining a bidirectional
graph of
parent and
child
class es. So that much would not go away (but I'd prefer an
implementation that uses Class#inherited, since it'd be cleaner,
faster, and deterministic).
So...I'm writing this to see what the general Ruby world thinks of us
having ObjectSpace disabled by default, enableable via a
command line
option (or perhaps through a library? -robjectspace?).
I think more and more of you may want to give JRuby another look over
the next few months, so I think we need to involve you in such decisions.
- Charlie
| From: Bill Kelly |
Date: Sunday, October 28, 2007
|
From: "Charles Oliver Nutter"
<charles.nutter@xxxxxxxxxxx>
As some of you may have heard, we're considering disabling
ObjectSpace.each_object by default in JRuby. Primarily, this is for
performance; to support each_object, we've to bend over backwards,
maintaining lists of weak references to all objects in the system and
periodically cleaning out those lists.
Is this also true for ObjectSpace#_id2ref ?Regards,
Bill
| From: Charles Oliver Nutter |
Date: Sunday, October 28, 2007
|
wrote in message:
> hmmm. ok i'm brainstorming here which you can ignore if you like as I
> know less that nothing about jvms or implementing ruby but here goes:
> what if you could invert the problem? what I objects knew about the
> global ObjectSpaceThang and could be forced to
register themselves on
> demand somehow? without a reference i have no idea how, just throwing
> that out there. or, another stupid idea, what if the objects themselves
> were the tree/graph of weak references parent -> children. crawling it
>'d be, um, fun - but you could prune
dead objects *only* when
walk ing
> the graph. this should be possible in ruby since you always have the
> notion of a parent object - which is Object - so all objects should be
> either reachable or leaks. now back to drinking my regularly scheduled
> beer...Continuing this discussion here...
Please, continue to brainstorm. I do not claim to have thought out every
aspect of this problem or every possible solution. I'd *
love * to
discover I have missed an obvious fix.
Your idea has come up in the past, and it'd probably eliminate the
cost of an ObjectSpace list. However that does not appear to be where we
pay the highest cost.
The two items that (we believe) cost the most for us on the
JVM are:
- Constructing an extra object for every Ruby object...namely, the
WeakReference object to
point to it. So we pay a
memory /allocation/initialization cost.
- WeakReference itself causes JAVA's GC to have to do additional checks,
so it can notify the WeakReference that the object it points at has gone
away. So that slows
down the legendary HotSpot GC and we pay again.
I believe the parent -> weakref -> children
algorithm is used in some
implementations of ObjectSpace-like behavior, so it's perfectly valid.
But again, there's certain aspects of ObjectSpace that are just
problematic...
-
threading or
concurrency of any kind? No, you can not have
multithreading with ObjectSpace, nor a concurrent/parallel GC (and it
potentially excludes other advanced GC designs too).
- determinism? Matz told me that "ObjectSpace does not have to be
deterministic"...but when it starts getting
wired into libraries like
test/unit, it seems like people
expect it to be. If we can say OS is not
deterministic, then *nobody* should be relying in its contents for
core
libraries, and we could reasonably claim that each_object will never
return *anything*.
- Charlie
| From: Charles Oliver Nutter |
Date: Sunday, October 28, 2007
|
wrote in message:
From: "Charles Oliver Nutter" <charles.nutter@xxxxxxxxxxx>
As some of you may have heard, we're considering disabling
ObjectSpace.each_object by default in JRuby. Primarily, this is for
performance; to support each_object, we've to bend over backwards,
maintaining lists of weak references to all objects in the system and
periodically cleaning out those lists.
Is this also true for ObjectSpace#_id2ref ?
Not directly. _id2ref is handled in a similar way, but we've an
event
we can
trigger off to start
track ing an object; namely, Object#id.
When you request an id, we start tracking that object for purposes of
_id2ref. Not until. So that wouldn't be affected by disabling ObjectSpace.
In actually, however, _id2ref is primarily used for things like weak
references, so you can hold a
virtual reference to an object without
preventing it from being collected. We could provide an implementation
of Ruby's weak references using JAVA's weak references that'd allow
us to escape _id2ref entirely for that use case.
Are there other places _id2ref is used?
- Charlie
| From: Bill Kelly |
Date: Sunday, October 28, 2007
|
From: "Charles Oliver Nutter"
<charles.nutter@xxxxxxxxxxx>
wrote in message:
Is this also true for ObjectSpace#_id2ref ?
Not directly. _id2ref is handled in a similar way, but we've an event
we can trigger off to start tracking an object; namely, Object#id.
When you request an id, we start tracking that object for purposes of
_id2ref. Not until. So that wouldn't be affected by disabling ObjectSpace.
I see, thanks. Nifty. :)
In actually, however, _id2ref is primarily used for things like weak
references, so you can hold a virtual reference to an object without
preventing it from being collected. We could provide an implementation
of Ruby's weak references using JAVA's weak references that'd allow
us to escape _id2ref entirely for that use case.
Are there other places _id2ref is used?
I think I have used _id2ref exactly twice. I can not recall the first
usage; I do not think it made it into production
code . The most
recent use was to
store some ruby object id's in a separate C++
process, which was able to fire an event back to ruby and provide
the object id for the object to receive the event.
(I suppose DRb might do something similar?)Regards,
Bill
| From: Robert Klemme |
Date: Sunday, October 28, 2007
|
wrote in message:
wrote in message:
> hmmm. ok i'm brainstorming here which you can ignore if you like as I
> know less that nothing about jvms or implementing ruby but here goes:
> what if you could invert the problem? what I objects knew about the
> global ObjectSpaceThang and could be forced to register themselves on
> demand somehow? without a reference i have no idea how, just throwing
> that out there. or, another stupid idea, what if the objects themselves
> were the tree/graph of weak references parent -> children. crawling it
>'d be, um, fun - but you could prune dead objects *only* when walking
> the graph. this should be possible in ruby since you always have the
> notion of a parent object - which is Object - so all objects should be
> either reachable or leaks. now back to drinking my regularly scheduled
> beer...
Continuing this discussion here...
Please, continue to brainstorm. I do not claim to have thought out every
aspect of this problem or every possible solution. I'd *love* to
discover I have missed an obvious fix.
IMHO ObjectSpace shouldn't be implemented in JAVA land. Why? The JVM
has to keep track of instances anyway and implementing this in JAVA via
WeakReferences seems to duplicate
function ality that is already there.
Did you consider using "JAVA Virtual Machine Tools Interface"?
http://JAVA.sun.com/JAVAse/6/webnotes/trouble/TSG-VM/html/gbmmt.html#gbmls
You could either follow the same approach of the heapTracker presented
on that page and use a
flag or require a lib that enables ObjectSpace
(because of the
overhead of instrumentation).
Alternatively there may be another
method that doesn't need
instrumentation and that can give you access to every (reachable) object
in the JVM.
Your idea has come up in the past, and it'd probably eliminate the
cost of an ObjectSpace list. However that does not appear to be where we
pay the highest cost.
The two items that (we believe) cost the most for us on the JVM are:
- Constructing an extra object for every Ruby object...namely, the
WeakReference object to point to it. So we pay a
memory/allocation/initialization cost.
- WeakReference itself causes JAVA's GC to have to do additional checks,
so it can notify the WeakReference that the object it points at has gone
away. So that slows down the legendary HotSpot GC and we pay again.
I believe the parent -> weakref -> children algorithm is used in some
implementations of ObjectSpace-like behavior, so it's perfectly valid.
But again, there's certain aspects of ObjectSpace that are just
problematic...
- threading or concurrency of any kind? No, you can not have
multithreading with ObjectSpace, nor a concurrent/parallel GC (and it
potentially excludes other advanced GC designs too).
- determinism? Matz told me that "ObjectSpace does not have to be
deterministic"...but when it starts getting wired into libraries like
test/unit, it seems like people expect it to be. If we can say OS is not
deterministic, then *nobody* should be relying in its contents for core
libraries, and we could reasonably claim that each_object will never
return *anything*.
I'd reformulate the requirement here: ObjectSpace.each_object must yield
every object that was existent before the invocation and that is
strongly reachable. I believe for the typical use case (e.g. traversing
all class instances) this is enough while leaving enough flexibility for
the implementation (i.e. create s snapshot of some form, iterate through
some internal structure that may change due to new objects being created
during #each_object etc.).
Kind regards
robert
| From: Daniel Berger |
Date: Sunday, October 28, 2007
|
On Oct 28, 12:53 am, Charles Oliver Nutter
<charles.nut...@xxxxxxxxxxx>
wrote in message:
<snip>
So...I'm writing this to see what the general Ruby world thinks of us
having ObjectSpace disabled by default, enableable via a command line
option (or perhaps through a library? -robjectspace?).
ext\common\win32\registry.rb:569: ObjectSpace.define_finalizer
self, @xxxxxxxxxxx@xxxxxxxxxxx(@xxxxxxxxxxx)
ext\dl\test\test.rb:187: ObjectSpace.define_finalizer(fp)
{File.unlink("tmp.txt")}
ext\tk\lib\multi-tk.rb:493: ObjectSpace.each_object(TclTkIp){|
obj|
ext\Win32API\lib\win32\registry.rb:569:
ObjectSpace.define_finalizer self, @xxxxxxxxxxx@xxxxxxxxxxx(@xxxxxxxxxxx)
lib\cgi\session.rb:299: ObjectSpace::define_finalizer(self,
Session::callback(@xxxxxxxxxxx))
lib\drb\drb.rb:337:# object's ObjectSpace id as its dRuby id. This
means that the dRuby
lib\drb\drb.rb:361: # This, the default implementation, uses an
object's local ObjectSpace
lib\drb\drb.rb:375: ObjectSpace._id2ref(ref)
lib\finalize.rb:59: ObjectSpace.call_finalizer(obj)
lib\finalize.rb:169: ObjectSpace.remove_finalizer(@xxxxxxxxxxx)
lib\finalize.rb:173: ObjectSpace.add_finalizer(@xxxxxxxxxxx)
lib\finalize.rb:180: # registering function to
ObjectSpace#add_finalizer
lib\finalize.rb:192: ObjectSpace.add_finalizer(@xxxxxxxxxxx)
lib\irb\completion.rb:152: ObjectSpace.each_object(Module){|m|
lib\irb\ext\save-history.rb:69: ObjectSpace.define_finalizer(obj,
HistorySavingAbility.create_finalizer)
lib\shell\process-controller.rb:216: ObjectSpace.each_object(IO) do
|io|
lib\singleton.rb:23:# ObjectSpace.each_object(OtherKlass){} # =>
0.
lib\singleton.rb:190: "#{ObjectSpace.each_object(klass){}} #{klass}
instance(s)"
lib\tempfile.rb:53: ObjectSpace.define_finalizer(self, @xxxxxxxxxxx)
lib\tempfile.rb:105: ObjectSpace.undefine_finalizer(self)
lib\tempfile.rb:118: ObjectSpace.undefine_finalizer(self)
lib\test\unit\autorunner.rb:17: ObjectSpace.each_object(Class)
do |klass|
lib\test\unit\autorunner.rb:54: :object
space => proc do |r|
lib\test\unit\autorunner.rb:55: require 'test/unit/collector/
objectspace'
lib\test\unit\autorunner.rb:56: c =
Collector::ObjectSpace.new
lib\test\unit\autorunner.rb:80: @xxxxxxxxxxx =
COLLECTORS[(standalone ? :dir : :objectspace)]
lib\test\unit\collector\dir.rb:13: def initialize(dir=::Dir,
file=::File, object_space=::ObjectSpace, req=nil)
lib\test\unit\collector\objectspace.rb:10: class ObjectSpace
lib\test\unit\collector\objectspace.rb:13: NAME = 'collected
from the ObjectSpace'
lib\test\unit\collector\objectspace.rb:15: def
initialize(source=::ObjectSpace)
lib\test\unit.rb:252: # the ObjectSpace and wrap them up into a suite
for you. It then runs
lib\weakref.rb:16:# ObjectSpace.garbage_collect
lib\weakref.rb:62: ObjectSpace._id2ref(@xxxxxxxxxxx)
lib\weakref.rb:74: ObjectSpace.define_finalizer obj, @xxxxxxxxxxx@xxxxxxxxxxx
lib\weakref.rb:75: ObjectSpace.define_finalizer self, @xxxxxxxxxxx@xxxxxxxxxxx
lib\weakref.rb:98: ObjectSpace.garbage_collect
test\dbm\test_dbm.rb:45: ObjectSpace.each_object(DBM) do |obj|
test\gdbm\test_gdbm.rb:42: ObjectSpace.each_object(GDBM) do |obj|
test\ruby\test_objectspace.rb:3:class TestObjectSpace <
Test::Unit::TestCase
test\ruby\test_objectspace.rb:10: o =
ObjectSpace._id2ref(obj.object_id);\
test\sdbm\test_sdbm.rb:15: ObjectSpace.each_object(SDBM) do |obj|
test\testunit\collector\test_dir.rb:62: class ObjectSpace
test\testunit\collector\test_dir.rb:81: @xxxxxxxxxxx =
ObjectSpace.new
test\testunit\collector\test_objectspace.rb:6:require 'test/unit/
collector/objectspace'
test\testunit\collector\test_objectspace.rb:11: class
TC_ObjectSpace < TestCase
test\testunit\collector\test_objectspace.rb:41: @xxxxxxxxxxx =
ObjectSpace.new(@xxxxxxxxxxx)
test\testunit\collector\test_objectspace.rb:44: def
full_suite(name=ObjectSpace::NAME)
test\testunit\collector\test_objectspace.rb:51:
TestSuite.new(ObjectSpace::NAME)
test\testunit\collector\test_objectspace.rb:83: expected =
TestSuite.new(ObjectSpace::NAME)
test\testunit\collector\test_objectspace.rb:89: expected =
TestSuite.new(ObjectSpace::NAME)
test\yaml\test_yaml.rb:1279: ObjectSpace.each_object(Class) do |
klass|
So, in summary, if we exclude those libraries where only tests are
affected, this'd affect:
win32-registry
tk
cgi
drb
finalize
irb
shell
singleton
tempfile
test-unit
weakref
Some comments on each of these as they relate to JRuby:
win32-registry: You've no hope of implementing this without JNA
anyway, unless there's some JAVA binding I do not know about. Besides,
I could not tell you why on Earth win32-registry'd need a
finalizer.
tk: No one will care. They will use SWT or Swing bindings. Besides, you
would need JNA.
cgi: This could be a problem. Then again, some people say this library
should be refactored or tossed.
drb: This could be a big deal.
finalize: Did anyone even know about this? Does anyone use it?
irb: You have got jirb.
shell: This could be a problem.
singleton: Ditto.
tempfile: Meh, I'm guessing JAVA has its own library for temp files. I
wrote in message
file-temp).
test-unit: Already mentioned.
weakref: You have stated that JAVA has its own implementation.
Regards,
Dan
| From: ara.t.howard |
Date: Sunday, October 28, 2007
|
wrote in message:
Are there other places _id2ref is used?
i use it quite often as a way to have meta-programming 'storage'
without polluting instances:
foo = method :foo
module_eval <<-code
def foo(*a, &b)
ObjectSpace._id2ref(#{ foo.id }).bind(self).call(*a, &b)
end
code
which is fabricated - but you get the concept:
string in eval maps to
live object at run time. when #define_method takes a
block this
won't be used much I think though...
cheers.
a @xxxxxxxxxxx
http://codeforpeople.com/
--
it isn't enough to be compassionate. you must act.
h.h. the 14th dalai lama
| From: Charles Oliver Nutter |
Date: Sunday, October 28, 2007
|
wrote in message:
I think I have used _id2ref exactly twice. I can not recall the first
usage; I do not think it made it into production code. The most
recent use was to store some ruby object id's in a separate C++
process, which was able to fire an event back to ruby and provide
the object id for the object to receive the event.
(I suppose DRb might do something similar?)
Yeah, sounds like that's mostly a "poor man's remote
hash ". I'd expect
that just creating a hash specifically for that purpose and passing a
key around'd be a "better" way to do it.
_id2ref is just another one of those features that gets rarely used, and
whose use cases can often be implemented in "better" ways.
- Charlie
| From: Charles Oliver Nutter |
Date: Sunday, October 28, 2007
|
wrote in message:
IMHO ObjectSpace shouldn't be implemented in JAVA land. Why? The JVM
has to keep track of instances anyway and implementing this in JAVA via
WeakReferences seems to duplicate functionality that is already there.
Did you consider using "JAVA Virtual Machine Tools Interface"?
http://JAVA.sun.com/JAVAse/6/webnotes/trouble/TSG-VM/html/gbmmt.html#gbmls
You could either follow the same approach of the heapTracker presented
on that page and use a flag or require a lib that enables ObjectSpace
(because of the overhead of instrumentation).
You just hit on exactly why we do not use JVMTI for ObjectSpace. It'd
certainly work, but it'd add a lot of overhead we'd never expect
people to
accept in a real application. Plus, it'd track far more
object instances than we actually want tracked. We'd love to
include a
JVMTI-based ObjectSpace implementation, however...it just has not been a
high priority to implement since 99% of users never actually need
ObjectSpace.
Alternatively there may be another method that doesn't need
instrumentation and that can give you access to every (reachable) object
in the JVM.
If there is...we have not found it. The "linked weakref list" has been
the least overhead so far, and it's still a lot of overhead.
Your idea has come up in the past, and it'd probably eliminate the
cost of an ObjectSpace list. However that does not appear to be where
we pay the highest cost.
The two items that (we believe) cost the most for us on the JVM are:
- Constructing an extra object for every Ruby object...namely, the
WeakReference object to point to it. So we pay a
memory/allocation/initialization cost.
- WeakReference itself causes JAVA's GC to have to do additional
checks, so it can notify the WeakReference that the object it points
at has gone away. So that slows down the legendary HotSpot GC and we
pay again.
I believe the parent -> weakref -> children algorithm is used in some
implementations of ObjectSpace-like behavior, so it's perfectly valid.
But again, there's certain aspects of ObjectSpace that are just
problematic...
- threading or concurrency of any kind? No, you can not have
multithreading with ObjectSpace, nor a concurrent/parallel GC (and it
potentially excludes other advanced GC designs too).
- determinism? Matz told me that "ObjectSpace does not have to be
deterministic"...but when it starts getting wired into libraries like
test/unit, it seems like people expect it to be. If we can say OS
is not deterministic, then *nobody* should be relying in its contents
for core libraries, and we could reasonably claim that each_object
will never return *anything*.
I'd reformulate the requirement here: ObjectSpace.each_object must yield
every object that was existent before the invocation and that is
strongly reachable. I believe for the typical use case (e.g. traversing
all class instances) this is enough while leaving enough flexibility for
the implementation (i.e. create s snapshot of some form, iterate through
some internal structure that may change due to new objects being created
during #each_object etc.).
The problem here is "strongly reachable". During ObjectSpace processing,
the last strong reference to an object may go away and the garbage
collector may run. Should ObjectSpace prevent GC from running if it's
traversed and now references that object? If not, how should it be
handled if immediately before you return an object from each_object, it
gets garbage collected? There's no way to catch that, so each_object may
end up returning a reference to an object that's gone away, or
reconstituting an object whose finalization has already fired. Bad
things happen.
ObjectSpace is just not
compatible with any GC that requires the ability
to move objects around in memory, run in parallel, and so on. It can
*never* be deterministic unless it can "stop the world", so it should
not be used for algorithms that require any level of determinism, such
as the test search in test/unit.
- Charlie
| From: Robert Klemme |
Date: Sunday, October 28, 2007
|
wrote in message:
wrote in message:
IMHO ObjectSpace shouldn't be implemented in JAVA land. Why? The
JVM has to keep track of instances anyway and implementing this in
JAVA via WeakReferences seems to duplicate functionality that is
already there. Did you consider using "JAVA Virtual Machine Tools
Interface"?
http://JAVA.sun.com/JAVAse/6/webnotes/trouble/TSG-VM/html/gbmmt.html#gbmls
You could either follow the same approach of the heapTracker presented
on that page and use a flag or require a lib that enables ObjectSpace
(because of the overhead of instrumentation).
You just hit on exactly why we do not use JVMTI for ObjectSpace. It'd
certainly work, but it'd add a lot of overhead we'd never expect
people to accept in a real application. Plus, it'd track far more
object instances than we actually want tracked.
Why is that? I mean, you could selectively decide which instances to track.
We'd love to include a
JVMTI-based ObjectSpace implementation, however...it just has not been a
high priority to implement since 99% of users never actually need
ObjectSpace.
Alternatively there may be another method that doesn't need
instrumentation and that can give you access to every (reachable)
object in the JVM.
If there is...we have not found it. The "linked weakref list" has been
the least overhead so far, and it's still a lot of overhead.
Hmm, but there are
iteration methods like #each_object:
http://JAVA.sun.com/j2se/1.5.0/docs/guide/jvmti/jvmti.html#Heap
Did you put them down because of the "stop the world" approach? I'd say
that'd be ok - at least it's better than not having ObjectSpace.
And also, there'd be no overhead. Question is only whether it's ok
to invoke arbitrary byte code (which'd happen during the iteration
callback).
Your idea has come up in the past, and it'd probably eliminate
the cost of an ObjectSpace list. However that does not appear to be
where we pay the highest cost.
The two items that (we believe) cost the most for us on the JVM are:
- Constructing an extra object for every Ruby object...namely, the
WeakReference object to point to it. So we pay a
memory/allocation/initialization cost.
- WeakReference itself causes JAVA's GC to have to do additional
checks, so it can notify the WeakReference that the object it points
at has gone away. So that slows down the legendary HotSpot GC and we
pay again.
I believe the parent -> weakref -> children algorithm is used in some
implementations of ObjectSpace-like behavior, so it's perfectly
valid. But again, there's certain aspects of ObjectSpace that are
just problematic...
- threading or concurrency of any kind? No, you can not have
multithreading with ObjectSpace, nor a concurrent/parallel GC (and it
potentially excludes other advanced GC designs too).
- determinism? Matz told me that "ObjectSpace does not have to be
deterministic"...but when it starts getting wired into libraries like
test/unit, it seems like people expect it to be. If we can say OS
is not deterministic, then *nobody* should be relying in its contents
for core libraries, and we could reasonably claim that each_object
will never return *anything*.
I'd reformulate the requirement here: ObjectSpace.each_object must
yield every object that was existent before the invocation and that is
strongly reachable. I believe for the typical use case (e.g.
traversing all class instances) this is enough while leaving enough
flexibility for the implementation (i.e. create s snapshot of some
form, iterate through some internal structure that may change due to
new objects being created during #each_object etc.).
The problem here is "strongly reachable". During ObjectSpace processing,
the last strong reference to an object may go away and the garbage
collector may run. Should ObjectSpace prevent GC from running if it's
traversed and now references that object? If not, how should it be
handled if immediately before you return an object from each_object, it
gets garbage collected?
You are right: objects can "disappear" (i.e. loose their strong
reachability) during traversal. Obviously my suggested requirement was
still too strong.
There's no way to catch that, so each_object may
end up returning a reference to an object that's gone away, or
reconstituting an object whose finalization has already fired. Bad
things happen.
Recreation is a bad idea. I agree, objects that are no longer strongly
reachable at the moment they are about to be passed to the block should
*not* be passed.
ObjectSpace is just not compatible with any GC that requires the ability
to move objects around in memory,
I do not think that moving is an issue. If it were, JVM's wouldn't work
the way they do (object references are no pointers to memory locations).
In other words, all programs would've the same problems #each_object
had.
run in parallel, and so on. It can
*never* be deterministic unless it can "stop the world", so it should
not be used for algorithms that require any level of determinism, such
as the test search in test/unit.
Right you are. #each_object shouldn't be used in regular code - it's
more for ad hoc statistics ("how many instances of a class?") and the like.
Kind regards
robert
| From: Charles Oliver Nutter |
Date: Sunday, October 28, 2007
|
wrote in message:
wrote in message:
You just hit on exactly why we do not use JVMTI for ObjectSpace. It
'd certainly work, but it'd add a lot of overhead we'd never
expect people to accept in a real application. Plus, it'd track
far more object instances than we actually want tracked.
Why is that? I mean, you could selectively decide which instances to
track.
Actually, we do that a
bit already. For example, we don't track arrays
constructed during
argument processing, since they are typically
transient. The problem is that we could only choose to track all Ruby
objects, for example...which'd cripple other JRuby apps running in
the same process.
In general, though, we have not explored JVMTI because we want JRuby to
be the best production
environment for deploying apps, and nobody will
EVER turn on JVMTI on their production servers.
Alternatively there may be another method that doesn't need
instrumentation and that can give you access to every (reachable)
object in the JVM.
If there is...we have not found it. The "linked weakref list" has been
the least overhead so far, and it's still a lot of overhead.
Hmm, but there are iteration methods like #each_object:
http://JAVA.sun.com/j2se/1.5.0/docs/guide/jvmti/jvmti.html#Heap
I was referring to non-JVMTI solutions, but you're right, JVMTI does
provide this capability.
Did you put them down because of the "stop the world" approach? I'd say
that'd be ok - at least it's better than not having ObjectSpace. And
also, there'd be no overhead. Question is only whether it's ok to
invoke arbitrary byte code (which'd happen during the iteration
callback).
Is it really ok? You need to remember that JRuby opens up the
possibility of running many, many applications in the same process, as
well as
asynchronous algorithms with true parallel threads. We can not
expect people to cripple all that so they can walk EVERY object in the
system. "Stop the world" is awful when you start breaking the ability to
do many things in parallel, as you can in JRuby.
But it may be that for cases where each_object is needed, this is a
reasonable thing to do. I think if someone were to submit an
implementation of each_object that uses JVMTI, we'd certainly accept
it :)
ObjectSpace is just not compatible with any GC that requires the
ability to move objects around in memory,
I do not think that moving is an issue. If it were, JVM's wouldn't work
the way they do (object references are no pointers to memory locations).
In other words, all programs would've the same problems #each_object
had.
The problem isn't so much that the object references move as that you
would've to lock the memory locations for some period of time to be
able to walk the object table. And I think that's *bad* especially when
we're looking at JRuby allowing folks to run dozens of apps in the same
process and memory space out of the box. We can not lock things down like
that.
- Charlie
Next Message: Talking more about JRuby