NOT reading an entire file into memory
8 Message(s) by 4 Author(s) originally posted in ruby programming
| From: Devi Web Development |
Date: Saturday, October 27, 2007
|
I am trying to
write a
parser for a
text -based
file format. Files in
this format frequently become very large. While the specification
specifically allows applications to
crash on large files, I know
several people who have taken to editing these files by hand in
Notepad or other basic text editors. This format isn't at all
friendly for this
type of editing, and it is extremely tedious work,
but their
program s all crash due to the size of these files.
What I
real ly want to know is:
I had been using File.readline and saving a lot of temporary files via
tempfile.rb (
http://www.ruby-doc.org/stdlib/libdoc/tempfile/rdoc/index.html).
However, I have heard that File.readline is in fact equivalent to
File.read.split('\n').each, which'd really ruin my purpose of not
load ing the whole file. I'd really like to keep this in ruby, as I
want to package the whole thing via the wonderful rubyscipt2exe, as
well as, of course, a
standard rubygem.
What I'd actually really
love is if there was a way to read lines
4 through 7 without reading the whole file.
My
current method has made the program not nearly as beautiful as ruby
ought to be.
-------------------------------------------
Daniel Brumbaugh Keeney
Devi Web Development
Devi.WebMaster@xxxxxxxxxxx
-------------------------------------------
| From: Konrad Meyer |
Date: Saturday, October 27, 2007
|
--nextPart2994194.aspd7W03tV
Content-Disposition: inline
Quoth Devi Web Development:
I am trying to write a parser for a text-based file format. Files in
this format frequently become very large. While the specification
specifically allows applications to crash on large files, I know
several people who have taken to editing these files by hand in
Notepad or other basic text editors. This format isn't at all
friendly for this type of editing, and it is extremely tedious work,
but their programs all crash due to the size of these files.
What I really want to know is:
I had been using File.readline and saving a lot of temporary files via
tempfile.rb=20
(
http://www.ruby-doc.org/stdlib/libdoc/tempfile/rdoc/index.html).
However, I have heard that File.readline is in fact equivalent to
File.read.split('\n').each, which'd really ruin my purpose of not
loading the whole file. I'd really like to keep this in ruby, as I
want to package the whole thing via the wonderful rubyscipt2exe, as
well as, of course, a standard rubygem.
What I'd actually really love is if there was a way to read lines
4 through 7 without reading the whole file.
My current method has made the program not nearly as beautiful as ruby
ought to be.
=20
-------------------------------------------
Daniel Brumbaugh Keeney
Devi Web Development
Devi.WebMaster@xxxxxxxxxxx
-------------------------------------------
f =3D File.open("myfile")
# skip through 3rd line
3.times do f.readline end
Array.new(4).map do
f.readline
end
=2D-=20
Konrad Meyer
<konrad@xxxxxxxxxxx> http://konrad.sobertillnoon.com/
--nextPart2994194.aspd7W03tV
Content-Type: application/pgp-signature; name=signature.asc
Content-Description: This is a digitally signed
message part.
-----BEGIN
PGP SIGNATURE-----
Version: GnuPG v1.4.7 (GNU/Linux)
iD8DBQBHI84xCHB0oCiR2cwRAsesAJ48vE8IHThnCWc8AX4jdarwfh0ULACeLNYr
sRIpXyuajRaW0s1FKCJ51Ao=
=nGo1
-----END PGP SIGNATURE-----
--nextPart2994194.aspd7W03tV--
| From: 7stud -- |
Date: Saturday, October 27, 2007
|
wrote in message:
I have heard that File.readline is in fact equivalent to
File.read.split('\n').each, which'd really ruin my purpose of not
loading the whole file.
I doubt that is true, but as is often the case with
Ruby there is no
easily locatable
documentation that describes File
I/O buffering. Just
in case, here is another solution:
#create a
data file containing:
#line 1
#line 2
#...
#line 10
File.open("data.txt", "w") do |file|
10.times do |i|
file.puts("line #{i+1}")
end
end#read lines 4-7 and
display them:
File.open("data.txt") do |file|
file.each_with_index do |line, i|
I = I + 1 #i starts at 0
if I < 4
next
elsif I < 8
puts line
else
break
end
end
end
--
Posted via
http://www.ruby-forum.com/.
| From: 7stud -- |
Date: Saturday, October 27, 2007
|
| From: Konrad Meyer |
Date: Saturday, October 27, 2007
|
--nextPart2448670.z5XDEfIDvv
Content-Disposition: inline
Quoth 7stud --:
wrote in message:
> I have heard that File.readline is in fact equivalent to
> File.read.split('\n').each, which'd really ruin my purpose of not
> loading the whole file.=20
>
=20
I doubt that is true, but as is often the case with Ruby there is no=20
easily locatable documentation that describes File I/O buffering. Just=20
in case, here is another solution:
=20
#create a data file containing:
#line 1
#line 2
#...
#line 10
=20
File.open("data.txt", "w") do |file|
10.times do |i|
file.puts("line #{i+1}")
end
end
=20
=20
#read lines 4-7 and display them:
File.open("data.txt") do |file|
file.each_with_index do |line, i|
I =3D I + 1 #i starts at 0
=20
if i
< 4
next
elsif I < 8
puts line
else
break
end
=20
end
end
IO#each_with_index and IO#readline are probably the same internally, so the=
=20
real answer here is that NO, IO#readline isn't the same as=20
=46ile.read.split('\n'), that's IO#readlines.
=2D-=20
Konrad Meyer <konrad@xxxxxxxxxxx> http://konrad.sobertillnoon.com/
--nextPart2448670.z5XDEfIDvv
Content-Type: application/pgp-signature; name=signature.asc
Content-Description: This is a digitally signed message part.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (GNU/Linux)
iD8DBQBHI/VJCHB0oCiR2cwRAroPAJ9PkmMnb0GaAuQGsF/PPgmMC+PkQQCfc+v8
7wxmgvidh89S15qXTNe5Yq4=
=rHhI
-----END PGP SIGNATURE-----
--nextPart2448670.z5XDEfIDvv--
| From: 7stud -- |
Date: Sunday, October 28, 2007
|
wrote in message:
Quoth 7stud --:
#create a data file containing:
else
break
end
end
end
IO#each_with_index and IO#readline are probably the same internally, so
the
real answer here is that NO, IO#readline isn't the same as
File.read.split('\n'), that's IO#readlines.
The real question is: does readline do any buffering? What about
each()? If a file has ten lines in it, does ruby access the file ten
times? Or, does ruby read some reasonable amount of data into a buffer?
--
Posted via
http://www.ruby-forum.com/.
| From: Konrad Meyer |
Date: Sunday, October 28, 2007
|
--nextPart6708284.t5Ny6Zu6aX
Content-Disposition: inline
Quoth 7stud --:
wrote in message:
> Quoth 7stud --:
#create a data file containing:
=20
else
break
end
=20
end
end
>=20
> IO#each_with_index and IO#readline are probably the same internally, so=
=20
> the
> real answer here is that NO, IO#readline isn't the same as
> File.read.split('\n'), that's IO#readlines.
>
=20
The real question is: does readline do any buffering? What about=20
each()? If a file has ten lines in it, does ruby access the file ten=20
times? Or, does ruby read some reasonable amount of data into a buffer?
Performance is not everything. If it was, you would not be using ruby. The id=
ea=20
is that this will work "well enough", should not take too much thought on th=
e=20
programmer's behalf, and does not load the entire (huge) file into ram.
=2D-=20
Konrad Meyer
<konrad@xxxxxxxxxxx> http://konrad.sobertillnoon.com/
--nextPart6708284.t5Ny6Zu6aX
Content-Type: application/pgp-signature; name=signature.asc
Content-Description: This is a digitally signed message part.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (GNU/Linux)
iD8DBQBHJEXCCHB0oCiR2cwRAt4sAKC3UYPEDVZZNMIfCsJthMT2Y8HuswCdFAd0
tb77cWABNbZlQ/CvwDNYlx8=
=vQVZ
-----END PGP SIGNATURE-----
--nextPart6708284.t5Ny6Zu6aX--
| From: Robert Klemme |
Date: Sunday, October 28, 2007
|
wrote in message:
wrote in message:
Quoth 7stud --:
#create a data file containing:
else
break
end
end
end
IO#each_with_index and IO#readline are probably the same internally, so
the
real answer here is that NO, IO#readline isn't the same as
File.read.split('\n'), that's IO#readlines.
The real question is: does readline do any buffering? What about
each()? If a file has ten lines in it, does ruby access the file ten
times? Or, does ruby read some reasonable amount of data into a buffer?
Ruby does buffering but won't read the whole file unless asked to do so.
There are several ways to access only lines 4 through 7. For example:
# 1
require 'enumerator' # pre 1.9
File.to_enum(:foreach, "foo.dat").each_with_index do |line,idx|
case idx
when 0...3
# ignore
when 3...7
puts line
else
break # or return or exit
end
end# 2
File.open("foo.dat") do |io|
io.each do |line|
case io.lineno
when 1...4
# ignore
when 4..7
puts line
else
break
end
end
end
# 3
File.foreach "foo.dat" do |line|
case $.
when 1...4
# ignore
when 4..7
puts line
else
break
end
end
Kind regards
robert
Next Message: Checkers library