Apparently, a component of Rails implements a buffering .write
method which keeps the String buffer around and makes it unsafe
for us to clear it after calling .write.
This caused Rack::Deflater to give empty results when enabled.
Fortunately, per r61631 / a55abcc0ca,
this misguided optimization was only worth a small (0.5MB) savings
and we still benefit from the majority of the memory savings in
that change.
Thanks to zunda for the bug report.
[ruby-core:90133] [Bug #15356]
Fixes: r61631 (commit a55abcc0ca)
("zlib: reduce garbage on gzip writes (deflate)")
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66268 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
No point in having a long-lived cbuf in "struct gzfile"
since GZFILE_CBUF_CAPA is smaller than RSTRING_EMBED_LEN_MAX
(even on 32-bit). We can also have rb_econv_convert write
directly to the return value instead of an intermediate buffer.
This brings "struct gzfile" from 264 to 256 bytes on 64-bit
systems to avoid taking an additional cache line.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@63993 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
For garbage-concious users who use the `outbuf' argument of
`readpartial' to supply a destination buffer, this provides
a drastic reduction in garbage when inflating large inputs
in a streaming fashion.
This results in a anonymous RSS reduction in the reader
similar to the reduction in the writer from r61631.
Results using the test script from r61631
<https://svn.ruby-lang.org/cgi-bin/viewvc.cgi?view=revision&revision=61631>
Before:
writer 7.359999 0.000000 7.359999 ( 7.360639)
writer RssAnon: 4040 kB
reader 6.346667 0.070000 6.416667 ( 7.387654)
reader RssAnon: 98272 kB
After:
writer 7.309999 0.000000 7.309999 ( 7.310651)
writer RssAnon: 4048 kB
reader 6.146666 0.003333 6.149999 ( 7.334868)
reader RssAnon: 4300 kB
* ext/zlib/zlib.c (struct read_raw_arg): new struct
(gzfile_read_raw_partial): use read_raw_arg
(gzfile_read_raw_rescue): ditto
(gzfile_read_raw): accept outbuf, use read_raw_arg
(gzfile_read_raw_ensure): accept outbuf
(gzfile_read_header): ditto
(gzfile_check_footer): ditto
(gzfile_read_more): ditto
(gzfile_read_raw_until_zero): adjust for changes
(gzfile_fill): ditto
(gzfile_readpartial): ditto
(gzfile_read_all): ditto
(gzfile_getc): ditto
(gzfile_reader_end_run): ditto
(gzfile_reader_get_unused): ditto
(rb_gzreader_initialize): ditto
(gzreader_skip_linebreaks): ditto
(gzreader_gets): ditto
(zlib_gunzip_run): ditto
[ruby-core:84660] [Feature #14319]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@61665 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
Zlib::GzipWriter generated large amounts of garbage from
(struct zstream).input. Reuse the .input field when it is
hidden, and recycle it when its lifetime is over. This change
alone reduced memory usage of the writer from 90MB to 4.5MB.
For the detached buffer of compressed data used by
gzfile_write_raw, we can only clear the string (not recycle it)
since user code may hold references to it (but the data would be
clobbered, anyways). This reduced memory usage slightly by
around 0.5MB (because it's smaller compressed data).
Combined, these changes reduce the anonymous RSS memory of a
dedicated writer process from over 90MB to under 4MB.
before:
# user system total real
writer 7.823332 0.053333 7.876665 ( 7.881464)
writer RssAnon: 92944 kB
reader 6.969999 0.076666 7.046665 ( 7.906377)
reader RssAnon: 109820 kB
after:
writer 7.359999 0.000000 7.359999 ( 7.360639)
writer RssAnon: 4040 kB
reader 6.346667 0.070000 6.416667 ( 7.387654)
reader RssAnon: 98272 kB
Script used:
-------
require 'zlib'
require 'benchmark'
nr = 16384 * 2
def stats(pfx, bm)
str = "#{bm}#{File.readlines("/proc/#$$/status").grep(/^RssAnon:/)[0]}"
puts str.gsub!(/^/m, pfx)
end
rd, wr = IO.pipe
pid = fork do
buf = ((0..255).map(&:chr).join * 128).freeze
rd.close
gzip = Zlib::GzipWriter.new(wr)
bm = Benchmark.measure do
nr.times { gzip.write(buf) }
gzip.close
wr.close
end
stats('writer ', bm)
end
wr.close
buf = ''
gunzip = Zlib::GzipReader.new(rd)
n = 0
bm = Benchmark.measure do
begin
gunzip.readpartial(16384, buf)
n += buf.size
rescue EOFError
break
end while true
end
stats('reader ', bm)
Process.waitall
-------
* ext/zlib/zlib.c (zstream_discard_input): reuse or recycle hidden input
(zstream_reset_input): clear hidden input
(zstream_run): detach input and recycle after use
(gzfile_write_raw): clear buffer after write
[ruby-core:84638] [Feature #14315]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@61631 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
No need to reveal strings freshly created with rb_str_new.
* ext/zlib/zlib.c (zstream_detach_input): remove redundant call
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@61612 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* ext/zlib/zlib.c (zlib_gunzip): gz0 is a structure variable on
the stack, no longer valid after exit by an exception. ensure
to free instead. [Bug #13982]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@60131 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* ext/zlib/zlib.c (rb_gzfile_total_out): cast to long not to
result in an unsigned long to normalized to Fixnum on LLP64
platforms. [ruby-core:81488]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@59337 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
patched by Andrew Haines <andrew@haines.org.nz> [ruby-core:81488]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@59333 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* ext/zlib/zlib.c (zstream): manage capacity and size of `buf`
instead of size and separated member `buf_filled`. reported by
Christian Jalio (jalio) at https://hackerone.com/reports/211958
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58526 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* ext/zlib/zlib.c (zstream_buffer_ungetbyte): simplify by using
zstream_buffer_ungets().
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58525 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* ext/zlib/zlib.c (zstream_expand_buffer_non_stream): rename from
zstream_expand_buffer_without_gvl() and replace duplicate code
in zstream_expand_buffer().
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58524 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
- fix a typo (`GzipReadr` -> `GzipReader`)
- `Zlib::GzipReader.new` does not take block
- fix encoding
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57037 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* ext: use rb_check_arity and rb_error_arity to raise
ArgumentError. [Feature #9025]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@52275 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* ext/zlib/zlib.c (gzfile_reset): preserve ZSTREAM_FLAG_GZFILE
[Bug #10101]
* test/zlib/test_zlib.rb (test_rewind): test each_byte
We must preserve the ZSTREAM_FLAG_GZFILE flag to prevent
zstream_detach_buffer from:
a) returning Qnil and breaking out of the `each_byte' loop
b) yielding a large string to each_byte
Note: the test case in bug report takes a long time. I found this
bug because I noticed the massive time descrepancy between
`each_byte' and `readbyte' loop before this patch. With this patch,
`each_byte' and `readbyte' both take very long.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@47327 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
they may change in the implementation without notice. Patched by
@robin850 [Fixes GH-682] https://github.com/ruby/ruby/pull/682
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@46976 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
* ext/zlib/zlib.c (zstream_shift_buffer): create new copied string
since it cannot be shared ever.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@45625 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
need a dictionary but are being decompressed by Zlib::Inflate.inflate
(which has no option to set a dictionary). Now Zlib::NeedDict is
raised instead of crashing. [ruby-trunk - Bug #8829]
* test/zlib/test_zlib.rb (TestZlibInflate): Test for the above.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@42720 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
RBASIC_CLASS(obj) macro which returns a class of `obj'.
This change is a part of RGENGC branch [ruby-trunk - Feature #8339].
* object.c: add new function rb_obj_reveal().
This function reveal interal (hidden) object by rb_obj_hide().
Note that do not change class before and after hiding.
Only permitted example is:
klass = RBASIC_CLASS(obj);
rb_obj_hide(obj);
....
rb_obj_reveal(obj, klass);
TODO: API design. rb_obj_reveal() should be replaced with others.
TODO: modify constified variables using cast may be harmful for
compiler's analysis and optimizaton.
Any idea to prohibt inserting RBasic::klass directly?
If rename RBasic::klass and force to use RBASIC_CLASS(obj),
then all codes such as `RBASIC(obj)->klass' will be
compilation error. Is it acceptable? (We have similar
experience at Ruby 1.9,
for example "RARRAY(ary)->ptr" to "RARRAY_PTR(ary)".
* internal.h: add some macros.
* RBASIC_CLEAR_CLASS(obj) clear RBasic::klass to make it internal
object.
* RBASIC_SET_CLASS(obj, cls) set RBasic::klass.
* RBASIC_SET_CLASS_RAW(obj, cls) same as RBASIC_SET_CLASS
without write barrier (planned).
* RCLASS_SET_SUPER(a, b) set super class of a.
* array.c, class.c, compile.c, encoding.c, enum.c, error.c, eval.c,
file.c, gc.c, hash.c, io.c, iseq.c, marshal.c, object.c,
parse.y, proc.c, process.c, random.c, ruby.c, sprintf.c,
string.c, thread.c, transcode.c, vm.c, vm_eval.c, win32/file.c:
Use above macros and functions to access RBasic::klass.
* ext/coverage/coverage.c, ext/readline/readline.c,
ext/socket/ancdata.c, ext/socket/init.c,
* ext/zlib/zlib.c: ditto.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@40691 b2dd03c8-39d4-4d8f-98ff-823fe69b080e