Use _mm_store_si128() instead of _mm_stream_si128(). This ensures that copied memory
is preserved in data cache, which is good as the interpretor will start to use this
data without the need to go back to memory. _mm_stream* is intended to be used for
stores where we want to avoid reading data into the cache and the cache pollution;
in our scenario it seems that preserving the data in cache has a positive impact.
Tests on WordPress 4.1 show ~1% performance increase with fast_memcpy() in place
versus standard memcpy() when running php-cgi -T10000 wordpress/index.php.
I also updated SW prefetching on target memory but its contribution is almost negligible.
The address to be prefetched will be used in a couple of cycles (at the next iteration)
while the data from memory will be available in >100 cycles.
C:\> php -r "trait A { } trait A { }"
Will now properly print "Cannot redeclare trait A" instead of "Cannot redeclare class A" to make error messages a tiny bit clearer. Admittedly, a better solution can most likely be made by actually telling what the colliding object is a type of.
Internally this adds a new function:
zend_get_object_type()
Removed HashTable->arHash (reduced memory consumption). Now hash slots may be accessed using HT_HASH() macro.
Hash slotas are allocated together with Buckets (before them) and lay in reverse order from HashTable->arData base address (see comments in Zend/zend_types.h)
Indexes in hash table and conflict resolution chains (Z_NEXT) may be stored as indeces or offsets in bytes, depending on system (32 or 64-bit).
HashTable data filelds are reordered to keep the most useful for zend_hash_find() data in the same CPU cache line.
Now each HashTable is also zend_array, so it's refcounted and may be a subject for Copy on Write
zend_array_dup() was changed to allocate and return HashTable, instead of taking preallocated HashTable as argument.
Make nTableMask to be 0 for packed arrays.
Remove checks fo HASH_FLAG_PACKED in zend_hash_find/zend_hash_del and family (string keys are resolved through uninitialized_bucket).
Change HashTable layout for better locality.