Conversation
3d12cad to
6d4cdc7
Compare
6d4cdc7 to
cc46164
Compare
Adds a `.indexOf` method to `Buffer`, which borrows semantics from both `Array.prototype.indexOf` and `String.prototype.indexOf`. `Buffer.prototype.indexOf` can be invoked with a Buffer, a string or a number as the needle. If the needle a Buffer or string, it will find the first occurrence of this sequence of bytes. If the needle is a number, it will find the first occurrence of this byte. Reviewed-by: Sam Rijs <srijs@airpost.net> Fixes: nodejs#95 PR-URL: nodejs#160
cc46164 to
984a21f
Compare
|
Though this function might be useful as it is, I think |
There was a problem hiding this comment.
Allocating a buffer for a single byte seems too much. Also it's unclear that only one byte numbers are accepted. What if I want to search for 16 byte number? How to specify endianness?
There was a problem hiding this comment.
This intentionally follows the semantics of Array.prototype.indexOf.
It can thus only search for an element of the array. If you wanted to search for a sequence of elements (bytes), you can supply a Buffer as the needle.
There was a problem hiding this comment.
@vkurchatkin As I understand it, the backing slow buffer will have already been allocated, so this shouldn't be too bad of an allocation.
We could amortize the cost of repeated calls by having a statically available single byte buffer into which we swap the value, which might be a nice optimization.
There was a problem hiding this comment.
I was thinking about special functions on internal for each case
There was a problem hiding this comment.
If we go with a hybrid interface, I also think different specialised functions would be a cleaner approach.
|
That depends on how you look at it. The situation is that there is It might be that I am missing something, in that case don't hesitate to point it out :) |
|
I think that function that finds a buffer inside of another buffer is generally more useful than function that finds a byte inside a buffer. But the spec says that Typed Arrays should behave just like arrays of numbers. And IMO
obj.on('data', function (data) {
// what is data?
// in node it's a `Buffer`
// in browser or ES6-compatible environment it's a `Uint8Array`
// but they are compatible (to some extent), so we don't really care
});It is maybe too late for full compatibility, but we shouldn't make things worse for no good reason. |
|
What about using something more efficient for multi-byte needles? I'm thinking of boyer-moore or a similar algorithm. |
|
boyer-moore-horspool ++ |
|
However, Boyer-Moore-Horspool's worst case is O(n*m) whereas Boyer-Moore runs in linear time. |
|
While I would be in favour of using a more sophisticated approach than naive searching, for the case that we go with Go with the current one (hybrid), or choose only one of I think @vkurchatkin has some valid points, I'd be interested to hear what some other people think of it. |
|
This will make writing parsers so much better ;) |
|
Checking in. @vkurchatkin I think the ship has sailed re: divergence from Uint8Array. We have already greatly diverged from the TypedArray spec in terms of public methods, and buffer-browserify shows how Buffers can be implemented on top of typed arrays for in-browser use. @srijs There seems to be a preference for implementing this function in terms of Boyer-Moore, would you be interested in continuing this PR in that direction? |
|
@chrisdickinson Yes, I'd be interested in going that direction. I will start with the implementation in the next couple of days. |
|
@srijs Ping. Any progress? |
There was a problem hiding this comment.
Possibly something like byteOffset would be better than fromIndex. The latter makes it sound like the character placement, but that's incorrect for multi-byte strings.
|
@srijs Thanks for doing this work. There are several optimizations that could be implemented, but it'd take me a while to explain how to do them. Mind if I implement this myself, and I could throw in regex support as well. :) |
|
+100 |
Adds a
.indexOfmethod toBuffer, which borrows semantics fromboth
Array.prototype.indexOfandString.prototype.indexOf.Buffer.prototype.indexOfcan be invoked with a Buffer, a stringor a number as the needle. If the needle a Buffer or string, it will
find the first occurrence of this sequence of bytes. If the needle is
a number, it will find the first occurrence of this byte.
Fixes #95.
If you're unsure about or not happy with the semantics, let's discuss :)