C++ string::find functions and size_t

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

C++ string::find functions and size_t

Rik-4
2/27/15

Dan Sebald found a subtle bug in the use of the C++ string find functions
that was leading to segfaults.

-- Code --
size_t pos = file.find_first_not_of ("|");
if (pos > 0)
  file = file.substr (pos);
else
-- End Code --

The issue is that the size_t is an unsigned quantity and the find functions
do not return -1 on failure as do regular C functions.  Instead they return
std::string::npos (a very large number) when they fail to find the search term.

This was easily corrected to

-- Code --
size_t pos = file.find_first_not_of ("|");
if (pos != std::string::npos)
  file = file.substr (pos);
else
-- End Code --

This made me a bit paranoid so I grep'ed through the code and found three
other instances where Octave was using incorrect comparisons or arithmetic
on size_t values returned from find functions.  See cset
(http://hg.savannah.gnu.org/hgweb/octave/rev/ed5ee3f610db).  At this point,
the test for __osmesa_print__ fails, which is probably good and probably
related to the segfault that has been reported with this test
(https://savannah.gnu.org/bugs/?44360).

Final question on this topic.  According to the C++ reference, the find
functions actually return an object of std::string::size_type.  Normally,
maybe even definitionally, this is the same as std::size_t.  Octave code,
however, assumes that the bare size_t is the same as std::size_t.  Should
we care about this?  The gl2ps-render.cc code, that maybe we got from
somewhere else originally, does use std::size_t.

--Rik



Reply | Threaded
Open this post in threaded view
|

Re: C++ string::find functions and size_t

Mike Miller
On Fri, Feb 27, 2015 at 09:47:17 -0800, rik wrote:
> Final question on this topic.  According to the C++ reference, the find
> functions actually return an object of std::string::size_type.  Normally,
> maybe even definitionally, this is the same as std::size_t.  Octave code,
> however, assumes that the bare size_t is the same as std::size_t.  Should
> we care about this?  The gl2ps-render.cc code, that maybe we got from
> somewhere else originally, does use std::size_t.

Yes, this is absolutely safe. Note that std::string::size_type is in
fact the same as std::size_t according to the standard. It's
std::basic_string<c,t,a>::size_type that may differ from std::size_t.

--
mike

Reply | Threaded
Open this post in threaded view
|

Re: C++ string::find functions and size_t

Rik-4
On 02/27/2015 10:17 AM, Mike Miller wrote:

> On Fri, Feb 27, 2015 at 09:47:17 -0800, rik wrote:
>> Final question on this topic.  According to the C++ reference, the find
>> functions actually return an object of std::string::size_type.  Normally,
>> maybe even definitionally, this is the same as std::size_t.  Octave code,
>> however, assumes that the bare size_t is the same as std::size_t.  Should
>> we care about this?  The gl2ps-render.cc code, that maybe we got from
>> somewhere else originally, does use std::size_t.
> Yes, this is absolutely safe. Note that std::string::size_type is in
> fact the same as std::size_t according to the standard. It's
> std::basic_string<c,t,a>::size_type that may differ from std::size_t.
>

Just clarifying, absolutely safe to use bare size_t or absolutely safe to
use std::size_t in place of std::string::size_type?

An example line:

corefcn/dlmread.cc:355:              size_t pos1 = line.find_first_not_of
(" \t");

and we don't have a "using namespace std;" line so this is whatever size_t
is picked up.  If it is the same as in <cstddef> then we are fine.

--Rik

Reply | Threaded
Open this post in threaded view
|

Re: C++ string::find functions and size_t

Jordi Gutiérrez Hermoso-2
In reply to this post by Rik-4
On Fri, 2015-02-27 at 09:47 -0800, rik wrote:
> According to the C++ reference, the find functions actually return
> an object of std::string::size_type. Normally, maybe even
> definitionally, this is the same as std::size_t. Octave code,
> however, assumes that the bare size_t is the same as std::size_t.

::size_t is inherited from C and it's the type of the sizeof()
operator, while std::size_t is, according to gospel, "an
implementation-defined unsigned integer type that is large enough to
contain the size in bytes of any object."

I don't see how these two things can be different, and they certainly
are not in the gcc implementation.

- Jordi G. H.



Reply | Threaded
Open this post in threaded view
|

Re: C++ string::find functions and size_t

Andreas Weber-4
In reply to this post by Rik-4
Am 27.02.2015 um 18:47 schrieb rik:

> 2/27/15
>
> Dan Sebald found a subtle bug in the use of the C++ string find functions
> that was leading to segfaults.
>
> -- Code --
> size_t pos = file.find_first_not_of ("|");
> if (pos > 0)
>   file = file.substr (pos);
> else
> -- End Code --

I think the idea here was to check if the first char in file is "|" and
omit the check (because we are building a pipe) for an existent
directory in the else path.

> The issue is that the size_t is an unsigned quantity and the find functions
> do not return -1 on failure as do regular C functions.  Instead they return
> std::string::npos (a very large number) when they fail to find the search term.
>
> This was easily corrected to
>
> -- Code --
> size_t pos = file.find_first_not_of ("|");
> if (pos != std::string::npos)
>   file = file.substr (pos);
> else
> -- End Code --

Please correct me if I'm wrong but shouldn't this then be

-- Code --
size_t pos = file.find_first_not_of ("|");
if (pos != std::string::npos && pos > 0)
-- End Code --

-- Andy


Reply | Threaded
Open this post in threaded view
|

Re: C++ string::find functions and size_t

John W. Eaton
Administrator
On 02/27/2015 06:37 PM, Andreas Weber wrote:

> Am 27.02.2015 um 18:47 schrieb rik:
>> 2/27/15
>>
>> Dan Sebald found a subtle bug in the use of the C++ string find functions
>> that was leading to segfaults.
>>
>> -- Code --
>> size_t pos = file.find_first_not_of ("|");
>> if (pos > 0)
>>    file = file.substr (pos);
>> else
>> -- End Code --
>
> I think the idea here was to check if the first char in file is "|" and
> omit the check (because we are building a pipe) for an existent
> directory in the else path.
>
>> The issue is that the size_t is an unsigned quantity and the find functions
>> do not return -1 on failure as do regular C functions.  Instead they return
>> std::string::npos (a very large number) when they fail to find the search term.
>>
>> This was easily corrected to
>>
>> -- Code --
>> size_t pos = file.find_first_not_of ("|");
>> if (pos != std::string::npos)
>>    file = file.substr (pos);
>> else
>> -- End Code --
>
> Please correct me if I'm wrong but shouldn't this then be
>
> -- Code --
> size_t pos = file.find_first_not_of ("|");
> if (pos != std::string::npos && pos > 0)
> -- End Code --

Yes, I think that's correct.  I also thought maybe it would be clearer
to write  if (file.length () > 0 && file[0] == '|'), then just extract
get file.substr (1) as the command.  Shouldn't that work?  It looks
simpler to me.

jwe



Reply | Threaded
Open this post in threaded view
|

Re: C++ string::find functions and size_t

Rik-4
In reply to this post by Andreas Weber-4
On 02/27/2015 03:37 PM, Andreas Weber wrote:
> Am 27.02.2015 um 18:47 schrieb rik:
>> 2/27/15
>>
>> Dan Sebald found a subtle bug in the use of the C++ string find functions
>> that was leading to segfaults.
>>
>> -- Code --
>> size_t pos = file.find_first_not_of ("|");
>> if (pos > 0)
>>   file = file.substr (pos);
>> else
>> -- End Code --
>
> I think the idea here was to check if the first char in file is "|" and
> omit the check (because we are building a pipe) for an existent
> directory in the else path.
>
>> The issue is that the size_t is an unsigned quantity and the find functions
>> do not return -1 on failure as do regular C functions.  Instead they return
>> std::string::npos (a very large number) when they fail to find the search term.
>>
>> This was easily corrected to
>>
>> -- Code --
>> size_t pos = file.find_first_not_of ("|");
>> if (pos != std::string::npos)
>>   file = file.substr (pos);
>> else
>> -- End Code --
>
> Please correct me if I'm wrong but shouldn't this then be
>
> -- Code --
> size_t pos = file.find_first_not_of ("|");
> if (pos != std::string::npos && pos > 0)
> -- End Code --


I didn't check the purpose of the code, but if that is the case, shouldn't it be simpler to just write

-- Code --
if (! file.empty () && file[0] == '|')
  file = file.substr (1);  // Strip leading pipe character
else
-- End Code --

This avoids playing around with the position entirely.

I made that change to both drawnow and in __osmesa_print__ and the latter now passes the built-in self tests.  See cset http://hg.savannah.gnu.org/hgweb/octave/rev/6b7aee95c54c

--Rik

Reply | Threaded
Open this post in threaded view
|

Re: C++ string::find functions and size_t

Rik-4
In reply to this post by Jordi Gutiérrez Hermoso-2
On 02/27/2015 10:52 AM, Jordi Gutiérrez Hermoso wrote:

> On Fri, 2015-02-27 at 09:47 -0800, rik wrote:
>> According to the C++ reference, the find functions actually return
>> an object of std::string::size_type. Normally, maybe even
>> definitionally, this is the same as std::size_t. Octave code,
>> however, assumes that the bare size_t is the same as std::size_t.
> ::size_t is inherited from C and it's the type of the sizeof()
> operator, while std::size_t is, according to gospel, "an
> implementation-defined unsigned integer type that is large enough to
> contain the size in bytes of any object."
>
> I don't see how these two things can be different, and they certainly
> are not in the gcc implementation.
>
> - Jordi G. H.

I made all of our uses consistent by just using the bareword size_t in this
cset (http://hg.savannah.gnu.org/hgweb/octave/rev/d575cd1e0da7).

--Rik


Reply | Threaded
Open this post in threaded view
|

Re: C++ string::find functions and size_t

Daniel Sebald
In reply to this post by John W. Eaton
On 02/27/2015 05:56 PM, John W. Eaton wrote:

> On 02/27/2015 06:37 PM, Andreas Weber wrote:
>> Am 27.02.2015 um 18:47 schrieb rik:
>>> 2/27/15
>>>
>>> Dan Sebald found a subtle bug in the use of the C++ string find
>>> functions
>>> that was leading to segfaults.
>>>
>>> -- Code --
>>> size_t pos = file.find_first_not_of ("|");
>>> if (pos > 0)
>>> file = file.substr (pos);
>>> else
>>> -- End Code --
>>
>> I think the idea here was to check if the first char in file is "|" and
>> omit the check (because we are building a pipe) for an existent
>> directory in the else path.
>>
>>> The issue is that the size_t is an unsigned quantity and the find
>>> functions
>>> do not return -1 on failure as do regular C functions. Instead they
>>> return
>>> std::string::npos (a very large number) when they fail to find the
>>> search term.
>>>
>>> This was easily corrected to
>>>
>>> -- Code --
>>> size_t pos = file.find_first_not_of ("|");
>>> if (pos != std::string::npos)
>>> file = file.substr (pos);
>>> else
>>> -- End Code --
>>
>> Please correct me if I'm wrong but shouldn't this then be
>>
>> -- Code --
>> size_t pos = file.find_first_not_of ("|");
>> if (pos != std::string::npos && pos > 0)
>> -- End Code --
>
> Yes, I think that's correct. I also thought maybe it would be clearer to
> write if (file.length () > 0 && file[0] == '|'), then just extract get
> file.substr (1) as the command. Shouldn't that work? It looks simpler to
> me.

After thinking about it, a change there would be good.  The logic
associated with finding "anything but" makes the conditional a little
unusual.  I'm not sure even that is foolproof though.  (If the user's
not going to enter this string, then it doesn't matter I suppose.)  What
about cases like " | foo.out", if that is a possible user entry somehow.
  Maybe there should be:

   size_t pos = file.find_first_not_of (" ");
   if (pos != std::string::npos)
     {
       if (file[pos] == '|' && file.length () > (pos + 1) &&
           file.find_first_not_of (" ", ) != std::string::npos)
         {
           file = file.substr (pos);
         }
       else if (file[pos] != '|')
         {
           pos = file.find_last_of (file_ops::dir_sep_chars ());
           [etc]
         }
       else
         error[empty pipe spec, obviating "broken pipe" error]
     }
   else
     error[empty output, obviating "broken pipe" error]


And I'm not sure that fully does it still because there is this:

   pos = file.find_last_of (file_ops::dir_sep_chars ());

which I think is meant to find a device file (??).  What if a file name
is given but there is no directory character?  Or does there have to be
a directory character by nature of the OS file system?

Dan

Reply | Threaded
Open this post in threaded view
|

Re: C++ string::find functions and size_t

Daniel Sebald
In reply to this post by Rik-4
On 02/27/2015 06:30 PM, Rik wrote:

> On 02/27/2015 03:37 PM, Andreas Weber wrote:
>>  Am 27.02.2015 um 18:47 schrieb rik:
>> > 2/27/15
>> >
>> > Dan Sebald found a subtle bug in the use of the C++ string find functions
>> > that was leading to segfaults.
>> >
>> > -- Code --
>> > size_t pos = file.find_first_not_of ("|");
>> > if (pos > 0)
>> >   file = file.substr (pos);
>> > else
>> > -- End Code --
>>
>>  I think the idea here was to check if the first char in file is "|" and
>>  omit the check (because we are building a pipe) for an existent
>>  directory in the else path.
>>
>> > The issue is that the size_t is an unsigned quantity and the find
> functions
>> > do not return -1 on failure as do regular C functions.  Instead they
> return
>> > std::string::npos (a very large number) when they fail to find the
> search term.
>> >
>> > This was easily corrected to
>> >
>> > -- Code --
>> > size_t pos = file.find_first_not_of ("|");
>> > if (pos != std::string::npos)
>> >   file = file.substr (pos);
>> > else
>> > -- End Code --
>>
>>  Please correct me if I'm wrong but shouldn't this then be
>>
>>  -- Code --
>>  size_t pos = file.find_first_not_of ("|");
>>  if (pos != std::string::npos && pos > 0)
>>  -- End Code --
>
> I didn't check the purpose of the code, but if that is the case,
> shouldn't it be simpler to just write
>
> -- Code --
> if (! file.empty () && file[0] == '|')
>    file = file.substr (1);  // Strip leading pipe character
> else
> -- End Code --
>
> This avoids playing around with the position entirely.
>
> I made that change to both drawnow and in __osmesa_print__ and the
> latter now passes the built-in self tests.  See cset
> http://hg.savannah.gnu.org/hgweb/octave/rev/6b7aee95c54c

+          if (! file.empty () && file[0] == '|')
              {
                // create process and pipe gl2ps output to it
-              std::string cmd = file.substr (pos);
+              std::string cmd = file.substr (1);

These two cases will probably cause a problem here:

"|"
"|   "

In the first case there is only file.substr[0], no file.substr[1].  In
the latter, well I guess it's just an empty command and not a seg fault
and for the drawnow() command would end up being a 'broken pipe' because
it's empty.  Again, only important if these are user-specified strings.

Dan

Reply | Threaded
Open this post in threaded view
|

Re: C++ string::find functions and size_t

Konstantinos Poulios
In reply to this post by Rik-4
Ooops! It is actually me to blame for this bug with

http://hg.savannah.gnu.org/hgweb/octave/diff/2f0d1e12806d/src/graphics.cc

Sorry for that and thanks for fixing it.
Kostas

On Fri, Feb 27, 2015 at 6:47 PM, rik <[hidden email]> wrote:
2/27/15

Dan Sebald found a subtle bug in the use of the C++ string find functions
that was leading to segfaults.

-- Code --
size_t pos = file.find_first_not_of ("|");
if (pos > 0)
  file = file.substr (pos);
else
-- End Code --

The issue is that the size_t is an unsigned quantity and the find functions
do not return -1 on failure as do regular C functions.  Instead they return
std::string::npos (a very large number) when they fail to find the search term.

This was easily corrected to

-- Code --
size_t pos = file.find_first_not_of ("|");
if (pos != std::string::npos)
  file = file.substr (pos);
else
-- End Code --

This made me a bit paranoid so I grep'ed through the code and found three
other instances where Octave was using incorrect comparisons or arithmetic
on size_t values returned from find functions.  See cset
(http://hg.savannah.gnu.org/hgweb/octave/rev/ed5ee3f610db).  At this point,
the test for __osmesa_print__ fails, which is probably good and probably
related to the segfault that has been reported with this test
(https://savannah.gnu.org/bugs/?44360).

Final question on this topic.  According to the C++ reference, the find
functions actually return an object of std::string::size_type.  Normally,
maybe even definitionally, this is the same as std::size_t.  Octave code,
however, assumes that the bare size_t is the same as std::size_t.  Should
we care about this?  The gl2ps-render.cc code, that maybe we got from
somewhere else originally, does use std::size_t.

--Rik