Re: character matrix inputs to string functions

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: character matrix inputs to string functions

Rik-4
On 02/20/2020 09:00 AM, [hidden email] wrote:
Subject:
Re: line continuations
From:
José Abílio Matos [hidden email]
Date:
02/20/2020 08:49 AM
To:
[hidden email]
List-Post:
[hidden email]
Content-Transfer-Encoding:
quoted-printable
Precedence:
list
MIME-Version:
1.0
References:
[hidden email] <2990348.2e00OfsEDc@myth> [hidden email]
In-Reply-To:
[hidden email]
Message-ID:
<9536545.1W78yDfZZB@myth>
Content-Type:
text/plain; charset="iso-8859-1"
Message:
4

On Thursday, 20 February 2020 14.35.19 WET Nicholas Jankowski wrote:
can you give a code example of what produces an error in matlab but not
octave? i may be misunderstanding your earlier comments.
c = ['1 2 3 '; '4 5 6 ']

strrep(c, '2', '0')

The call to strrep fails in Matlab since c is not a char array and it succeeds 
in Octave (with an anti-intuitive result IMHO). My proposal is to do the same 
in Octave.

[m,n] = size(c);

If m != 1 and n!= 1 then throw an error. I hope that this now makes sense. :-)

Regards,
-- José Matos

There is a much larger issue which should be resolved and that is how character matrix inputs should be handled by all string functions.

Consider this example,

octave:1> cstr = { 'Hello World' ; 'Goodbye World '}
cstr =
{
  [1,1] = Hello World
  [2,1] = Goodbye World
}

octave:2> strrep (cstr, 'World', 'Jane')
ans =
{
  [1,1] = Hello Jane
  [2,1] = Goodbye Jane
}

This does just what you would think.  Now try the same thing with a character matrix.

octave:3> chmat = char ('Hello World', 'Goodbye World')
chmat =

Hello World 
Goodbye World

octave:4> strrep (chmat, 'World', 'Jane')
ans =

Hello World 
Goodbye World

There is no substitution because the internal algorithm sees a string that is "HGeolo...."  In any case, the average user is going to be quite surprised by the apparent failure of the strrep function.  Restricting character input to be a row vector (1xN) restores the correct behavior of the function.

octave:5> strrep (chmat(1,:), 'World', 'Jane')
ans = Hello Jane 

So, I think we (Octave community) need to make a decision about how we want to handle character matrix inputs and then propagate this change to all of the m-files in scripts/strings.

One obvious possibility is simply to follow Matlab and increase the level of input validation to reject character matrices.  The validation code is pretty simple

if (ischar (input))
  if (! isrow (input))
    error ("fcn_name: input must be a character string or cell array of strings");
  endif
elseif (iscellstr (input))
  ...
else
  error ("fcn_name: input must be a character string or cell array of strings");
endif

But Octave does try to see itself as a superset of Matlab.  We don't have to follow them slavishly.  In this case we could change the input validation to detect the character matrix and call the function recursively.  For example, this code converts the character matrix to a cell array of string, executes strrep, and converts the output back to a character matrix.

if (ischar (input))
  if (! isrow (input))
    retval = char (strrep (cellstr (input), pattern, replacement)));
    return;
  endif
endif

And it works,

octave:7> char (strrep (cellstr (chmat), 'World', 'Jane'))
ans =

Hello Jane 
Goodbye Jane

Anyone want to comment on which approach they like and why?

--Rik

Reply | Threaded
Open this post in threaded view
|

Re: character matrix inputs to string functions

apjanke-floss


On 2/20/20 1:22 PM, Rik wrote:

> On 02/20/2020 09:00 AM, [hidden email] wrote:
>>
>> On Thursday, 20 February 2020 14.35.19 WET Nicholas Jankowski wrote:
>>> can you give a code example of what produces an error in matlab but not
>>> octave? i may be misunderstanding your earlier comments.
>> c = ['1 2 3 '; '4 5 6 ']
>>
>> strrep(c, '2', '0')
>>
>> The call to strrep fails in Matlab since c is not a char array and it succeeds
>> in Octave (with an anti-intuitive result IMHO). My proposal is to do the same
>> in Octave.
>>
>> [m,n] = size(c);
>>
>> If m != 1 and n!= 1 then throw an error. I hope that this now makes sense. :-)
>>
>> Regards,
>> -- José Matos
>
> There is a much larger issue which should be resolved and that is how
> character matrix inputs should be handled by all string functions.
>
> Consider this example,
>
> octave:1> cstr = { 'Hello World' ; 'Goodbye World '}
> cstr =
> {
>    [1,1] = Hello World
>    [2,1] = Goodbye World
> }
>
> octave:2> strrep (cstr, 'World', 'Jane')
> ans =
> {
>    [1,1] = Hello Jane
>    [2,1] = Goodbye Jane
> }
>
> This does just what you would think.  Now try the same thing with a
> character matrix.
>
> octave:3> chmat = char ('Hello World', 'Goodbye World')
> chmat =
>
> Hello World
> Goodbye World
>
> octave:4> strrep (chmat, 'World', 'Jane')
> ans =
>
> Hello World
> Goodbye World
>
> There is no substitution because the internal algorithm sees a string
> that is "HGeolo...."  In any case, the average user is going to be quite
> surprised by the apparent failure of the strrep function.  Restricting
> character input to be a row vector (1xN) restores the correct behavior
> of the function.
>
> octave:5> strrep (chmat(1,:), 'World', 'Jane')
> ans = Hello Jane
>
> So, I think we (Octave community) need to make a decision about how we
> want to handle character matrix inputs and then propagate this change to
> all of the m-files in scripts/strings.
>
> One obvious possibility is simply to follow Matlab and increase the
> level of input validation to reject character matrices.  The validation
> code is pretty simple
>
> if (ischar (input))
>    if (! isrow (input))
>      error ("fcn_name: input must be a character string or cell array of
> strings");
>    endif
> elseif (iscellstr (input))
>    ...
> else
>    error ("fcn_name: input must be a character string or cell array of
> strings");
> endif
>
> But Octave does try to see itself as a superset of Matlab.  We don't
> have to follow them slavishly.  In this case we could change the input
> validation to detect the character matrix and call the function
> recursively.  For example, this code converts the character matrix to a
> cell array of string, executes strrep, and converts the output back to a
> character matrix.
>
> if (ischar (input))
>    if (! isrow (input))
>      retval = char (strrep (cellstr (input), pattern, replacement)));
>      return;
>    endif
> endif
>
> And it works,
>
> octave:7> char (strrep (cellstr (chmat), 'World', 'Jane'))
> ans =
>
> Hello Jane
> Goodbye Jane
>
> Anyone want to comment on which approach they like and why?
>
> --Rik

In my view, a 2-D char matrix is a special case, which represents a list
or vector of "strings" as a blank-padded 2-D array, and should be
treated as an alternate physical representation of the same sort of
thing that a cellstr vector represents. Lots of Matlab and existing
Octave functions operate on them this way.

So in this case, I'd suggest that doing the strrep substitution on a
per-row basis would be a right and useful thing to do.

I don't think there's a big Matlab compatibility concern - things that
throw errors aren't really functional space that Matlab is claiming and
that is likely to be used in user code. So it's unlikely that defining
useful behavior here would break any user code.

Cheers,
Andrew

Reply | Threaded
Open this post in threaded view
|

Re: character matrix inputs to string functions

nrjank
Administrator


On Thu, Feb 20, 2020 at 2:28 PM Andrew Janke <[hidden email]> wrote:

I don't think there's a big Matlab compatibility concern - things that
throw errors aren't really functional space that Matlab is claiming and
that is likely to be used in user code. So it's unlikely that defining
useful behavior here would break any user code.



i agree, absent some need for it to error, there's no reason we can't keep extended function.  i can think of little reason for someone to want a replace operation on a string in a columnwise fashion. Char arrays like that are generally built 'horizontally' when they're representing strings in the first place, and a string replace would follow suit.  Unless I'm missing something, in which case there could be a 'colunmwise' flag that dosen't preserve size , but that sounds like a headache.

Reply | Threaded
Open this post in threaded view
|

Re: character matrix inputs to string functions

José Abílio Matos
On Thursday, 20 February 2020 19.55.36 WET Nicholas Jankowski wrote:
> i agree, absent some need for it to error, there's no reason we can't keep
> extended function.  i can think of little reason for someone to want a
> replace operation on a string in a columnwise fashion. Char arrays like
> that are generally built 'horizontally' when they're representing strings
> in the first place, and a string replace would follow suit.  Unless I'm
> missing something, in which case there could be a 'colunmwise' flag that
> dosen't preserve size , but that sounds like a headache.

FWIW I agree with Nicholas and Andrew. :-)
--
José Matos