associative array

classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|

associative array

Francesco Potortì
I need an associative array, and I tried with structs, with the field
names as keys.

Unfortunately it clearly slows down with size.  I have about 30000 keys,
for each I first check if it's already there, if not I create it, if
it's there I add some data.  The number of operations (insertion +
updates) is around 55000.  The slowdown appears to be quadratic (but I
have not made any measurement, in fact).

Are there more efficient ways to build and update an associative array?

--
Francesco Potortì (ricercatore)        Voice:  +39.050.621.3058
ISTI - Area della ricerca CNR          Mobile: +39.348.8283.107
via G. Moruzzi 1, I-56124 Pisa         Skype:  wnlabisti
(gate 20, 1st floor, room C71)         Web:    http://fly.isti.cnr.it


Reply | Threaded
Open this post in threaded view
|

Re: associative array

Andreas Weber-6
Am 26.03.20 um 21:00 schrieb Francesco Potortì:
> I need an associative array, and I tried with structs, with the field
> names as keys.

Perhaps https://github.com/markuman/go-redis

Online documentation:
https://markuman.gitlab.io/go-redis/usage/basics.html

I would give it a try.
Of course you can host redis on your own host
-- Andy


Reply | Threaded
Open this post in threaded view
|

Re: associative array

Juan Pablo Carbajal-2
In reply to this post by Francesco Potortì
> Are there more efficient ways to build and update an associative array?
What is the value associated with the key? How flexible you are?
For numeric values, you can do this with 2 arrays (value and key, keys
might be a string array), or 1 array and 1 cell.
The cost of operations in keys inherit the cost of string arrays or
cells. The cost of operations in values inherit the costs of numeric
arrays.

You can create a very simple class to put the two things together, but
that's just nice API, nothing else.

I would expect that struct are close to std::map in their complexity
[1], N^2 would be pretty far away from that.

[1]: https://stackoverflow.com/questions/21740893/what-is-the-time-complexity-of-stdmap


Reply | Threaded
Open this post in threaded view
|

Re: associative array

José Abílio Matos
In reply to this post by Francesco Potortì

On Thursday, 26 March 2020 20.00.39 WET Francesco Potortì wrote:

> I need an associative array, and I tried with structs, with the field

> names as keys.

>

> Unfortunately it clearly slows down with size. I have about 30000 keys,

> for each I first check if it's already there, if not I create it, if

> it's there I add some data. The number of operations (insertion +

> updates) is around 55000. The slowdown appears to be quadratic (but I

> have not made any measurement, in fact).

>

> Are there more efficient ways to build and update an associative array?

 

How about containers.Map:

https://octave.org/doc/v5.2.0/containers_002eMap.html

 

--

José Matos



Reply | Threaded
Open this post in threaded view
|

Re: associative array

nrjank
Administrator
In reply to this post by Francesco Potortì


Are there more efficient ways to build and update an associative array?


cheating and looking at matlab documentation, they point to Map containers objects as being specifically built for key-value pair mapping, and it appears Octave has a compatible implementation.  see:


Reply | Threaded
Open this post in threaded view
|

Re: associative array

Andrew Janke-2
In reply to this post by José Abílio Matos


On 3/26/20 4:25 PM, José Abílio Matos wrote:

> On Thursday, 26 March 2020 20.00.39 WET Francesco Potortì wrote:
>
>> I need an associative array, and I tried with structs, with the field
>
>> names as keys.
>
>>
>
>> Unfortunately it clearly slows down with size. I have about 30000 keys,
>
>> for each I first check if it's already there, if not I create it, if
>
>> it's there I add some data. The number of operations (insertion +
>
>> updates) is around 55000. The slowdown appears to be quadratic (but I
>
>> have not made any measurement, in fact).
>
>>
>
>> Are there more efficient ways to build and update an associative array?
>
>  
>
> How about containers.Map:
>
> https://octave.org/doc/v5.2.0/containers_002eMap.html

That's a reasonable idea, but I'm curious as to why struct itself is
slow. I would think that struct lookups for get or set would be O(1)
using a hash on the string key (or some other symbol-level identifier).
Are structs not hashtables over the string key values?

Cheers,
Andrew


Reply | Threaded
Open this post in threaded view
|

Re: associative array

Francesco Potortì
In reply to this post by Juan Pablo Carbajal-2
>> Are there more efficient ways to build and update an associative array?

>What is the value associated with the key? How flexible you are?

The key is a numeric string, which I treat as a string.

>For numeric values, you can do this with 2 arrays (value and key, keys
>might be a string array), or 1 array and 1 cell.

The value is complex: it is a struct with four filds, two of which are
short arrays.

>The cost of operations in keys inherit the cost of string arrays or
>cells. The cost of operations in values inherit the costs of numeric
>arrays.

Finding an element in an array is fast only if the array is sorted, and
I do not know whether inserting into a sorted array would be faster than
adding a field to a struct.

>I would expect that struct are close to std::map in their complexity
>[1], N^2 would be pretty far away from that.

That's what made me wonder.  Maybe the cost of adding a new field is
high?

--
Francesco Potortì (ricercatore)        Voice:  +39.050.621.3058
ISTI - Area della ricerca CNR          Mobile: +39.348.8283.107
via G. Moruzzi 1, I-56124 Pisa         Skype:  wnlabisti
(gate 20, 1st floor, room C71)         Web:    http://fly.isti.cnr.it


Reply | Threaded
Open this post in threaded view
|

Re: associative array

Francesco Potortì
In reply to this post by nrjank
>> Are there more efficient ways to build and update an associative array?

> cheating and looking at matlab documentation, they point to Map containers
>objects as being specifically built for key-value pair mapping, and it
>appears Octave has a compatible implementation.  see:
>https://octave.org/doc/v5.2.0/containers_002eMap.html

Yes, I had seen that by looking for "hash" in the documentation, whose
index contain an "hash table" entry.  The manual does not explain how to
use it, so I have neglected it.

Thank you.  If the slowness turns out ot be unbearable I will consider it.

--
Francesco Potortì (ricercatore)        Voice:  +39.050.621.3058
ISTI - Area della ricerca CNR          Mobile: +39.348.8283.107
via G. Moruzzi 1, I-56124 Pisa         Skype:  wnlabisti
(gate 20, 1st floor, room C71)         Web:    http://fly.isti.cnr.it


Reply | Threaded
Open this post in threaded view
|

Re: associative array

Andrew Janke-2


On 3/26/20 6:07 PM, Francesco Potortì wrote:

>>> Are there more efficient ways to build and update an associative array?
>
>> cheating and looking at matlab documentation, they point to Map containers
>> objects as being specifically built for key-value pair mapping, and it
>> appears Octave has a compatible implementation.  see:
>> https://octave.org/doc/v5.2.0/containers_002eMap.html
>
> Yes, I had seen that by looking for "hash" in the documentation, whose
> index contain an "hash table" entry.  The manual does not explain how to
> use it, so I have neglected it.
>
> Thank you.  If the slowness turns out ot be unbearable I will consider it.
>

FWIW: Matlab's containers.Map is a bummer, because it's a
pass-by-reference handle object, so it doesn't play well with normal
Matlab/Octave objects and data structures. I've found that this
limitation makes it basically useless for M-code programming over the
last couple years.

It would be nice if there were a pass-by-value/COW version of
containers.Map. I know that Octave core is shy of "innovating" in
non-Matlab-compatible ways, but maybe an exception should be made here.

Cheers,
Andrew


Reply | Threaded
Open this post in threaded view
|

Re: associative array

Francesco Potortì
In reply to this post by Andrew Janke-2
Francesco Potortì:
>>> I need an associative array, and I tried with structs, with the field
>>> names as keys.
>>> Unfortunately it clearly slows down with size. I have about 30000 keys,
>>> for each I first check if it's already there, if not I create it, if
>>> it's there I add some data. The number of operations (insertion +
>>> updates) is around 55000. The slowdown appears to be quadratic (but I
>>> have not made any measurement, in fact).
>>> Are there more efficient ways to build and update an associative array?
José Abílio Matos:
>> How about containers.Map:
>> https://octave.org/doc/v5.2.0/containers_002eMap.html
Andrew Janke:
>That's a reasonable idea, but I'm curious as to why struct itself is
>slow. I would think that struct lookups for get or set would be O(1)
>using a hash on the string key (or some other symbol-level identifier).
>Are structs not hashtables over the string key values?

I read somewhere that they are, that's why I used a struct in the first
place.  I have now profiled my code and it turns out that it spends 91%
of time on isfield, which is what I suspected.  After instrumenting it
to measure time, it turns out that the time grows linearly with struct
size, while it should stay constant.

Then I tried to build a small demo program showing this, but no, here
the time is almost constant with struct size, as expected from a hash
table.

Now the only thing is to take my program and simplify it until I
demonstrate the problem, but that would be a long task :(

--
Francesco Potortì (ricercatore)        Voice:  +39.050.621.3058
ISTI - Area della ricerca CNR          Mobile: +39.348.8283.107
via G. Moruzzi 1, I-56124 Pisa         Skype:  wnlabisti
(gate 20, 1st floor, room C71)         Web:    http://fly.isti.cnr.it


Reply | Threaded
Open this post in threaded view
|

Re: associative array

Francesco Potortì
I managed to recreate the problem with a simple program.  It shows that
isfield execution time grows linearly with the size of the struct if
isfield is called after creating a new field.  The printout (times and
time differences) shows that indeed the total time grows quadratically
with struct size.

Unless someone comes out with a reasonable explanation, I'll signal this
as a bug in isfield.

======================= isfield_bench.m ========================
e = 4;
a = struct;
t = zeros(1,10);

profile off; profile clear; profile on
for i = 1:10
  for j = 1:10^(e-1)
    f = sprintf("%.11f",rand);
    if !isfield(a,f)
      a.(f) = [];
    endif
  end
  t(i) = cputime;
end
profile off; profshow(4)

numfields(a)
[t(2:end)'-t(1), diff(t')]
================================================================

octave> isfield_bench
   # Function Attr     Time (s)   Time (%)        Calls
-------------------------------------------------------
   5  isfield            26.787      98.65        10000
   4  sprintf             0.178       0.66        10000
   3     rand             0.171       0.63        10000
   6 prefix !             0.018       0.07        10000
ans =  10000
ans =
    1.0021    1.0021
    2.5191    1.5170
    4.5951    2.0760
    7.2258    2.6308
   10.4104    3.1846
   14.1473    3.7369
   18.4348    4.2875
   23.2898    4.8551
   28.6780    5.3882

--
Francesco Potortì (ricercatore)        Voice:  +39.050.621.3058
ISTI - Area della ricerca CNR          Mobile: +39.348.8283.107
via G. Moruzzi 1, I-56124 Pisa         Skype:  wnlabisti
(gate 20, 1st floor, room C71)         Web:    http://fly.isti.cnr.it


Reply | Threaded
Open this post in threaded view
|

Re: associative array

Francesco Potortì
In reply to this post by Andreas Weber-6
Francesco Potortì:
>> I need an associative array, and I tried with structs, with the field
>> names as keys.

Andreas Weber:
>Perhaps https://github.com/markuman/go-redis
>
>Online documentation:
>https://markuman.gitlab.io/go-redis/usage/basics.html
>
>I would give it a try.
>Of course you can host redis on your own host

Thank you.  For other curios people:

https://redis.io/topics/introduction

Redis is an open source (BSD licensed), in-memory data structure store,
used as a database, cache and message broker. It supports data
structures such as strings, hashes, lists, sets, sorted sets with range
queries, bitmaps, hyperloglogs, geospatial indexes with radius queries
and streams. Redis has built-in replication, Lua scripting, LRU
eviction, transactions and different levels of on-disk persistence, and
provides high availability via Redis Sentinel and automatic partitioning
with Redis Cluster.

--
Francesco Potortì (ricercatore)        Voice:  +39.050.621.3058
ISTI - Area della ricerca CNR          Mobile: +39.348.8283.107
via G. Moruzzi 1, I-56124 Pisa         Skype:  wnlabisti
(gate 20, 1st floor, room C71)         Web:    http://fly.isti.cnr.it