# Matlab – Appending a vector to an empty MATLAB matrix

matlabmatrix

I've got MATLAB code inserting n-dimensional points (n >1) into a matrix (`myPointMatrix`) and am having thoughts about how to insert the first point.

Right now the program checks the size of `myPointMatrix` before inserting a point. If it is 1×1, `myPointMatrix` is set equal to the current point. Otherwise the current point is appended. This `if`-statement is only true once, but is evaluated each time I insert a point, which is very very often.

Removing the `if` and trying to append to `myPointMatrix` makes MATLAB understandably complain about matrix dimensions not being consistent. Removing both the `if`-statement and the inialization of `myPointMatrix = 0` causes MATLAB to find `myPointMatrix` undefined. Also understandable.

How do I initialize `myPointMatrix` so that I can remove the `if`-statement? Or is there some other smart solution?

``````myPointMatrix = 0;
for x=0:limit
for y=0:limit
for z=0:limit
tempPoint = [x y z];
if (length(myPointMatrix) == 1)
myPointMatrix = tempPoint;
else
myPointMatrix = [myPointMatrix; tempPoint];
end
end
end
end
``````

#### Best Solution

There are several ways to append a matrix or vector to any matrix, empty or not. Much depends upon the size of the matrix, and how often you will do the append. (Note that sparse matrices are an entirely different animal. They need to be deal with separately.)

The simple scheme would use concatenation. For example, I'll create a random array. While I know that one call to rand would be the proper solution here, I'm doing it only for comparison purposes.

``````n = 10000;
tic
A = [];
for i = 1:n
Ai = rand(1,3);
A = [A;Ai];
end
toc

Elapsed time is 9.537194 seconds.
``````

See that the time required was reasonably high, far more than had I just called rand directly.

``````tic,rand(n,3);toc
Elapsed time is 0.008036 seconds.
``````

Other ways to append are similar in time. For example, you can append by indexing too.

``````A = [];
A(end+1,:) = rand(1,3);
A
A =
0.91338      0.63236      0.09754
``````

This will be similar in terms of time to appending via concatenation. An interesting fact to understand is that appending new rows to an array is subtly different than appending new columns. It takes slightly more time to append a row than a column. This is because of the way elements are stored in MATLAB. Appending a new row means the elements must actually be shuffled in memory.

``````A = zeros(10000,3);
B = zeros(3,10000);

tic,for i = 1:100,A(end+1,:) = rand(1,3);end,toc
Elapsed time is 0.124814 seconds.

tic,for i = 1:100,B(:,end+1) = rand(3,1);end,toc
Elapsed time is 0.116209 seconds.
``````

The problem with any append operation at all is that MATLAB must reallocate the memory required for A, and do so EVERY time the matrix grows in size. Since the size of A grows linearly, the overall time required grows quadratically with n. So were we to double the size of n, the dynamically grown A will take four times as long to build. This quadratic behavior is why people tell you to pre-allocate your MATLAB arrays when they will be grown dynamically. In fact, if you look at the mlint flags in the editor, MATLAB warns you when it sees this happening.

A better solution, if you know the final size of A, is to pre-allocate A to its final size. Then just index in.

``````tic
A = zeros(n,3);
for i = 1:n
A(i,:) = rand(1,3);
end
toc

Elapsed time is 0.156826 seconds.
``````

While this is much better than the dynamically grown array, it is still FAR worse than a vectorized use of rand. So wherever possible, use the vectorized form of functions like this.

The problem is, sometimes you simply do not know how many elements you will end up with. There are still several tricks one can use to avoid the nasty quadratic growth.

One trick is to make a guess at the final size of A. Now, use indexing to insert new values into A, but watch carefully for when the new entries will spill over the bounds of A. When this is just about to happen, DOUBLE the size of A, appending one big block of zeros to the end. Now return to indexing new elements into A. Keep a separate count of how many elements have been "appended". At the very end of this process, delete the unused elements. This avoids much of the nasty quadratic behavior, since only a few append steps will ever be done. (Remember, you are doubling the size of A when you must do an append.)

A second trick is to use pointers. While MATLAB does not really offer much capability in the way of pointers, a cell array is a step in that direction.

``````tic
C = {};
for i = 1:n
C{end+1} = rand(1,3);
end
A = cat(1,C{:});
toc

Elapsed time is 3.042742 seconds.
``````

This took less time to accomplish than the grown array. Why? We were only building an array of pointers to the cells. A nice thing about this is if each append step has a variable number of rows, it still works nicely.

A problem with the cell array, is it is not terribly efficient when there are MILLIONS of elements to append. It is still a quadratic operation after all, because we are growing the array of pointers at each step.

A solution to that problem is to use an amalgam of the two styles shown above. Thus, define each cell of the cell array to be moderately large in size. Now use indexing to stuff new rows of A into the cell. When the current cell must be grown larger by the next append step, just add a new cell to the cell array.

Some years ago, this discussion arose on the MATLAB newsgroup, and several solutions along these lines were proposed. I posted the solutions growdata & growdata2 as files on the MATLAB Central File Exchange. Growdata2 used function handles to solve the problem:

``````tic
Ahandle = growdata2;
for i = 1:n
Ahandle(rand(1,3))
end
% unpack the object into a normal array
A = Ahandle();
toc

Elapsed time is 1.572798 seconds.
``````

At the time, it was a somewhat faster approach to use persistent variables.

``````tic
growdata
for i = 1:n
growdata(rand(1,3))
end
A = growdata;
toc

Elapsed time is 2.048584 seconds.
``````

Since then the implementation of function handles has clearly improved in MATLAB, so the function handle is now faster.

A virtue of these schemes is they will not have a quadratic performance penalty, while allowing millions of append steps.

Oh well, this is surely more information than was originally requested when the question was asked. Perhaps someone will get something out of it though.