I recently devised a method to perform Monte Carlo simulations with Open Office Calc. I have long considered this a complicating feature of Open Office Calc, and would welcome comments that improve upon what I produced below. By perform simulations, I mean my procedure produces a data table with the following qualities:
- Rows represent simulation results of my desired model
- Columns represent individual outcome variables of interest
- Multiple.Operations is used to establish the data table
- The results in each cell entry are based on random numbers applied to individual variables within my model
- The entire table recalculates for any change to a model parameter, automatic or manual at my choice
Once such data table is established, I then perform my desired statistical analysis.
In the Microsoft Excel environment, the data table results are achieved by using the RAND() function inside of the model calculations, and a data table. The issues with Calc were appropriately stated by Nathan Fleming is his description of using OpenOffice Calc for ESD.70:
The Monte Carlo simulation is performed by a combination of the RAND() function and the data table feature in Excel. Calc has the RAND() function, and is able to replicate the data table feature using a feature called ‘multiple operations’; however, when Calc tries to create a data table using a formula based on RAND(), it fills the data table with identical values of output. It is supposed to produce varied results by having varied the inputs to the formula using the RAND() function—this is what is does in Excel. Two thousand identical outputs do not make a Monte Carlo simulation.
I encourage others to please comment on these steps, as I sure it can be improved upon.
My process of discovery
- I first established my model, using the Rand() function, making note of the variables that include the function. A sample function would be
INDEX(B2:B8;MATCH(RAND();A2:A8;1))
Where A2:A8 contains cumulative probabilities that the outcomes in B2:B8 occur.
- I create a data table of random numbers, I used rows for a simulation number, and columns for variables. This table can be created from copying and paste special values for the RAND() function, or from any random number table. The number of entries will be equal to the number of random numbers required by the model, for example, 2000 simulations with 3 variables will be 6000 random numbers in a table 2000 rows and 3 columns. Each row and column should be labeled with a unique identifier. I use simple numbers in ascending order to make the match functions work efficiently. Assume for this example, that the 6000 random numbers are placed in cells AA2:AC2001, the simulations are numbered 1 to 2000 in cells Z2:Z2001, and the varibles are numbered 1, 2, and 3 in cells AA1:AC1
- For each formula with a RAND() function, I select an identifier corresponding to the column identifier in step 2. Assume in this example that cell D2 contains the variable identifier.
- I select an open cell and use it as an indicator of the simulation number run through the model calculatins. Assume in this example, cell D1 contains the simulation number.
- I rewrite all formulas replacing the RAND() function, with another index function that looks up the random number from the table created in step 2. This index function will match the row to the simulation number (selected cell in step 4), and the column to the variable identifier in step 3. Hence, the formula in step one becomes:
INDEX(B2:B8;MATCH(INDEX(AA2:AC2001;MATCH(D1;Z2:Z2001;1);MATCH(D2;AA1:AC1;1));A2:A8;1))
These steps, in effect, remove RAND() from the model, by using the index function to return a random number identified by the simulation and varible identifier. Thus, variables with different identifiers, have independent random numbers, and each simulation uses one and only one set of random numbers. And now, the Calc multiple.operations feature works as intended to produce the desired data table. To complete my data table, I make a list in a column of the desired number of simulations, in this example my first column contains a simple list of 1 to 2000. I perform multiple.operations function with a single row input, D1 for my simulation number, and select the output variables of my desire. The result is a data table with the qualities described above.
The core of this procedure, is to make transparent the random number used in the model, rather than to leave them as part of the internal calculation through the use of the RAND() function inside the formula. These steps are simplified in the model design process by knowing how many simulations your experiment will be based on and the number of variables desired for the model. Thus the formula replacement process is omitted and initial formulas are prepared to call the appropriate random number stream for each variable independently.