Install R AnalyticFlow following Chapter 2, Installation of preview edition. See the section called “Setup” for initial setup, and launch the program.
At first read sample data for analysis.
In this tutorial "iris" data set is used from R sample data sets.
This is a famous data measuring iris flowers. For details, see Fisher (1936) or
type help(iris)
in R console.
Basically all the processes are described by creating nodes in R AnalyticFlow. Create a node to read a data set here. Click on "Node" > "Create Simple Node" from the menu.
A Simple node represents a process which can be expressed by a single R expression. Into "Code", input
data(iris)
as follows:
Click on "OK", then a node is created on the flow area.
The process is not executed by creating a node.
To execute a process described as a node, right-click on the node and click on "Run".
Now run on the data
node we created.
Then you will have the following output in the console window:
> data(iris) >
By running on a single node, the R code described in the node is sent to the console and executed.
Here you can see that data(iris)
has been executed and R is waiting for the next input.
R codes can directly be executed from the console window.
To execute the function head
to look into the data, input as follows:
Push enter key to execute the code. Then the first certain rows are displayed, so you can see that this data have four quantitative variables (height and width of petal, sepal) and one qualitative variable (species of iris). In such a situation as a quick check of data, direct execution from console is useful.
You can also create a simple node from the console, by pushing control key (command key on Mac) and enter key together after inputting code. You can take a trial and error on the console, and leave only necessary things in a flow.
Next return to the main window to add another node. Click on blank space of the flow area, and you can see the node created earlier as follows:
The brackets indicates that this node is the last one which are already excuted. Click on this node to make it selected.
If a new node is created when another node is selected, an edge (arrow) is automatically drawn from the selected node to a new node. Click on "Node" > "Create Simple Node" from the menu, and input the following code:
plot(iris[, 1:4], col = as.integer(iris$Species) + 1)
An edge is drawn automatically, resulting in the flow as follows:
On running on a node in a flow, all nodes in the path are executed in order, from the root node (the first node in the path).
If there is a node with brackets in the excution path, execution starts from the next node to the bracketed node.
Right-click on the plot
node we created, and run on it.
The graphic windows displays a figure as follows:
A scatterplot with four quantitative variables are drawn. The points indicate Species, which suggests that iris species may be well discriminated if these quantitative variables are used in an efficient way.
Now look at the console window. You can see that only plot
function was executed
followed by head
function we executed earlier.
> data(iris) > head(iris) Sepal.Length Sepal.Width Petal.Length Petal.Width Species 1 5.1 3.5 1.4 0.2 setosa 2 4.9 3.0 1.4 0.2 setosa 3 4.7 3.2 1.3 0.2 setosa 4 4.6 3.1 1.5 0.2 setosa 5 5.0 3.6 1.4 0.2 setosa 6 5.4 3.9 1.7 0.4 setosa > plot(iris[, 1:4], col = as.integer(iris$Species) + 1) >
This is because the data
is bracketed (already executed) so there is no need to execute the former path again.
If you want to execute all the nodes in the path, use "Clear and Run" in spite of "Run".
It clears all the objects on the workspace, and all the nodes on the path are executed from the root node to the node that is run.
The scatterplot suggests that Petal.Length varies widly according to Species.
Let us draw a boxplot to examine this relation closer.
Select the plot
node and crate a simple node with the following code:
boxplot(Petal.Width ~ Species, data = iris, col = 3, main = "Petal.Width")
Then the flow becomes as follows:
Since the boxplot
node comes next to the plot
node in this flow,
they will be executed in this order. It is natural as a process of exploratory analysis —
however, it is not necessary when you want to see each result separately.
Therefore we rearrange this flow so that the boxplot
node comes next to the data
node,
in the same way as the plot
node does.
As there are several ways to do this, the easiest way is simply drawing a new edge
from the data
node to the boxplot
.
First click on the data
node (the source of the new edge) to be selected:
Next center-click (or Alt + click) on the boxplot
(the destination of the new edge).
Then a new edge is drawn, replacing the existing edge.
To make it more eye-friendly, drug the boxplot
to reallocate it:
Now the edge replacement has been done.
Here the plot
node (which was executed at the last) does not come before the boxplot
node.
So when the flow is run on the boxplot
node, excution starts from the data
node.