RMarkdown: Working With Data and R Markdown

Generating documents from markdown is all well and good but one of the main draws of RMarkdown is the fact it can pull data from external sources.

While the main purpose of R Markdown is that it can run R in a document I personally rather working with python.

To run arbitrary python code you can make use of the python library called reticulate (installed in my R Markdown docker image).

Setup

First we need to run some R code to import the library we want and to setup the python virtual environment.

``` {r setup, include = FALSE}
library(reticulate)
virtualenv_create("my-proj")
py_install("matplotlib", envname="my-proj")
py_install("pandas", envname="my-proj")
use_virtualenv("my-proj")
```

Ok breaking this down we have the following:

{r setup, include = FALSE} - The braces indicate that the language specified needs to be executed at build time and include = FALSE hides the results and the code of the code block. Use this for setup code.
In R we load the reticulate library so we can use python.
We are creating a virtual env called my-proj. You can name this what you want, its not important because it is going to be created in the docker container and thrown away at the end of the build.
We are importing the matplotlib and pandas packages into python.
Finally we are telling python to use the my-proj virtual environment for the rest of this document.

Creating a Graph from Python

Next we can run a block of python code which will generate a graph we want to put on the page.

``` {python, echo = FALSE}
import matplotlib.pyplot as plt

time = [0, 1, 2, 3]
position = [0, 100, 200, 300]

plt.plot(time, position)
plt.xlabel('Time (hr)')
plt.ylabel('Position (km)')
```

Breaking this code chunk down:

The {python, echo = FALSE} as you have probably guessed executes python code in the block. The echo = FALSE is similar to the include = FALSE above. Instead of hiding the block altogether it only hides the code but will display the result (in this case a nice graph).
The remainder of the code is just a simple example of using matplotlib python library to create a graph.

Creating a table from Python

If you want to output a table from data gathered from a script you can do the following:

``` {python, include = FALSE}
import pandas as pd

mydata = [ {
        "Id": 1,
        "Message": "fooo"
    },
    {
        "Id": 2,
        "Message": "bar"
    }
    ]

pandadata = pd.DataFrame(data=mydata)
```

```{r, echo = FALSE}
kable(py$pandadata, caption="Data from python")
```

Breakdown of the code above:

You will see 2 blocks of code. The python block and the r block. We are generating data in python and then using the r block to display it.
You will also see the use of the pandas package. This lets us create a pandas data frame which r can turn into a table using kable. I actually like this because it keeps the data and presentation a little seperate.

Importing a csv file into a table

This is probably the easiest of them all. You just need to add the following code block.

```{r, echo = FALSE}
kable(read.csv("./test.csv", header = TRUE))
```

Final thoughts

AS you can see what we can do with this is pretty much limitless. You also have the option of generating data in the file system as part of a script and then pulling it in via regular markdown.

RMarkdown Series

01 - Replacing MS Office
02 - Setting up R Markdown
03 - How to do common word tasks in R Markdown
04 - Generating presentations in R Markdown and Reval.JS
05 - Working with data and R Markdown
06 - Generating flow charts
07 - Creating books in Bookdown
08 - Misc other tools
09 - Co-operating with other people