Saturday, January 21, 2012

Stats Saturday 1 - MySQL and R

Fig. 1 - QQ-plot of pretest reaction times
I've decided to try and devote my Saturday mornings to working out the technical bits I need for my dissertation - which, at this stage of the project, means stats stats stats.

Although I'm not sure if I'm going to end up using R to do all the stats for my dissertation - but I'm currently using it poke at my dataset a little.

Today I wanted to accomplish two things. I wanted to get R connected to the MySQL database storing all of my raw data, and I wanted to work out how to generate a QQ-plot from that data.

This was all surprisingly easy ...


Step 1 - load the library

MySQL connectivity to R is provided through the CRAN package, rMySQL.
To import the library you merely need to run the command
library(RMySQL)
You shouldn't have any problems unless you haven't actually got the library installed - I had to "apt-get install r-cran-rmysql" under Ubuntu to get it.

Step 2 - get a connection object

Next, we connect to the database and grab a connection object that will be used in subsequent interactions with the DB. We use the command
conn <- dbConnect(MySQL(), user="USER", password="PASSWORD", dbname="DBNAME")
Where "conn" is the connection object, USER is your username, PASSWORD the db password, and DBNAME the name of the database you're trying connecting to (I'm assuming you're connecting to localhost here)

There are a bunch of ways you can set these connections up - check this out (it's a PDF by the way...).

Step 3 - run a query

This part was surprisingly easy - there are a couple of ways you can do this, including stepping through results in stages, but I was just interested in reading the output of a db query into an R dataframe straight.
data <- dbGetQuery(conn,"select ur.user_name,avg(resp_time_per_deg) as resptime \
from tetris4km_mr_test_data as td \
inner join tetris4km_user_roles as ur \
on td.student_number = ur.user_name \
where td.pretest_posttest = 'pretest' and td.reason_excluded is null \
and td.correct_incorrect = 'correct' and td.resp_time_per_deg < 200 \
group by ur.user_name")
The dbGetQuery() function accepts two arguments, the first being our connection we made in step 2, the second being the query itself. In my query here, I'm interested in pulling out my participants' student numbers and their average reaction time on their mental rotation pre-test.
dbGetQuery() then returns a dataframe with our data. We'll use this for our QQ-plot.

Step 4 - do stuff with your data (QQ-plot)

So I wanted to do a QQ-plot of this data to perform a visual test of normality. I simply ran the following commands
qqnorm(data$resptime);qqline(data$resptime)
and got the lovely fig. 1

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.