Stock Scraping in R: The Comprehensive Step-by-Step Tutorial

Walker Burgin
4 min readSep 4, 2021

When I was starting this project, I could find barely anything online on how to scrape, import, organize, and export stock stats for technical and fundamental analysis — I’ve spared you most of the frustration. This tutorial will show you how to read stock data for hundreds — or thousands, if you wish — of market-listed companies, and will export it into an excel doc with each ticker symbol as a separate column. I used finviz.com as a data source (feel free to change that, if you know how to).

Let’s get down to it. I wrote my program in RStudio, and had the ticker symbols of all the companies I wanted to scrape data for in one column of an excel sheet. Of course, you could just enter them manually into RStudio as a list, if you wish. Here’s my excel doc where I list out the tickers of all the companies I’m interested in:

Notice that I name the document ‘tickers’. This will be important for retrieving it in our program.

Now, if you haven’t already downloaded RStudio or another console that can run the R language, here’s a link to their page: https://www.rstudio.com/products/rstudio/download/.

Ok, now that we have RStudio all up and running, we’re going to need to use four libraries: XML, the Tidyverse, RVest, and readXL.

Scraping and Retrieving Stock Data

Let’s begin by reading the excel document that we previously created, which contains all of the stock tickers that we’re interested in. This is the step where you can bypass the excel doc (if you choose to) and just create a list of tickers.

Let’s unpack the for statement. For each ticker symbol, the program is visiting finviz’s specific webpage for that stock, where we are greeted by the lovely datatable shown below:

This is finviz.com’s datatable for Apple (AAPL).

Our program is reading each table, and assigning all of its data to a specific data frame, each automatically labeled by the company’s ticker symbol.

Now, let’s create an aggregate matrix, where we can view all of the data — all at once. This is what we name stockdatalist.

Filtering for Specific Statistics

By now, the majority of the work has already been completed. We have the data, it’s stored, now we just gotta decide which statistics we want — do we want every single entry in finviz’s datatable?

I chose 25 specific stats, which are abbreviated below.

I made each of these statistics into its own column in a final data frame, which I named cc:

The last step of the program is to actually retrieve these statistics from their location in stockdatalist. An important note: the numbers that are associated with stockdatalist follow the pattern [[ row-1 , column ]]. That’s because statistics that were on the first row were automatically assigned as the column name, which we fix for insidr and perfWk below:

The rm() line removes all the extraneous data, and finally we open our documents at our chosen location to find this beautiful excel output:

Each ticker symbol that we specified is right here, along with every statistic that we wanted to find. All ready for technical and/or fundamental analysis!

If you loved this article, please share and follow me on Medium! Thanks for reading!

-Walker

--

--

Walker Burgin

Junior at UNC-Chapel Hill, interested in too many things for too little time.