How to Use Chrome Puppeteer to Fake Googlebot & Monitor Your Site
Posted by Tom Pool on April 17, 2019
On Friday 12th April 2019, I spoke at one of the UK’s largest SEO events – BrightonSEO, as part of a really in-depth Technical SEO Session.
If you want to see my slides then click here.
Nils De Moor of WooRank talked about EdgeSEO, giving a fantastic introduction to the topic, and giving great examples of what is really possible with this potential new style of technical implementation.
Mike Osolinski covered Command Line Automation, giving a number of examples of how you can use it to really speed up your workflow, and automate “boring” tasks.
I spoke in between these two fantastic speakers, and as the second time that I have spoken at Brighton SEO, I was really looking forward to taking the stage!
Many thanks to Kelvin and the team at Rough Agenda for allowing me & many other speakers to present our ideas, and for putting on such an awesome event, twice a year in sunny Brighton (they must have a sweet deal with the weather gods!) The pre-party, main event, fringe events and post-event parties were amazing, and helped contribute to the best Brighton SEO that I have been to!
Since my talk I’ve received a number of requests for some clarification about usages of Chrome Puppeteer that I covered in my talk. As such, I’ve put together this quick introduction & breakdown of some of the code that I mentioned, that should help all to get started with utilising Chrome Puppeteer in your own projects.
This solely covers basic usage on Mac OS.
So how do you install Chrome Puppeteer?
To use Chrome Puppeteer on a mac, you first should install Homebrew. Homebrew simplifies installation of software on Mac OS.
To install Homebrew, open up a terminal window (Open up Finder > Applications > Utilities), and type in:
ruby -e “$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)”
This will take a couple of minutes to install, and the terminal window will display a bit of code. When Homebrew is installed, there are a couple of other steps that should be followed to enable usage of Chrome Puppeteer.
Within the same Terminal window, type in:
brew install node
Rather than going through the website and attempting to download directly from there, this utilises Homebrew (through the brew command) to install Node.js onto your machine. Node powers Chrome Puppeteer, and makes all that it can do possible.
When Node has installed, type in:
npm i puppeteer
This uses npm to install the Puppeteer library that we need to have to be able to carry out all of the mentioned & following examples. If that sounds technical, that is as bad as it gets – this just installs Puppeteer on your machine, and allows you to use it.
So how can I start doing Puppeteer stuff?
To start using Puppeteer to take screenshots, or webpage testing (among countless other things), you should open up a text editor. This is where we can write scripts that use Puppeteer to do things. I personally recommend Sublime Text – it’s free, and offers awesome functionality!
You should also be able to open up a terminal window and navigate to a folder within your machine. I recommend having all your Puppeteer scripts within the same folder structure, to allow for ease of access later on.
Open up the text editor, and you are ready to begin coding!!
Alternatively, you can access some basic code within this Google Drive folder, that you can download and save on your local machine.
The following code will enable you to take a screenshot of a page, and save it to a folder. You can edit the different areas as you want, to provide different results. Experimentation is encouraged, to see what you can get Puppeteer to do!
const puppeteer = require(‘puppeteer’);
This line imports the Puppeteer library into the script that we will be using.
async function run (){
let browser = await puppeteer.launch({headless: true});
These next 2 lines form the basics of most Puppeteer scripts. We are launching Puppeteer in Headless mode – so it will run behind the scenes, without the user seeing anything that it is doing. If you want to see what the script is doing, change “true” to “false”.
let page = await browser.newPage();
Here we open up a new page within the browser.
await page.goto(‘https://www.google.com’);
await page.screenshot({ path: ‘./testimg.jpg’, type: ‘jpeg’});
Next, we specify which page we want to go to, and then take a screenshot of that page. We then specify where we want the image to be saved, and then the filename & file extension.
await page.close();
await browser.close();
}
run();
We then finish off with closing the page, and then closing the browser.
Save this script within a folder, with an appropriate name. On my Mac, I have a folder on my Desktop called ‘Puppeteer_Scripts’ where I will save this script – with the filename ‘Screenshot.js’
Then, you will want to open up a new Terminal window, and navigate to this folder. If the folder is on your Desktop, type in “cd Desktop” to navigate to your Desktop, and then type in “cd foldername” to navigate to your folder:
When you are in the folder, type in ‘ls’ to list the contents of your folder, to make sure you are in the right place:
To then run the script, type in ‘npm i puppeteer’ – to install the latest version of puppeteer, and then type in ‘node Screenshot.js’
When the script has run, look within the folder within a finder window, and you will see a screenshot within the folder.
If you want, you can also modify this script with a number of additional lines:
await page.setUserAgent(‘Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.96 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)’);
This line sets the UserAgent to Googlebot, which can be replaced with any other User Agent that you want.
await page.setViewport({width: 1280, height: 1280});
This line sets the viewport to a 1280×1280 view. Mess around with these numbers to get a screenshot size that you want!
A huge number of other parameters can be set, with a full list available here.
What else can I get Puppeteer to do?
I also mentioned within my Brighton SEO talk that we can use Puppeteer to monitor web pages. The full code for this ‘project’ can be seen here.
This code checks for a file called ‘urls.txt’ at a specified location, at line 68 within the provided code:
try {data = fs.readFileSync(‘/Users/tompool/Desktop/PuppeteerRendering/PageMonitor/urls.txt‘,’utf8’);}
Update this path to reflect your case. This .txt file should contain data as follows:
https://www.bluearray.co.uk ,Homepage
https://www.bluearray.co.uk/news ,News
The first column should contain a URL, the second, separated by a comma, should be what you want the data associated with that URL to be called.
You should also update line 97 with an up to date path:
let fileName = “/Users/tompool/Desktop/PuppeteerRendering/PageMonitor/” + client + “.txt”;
This is where the data for each URL will be saved, pulling in the second column from ‘urls.txt’ as a filename.
The final line that should be updated is line 135:
var dateFileName = “/Users/tompool/Desktop/PuppeteerRendering/PageMonitor/Changes/” + dateFile + ” Changes.txt”;
This is where any changes that are seen between each running of the script are saved.
When these changes are made, the script will run, testing each provided URL, and pulling out:
- Meta title
- Meta description
- Canonical element
- Meta Robots directives
- HTML lines
- Paragraphs
On the first run, just these required elements will be pulled out. On any iterative run of the script, the previous data will be compared with the new data, and any changes will be saved to ‘Changes.txt’.
Summary
All of the above mentioned scripts worked at the time of writing, on my machine (iMac 2017 3.7 GHz i5 64GB RAM – OS Mojave) and should allow you to use them via a simple copy & paste. A number of other great guides can be found on usages of Chrome Puppeteer, in particular from an ex-Google engineer Eric Bidelman.
If you want further information, or would like some guidance around usage of Chrome Puppeteer, please do not hesitate to get in contact with me!
You can also check out all 210 of my slides here!