A Comparison of Cloud Services for Screaming Frog & other Website Crawling Software
Posted by Tom Pool on March 10, 2018
Here at Blue Array we crawl a large number of websites which vary massively in size; typically they can be anything from 100 to 1 million+ pages.
For smaller websites we use a couple of different crawlers; Screaming Frog and Sitebulb.
However, when we need to crawl really really big websites (i.e. with millions of pages) we utilise Screaming Frog in the Cloud, using Amazon’s Web Servers (AWS) to power the Screaming Frog crawler.
Screaming Frog have a great guide to crawling large websites, and also link to a couple of resources to help set up instances on both Amazon Web Services and Google Cloud Platform.
After receiving a rather large usage bill from AWS in one particular month we decided to investigate each major player in the “cloud services” space to see if we were getting the best value available, as well as to see what each service could offer us. It’s important for us as a business to keep our transferrable costs low and therefore highly competitive.
The major players that we investigated included:
- Amazon – Amazon Web Services
- Oracle – Oracle Cloud Platform
- Google – Cloud Platform
- Microsoft – Azure
As we can see from the above image, Amazon are by far the biggest player within the cloud services industry, with a reported $4.57 Bn in revenue for Q3 2017 (for the cloud infrastructure market)
For each service we set up a trial account and explored the options that were available. We tried to set up the same instance on each of the services to compare them like for like.
For those more technical amongst you, this was a Virtual Machine Instance in North America, that was running Windows Server 2016 as an OS, with around 30GB of RAM and 50GB of SSD storage.
We also assigned a static IP address to the instance, so that we could crawl from the same location each time.
In this article we are going to review the cloud services one by one, and provide thoughts on factors such as cost, general usability and available documentation, as well as highlighting the main benefits/issues with each service as we see them.
We then summarise all four companies’ services and let you know which one we picked to use for future website crawls.
Amazon Web Services
Amazon EC2 is the service that we have been using for a while now, and we have never really had any issues with it, other than that it is quite expensive. It commands (based on 2017 stats) a really high market share; way above all of its competitors, and when using the service it is easy to see why.
A number of documents are available to refer to if needed, with a selection of internal & external guides & resources to help users. They also offer a “Free” product tier, which allows you to get to grips with the products that are offered before committing for paying for the service.
Benefits:
- Relatively easy to set up
- Plenty of really good guides that give explicit guidelines
- Good billing breakdown that tells you exactly how many hours have been used
- Good customer service
- Can tie in really easily with other AWS services, such as being able to connect storage
- Plenty of options available for customisation – e.g. different RAM, storage, OS & processor specs
- Free tier of products is available, if minimal usage is required
Issues:
- Set up of security pair & password is not that easy
- Costs can add up really quickly
- Set up without using a guide is quite difficult
- Boot times of instances are quite slow
- Dashboard very technical – can be difficult to understand
Google Cloud Platform
Google Cloud Platform has quite a small market share, but it benefits from a clear, user-friendly interface; it’s this ease of use which seems to be aiding its growing popularity. Google also offer a one year trial of their cloud services, which is by far the largest amount of time that any service offers on a trial basis. They provide a budget of $250 to get you started.
Benefits:
- Really easy to set up, with many internal & external guides available
- Cheaper than AWS
- Google provide a 1 year trial – which is more than enough to get you started
- Massive amount of different specs available, with the option to fine tune if necessary
- Complete cost breakdown shown before starting an instance, as well as the maximum possible bill
- Can tie in really easily with other Cloud offerings
- Google also offer a ‘committed use discount’, that allows you to save a considerable amount, if you need the instance for a long period of time
- Free tier of basic products available
Issues
- Not as mature a product as AWS
- Can be quite confusing if you have no experience in setting up instances
- Not as many products offered as AWS
- Interface is not as clear as it could be
Oracle
Oracle have the smallest market share (out of the providers we compared) and also appear to target more advanced, technical users. Like AWS they also offer a 1 month trial, with a budget of $300 to help you get used to the service. However, not all products are available on the trial, and usability of the website & domain isn’t great.
Benefits:
- Cheaper than AWS
- Customer support is particularly good, with requests for help answered really quickly
- Good pricing calculator
- Trial with a budget of $300 is offered, which is more than enough to have a play with what is on offer
Issues:
- Website appears to be quite dated
- A number of days service has been affected by maintenance
- It’s quite difficult to set up an instance, with not many external resources
- Internal guides and resources are not that clear, and require a good level of prior understanding
- Quite hard to login to the console, and it’s also difficult to access billing breakdown
- Costs can add up really quickly
Microsoft Azure
Microsoft Azure have a higher market share than Google & Oracle, and also have a reliable product. With a huge name behind it, it’s not surprising that a lot of well known brands use their cloud services.
Microsoft offer a free trial of 30 days, with a budget of £150 – while not as much as other competitors, this is still enough to get a feel for the product.
Benefits:
- Free account available, with a budget to work within
- Free tier of products available
- Well-known brand, with a good reputation
- Covers the most amount of regions to have services within
- Good amount of documentation available, with a number of internal & external guides
- Easy-to-use interface, and really usable website
Issues:
- Not the cheapest product
- Instances not as customisable as Google’s offering
- Smaller budget to play with on trial version
- Longer instance boot times than Google Cloud
- Costs can add up really quickly
Summary
While Amazon are the biggest and longest running player in this market, and we have used them for a number of years, following this trial our winner was… 🏆
Google Cloud Platform
Overall Google’s offering was the easiest to use, with good documentation and user support provided. The tool also becomes easier to use over time than AWS, and ties right in with your existing Google account, so is super easy to set up.
Although I was, admittedly, a big fan of AWS and the functionality that it offers, I was quickly swayed by Google’s Cloud Platform offering, and found it incredibly easy to get to grips with.
While the amount of products that GCP offers is not up to par with AWS’s massive offering (that it has built out over a few more years of operation), it is more than sufficient for our specific requirements.
The time to boot up an instance is very impressive (a few seconds), and sadly leaves EC2 boot times in it’s dust. Scalability also comes easily, with the ability to create more instances really easily; you can template an instance and launch an identical one in seconds.
The one year trial that is offered is also far better than its competitors, with a good amount of ‘free money’ to help you get to grips with the tools on offer.