Site maps are used to document the pages in a website and also to help improve SEO, essentially allowing search engine crawlers to browse the file and index pages. Analysing the site map of a site can also help us the information architecture and hierarchy of the site’s pages.
The ‘LATCH’ Theory
“The only thing we know is our own personal knowledge and lack of knowledge.. And since it’s the only thing we really know, the key to making things understandable is to understand what it’s like not to understand.” -Richard Saul Wurman
How does this quote relate to information architecture?
Consider the ‘LATCH’ theory, essentially items are organised according to five things:
- Location: items can be grouped based on a location
- Alphabet: items can be grouped alphabetically
- Time: items can be grouped by time
- Category: item can be categorised
- Hierarchy: items can be arranged by a hierarchy, e.g. size order, height order or similar
The theory goes that arranging items based on only one of these principles can make things hard to find.
Take arranging things alphabetically for example – imagine how ‘shallow’ a website’s menu would be if items were arranged simply by their name. You’d end up with buttons across the top of the website and no sub-menus – and potentially a lot of them! Take location too – without some categorisation you could end up with a lot of items in one menu. Let’s assume you were making a menu and in it had to list every city in the USA and Canada and you simply made a menu section called ‘USA’ and another called ‘Canada’ and placed the appropriate cities in it. The lists would be long. To help make finding the cities easier, you could introduce categorisation, for example arranging the cities by their state.
In the example below, I was given 46 different fruits and vegetables to arrange. Look at how simply arranging by only one of the ‘LATCH’ categories makes long menus.
Download full resolution PDF here.
The idea is that in order to make a menu system, and thus information architecture, which is easy for a user to navigate, a combination of the above must be used.
So, going back to the quote: by taking a moment to not understand something, you can learn how others can better understand it and produce a better information architecture.
Visualising a site map
With that in mind, here’s how to visualise a site map and interpret the data.
Essentially, most large websites have a file called ‘sitemap.xml’ stored in their root directory which is just an index of pages on the site. The URLs in the site map can be broken down into their sub-folders which can help display a sitemap, of sort.
Consider this URL: http://mysite.com/subdir-1/index.html and consider this URL: http://mysite.com/subdir-2/index.html. A visualised representation of a sitemap could be used to show all of the pages inside the ‘subdir-1’ folder and the ‘subdir-2’ folder. Now imagine this scaled up to potentially thousands of sub-directories and thousands of pages. A visualised sitemap would make examining how the pages are arranged much easier than looking at the URLs in a text document or a spreadsheet and would show exactly where traffic is going. From there, the LATCH theory can be applied to more evenly distribute the sitemap, so that it is not weighted in any one particular direction.
How to visualise a site map
There are various ways of doing it, the fastest of which I found was to run some Python scripts that get a list of URLs from the sitemap.xml file (whilst the file is online), then import these into a CSV file, break down the links into their various sub-directories and finally create a visual map of the pages.
The ‘Site Map Visualisation Tool’ consists of three Python scripts and is publicly available to download from GitHub here. Theoretically this will work on Windows, macOS and Linux but in this example I used Linux (Ubuntu 17.10, specifically). I found that installing the Python libraries was easiest on Linux.
You need to install Python IDLE and the following libraries through the Terminal using the following commands (these are for Linux):
Install Python 3 IDLE:
apt-get install idle3
Install PIP:
apt install python3-pip
Install BeautifulSoup4:
apt install python3-bs4
Install Requests:
pip install requests
Install Pandas:
apt install python3-pandas
Install Graphviz:
apt install python3-graphviz
If Pandas doesn’t work, I suggest looking at installing the Anaconda library which can be downloaded here and installed via running:
bash ~/Downloads/Anaconda3-5.3.0-Linux-x86-64.sh
in the terminal, providing you save the *.sh file in the default location and with the default name. Anaconda includes Pandas and lots of other Python libraries.
After this, you just need to modify the location of the sitemap.xml file in the ‘extract_urls.py’ file and then run each Python script in the following order:
- extract_urls.py (creates a text document with the URLs in the sitemap in plain text)
- categorise_urls.py (creates a CSV with the links broken down into sub-directories)
- visualise_urls.py (creates a visual map of the data in the CSV file, saved as a PDF file)
You can watch a video of me going through the process below.
The results
I was tasked with visualising the site map for financial websites. These tend to be very big websites, typically tens of thousands of pages large!
All of the maps are so large, so I recommend downloading them from here and examining the PDFs in a PDF viewer on your computer if you want to see anything meaningful from them!
Aviva
The first site I tried was Aviva, here is the map.
Barclays
Barclays was up next.
Direct Line
As one of the UK’s largest insurance companies, Direct Line also has a large website.
Lloyds Banking
Lloyds’ sitemap also followed the pattern.
Nationwide
A building society this time, but the pattern remains.
RSA
As one of the county’s largest insurers, it’s no surprise to see that the RSA site has nearly 13,000 pages on their site.
If you download the PDFs from the link here and zoom into the maps, you can see that generally the sub-directories have names such as:
- ‘In your area’
- ‘2012’
- ‘Upcoming events’, ‘Events FAQ’, Sustainability Network’
And sub-menus tend to be arranged with what is believed to be the most important or relevant page at the top, look at the screenshot below from the RSA sitemap which shows the ‘fellowship’ directory or menu and the pages that are inside it. They believe that ‘get involved’ is likely the most relevant information to ‘fellowship’.

What does this suggest?
It suggests that ‘LATCH’ is in operation on these sites and therefore the professional web designers at these large financial organisations are utilising it to make finding very specific pages easier by arranging by location, (alphabetically not so much), time, categories and a hierarchy of importance. After just a few minutes of looking at these massive sitemap diagrams I was able to determine this, proving that both LATCH and sitemap visualisation are effective. LATCH makes arranging content easy and intuitive (as long as multiple parts of it are used) and sitemap visualisation makes information architecture analysis easy.
References
Design 4. (2018). LATCH – Methods of Organization. [online] Available at: https://parsonsdesign4.wordpress.com/resources/latch-methods-of-organization/ [Accessed 15 Oct. 2018].
4 Comments on “Visualising site maps to understand information architecture”
Comments are closed.