Url Hunting in Lancashire
Posted : Blog Post : 12.06.2020 - North West Open Data

1. Introduction
As a first step in building a multi authority ‘Expenditure’ data set I decided to conduct a pilot(full project aim here). I selected all the local authorities in Lancashire, this gave me 14 separate websites to trawl and download .csv files covering January 2019 to 2020.
West Lancashire Borough Council |
Chorley Council |
South Ribble Borough Council |
Fylde Council |
Preston City Council |
Wyre Council |
Lancaster City Council |
Ribble Valley Borough Council |
Pendle Borough Council |
Burnley Borough Council |
Rossendale Borough Council |
Hyndburn Borough Council |
Blackpool Council |
Blackburn with Darwin Council |
2. How Hard Can This Be…
I think I naively assumed there would be some standards in place to help facilitate the downloading of multiple files from different sites such as
-
A standard location for open data downloads
-
A naming standard for data set files
Ideally I was hoping to use something like this piece of pseudo code
#!/bin/bash
local_authorities='https://www.westlancs.gov.uk \
https://chorley.gov.uk https://www.southribble.gov.uk \
https://new.fylde.gov.uk https://www.preston.gov.uk \
https://www.wyre.gov.uk https://www.lancaster.gov.uk \
https://www.ribblevalley.gov.uk https://www.pendle.gov.uk \
https://www.burnley.gov.uk https://www.rossendale.gov.uk \
https://www.hyndburnbc.gov.uk https://www.blackpool.gov.uk \
https://www.blackburn.gov.uk'
months='january february march april may june july august september \
october november december'
for la in ${local_authorities}
do
directory=$(echo ${la} | awk -F'//' '{print $2}' \
| sed 's/www.//' | sed 's/.gov.uk//' | sed 's/new.//')
mkdir ${directory}
for m in ${months}
do
curl -o ${directory}/${m}_2019.csv \
${la}://open-data/expenditure-over-250/${m}_2019.csv
done
done
3. Expectation Meets Reality
I decided to arbitrarily score each of the authorities for ease of use and access to the data primarily with a view to automating download of the relevant CSV files. Here’s the results

You can find a spreadsheet I used to score the councils here
Councils were scored on the following features
-
Landing Page – Is there a Spending landing page, generally there was, some authorities had this in their ‘Finance’ section of the website rather than the Open Data area. Blackburn and Darwin Council had a “DataShare” page, more of this later
-
License – Is there notification of the terms of the release of the data ie OGL3.0. Only half the authorities complied with this requirement.
-
Metadata Description – Is there a technical description of the data enclosed in the download files. 10 councils had no metadata description, 3 had some information and Pendle Borough Council was the only council to provide a full description of their data.
-
E-mail Contact – Is there a contact for further queries regarding the data. Six councils had no contact details.
-
Archiving – Is there a full archive of old files available, is this in the same area or are there links to and archive area, is there a retention policy in place.
-
File Name – How consistent is the CSV file name, across councils this varied from terrible to good, Blackpool, Pendle Borough and Ribble Valley Borough Councils were good, Hyndburn Borough Council was terrible.
-
File Formats Available – Four formats where available across all councils, CSV, PDF, XLS, XML or JSON, most councils offering two types. The most common pairing was CSV and PDF. Three sites didn’t offer CSV files, (Fylde, Preston City and Rossendale). I considered this the most important requirement and consequently these councils appear in the bottom 3 of my scorecard.
-
Data Frequency – Either Quarterly or Weekly, see below
-
Spend Threshold – Either £500 or £250
-
Latest File – How up to date the latest file is, this varied from 6 months(West Lancashire) to 5 authorities who reported monthly doing so correctly ie one month behind inspection month
-
CSV File Directly Downloadable – Is the download url for the file(s) available in the landing page or do you have to construct a path to a download area. 2 councils(Chorley and Blackpool) had ‘aspx’ pages that were unreadable when downloaded via curl. I had to manually us DataMiner to extract the urls. Hyndburn Borough Council files had no extension to indicate whether they were CSV or PDF and an unhelpful naming standard.
4. Issues
There is a large variation in how, where and what format is presented by the 14 councils, it’s important to note that no file has yet been opened to verify the data for completeness or accuracy. In the Local Government Transparency Code 2015 key document no real mention as to the ‘how’ and ‘where’ this data should be made available however the Local Government Association does provide some guidance in the following document
While the data may be presented in a consistent way for local human customers the lack of any cross authority standards impedes the download and analysis of data comparing councils or attempting to aggregate data in any programmatic manner. I’m going to review 4 key areas where the spend (or any other transparency) data could be presented in a more standardised way.
4.1. Landing page
You may wish to create a dedicated open data page or section on the authority’s website, for example www.yourauthority.gov.uk/opendata to publish your information.
from https://www.local.gov.uk/sites/default/files/documents/publishing-data-general-g-2b0.pdf
Only Blackpool Council appeared to have adopted this approach, their files were located in the following location
Your-Council/Transparency-and-open-data/Documents/Spending/20192020/
Even here there are too many directories, if files are named correctly ‘Documents/Spending/FinancialYear’ are superfluous. Most authorities seem to use some Content Management System that gives rise to these sort of directory structures
download/downloads/id/12413/
Spend data is frequently located in the ‘Finance’ section of the website away from other open data offerings.
Spend files located at different levels in the directory hierarchy.
4.2. File Names
Files should be named in a consistent manner with the date included in the file name. Each file name should be unique. The name should avoid spaces and characters other than A to Z, 0 to 9, underline () and hyphen (-)._
from https://www.local.gov.uk/sites/default/files/documents/publishing-data-general-g-2b0.pdf
Problems in this area include
-
Inconsistent name standards
-
Ascii encoded illegal URL characters eg
%20
-
No file extension eg
-csv
not.csv
-
Spaces in filenames
-
Using link pointers in pages eg
wpdmdl=7366&ind=22yFlnI0e1D5_FTDazU4DHjmd130OHTyxWKhdnRDZ3A
-
Filenames that don’t support subsequent versions
4.3. File Formats
This means that data should be published in machine readable, non-proprietary formats such as CSV or XML files. Excel (XLS or XLSX) is a software proprietary format and cannot be uniformly read by any software, whereas CSV and XML are widely accessible.
from https://www.local.gov.uk/sites/default/files/documents/publishing-data-general-g-2b0.pdf
CSV is the absolute minimum basic format required for spending data, there is no excuse for not offering files in this format. Fylde, Preston City and Rossendale Borough Councils offer Excel files instead.
4.4. Data Frequency and Thresholds
In the Local Government Transparency Code 2015 document it sets out aspirations for spending data publication. I covered these aims in this post. It is now 5 years since that document was published and 5 councils are still only producing quarterly data and 7 are producing spend data over £500. I would have expected all councils would now be producing monthly over £250 data sets.
A further problem arises with respect to quarterly data in that it may be occur
in files with a quarter number in the file name eg
spending-over-500-q3-2018-19-csv.csv
, clearly from this file the quarter
numbers refer to a ‘financial year’ not a calendar year. This adds further
difficulties and externally a financial year span may not be known and a search
for a specific time period is further complicated. Where quarters are reported
‘calendar year’ quarters should be used and ideally all councils should move to
a monthly reporting interval.
5. Web based access
Blackburn with Darwin Council have taken a different approach to presenting data on their website.
Blackburn with Darwen Council has developed a publically accessible data share platform in order for citizens to access the data sets as set out in the code.
from http://mybins.blackburn.gov.uk/Pages/Data-Transparency-data-sets.aspx?CurrentTermId=4a212867-2267-4b7e-ba61-3728d239e549
The interface looks like this

Potentially this is an interesting approach, I think there is at least one other council in the North West that has taken this route. Pressing the ‘Download’ button will produce a CSV file and separate ‘Download’ page is available but will only allow you to download a consolidated file. Unfortunately the API will only allow XML and JSON format. What appears to be a promising approach to deal with customer needs fails to help scripted downloads of CSV files. I did spend a short period of time looking at the API documentation but seemed to run into alot of errors. It’s interesting to speculate on the reasoning for this approach when the Government and Local Government Guidance documents/requirements take an entirely different methodology.
6. Wishlist
I’ve compiled a list of available CSV files from 13 Lancashire councils to give a flavour of the variance. Here’s a short list of items I would like to see addressed to ease the rather chaotic situation I’ve described above. Obviously this would need some cross council coordination to define standards but the benefits for open data users and indeed the councils themselves would be worth the initial work.
-
If your council is reporting quarterly, change to monthly now.
-
Create a high level open data directory
DOC_ROOT/opendata
-
Relocate all your open data files here and rename them to a universal naming standard eg.
lgtc_expenditure_2020_jun-1.csv
--+- ------+---- -+-- -+- - -+-
| | | | | |
| | | | | +---- file extension to identify contents
| | | | +-------- version number
| | | +----------- abbreviated month name (%b from 'date')
| | +---------------- year (YYYY)
| +----------------------- report data type
+-------------------------------- legislation or document under which
data is released
-
Create a simple index.html page listing all files in report type listings<li> for human use, this can be automated.
-
Review the archiving processes and consider keeping aged out files available for future use.
-
Review council procedures against Local transparency guidance – publishing data