CSV looks easy, but it can be hard to make a CSV file that other people can work with. CSVLint helps you to check that your CSV file is readable. And you can use it to check whether it contains the columns and types of values that it should.
This service is maintained by an open source community and hosted by the Open Data Institute (ODI).
Follow these three steps:
You'll see a page that tells you how to improve your CSV file, if it needs improvement.
On the page that summarises the CSV, you'll also see a badge, which might look like this:
If you click on that you'll get a pop-up that gives you some code for embedding the badge in your page. You can put that next to some CSV and it means you and other people will be able to see whether the CSV is valid or not.
Wear your valid CSV badge with pride!
We can fix some of the errors that we find in CSV files, such as bad encodings. At the bottom of the page that shows you how to improve the file, you'll see a button that says Download Standardised CSV File.
That won't fix all the problems: we won't delete empty lines or try to fix up values that are in the wrong format. We can't change the way your server provides CSV either, so you'll still be warned if it's not using the right Content-Type
header.
The Recent schemas page gives a list of schemas that people have been using to validate their CSV files. See if there's a schema that you could use!
If your CSV File is sizeable and contains only one row, CSVLint detects that
your file may be using a different "dialect". For example, your file may use
;
delimiters to seperate fields rather than the standard
,
. Although it is encouraged to keep to CSV Standard File
Format, we provide an option to change the dialect to suit your particular file.
Good CSVs look like this:
OrganisationId,WeekDay,Times,IsOpen,OpeningTimeType
1186,Monday,09:30-13:00,True,General
1186,Monday,13:30-17:30,True,General
...
for a table like this:
OrganisationId | WeekDay | Times | IsOpen | OpeningTimeType |
---|---|---|---|---|
1186 | Monday | 09:30-13:00 | True | General |
1186 | Monday | 13:30-17:30 | True | General |
... |
The first row is a header row that contains the names of some columns.
The rest of the rows are data rows that contain a number of fields.
Line endings use CRLF
(Windows line endings) and the column names and fields are separated by commas.
If a field contains a comma, a line ending or a double quote then the field is escaped by wrapping it in double quotes. Within a field that's escaped like that, any double quotes are doubled up. For example:
OrganisationCode,OrganisationName,Address1,Address2,Address3,City,County,Postcode
1-231076921,"Next Stage ""A Way Forward"" Ltd",Head Office,"HR House, 28 Manchester Road",Westhoughton,Bolton,Lancashire,BL5 3QJ
...
for a table like this:
OrganisationCode | OrganisationName | Address1 | Address2 | Address3 | City | County | Postcode |
---|---|---|---|---|---|---|---|
1-231076921 | Next Stage "A Way Forward" Ltd | Head Office | HR House, 28 Manchester Road | Westhoughton | Bolton | Lancashire | BL5 3QJ |
... |
When checking your CSV, we may return one or more of the following errors:
If we get the CSV from a URL, then we also check for these errors:
404 Not Found
response)text/csv
We also return the following warnings:
and if we get the CSV from a URL, we return these warnings:
Content-Type
HTTP headercharset
parameter.xls
and there is no 'Content-Type' header)CRLF
(Windows line endings). That's because RFC4180, which is the closest thing to standard CSV that there is, says that you should use CRLF
line endings.We currently recognise schemas that use the JSON Table Schema, with a few modifications.
An example schema for the first CSV shown above could be:
{
"fields": [
{
"name": "OrganisationId",
"title": "Organisation ID",
"constraints": {
"required": true,
"type": "http://www.w3.org/TR/xmlschema-2/#int"
}
},
{
"name": "WeekDay",
"title": "Day of the week",
"constraints": {
"required": true,
"pattern": "(Mon|Tue|Wednes|Thurs|Fri|Satur|Sun)day"
}
},
{
"name": "Times",
"constraints": {
"required": true,
"pattern": "(0[0-9]|1[0-2]):[0-5][0-9]-(0[0-9]|1[0-2]):[0-5][0-9]"
}
},
{
"name": "IsOpen",
"title": "Open?",
"constraints": {
"required": true,
"pattern": "(True|False)"
}
},
{
"name": "OpeningTimeType",
"title": "General or additional opening",
"constraints": {
"required": true,
"pattern": "(General|Additional)"
}
}
]
}
The constraints that CSVLint checks for are shown in the following table:
Constraint | Value type | Description |
---|---|---|
required | boolean | if true, there must be a value in this column on every row |
unique | boolean | if true, each row should have a different value in this column |
minLength | integer | every value must contain at least this number of characters |
maxLength | integer | no value can have more than this number of characters |
pattern | regexp | values must match this regular expression (Use rubular.com to test your regular expressions |
type | URL | a URL for a data type which every value must adhere to (see below) |
minimum | number or date/time | every value must be at least this value |
maximum | number or date/time | no value should be more than this value |
datePattern | strftime | the format for date/time values in this column |
The supported data types are currently:
http://www.w3.org/2001/XMLSchema#string
http://www.w3.org/2001/XMLSchema#int
http://www.w3.org/2001/XMLSchema#float
http://www.w3.org/2001/XMLSchema#double
http://www.w3.org/2001/XMLSchema#anyURI
http://www.w3.org/2001/XMLSchema#boolean
http://www.w3.org/2001/XMLSchema#nonPositiveInteger
http://www.w3.org/2001/XMLSchema#positiveInteger
http://www.w3.org/2001/XMLSchema#nonNegativeInteger
http://www.w3.org/2001/XMLSchema#negativeInteger
http://www.w3.org/2001/XMLSchema#date
http://www.w3.org/2001/XMLSchema#dateTime
http://www.w3.org/2001/XMLSchema#gYear
http://www.w3.org/2001/XMLSchema#gYearMonth
http://www.w3.org/2001/XMLSchema#time
Use of an unknown data type will result in the column failing to validate.
CSVLint will give schema validation error and warning messages if these constraints aren't met. In addition we check that each column name in the CSV file is the same as the name for the column in the same position in the schema.
We check the following things about the schema that you provide:
Please check the privacy policy for information on how we store CSVLint data.
CSVLint should be able to handle files up to around 100MB.
Development of this tool was partly supported by the DaPaaS project, co-funded by the European Commission under the Seventh Framework Programme (FP7 2007-2013). The DaPaaS project has also developed the DataGraft platform where you can transform and publish your data online in tabular and linked data form.