CSV looks easy, but it can be hard to make a CSV file that other people can work with. CSVLint helps you to check that your CSV file is readable. And you can use it to check whether it contains the columns and types of values that it should.
This service is maintained by an open source community and hosted by the Open Data Institute (ODI).
Follow these three steps:
You'll see a page that tells you how to improve your CSV file, if it needs improvement.
On the page that summarises the CSV, you'll also see a badge, which might look like this:
You can also see the embed code to allow you to embed the badge on your own website.
If your CSV File is sizeable and contains only one row, CSVLint detects that
your file may be using a different "dialect". For example, your file may use
;
delimiters to seperate fields rather than the standard
,
. Although it is encouraged to keep to CSV Standard File
Format, we provide an option to change the dialect to suit your particular file.
Good CSVs look like this:
OrganisationId,WeekDay,Times,IsOpen,OpeningTimeType
1186,Monday,09:30-13:00,True,General
1186,Monday,13:30-17:30,True,General
...
for a table like this:
OrganisationId | WeekDay | Times | IsOpen | OpeningTimeType |
---|---|---|---|---|
1186 | Monday | 09:30-13:00 | True | General |
1186 | Monday | 13:30-17:30 | True | General |
... |
The first row is a header row that contains the names of some columns.
The rest of the rows are data rows that contain a number of fields.
Line endings use CRLF
(Windows line endings) and the column names and fields are separated by commas.
If a field contains a comma, a line ending or a double quote then the field is escaped by wrapping it in double quotes. Within a field that's escaped like that, any double quotes are doubled up. For example:
OrganisationCode,OrganisationName,Address1,Address2,Address3,City,County,Postcode
1-231076921,"Next Stage ""A Way Forward"" Ltd",Head Office,"HR House, 28 Manchester Road",Westhoughton,Bolton,Lancashire,BL5 3QJ
...
for a table like this:
OrganisationCode | OrganisationName | Address1 | Address2 | Address3 | City | County | Postcode |
---|---|---|---|---|---|---|---|
1-231076921 | Next Stage "A Way Forward" Ltd | Head Office | HR House, 28 Manchester Road | Westhoughton | Bolton | Lancashire | BL5 3QJ |
... |
When checking your CSV, we may return one or more of the following errors:
If we get the CSV from a URL, then we also check for these errors:
404 Not Found
response)text/csv
We also return the following warnings:
and if we get the CSV from a URL, we return these warnings:
Content-Type
HTTP headercharset
parameter.xls
and there is no 'Content-Type' header)CRLF
(Windows line endings). That's because RFC4180, which is the closest thing to standard CSV that there is, says that you should use CRLF
line endings.We currently recognise schemas that use the JSON Table Schema, with a few modifications.
An example schema for the first CSV shown above could be:
{
"fields": [
{
"name": "OrganisationId",
"title": "Organisation ID",
"constraints": {
"required": true,
"type": "http://www.w3.org/TR/xmlschema-2/#int"
}
},
{
"name": "WeekDay",
"title": "Day of the week",
"constraints": {
"required": true,
"pattern": "(Mon|Tue|Wednes|Thurs|Fri|Satur|Sun)day"
}
},
{
"name": "Times",
"constraints": {
"required": true,
"pattern": "(0[0-9]|1[0-2]):[0-5][0-9]-(0[0-9]|1[0-2]):[0-5][0-9]"
}
},
{
"name": "IsOpen",
"title": "Open?",
"constraints": {
"required": true,
"pattern": "(True|False)"
}
},
{
"name": "OpeningTimeType",
"title": "General or additional opening",
"constraints": {
"required": true,
"pattern": "(General|Additional)"
}
}
]
}
The constraints that CSVLint checks for are shown in the following table:
Constraint | Value type | Description |
---|---|---|
required | boolean | if true, there must be a value in this column on every row |
unique | boolean | if true, each row should have a different value in this column |
minLength | integer | every value must contain at least this number of characters |
maxLength | integer | no value can have more than this number of characters |
pattern | regexp | values must match this regular expression (Use rubular.com to test your regular expressions |
enum | array | Array of value values that a string can have |
type | URL | a URL for a data type which every value must adhere to (see below) |
minimum | number or date/time | every value must be at least this value |
maximum | number or date/time | no value should be more than this value |
datePattern | strftime | the format for date/time values in this column |
The supported data types are currently:
http://www.w3.org/2001/XMLSchema#string
http://www.w3.org/2001/XMLSchema#int
http://www.w3.org/2001/XMLSchema#float
http://www.w3.org/2001/XMLSchema#double
http://www.w3.org/2001/XMLSchema#anyURI
http://www.w3.org/2001/XMLSchema#boolean
http://www.w3.org/2001/XMLSchema#nonPositiveInteger
http://www.w3.org/2001/XMLSchema#positiveInteger
http://www.w3.org/2001/XMLSchema#nonNegativeInteger
http://www.w3.org/2001/XMLSchema#negativeInteger
http://www.w3.org/2001/XMLSchema#date
http://www.w3.org/2001/XMLSchema#dateTime
http://www.w3.org/2001/XMLSchema#gYear
http://www.w3.org/2001/XMLSchema#gYearMonth
http://www.w3.org/2001/XMLSchema#time
Use of an unknown data type will result in the column failing to validate.
CSVLint will give schema validation error and warning messages if these constraints aren't met. In addition we check that each column name in the CSV file is the same as the name for the column in the same position in the schema.
We check the following things about the schema that you provide:
Please check the privacy policy for information on how we store CSVLint data.
CSVLint should be able to handle files up to around 100MB.
Development of the old tool was partly supported by the DaPaaS project, co-funded by the European Commission under the Seventh Framework Programme (FP7 2007-2013).