×

v1.1.0

About this service

CSV looks easy, but it can be hard to make a CSV file that other people can work with. CSVLint helps you to check that your CSV file is readable. And you can use it to check whether it contains the columns and types of values that it should.

This service is maintained by an open source community and hosted by the Open Data Institute (ODI).

How to Use CSVLint

Follow these three steps:

  1. Enter the location of the CSV file you want to check, or upload it if it's not on the web already.
  2. Do you have a schema? If you do, you can enter its location or upload it.
  3. Hit the big Validate button.

You'll see a page that tells you how to improve your CSV file, if it needs improvement.

  • Errors are things you really need to fix, because they'll stop people from being able to use your data
  • Warnings are thing you should fix if you can because doing so will help people to use your data
  • Information messages are hints and tips of things that could make the data easier to use

Get a Badge

On the page that summarises the CSV, you'll also see a badge, which might look like this:

:csv%20 %20warnings yellow

You can also see the embed code to allow you to embed the badge on your own website.

Dialects

Dialect

If your CSV File is sizeable and contains only one row, CSVLint detects that your file may be using a different "dialect". For example, your file may use ; delimiters to seperate fields rather than the standard ,. Although it is encouraged to keep to CSV Standard File Format, we provide an option to change the dialect to suit your particular file.

How to Write Good CSV

Good CSVs look like this:

OrganisationId,WeekDay,Times,IsOpen,OpeningTimeType
1186,Monday,09:30-13:00,True,General
1186,Monday,13:30-17:30,True,General
...

for a table like this:

OrganisationId WeekDay Times IsOpen OpeningTimeType
1186 Monday 09:30-13:00 True General
1186 Monday 13:30-17:30 True General
...

The first row is a header row that contains the names of some columns.

The rest of the rows are data rows that contain a number of fields.

Line endings use CRLF (Windows line endings) and the column names and fields are separated by commas.

If a field contains a comma, a line ending or a double quote then the field is escaped by wrapping it in double quotes. Within a field that's escaped like that, any double quotes are doubled up. For example:

OrganisationCode,OrganisationName,Address1,Address2,Address3,City,County,Postcode
1-231076921,"Next Stage ""A Way Forward"" Ltd",Head Office,"HR House, 28 Manchester Road",Westhoughton,Bolton,Lancashire,BL5 3QJ
...

for a table like this:

OrganisationCode OrganisationName Address1 Address2 Address3 City County Postcode
1-231076921 Next Stage "A Way Forward" Ltd Head Office HR House, 28 Manchester Road Westhoughton Bolton Lancashire BL5 3QJ
...

Common Errors

When checking your CSV, we may return one or more of the following errors:

  • Invalid encoding: if there are any odd characters in a file which could cause encoding errors
  • Line breaks: if line breaks are not the same throughout the file
  • Undeclared header: if you do not specify in a machine readable way whether or not your CSV has a header row
  • Ragged rows: if every row in the file doesn't have the same number of columns
  • Blank rows: if there are any blank rows
  • Stray/Unclosed quote: if there are any unclosed quotes in the file
  • Whitespace: if there is any whitespace between commas and double quotes around fields

If we get the CSV from a URL, then we also check for these errors:

  • Not found: if the file doesn't exist (we get a 404 Not Found response)
  • Wrong content type: if the content type isn't set text/csv

Common Warnings

We also return the following warnings:

  • Encoding: if you don't use UTF-8 as the encoding for the file
  • Check options: if the CSV file only contains a single comma-separated column; this usually means you're using a separator other than a comma
  • Inconsistent values: if any column contains inconsistent values, for example if most values in a column are numeric but there's a significant proportion that aren't
  • Empty column name: if all the columns don't have a name
  • Duplicate column name: if all the column names aren't unique
  • Title row: if there appears to be a title field in the first row of the CSV

and if we get the CSV from a URL, we return these warnings:

  • No content type: if you don't provide a Content-Type HTTP header
  • No encoding: if you don't specify an encoding with a charset parameter
  • Excel: if it looks like you're serving an Excel file rather than CSV (because the suffix for the file is .xls and there is no 'Content-Type' header)

Extra Information

  • Non RFC line breaks: We let you know if you use line breaks that aren't CRLF (Windows line endings). That's because RFC4180, which is the closest thing to standard CSV that there is, says that you should use CRLF line endings.
  • Assumed header: We assume that a header is present.

How to Write a Schema

We currently recognise schemas that use the JSON Table Schema, with a few modifications.

An example schema for the first CSV shown above could be:

{
"fields": [
{
    "name": "OrganisationId",
    "title": "Organisation ID",
    "constraints": {
        "required": true,
        "type": "http://www.w3.org/TR/xmlschema-2/#int"
    }
},
{
    "name": "WeekDay",
    "title": "Day of the week",
    "constraints": {
        "required": true,
        "pattern": "(Mon|Tue|Wednes|Thurs|Fri|Satur|Sun)day"
    }
},
{
    "name": "Times",
    "constraints": {
        "required": true,
        "pattern": "(0[0-9]|1[0-2]):[0-5][0-9]-(0[0-9]|1[0-2]):[0-5][0-9]"
    }
},
{
    "name": "IsOpen",
    "title": "Open?",
    "constraints": {
        "required": true,
        "pattern": "(True|False)"
    }
},
{
    "name": "OpeningTimeType",
    "title": "General or additional opening",
    "constraints": {
        "required": true,
        "pattern": "(General|Additional)"
    }
}
]
}

The constraints that CSVLint checks for are shown in the following table:

Constraint Value type Description
required boolean if true, there must be a value in this column on every row
unique boolean if true, each row should have a different value in this column
minLength integer every value must contain at least this number of characters
maxLength integer no value can have more than this number of characters
pattern regexp values must match this regular expression (Use rubular.com to test your regular expressions
enum array Array of value values that a string can have
type URL a URL for a data type which every value must adhere to (see below)
minimum number or date/time every value must be at least this value
maximum number or date/time no value should be more than this value
datePattern strftime the format for date/time values in this column

The supported data types are currently:

Use of an unknown data type will result in the column failing to validate.

CSVLint will give schema validation error and warning messages if these constraints aren't met. In addition we check that each column name in the CSV file is the same as the name for the column in the same position in the schema.

Common Errors

We check the following things about the schema that you provide:

  • Missing value: a column marked as `required` in the schema has no value
  • Minimum length: a column with a `minLength` constraint has a value that is too short
  • Maximum length: a column with a `maxLength` constraint has a value that is too long
  • Pattern: a column with a `pattern` constraint has a value that doesn't match the regular expression
  • Enum: a column with a `enum` constraint has a value that isn't contained in the list of valid values
  • Invalid regex: a regular expression pattern defined in the schema is not in valid regex form
  • Unique: a column with a `unique` constraint contains non-unique values
  • Below minimum: a column with a `minimum` constraint contains a value that is below the minimum
  • Above maximum: a column with a `maximum` constraint contains a value that is above the maximum

Common Warnings

  • Header name: the header in the CSV has a column name that doesn't match the schema
  • Missing column: a row in the CSV file has a missing column, that is specified in the schema
  • Extra column: a row in the CSV file has an extra column

Please check the privacy policy for information on how we store CSVLint data.

CSVLint should be able to handle files up to around 100MB.

Acknowledgements

Dapaas logo

Development of the old tool was partly supported by the DaPaaS project, co-funded by the European Commission under the Seventh Framework Programme (FP7 2007-2013).