Check If File Has a CSV Format With Python

How to check if content of CSV file follows a specific format in Python?

See if this fits your requirement:

import sys
import csv
def assert_format(file_name):
with open(file_name, 'rb') as csvfile:
reader = csv.reader(csvfile, delimiter='.')
for row in reader:
flag=False
for cell in row:
if(cell == 'NA' and not flag):
flag=True
elif(cell == 'NA' and flag):
return False
return True

file_name = sys.argv[1]

if assert_format(file_name):
print("format is correct")
else:
print("choose correct file")

How to check if a file is csv or not?

Depending on how secure this needs to be. The easiest step is simply to limit the extension type on the upload HTML field.

<input type="file" accept=".csv" />

Obviously, anyone could just name the extension .csv to circumnavigate but that would be the case with any solution that is just checking extensions.

How to write If file type is NOT txt or csv, do X inside for loop?

Each boolean expression is evaluated separately.

not file.endswith ('.txt') or ('.csv') evaluated true: not empty tuple considered truthy value.

for file in glob.glob('*'):
if not file.endswith ('.txt') or not file.endswith ('.csv'):
continue
elif file.endswith ('.txt'):
# run script1 here
else:
# run script2 here

Python - How can I check if a CSV file has a comma or a semicolon as a separator?

Say that you would like to read an arbitrary CSV, named input.csv, and you do not know whether the separator is a comma or a semicolon.

You could open your file using the csv module. The Sniffer class is then used to deduce its format, like in the following code:

import csv
with open(input.csv, newline='') as csvfile:
dialect = csv.Sniffer().sniff(csvfile.read())

For this module, the dialect class is a container class whose attributes contain information for how to handle delimiters (among other things like doublequotes, whitespaces, etc). You can check the delimiter attribute by using the following code:

print(dialect.delimiter)
# This will be either a comma or a semicolon, depending on what the input is

Therefore, in order to do a smart CSV reading, you could use something like the following:

if dialect.delimiter == ',':
df = pd.read_csv(input.csv) # Import the csv with a comma as the separator
elif dialect.delimiter == ';':
df = pd.read_csv(input.csv, sep=';') # Import the csv with a semicolon as the separator

More information can be found here.

Checking if the .csv file is formatted correctly before importing to avoid embedding wrong data into database

Checking for a well formed file

If you are programmatically importing a file, then if you can load a Dataset object, without any errors being raised, then it is a well-formed csv file. So something like:

try:
with open('data.csv', 'r') as fh:
imported_data = Dataset().load(fh, headers=False)
except Exception as e:
# you can add additional error handling / logging here if you like
print("import fail")
raise e

Checking for correct headers

Before the import process, there is a hook you can use to check for valid headers. So you could do something like the following to check for missing columns:

class YourResource(resources.ModelResource):
fields = ('author', 'email')

def before_import(self, dataset, using_transactions, dry_run, **kwargs):
for field_name in self.fields:
col_name = self.fields[field_name].column_name
if col_name not in dataset.headers:
raise ValueError(f"'{col_name}' field not in data file")

Data Validation

You can use the in-built widgets to supply additional validation at the field level. You can extend these as much as you like to enable additional domain-specific validation. For example, if you only want to allow '1' or 0' as your boolean values, you could implement the following:

class StrictBooleanWidget(widgets.BooleanWidget):
TRUE_VALUES = ["1"]
FALSE_VALUES = ["0"]
NULL_VALUES = [""]

def clean(self, value, row=None, *args, **kwargs):
if value in self.NULL_VALUES:
return None
if value in self.TRUE_VALUES:
return True
if value in self.FALSE_VALUES:
return False
raise ValueError("Invalid boolean: value must be 1 or 0.")

Then refer to this in your resource:

class YourResource(resources.ModelResource):
is_active = fields.Field(
attribute="active",
column_name="active",
default=False,
widget=upload.widgets.StrictBooleanWidget(),
)

You can also use this approach to check for missing or empty values in the data.

django-import-export really helps a lot with the use-case you describe, but it can be confusing when you are new to it. Read the documentation in depth, and I suggest to download and run the example application (if you haven't already).

The core import logic is fairly straightforward, but it will save a ton of time if you can set breakpoints and step through during dev. That will really help you to understand what is going on.

How can I check the file I am uploading is in CSV format or Excel? in python

If I understood You correctly, there is a simple option.

file_split = uploaded_file.split('.')
file_extension = '' if len(file_split) != 2 else file_split[1]


Related Topics



Leave a reply



Submit