quoting error importing JSON formatted column of geographical object via CSV into rails + postgresql/postgis
Using CSV is actually making one go through pointless hoops. Using plain ol' Ruby is much more direct, given the extension involved
task :load_geo_data => :environment do
tempdir = File.absolute_path('uploads/regions.tsv')
File.open(tempdir).each do |line|
feature = RGeo::GeoJSON.decode(line)
Regionpolygon.create(
rawdata: feature.geometry.as_text,
[...]
)
end
end
Error in Reading a csv file in pandas[CParserError: Error tokenizing data. C error: Buffer overflow caught - possible malformed input file.]
Not an answer, but too long for a comment (not speaking of code formatting)
As it breaks when you read it in csv module, you can at least locate the line where the error occurs:
import csv
with open(r"C:\work\DATA\Raw_data\store.csv", 'rb') as f:
reader = csv.reader(f)
linenumber = 1
try:
for row in reader:
linenumber += 1
except Exception as e:
print (("Error line %d: %s %s" % (linenumber, str(type(e)), e.message)))
Then look in store.csv what happens at that line.
Parsing csv file in Ruby
That message is technically correct. Quotes have special meaning for the CSV format - they would allow you to embed separator characters in the data. Any quotes used within a field therefore need to be escaped if they are part of the data, or the CSV parser should be informed to use some other character for quoting, in which case it will treat any "
that it sees as literal data.
If you don't need to support pipes actually within each field, and have some other unused character you can shift this problem off to, Ruby's CSV can be made to consume your (slightly) malformed csv format:
CSV.parse(data, {:col_sep => '|', :quote_char => "%" })
Otherwise, the correct quoting for your problem line is
|"Some ""quoted name"""|2|12|Machine|
Importing CSV-file as list returns empty list
This is because a variable declared inside a function is only available inside that function and not in the global scope. In this case my suggestion would be to return the data from the function after you have finished reading the file. In other words something like this:
import csv
def read_file():
with open ('filepath', 'r') as file:
csv_reader = csv.reader(file, delimiter=';')
data=[]
for rad in csv_reader:
data.append(rad)
return data
file_data = read_file()
print(file_data)
It is also possible to make the data
variable inside the function global, however this is normally not recommended due to global variables quickly becoming hard to keep track of, it is much easier to see where data is coming from when it is returned from a function like this.
Line breaks in generated csv file driving me crazy
This works for me:
a) Setting Response.ContentEncoding = System.Text.Encoding.UTF8
isn't enough to make Excel open UTF-8 files correctly. Instead, you have to manually write a byte-order-mark (BOM) header for the excel file:
if (UseExcel2003Compatibility)
{
// write UTF-16 BOM, even though we export as utf-8. Wrong but *I think* the only thing Excel 2003 understands
response.Write('\uFEFF');
}
else
{
// use the correct UTF-8 bom. Works in Excel 2008 and should be compatible to all other editors
// capable of reading UTF-8 files
byte[] bom = new byte[3];
bom[0] = 0xEF;
bom[1] = 0xBB;
bom[2] = 0xBF;
response.BinaryWrite(bom);
}
b) send as octet-stream, use a filename with .csv extension and do quote the filename as is required by the HTTP spec:
response.ContentType = "application/octet-stream";
response.AppendHeader("Content-Disposition", "attachment; filename=\"" + fileName + "\"");
c) use double quotes for all fields
I just checked and for me Excel opens downloaded files like this correctly, including fields with line breaks.
But note that Excel still won't open such CSV correctly on all systems that have a default separator different to ",". E.g. if a user is running Excel on a Windows system set to German regional settings, Excel will not open the file correctly, because it expects a semicolon instead of a comma as separator. I don't think there is anything that can be done about that.
Ruby CSV fails on fields like =1234
Using gsub
should be enough:
#!/usr/bin/env ruby
require 'csv'
data = File.read('file.csv').gsub(/=("[^"]*")/, '\\1')
CSV.parse(data).each do |e|
puts e.inspect
end
Output:
["Product Code", "Product Name", "Retail Price", "Tax Percentage", "Option Name", "Option Type"]
["20042", "Blossom Wall Art", "245.00", "1", "", ""]
Importing CSV with line breaks in Excel 2007
I have finally found the problem!
It turns out that we were writing the file using Unicode encoding, rather than ASCII or UTF-8. Changing the encoding on the FileStream seems to solve the problem.
Thanks everyone for all your suggestions!
Related Topics
Why Bundle Install Is Installing Gems in Vendor/Bundle
How to Specify a Required Switch (Not Argument) with Ruby Optionparser
Ruby: Character to Ascii from a String
Directory Layout for Pure Ruby Project
Parse CSV File with Header Fields as Attributes for Each Row
What's the Difference Between "Includes" and "Preload" in an Activerecord Query
Change the Context/Binding Inside a Block in Ruby
When to Use Curly Braces VS Parenthesis in Expect Rspec Method
Ruby Invalid Byte Sequence in Utf-8
How to Generate Rdoc for (All Of) Rails
Find Classes Available in a Module
What Is the Best/How to Validate an Email Address in Ruby
How to Create a Full Audit Log in Rails for Every Table
How to Loop Over a Hash of Hashes
What's the Best Way to Talk to a Database While Using Sinatra