How to Parse a CSV File, Update a Field, Then Save

How to parse a CSV file, update a field, then save

What I would do is write the updated records to a new file and then, if you want, after you have finished your program can delete the old file and rename the new file to have the original file name.

To do this I would open the output file at the top of your code, outside the each_with_index loop. e.g.

csv_out = CSV::Writer.generate(File.open('new.csv', 'wb'))

Then inside your each_with_index loop you can write the current row to the new file like this:

csv_out << row

Then at the end you can close the new file:

csv_out.close

As mentioned in the comments, CSV::Writer is no longer in the standard library. An equivalent version of the code using the newer CSV.foreach (for reading) and CSV.open (for writing) is:

CSV.open("path/to/updated_file.csv", "wb") do |csv_out|
CSV.foreach("#{RAILS_ROOT}/doc/some.csv") do |row|
address = row[5]
l = Location.address_find(address)
if l != nil
puts "#{l.name} at #{l.address}"
row[14] = l.store_code
puts row[14]
else
puts "No matching address Found!!!"
end
csv_out << row
end
end

update existing csv file with updated info in another csv file

The first time when you use the condition in update you consume the entire input file. Because update is basically a generator, you exhaust it when you loop over it.

Also, your condition checks if exactly the same line exists in the update file, which of course it doesn't (you would not want or need to update anything if the data was exactly the same).

You want to read the update lines into memory once, then skip those lines from the master file when you see a line with the same key (not the whole line!)

I assume the first field (capcode) is the key here, though there could be other arrangements.

Tangentially, you can combine all the with statements; and when you use with open, there is no need to .close() anything.

#!/usr/bin/env python3
import csv


fields = ['capcode', 'discipline', 'region', 'location', 'description', 'remark']

with open('bommel_db_capcodes.txt', 'r') as readFile_bommel, \
open('results.csv', 'w') as results, \
open('zulu_db_capcodes.txt', 'r') as readFile_zulu:

master = csv.DictReader(readFile_zulu, fieldnames=fields)
update = csv.DictReader(readFile_bommel, fieldnames=fields)
writer = csv.DictWriter(results, fieldnames=fields)

# Save header to output file and skip
writer.writerow(next(master))

# Skip header from updates
next(update)

# Read, remember, and write updated lines
seen = set()
for row in update:
writer.writerow(row)
seen.add(row['capcode'])

for row in master:
if row['capcode'] not in seen:
writer.writerow(row)

Demo: https://ideone.com/7Aj1PQ

How to update rows in a CSV file

With the csv module you can iterate over the rows and access each one as a dict. As also noted here, the preferred way to update a file is by using temporary file.

from tempfile import NamedTemporaryFile
import shutil
import csv

filename = 'my.csv'
tempfile = NamedTemporaryFile(mode='w', delete=False)

fields = ['ID', 'Name', 'Course', 'Year']

with open(filename, 'r') as csvfile, tempfile:
reader = csv.DictReader(csvfile, fieldnames=fields)
writer = csv.DictWriter(tempfile, fieldnames=fields)
for row in reader:
if row['ID'] == str(stud_ID):
print('updating row', row['ID'])
row['Name'], row['Course'], row['Year'] = stud_name, stud_course, stud_year
row = {'ID': row['ID'], 'Name': row['Name'], 'Course': row['Course'], 'Year': row['Year']}
writer.writerow(row)

shutil.move(tempfile.name, filename)

If that's still not working you might try one of these encodings:

with open(filename, 'r', encoding='utf8') as csvfile, tempfile:
with open(filename, 'r', encoding='ascii') as csvfile, tempfile:

Edit: added str, print and encodings

python update a column value of a csv file according to another csv file

Thanks for making the question clearer. This code does not modify file A inplace and instead it uses output file fileC.

import csv #imports module csv

filea = "fileA.csv"
fileb = "fileB.csv"
output = "fileC.csv"

delim = ";" #set your own delimiter

source1 = csv.reader(open(filea,"r"),delimiter=delim)
source2 = csv.reader(open(fileb,"r"),delimiter=delim)
#open csv readers

source2_dict = {}

# prepare changes from file B
for row in source2:
source2_dict[row[0]] = row[1]

# write new changed rows
with open(output, "w") as fout:
csvwriter = csv.writer(fout, delimiter=delim)
for row in source1:
# needs to check whether there are any changes prepared
if row[1] in source2_dict:
# change the item
row[3] = source2_dict[row[1]]
csvwriter.writerow(row)

I hope I understood your intention well.

Just a short explanation of the steps:

  • First you specify the paths to source files and an output file and
    you also specify the delimiter.
  • Then you load CSV readers using csv module.
  • You read all the changes from source file B and store it in a

    dictionary.
  • And then you iterate through file A, modify the row when necessary
    and then you save it to output file.

Update column value of CSV in C#

there is a lot of misconception in code you provided, and some of the solutions for your problem might not be begginer friendly.
Especially when they are not 'global' solutions. For your case I tried to explain parts of code in comments

using System.Text.RegularExpressions;

var csvFilePath = @"C:\test.csv";

// Split csv file into lines instead of raw text.
string[] csvText = File.ReadAllLines(csvFilePath);

var models = new List<TestDataModel>();

// Regex that matches your CSV file.
// Explained here: https://regex101.com/r/t589CW/1
var csvRegex = new Regex("\"(.*)\",\"(.*)\"");
for (int i = 0; i < csvText.Length; i++)
{
// Skip headers of file.
// That is: "Name","Age"
if (i == 0)
{
continue;
}

// Check for potential white spaces at the end of the file.
if (string.IsNullOrWhiteSpace(csvText[i]))
{
continue;
}

models.Add(new TestDataModel
{
// Getting a name from regex group match.
Name = csvRegex.Match(csvText[i]).Groups[1].Value,

// Getting an age from regex group and parse it into integer.
Age = int.Parse(csvRegex.Match(csvText[i]).Groups[2].Value),
});
}

// Creating headers for altered CSV.
string alteredCsv = "\"Name\",\"Age\"\n";

// Loop through your models to modify them as you wish and add csv text in correct format.
foreach (var testDataModel in models)
{
testDataModel.Name = testDataModel.Name.Replace('m', 'n');
alteredCsv += $"\"{testDataModel.Name}\",\"{testDataModel.Age}\"\n";
}

var outputFilePath = @"C:\test2.csv";
File.WriteAllText(outputFilePath, alteredCsv);

public class TestDataModel
{
public string Name { get; set; }
public int Age { get; set; }
}

However this answer contains many topics that you might want to get familiar with such as:

  • Regex/Regex in C#
  • Data Serialization/Deserialization
  • Working with Linq
  • String templates
  • I/O Operations

Is it possible to update part of a CSV file without needing knowledge of the rest?

The way modern file systems work you can only update any file in place if the new data is the exact same size as the original. Otherwise you must re-write the entire file from scratch. If you can meet this constraint, you can do it with low-level file streams. I don't know of a csv package that supports this off the top of my head, but the reason for this is that csv is simple enough you can do it on your own.

That said, if you are updating every row anyway then re-writing the file probably isn't that big of a deal. Writing a csv record is dead simple. Observe the following C# code:

public WriteRecord(IEnumerable items, TextWriter outputStream))
{
string delimiter = "";
foreach(var item in items)
{
outputStream.Write(delimiter);
outputStream.Write("\"");
outputStream.Write(item.ToString().Replace("\"", "\"\""));
outputStream.Write("\"");
delimiter = ",";
}
outputStream.Write(Environment.Newline);
}

Of course, if you have complex types that you want to be more picky about, that's fine, but since you don't want to constrain yourself to specific future column arrangements this code should be just fine. Additionally, it will complement my own CSV parser listed here on Stack Overflow, which does not require advance knowledge of the columns in the file. You could do something like this:

var tempPath = @"Some-temp-file-path.csv";
var srcPath = @"input-file-path.csv";
using (var outFile = new StreamWriter(tempPath))
{
foreach (var items in CSV.FromFile(srcPath))
{
items[someInt] = "new value";
items[otherInt] = "other value";
WriteRecord(items, outFile);
}

}
File.Copy(tempPath, srcPath);


Related Topics



Leave a reply



Submit