Removing Rows and Columns in Python CSV Module

Removing rows and columns in python CSV module

I have tried to provide you an answer as close as possible than what you have done so far.

Prototype:

import csv

with open('aquastat.csv', 'r') as csv_file:
  csv_reader = csv.reader(csv_file)
  final_file_name = "final_water.data.csv"
  final_file = open(final_file_name,'w')
  csv_writer = csv.writer(final_file,delimiter="\t")
  for row in csv_reader:
    if len(row) >= 6:
        row = [row[0], row[4], row[5]]
        csv_writer.writerow(row)
  final_file.close()

explanations:

Before the line csv_writer.writerow(row) where you output the row in the output csv file. I have added the line row = [row[0], row[4], row[5]] where I overwrite the content of the array row by an array containing only 3 cells, those cells are respectively taken from the Area, Year, Value columns
On top of this, I have added a the if condition if len(row) >= 6: to check that you have at least enough elements in your row to extract the columns until Value.

input:

"Area","Area Id","Variable Name","Variable Id","Year","Value","Symbol","Md"
"Afghanistan",2,"Total area of the country",4100,1977,65286.0,"E","",""
"Afghanistan",2,"Total area of the country",4100,1982,65286.0,"E","",""
"Afghanistan",2,"Total area of the country",4100,1987,65286.0,"E","",""
"Afghanistan",2,"Total area of the country",4100,1992,65286.0,"E","",""
"Afghanistan",2,"Total area of the country",4100,1997,65286.0,"E","",""
"Afghanistan",2,"Total area of the country",4100,2002,65286.0,"E","",""

output:

Area    Year    Value
Afghanistan     1977    65286.0
Afghanistan     1982    65286.0
Afghanistan     1987    65286.0
Afghanistan     1992    65286.0
Afghanistan     1997    65286.0
Afghanistan     2002    65286.0

Delete specific rows and columns from csv using Python in one step

Here is a pandas approach:

Step 1, creating a sample dataframe

import pandas as pd

# Create sample CSV-file (100x100)
df = pd.DataFrame(np.arange(10000).reshape(100,100))
df.to_csv('test.csv', index=False)

Step 2, doing the magic

import pandas as pd
import numpy as np

# Read first row to determine size of columns
size = pd.read_csv('test.csv',nrows=0).shape[1]

#want to remove columns 25 to 29, in addition to columns 3 to 18 already specified,
# Ok so let's create an array with the length of dataframe deleting the ranges
ranges = np.r_[3:19,25:30]
ar = np.delete(np.arange(size),ranges)

# Now let's read the dataframe
# let us also skip rows 2 and 3
df = pd.read_csv('test.csv', skiprows=[2,3], usecols=ar)

# And output
dt.to_csv('output.csv', index=False)

And the proof:

enter image description here

How can I remove a row from csv file with python where a value of any column is missing

Try this:

with open(in_filename) as infile:
    with open(out_filename, 'w') as outfile:
        reader = csv.reader(infile)
        writer = csv.writer(outfile)
        writer.writerows(filter(all, reader))

how to delete rows from a csv file which string in 1st column is the same of string in 1st column of another csv?

I found a solution. Since the entire file 2 had to be removed from file 1, I did the following command, which informed just the first column to be compared, and it worked:

df1.loc[pd.merge(df1, df2, on=['Genus'], how='left', indicator=True)['_merge'] == 'left_only']

Thanks for you time!
AP

Remove row from CSV file python

A way I have done so in the past is to use a pandas dataframe and the drop function based on the row index or label.

For example:

import pandas as pd
df = pd.read_csv('yourFile.csv')

newDf = df.drop('rowLabel')

or by index position:

newDf =df.drop(df.index[indexNumber])

Removing Rows and Columns in Python CSV Module