How to Split a CSV File Row to Columns in Python

Python - splitting data as columns in csv file

You can create three separate lists, and then append to each using csv.reader.

import csv

c1 = []
c2 = []
c3 = []
with open('Half-life.csv', 'r') as f:
reader = csv.reader(f, delimiter=',')
for row in reader:
c1.append(row[0])
c2.append(row[1])
c3.append(row[2])

Splitting a column into 2 in a csv file using python

Your data have only one column and a tab delimiter:

pd.read_csv('test.csv', quoting=1, header=None, squeeze=True) \
.str.split('\t', expand=True) \
.to_csv('result.csv', index=False, header=False)

Split columns in Csv file

Assuming you have a file like so:

123     A=2,B=asdjhf,C=jhdkfhskdf,D=1254
54878754 A=45786,D=asgfd,C=1234

and your file is not huge, you can append to the dataframe iteratively:

df = pd.DataFrame(columns=["sku", "A", "B", "C", "D"])

with open("data_mangled.csv") as f:
for line in f:
d = {}
col1, col2 = line.split()
d["sku"] = col1
cols = col2.split(",")
for item in cols:
k,v = item.split("=")
d[k] = v
for col in df.columns: # add potentially missing columns as None
if col not in d:
d[col] = None
df = df.append(d, ignore_index=True)
print(df)

This would also deal with a situation where some of the column names are missing in the second place or are switched.

Output:

        sku      A       B           C      D
0 123 2 asdjhf jhdkfhskdf 1254
1 54878754 45786 None 1234 asgfd

EDIT: For your specific data:

with open("data_real.txt") as f:
# use the first line as column names in the dataframe
col_names = f.readline()
df = pd.DataFrame(columns=col_names.split(","))
print(col_names)

for line in f:
d = {}
# lines have more than 2 columns, but the trailing values are empty
# so the format is col1,large_col2,,,,,,,
col1, *col2 = line.split(",")
d["sku"] = col1
for item in col2:
try:
if item.strip(): # disregard the empty trailing columns
k,v = item.split("=")
# we split on comma, so have occasional "
k = k.strip('"')
v = v.strip('"')
d[k] = v
except ValueError as e:
# there is a column value with missing key
print("Could not assign to column:", d["sku"], item)
for col in df.columns:
if col not in d:
d[col] = None
df = df.append(d, ignore_index=True)

print(df)
df.to_csv("data_parsed.csv") # save

One of the columns was not in the key=value format:
Could not assign to column: PRACLA16 16 months on less

Note: newer Python versions will complain that append is deprecated, I chose to ignore this here, can be solved by converting the dict to a dataframe and joining the two dataframes.

Split CSV file in Python with semicolon separating the records

How about this:

with open("data.csv") as f:
array = [l.split(",") for l in f.readline().split(";") if l]

print(len(array))
print(array[1][0])

Output: where 3 is the number of lists within the array and each list has 16 values.

3
20210402

The above allows for:

Just looking to be able to address as array[r][c] where r is 0 to 287
and c is 0 to 16.

I've assumed that your data is one long continuous string, as shown in your question.

If you feel like it, this can be easily dumped to a pandas DataFrame and then to a proper .csv file:

import pandas as pd

with open("data.csv") as f:
array = [l.split(",") for l in f.readline().split(";") if l]

pd.DataFrame(array).to_csv("your_array.csv", header=False, index=False)

How can I split a csv file into two files based on values in a column in Python?

Is this what you are looking for ?

import pandas
data = pandas.read_csv('mycsv.csv')


# just make sure to remove the quotation if the numbers are not string
csv2 = data[data['item_number'].isin(['5678','6789'])]
csv1 = data[data['item_number'].isin(['1234','2345']]

csv1.to_csv('csv1.csv', index=False)
csv2.to_csv('csv2.csv', index=False)

Need to split one Columns data into different columns in Pandas Data frame

I believe the idea of the code below is quite clear. First we need to correct data in the csv file to valid csv (comma separated value) format. After that we can create dataframe.

'data.csv' file content

"Date InformationIdNo."," Date out ","Dr."," Cr."
"01 FEB 21 Mart Purchase MATRSC203255H","30 DEC 21","-3,535.61","0"
"250 - PQRT14225","","",""
"01 FEB 21 Cash Sales CCTR220307AXCDV","30 DEC 21","-34.33","0"
"20000 - DEFG12","","",""
"01 FEB 21 TransferFT22032FQWE3"," 01 FEB 21","0","7,426.93"
"","","",""
"","","",""
"","","",""
"","","",""
"","","",""

Possible (quick) solution is the following:

#pip install pandas

import re
import pandas as pd
from io import StringIO

with open("data.csv", "r", encoding='utf-8') as file:
raw_data = file.read()

# convert txt to valid csv (comma separated values) format
raw_data = raw_data.replace(' - ', '-')
raw_data = raw_data.replace('Date InformationIdNo.', 'Date","Information","IdNo.')
raw_data = raw_data.replace('" Cr."', '"Cr","Information_add"')
raw_data = re.sub('(\d{2} [A-Z]{3} \d{2})', r'\1","', raw_data)
raw_data = re.sub('\n"([A-Z0-9-]+)","","",""\n', r',"\1"\n', raw_data)
raw_data = re.sub(r',""{2,}', '', raw_data)
raw_data = re.sub('([A-Z0-9]{3,}",")', r'","\1","', raw_data)
raw_data = re.sub(',""+', r'', raw_data)
raw_data = re.sub('\n""+', r'', raw_data)

# # create dataframe and replace NaN with ""
df = pd.read_csv(StringIO(raw_data), sep=",")
df.fillna("", inplace=True)

# merge columns and drop temporary column
df['Information'] = df['Information'] + df['Information_add']
df.drop(['Information_add'], axis=1, inplace=True)

# cleanup column headers
df.columns = [name.strip() for name in df.columns]

# convert date to datetime format
df['Date'] = pd.to_datetime(df['Date'].str.title().str.strip(), format="%d %b %y", dayfirst=True)
df['Date out'] = pd.to_datetime(df['Date out'].str.title().str.strip(), format="%d %b %y", dayfirst=True)

df

Returns

Sample Image



Related Topics



Leave a reply



Submit