Python - splitting data as columns in csv file
You can create three separate lists, and then append to each using csv.reader
.
import csv
c1 = []
c2 = []
c3 = []
with open('Half-life.csv', 'r') as f:
reader = csv.reader(f, delimiter=',')
for row in reader:
c1.append(row[0])
c2.append(row[1])
c3.append(row[2])
Splitting a column into 2 in a csv file using python
Your data have only one column and a tab delimiter:
pd.read_csv('test.csv', quoting=1, header=None, squeeze=True) \
.str.split('\t', expand=True) \
.to_csv('result.csv', index=False, header=False)
Split columns in Csv file
Assuming you have a file like so:
123 A=2,B=asdjhf,C=jhdkfhskdf,D=1254
54878754 A=45786,D=asgfd,C=1234
and your file is not huge, you can append to the dataframe iteratively:
df = pd.DataFrame(columns=["sku", "A", "B", "C", "D"])
with open("data_mangled.csv") as f:
for line in f:
d = {}
col1, col2 = line.split()
d["sku"] = col1
cols = col2.split(",")
for item in cols:
k,v = item.split("=")
d[k] = v
for col in df.columns: # add potentially missing columns as None
if col not in d:
d[col] = None
df = df.append(d, ignore_index=True)
print(df)
This would also deal with a situation where some of the column names are missing in the second place or are switched.
Output:
sku A B C D
0 123 2 asdjhf jhdkfhskdf 1254
1 54878754 45786 None 1234 asgfd
EDIT: For your specific data:
with open("data_real.txt") as f:
# use the first line as column names in the dataframe
col_names = f.readline()
df = pd.DataFrame(columns=col_names.split(","))
print(col_names)
for line in f:
d = {}
# lines have more than 2 columns, but the trailing values are empty
# so the format is col1,large_col2,,,,,,,
col1, *col2 = line.split(",")
d["sku"] = col1
for item in col2:
try:
if item.strip(): # disregard the empty trailing columns
k,v = item.split("=")
# we split on comma, so have occasional "
k = k.strip('"')
v = v.strip('"')
d[k] = v
except ValueError as e:
# there is a column value with missing key
print("Could not assign to column:", d["sku"], item)
for col in df.columns:
if col not in d:
d[col] = None
df = df.append(d, ignore_index=True)
print(df)
df.to_csv("data_parsed.csv") # save
One of the columns was not in the key=value format:
Could not assign to column: PRACLA16 16 months on less
Note: newer Python versions will complain that append
is deprecated, I chose to ignore this here, can be solved by converting the dict to a dataframe and joining the two dataframes.
Split CSV file in Python with semicolon separating the records
How about this:
with open("data.csv") as f:
array = [l.split(",") for l in f.readline().split(";") if l]
print(len(array))
print(array[1][0])
Output: where 3
is the number of lists within the array and each list has 16
values.
3
20210402
The above allows for:
Just looking to be able to address as array[r][c] where r is 0 to 287
and c is 0 to 16.
I've assumed that your data is one long continuous string, as shown in your question.
If you feel like it, this can be easily dumped to a pandas DataFrame
and then to a proper .csv
file:
import pandas as pd
with open("data.csv") as f:
array = [l.split(",") for l in f.readline().split(";") if l]
pd.DataFrame(array).to_csv("your_array.csv", header=False, index=False)
How can I split a csv file into two files based on values in a column in Python?
Is this what you are looking for ?
import pandas
data = pandas.read_csv('mycsv.csv')
# just make sure to remove the quotation if the numbers are not string
csv2 = data[data['item_number'].isin(['5678','6789'])]
csv1 = data[data['item_number'].isin(['1234','2345']]
csv1.to_csv('csv1.csv', index=False)
csv2.to_csv('csv2.csv', index=False)
Need to split one Columns data into different columns in Pandas Data frame
I believe the idea of the code below is quite clear. First we need to correct data in the csv file to valid csv (comma separated value) format. After that we can create dataframe.
'data.csv' file content
"Date InformationIdNo."," Date out ","Dr."," Cr."
"01 FEB 21 Mart Purchase MATRSC203255H","30 DEC 21","-3,535.61","0"
"250 - PQRT14225","","",""
"01 FEB 21 Cash Sales CCTR220307AXCDV","30 DEC 21","-34.33","0"
"20000 - DEFG12","","",""
"01 FEB 21 TransferFT22032FQWE3"," 01 FEB 21","0","7,426.93"
"","","",""
"","","",""
"","","",""
"","","",""
"","","",""
Possible (quick) solution is the following:
#pip install pandas
import re
import pandas as pd
from io import StringIO
with open("data.csv", "r", encoding='utf-8') as file:
raw_data = file.read()
# convert txt to valid csv (comma separated values) format
raw_data = raw_data.replace(' - ', '-')
raw_data = raw_data.replace('Date InformationIdNo.', 'Date","Information","IdNo.')
raw_data = raw_data.replace('" Cr."', '"Cr","Information_add"')
raw_data = re.sub('(\d{2} [A-Z]{3} \d{2})', r'\1","', raw_data)
raw_data = re.sub('\n"([A-Z0-9-]+)","","",""\n', r',"\1"\n', raw_data)
raw_data = re.sub(r',""{2,}', '', raw_data)
raw_data = re.sub('([A-Z0-9]{3,}",")', r'","\1","', raw_data)
raw_data = re.sub(',""+', r'', raw_data)
raw_data = re.sub('\n""+', r'', raw_data)
# # create dataframe and replace NaN with ""
df = pd.read_csv(StringIO(raw_data), sep=",")
df.fillna("", inplace=True)
# merge columns and drop temporary column
df['Information'] = df['Information'] + df['Information_add']
df.drop(['Information_add'], axis=1, inplace=True)
# cleanup column headers
df.columns = [name.strip() for name in df.columns]
# convert date to datetime format
df['Date'] = pd.to_datetime(df['Date'].str.title().str.strip(), format="%d %b %y", dayfirst=True)
df['Date out'] = pd.to_datetime(df['Date out'].str.title().str.strip(), format="%d %b %y", dayfirst=True)
df
Returns
Related Topics
Python: Element Is Not Attached to the Page Document
How to Block Comment Code in the Ipython Notebook
Python: Pickle.Load() Raising Eoferror
No Unique Mode; Found 2 Equally Common Values
How to Check the Date Is Empty Using Python
Python: How to Split a List Based on a Specific Element
Pandas | Merge Rows With Same Id
Drop Non-Numeric Columns from a Pandas Dataframe
Find the Longest Substring in Alphabetical Order
Convenient Way to Handle Deeply Nested Dictionary in Python
Typeerror: Missing 1 Required Positional Argument: 'Self'
Plot Two Histograms on Single Chart With Matplotlib
Python Anaconda - How to Safely Uninstall
How to Change a Two Dimensional Array to One Dimensional
How to Save Opened Page as Pdf in Selenium (Python)
Find Matching Rows in 2 Dimensional Numpy Array
Convert String from Big-Endian to Little-Endian or Vice Versa in Python