Reading a CSV File Organized Horizontally

Reading a CSV file organized horizontally

Let's say your file is called 'data.csv' and it contains:

var1,1,2,3,4,5,6
var2,2.1,3.9,4.6,5.2,6.1
var3,M,F,M,F,M,M

Note var1 and var3 have 6 values but var2 has only 5.
So, the idea is to read the data, transpose it and then use read.csv.

read.tcsv = function(file, header=TRUE, sep=",", ...) {

n = max(count.fields(file, sep=sep), na.rm=TRUE)
x = readLines(file)

.splitvar = function(x, sep, n) {
var = unlist(strsplit(x, split=sep))
length(var) = n
return(var)
}

x = do.call(cbind, lapply(x, .splitvar, sep=sep, n=n))
x = apply(x, 1, paste, collapse=sep)
out = read.csv(text=x, sep=sep, header=header, ...)
return(out)

}

Then, you can do:

read.tcsv("data.csv")

  var1 var2 var3
1 1 2.1 M
2 2 3.9 F
3 3 4.6 M
4 4 5.2 F
5 5 6.1 M
6 6 NA M

Read in CSV files horizontally using csvreader in python

I know you are trying to use the csv library, but this is easy with pandas and the transpose operation.

import pandas as pd

df = pd.read_csv('horizontal.csv', index_col=0)

>>> df
John Susan
Name
Date 3/14/2019 3/14/2019
Job Doctor Cashier

>>> df.T
Name Date Job
John 3/14/2019 Doctor
Susan 3/14/2019 Cashier

poorly' organized csv file

from io import StringIO
import pandas as pd

data ="""
TIME,HDRA-1,HDRA-2,HDRA-3,HDRA-4
0.473934934,0.944026678,0.460177668,0.157028404,0.221362174
0.911384892,0.336694914,0.586014563,0.828339071,0.632790473
0.772652589,0.318146985,0.162987171,0.555896202,0.659099194
0.541382917,0.033706768,0.229596419,0.388057901,0.465507295
0.462815443,0.088206108,0.717132904,0.545779038,0.268174922
0.522861489,0.736462083,0.532785319,0.961993893,0.393424116

TIME,HDRB-1,HDRB-2,HDRB-3,HDRB-4
0.92264286,0.026312552,0.905839375,0.869477136,0.985560264
0.410573341,0.004825381,0.920616162,0.19473237,0.848603523
0.999293171,0.259955029,0.380094352,0.101050014,0.428047493
0.820216119,0.655118219,0.586754951,0.568492346,0.017038336
0.040384337,0.195101879,0.778631044,0.655215972,0.701596844

TIME,HDRB-1,HDRB-2,HDRB-3,HDRB-4
0.342418934,0.290979228,0.84201758,0.690964176,0.927385229
0.173485057,0.214049903,0.27438753,0.433904377,0.821778689
0.982816721,0.094490904,0.105895645,0.894103833,0.34362529
0.738593272,0.423470984,0.343551191,0.192169774,0.907698897
"""

df = pd.read_csv(StringIO(data), header=None)

start_marker = 'TIME'
grouper = (df.iloc[:, 0] == start_marker).cumsum()
groups = df.groupby(grouper)

frames = [gr.T.set_index(gr.index[0]).T for _, gr in groups]

frames

Parsing horizontally aligned text file to DataFrame

Is this what you're looking for?

I've done this without pandas, but the output, based on the file sample from the question, resembles the one you expect.

import csv
import itertools
from collections import defaultdict

with open("sample.txt") as f:
lines = f.readlines()

lines = [l.strip().split(' ') for l in lines if l != '//']

data = defaultdict(list)

for line in lines:
key, values, = line
data[key].append(''.join([v for v in values.split(";")]))

with open("test.csv", "w") as outfile:
writer = csv.writer(outfile)
writer.writerow(data.keys())
writer.writerows(itertools.zip_longest(*data.values()))

Output:

Sample Image

EDIT: with pandas

import pandas as pd

codes = [
'ID', 'AC', 'AS', 'SY', 'DR', 'RX', 'WW', 'CC', 'ST', 'DI', 'OX', 'HI',
'OI', 'SX', 'AG', 'CA', 'DT',
]

with open("sample.txt") as f:
lines = f.readlines()

lines = [l.strip().split(' ') for l in lines if l != '//']
print(lines)

data = {c: [] for c in codes}

for line in lines:
key, values, = line
data[key].append(''.join([v for v in values.split(";")]))

df = pd.DataFrame.from_dict(data, orient='index').transpose()
df.to_csv("test_2.csv", index=False)

How to vertically align comma separated values in Notepad++?

You can use the TextFX plugin:
Edit > Line Up multiple lines by ...

Note: This doesn't work if the file is read only.

http://tomaslind.net/2016/02/18/how-to-align-columns-in-notepad/

Update 2019: Download link from SourceForge



Related Topics



Leave a reply



Submit