Reading a CSV file organized horizontally
Let's say your file is called 'data.csv' and it contains:
var1,1,2,3,4,5,6
var2,2.1,3.9,4.6,5.2,6.1
var3,M,F,M,F,M,M
Note var1
and var3
have 6 values but var2
has only 5.
So, the idea is to read the data, transpose it and then use read.csv
.
read.tcsv = function(file, header=TRUE, sep=",", ...) {
n = max(count.fields(file, sep=sep), na.rm=TRUE)
x = readLines(file)
.splitvar = function(x, sep, n) {
var = unlist(strsplit(x, split=sep))
length(var) = n
return(var)
}
x = do.call(cbind, lapply(x, .splitvar, sep=sep, n=n))
x = apply(x, 1, paste, collapse=sep)
out = read.csv(text=x, sep=sep, header=header, ...)
return(out)
}
Then, you can do:
read.tcsv("data.csv")
var1 var2 var3
1 1 2.1 M
2 2 3.9 F
3 3 4.6 M
4 4 5.2 F
5 5 6.1 M
6 6 NA M
Read in CSV files horizontally using csvreader in python
I know you are trying to use the csv library, but this is easy with pandas and the transpose operation.
import pandas as pd
df = pd.read_csv('horizontal.csv', index_col=0)
>>> df
John Susan
Name
Date 3/14/2019 3/14/2019
Job Doctor Cashier
>>> df.T
Name Date Job
John 3/14/2019 Doctor
Susan 3/14/2019 Cashier
poorly' organized csv file
from io import StringIO
import pandas as pd
data ="""
TIME,HDRA-1,HDRA-2,HDRA-3,HDRA-4
0.473934934,0.944026678,0.460177668,0.157028404,0.221362174
0.911384892,0.336694914,0.586014563,0.828339071,0.632790473
0.772652589,0.318146985,0.162987171,0.555896202,0.659099194
0.541382917,0.033706768,0.229596419,0.388057901,0.465507295
0.462815443,0.088206108,0.717132904,0.545779038,0.268174922
0.522861489,0.736462083,0.532785319,0.961993893,0.393424116
TIME,HDRB-1,HDRB-2,HDRB-3,HDRB-4
0.92264286,0.026312552,0.905839375,0.869477136,0.985560264
0.410573341,0.004825381,0.920616162,0.19473237,0.848603523
0.999293171,0.259955029,0.380094352,0.101050014,0.428047493
0.820216119,0.655118219,0.586754951,0.568492346,0.017038336
0.040384337,0.195101879,0.778631044,0.655215972,0.701596844
TIME,HDRB-1,HDRB-2,HDRB-3,HDRB-4
0.342418934,0.290979228,0.84201758,0.690964176,0.927385229
0.173485057,0.214049903,0.27438753,0.433904377,0.821778689
0.982816721,0.094490904,0.105895645,0.894103833,0.34362529
0.738593272,0.423470984,0.343551191,0.192169774,0.907698897
"""
df = pd.read_csv(StringIO(data), header=None)
start_marker = 'TIME'
grouper = (df.iloc[:, 0] == start_marker).cumsum()
groups = df.groupby(grouper)
frames = [gr.T.set_index(gr.index[0]).T for _, gr in groups]
Parsing horizontally aligned text file to DataFrame
Is this what you're looking for?
I've done this without pandas
, but the output, based on the file sample from the question, resembles the one you expect.
import csv
import itertools
from collections import defaultdict
with open("sample.txt") as f:
lines = f.readlines()
lines = [l.strip().split(' ') for l in lines if l != '//']
data = defaultdict(list)
for line in lines:
key, values, = line
data[key].append(''.join([v for v in values.split(";")]))
with open("test.csv", "w") as outfile:
writer = csv.writer(outfile)
writer.writerow(data.keys())
writer.writerows(itertools.zip_longest(*data.values()))
Output:
EDIT: with pandas
import pandas as pd
codes = [
'ID', 'AC', 'AS', 'SY', 'DR', 'RX', 'WW', 'CC', 'ST', 'DI', 'OX', 'HI',
'OI', 'SX', 'AG', 'CA', 'DT',
]
with open("sample.txt") as f:
lines = f.readlines()
lines = [l.strip().split(' ') for l in lines if l != '//']
print(lines)
data = {c: [] for c in codes}
for line in lines:
key, values, = line
data[key].append(''.join([v for v in values.split(";")]))
df = pd.DataFrame.from_dict(data, orient='index').transpose()
df.to_csv("test_2.csv", index=False)
How to vertically align comma separated values in Notepad++?
You can use the TextFX plugin:
Edit > Line Up multiple lines by ...
Note: This doesn't work if the file is read only.
http://tomaslind.net/2016/02/18/how-to-align-columns-in-notepad/
Update 2019: Download link from SourceForge
Related Topics
Dplyr Without Hard-Coding the Variable Names
Read Multiple Xlsx Files with Multiple Sheets into One R Data Frame
Overlay Geom_Points() on Geom_Boxplot(Fill=Group)
R Optimization with Equality and Inequality Constraints
Passing Along Ellipsis Arguments to Two Different Functions
Reduce File Size of R Markdown HTML Output
Correctly Color Vertices in R Igraph
Finding Overlapping Ranges Between Two Interval Data
How to Italicize One Category in a Legend in Ggplot2
Knitr: Include Figures in Report *And* Output Figures to Separate Files
Displaying True When Shiny Files Are Split into Different Folders
Grouping Every N Minutes with Dplyr
Remove Part of a String in Dataframe Column (R)
Dealing with Spaces and "Weird" Characters in Column Names with Dplyr::Rename()