Parsing a structured text file in Perl
Reading the file line by line is a good way to go. Here I am creating a hash of array references. This is how you would just read one file. You could read each file this way and put the hash of arrays into a hash of hashes of array.
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
my %contents;
my $key;
while(<DATA>){
chomp;
if ( s/:\s*$// ) {
$key = $_;
} else {
s/^\s+//g; # remove extra whitespace
push @{$contents{$key}}, $_;
}
}
print Dumper \%contents;
__DATA__
name:
John Smith
occupation:
Electrician
date of birth:
2/6/1961
hobbies:
Boating
Camping
Fishing
Output:
$VAR1 = {
'occupation' => [
'Electrician'
],
'hobbies' => [
'Boating',
'Camping',
'Fishing'
],
'name' => [
'JohnSmith'
],
'date of birth' => [
'2/6/1961'
]
};
Parsing of Text File using Perl
This Perl program does what you ask. It allows for any number of fields for each parameter (although there must be the same number of fields for every parameter) and takes the header labels for the fields from the data itself.
use strict;
use warnings;
my $file = 'sample.txt';
open my $fh, '<', $file or die qq{Can't open "$file" for input: $!};
my %data;
my @params;
my @fields;
while (<$fh>) {
next unless /\S/;
chomp;
my ($key, $val) = split /\s*:\s*/;
if ($val =~ /\S/) {
push @fields, $key if @params == 1;
push @{ $data{$params[-1]} }, $val if @params;
}
else {
die qq{Unexpected parameter format "$key"} unless $key =~ /parameter\s+(\d+)/i;
push @params, $1;
}
}
my @headers = ('Parameter', @fields);
my @widths = map length, @headers;
my $format = join(' ', map "%${_}s", @widths) . "\n";
printf $format, @headers;
for my $param (@params) {
printf $format, $param, @{ $data{$param} };
}
output
Parameter Field 1 Field 2 Field 3
0 100 0 4
1 873 23 89
How to parse a text file to csv file using Perl
The key part here is how to manipulate your data so to extract what need be printed for each line. Then you are best off using a module to produce valid CSV, and Text::CSV is very good.
A program using an array of small hashrefs, mimicking data in the question
use strict;
use warnings;
use feature 'say';
use Text::CSV;
my @data = (
{ name => 'A', age => 1, weight => 10 },
{ name => 'B', age => 2, weight => 20 },
);
my $csv = Text::CSV->new({ binary => 1, auto_diag => 2 });
my $outfile = 'test.csv';
open my $ofh, '>', $outfile or die "Can't open $outfile: $!";
# Header, also used below for order of values for fields
my @hdr = qw(name age weight);
$csv->say($ofh, \@hdr);
foreach my $href (@data) {
$csv->say($ofh, [ @{$href}{@hdr} ]);
}
The values from hashrefs in a desired order are extracted using a hashref slice @{$href}{@hdr}
, what is in general
@{ expression returning hash reference } { list of keys }
This returns a list of values for the given list of keys, from the hashref that the expression in the block {}
must return. That is then used to build an arrayref (an anonymous array here, using []
), what the module's say
method needs in order to make and print a string of comma-separated-values† from that list of values.
Note a block that evaluates to a hash reference, used instead of a hash name that is used for a slice of a hash. This is a general rule that
Anywhere you'd put an identifier (or chain of identifiers) as part of a variable or subroutine name, you can replace the identifier with a BLOCK returning a reference of the correct type.
Some further comments
Look over the supported constructor's attributes; there are many goodies
For very simple data you can simply join fields with a comma and print
say $ofh join ',', @{$href}{@hdr};
But it is far safer to use a module to construct a valid CSV record. With the right choice of attributes in the constructor it can handle whatever is legal to embed in fields (some of what can take quite a bit of work to do correctly by hand) and it calls things which aren't
I list column names explicitly. Instead, you can fetch the
keys
and thensort
in a desired order, but this will again need a hard-coded list for sorting
The program creates the file test.csv
and prints to it the expected header and data lines.
† But separating those "values" with commas may involve a whole lot more than merely what the acronym for the "CSV format" stands for. A variety of things may come between those commas, including commas, newlines, and whatnot. This is why one is best advised to always use a library. Seeing constructor's options is informative.
The following commentary referred to the initial question. In the meanwhile the problems this addresses were corrected in OP's code and the question updated. I'm still leaving this text for some general comments that can be useful.
As for the code in the question and its output, there is almost certainly an issue with how the data is processed to produce @data
, judged by the presence of keys HASH(address)
in the output.
That string HASH(0x...)
is output when one prints a variable which is a hash reference (what cannot show any of hash's content). Perl handles such a print by stringifying (producing a printable string out of something which is more complex) the reference in that way.
There is no good reason to have a hash reference for a hash key. So I'd suggest that you review your data and its processing and see how that comes about. (Or briefly show this, or post another question with it if it isn't feasible to add that to this one.)
One measure you can use to bypass that is to only use a list of keys that you know are valid, like I show above; however, then you may be leaving some outright error unhandled. So I'd rather suggest to find what is wrong.
Parsing file in Perl and store the data in Hash
You are creating these variables anew for each line of the file:
$e_id, $start, $end, $priority, $node
They can't be scoped to a loop that repeats for every line of the file if you want to access the values when processing later lines.
Furthermore, you assign to the fields of the record for each line of the line, including before you even populate $e_id
. You don't want to assign to every fields for each line of the file, and you need to wait until you've read an entire record before assigning to $hash{$e_id}
.
My solution:
my %field_map = (
'startTime' => 'start',
'endTime' => 'end',
'Node' => 'node',
'Priority' => 'priority',
);
my %recs;
my $id;
my $rec = { };
while (1) {
$_ = <DATA>;
# If end of file or end of record.
if (!defined($_) || $_ =~ /^$/) {
$recs{$id} = $rec if defined($id);
# If end of file.
last if !defined($_);
# Start a new record.
$id = undef;
$rec = { };
next;
}
chomp;
my ($key, $val) = split(/\s*:\s*/, $_, 2);
if ( $key eq 'eventId' ) {
$id = $val;
}
elsif ( $field_map{$key} ) {
$rec->{ $field_map{$key} } = $val;
}
}
Parse text file in Perl and get a specific string
Try to use regexp:
my $variable;
if ($line =~ /TEST_SEQUENCE=(\w+)/){
$variable = $1;
}
Perl script -- Multiple text file parsing and writing
The important points in this suggestion are:
- the "magic" diamond operator (a.k.a.
readline
), which reads from each file in*ARGV
, - the
eof
function, which tells if the nextreadline
on the current filehandle will return any data - the
$ARGV
variable, that contains the name of the currently opened file.
With that intro, here we go!
#!/usr/bin/perl
use strict; # Always!
use warnings; # Always!
my $header = 1; # Flag to tell us to print the header
while (<>) { # read a line from a file
if ($header) {
# This is the first line, print the name of the file
print "========= $ARGV ========\n";
# reset the flag to a false value
$header = undef;
}
# Print out what we just read in
print;
}
continue { # This happens before the next iteration of the loop
# Check if we finished the previous file
$header = 1 if eof;
}
To use it, just do: perl concat.pl *.txt > compiled.TXT
Related Topics
What Is The 'Tr' Command in Windows
Building Subversion 1.5.4 on Debian: Could Not Find Library Containing Rsa_New
Container Running in Privileged Mode
Restoring System Directories Permissions
Copying Local Git Config into Docker Container
Most Efficient Way to Concatenate Thousands of Files in Perl
Truncate Table via Command Line in Linux
How to Cross-Compile a Autotools Project for Arm
Complete Password Field Scp Command on Linux
How to Two Mmap on Same /Dev File
How to Run The Linux/X86/Shell_Bind_Tcp Payload Stand Alone
How to Find/Cut for Only The Filename from an Output of Ls -Lrt in Perl