Parsing The File Perl

Parsing a structured text file in Perl

Reading the file line by line is a good way to go. Here I am creating a hash of array references. This is how you would just read one file. You could read each file this way and put the hash of arrays into a hash of hashes of array.

#!/usr/bin/perl

use strict;
use warnings;
use Data::Dumper;

my %contents;
my $key;
while(<DATA>){
chomp;
if ( s/:\s*$// ) {
$key = $_;
} else {
s/^\s+//g; # remove extra whitespace
push @{$contents{$key}}, $_;
}
}
print Dumper \%contents;

__DATA__
name:
John Smith
occupation:
Electrician
date of birth:
2/6/1961
hobbies:
Boating
Camping
Fishing

Output:

$VAR1 = {
'occupation' => [
'Electrician'
],
'hobbies' => [
'Boating',
'Camping',
'Fishing'
],
'name' => [
'JohnSmith'
],
'date of birth' => [
'2/6/1961'
]
};

Parsing of Text File using Perl

This Perl program does what you ask. It allows for any number of fields for each parameter (although there must be the same number of fields for every parameter) and takes the header labels for the fields from the data itself.

use strict;
use warnings;

my $file = 'sample.txt';

open my $fh, '<', $file or die qq{Can't open "$file" for input: $!};

my %data;
my @params;
my @fields;

while (<$fh>) {
next unless /\S/;
chomp;

my ($key, $val) = split /\s*:\s*/;
if ($val =~ /\S/) {
push @fields, $key if @params == 1;
push @{ $data{$params[-1]} }, $val if @params;
}
else {
die qq{Unexpected parameter format "$key"} unless $key =~ /parameter\s+(\d+)/i;
push @params, $1;
}
}

my @headers = ('Parameter', @fields);
my @widths = map length, @headers;
my $format = join(' ', map "%${_}s", @widths) . "\n";

printf $format, @headers;
for my $param (@params) {
printf $format, $param, @{ $data{$param} };
}

output

Parameter Field 1 Field 2 Field 3
0 100 0 4
1 873 23 89

How to parse a text file to csv file using Perl

The key part here is how to manipulate your data so to extract what need be printed for each line. Then you are best off using a module to produce valid CSV, and Text::CSV is very good.

A program using an array of small hashrefs, mimicking data in the question

use strict;
use warnings;
use feature 'say';

use Text::CSV;

my @data = (
{ name => 'A', age => 1, weight => 10 },
{ name => 'B', age => 2, weight => 20 },
);

my $csv = Text::CSV->new({ binary => 1, auto_diag => 2 });

my $outfile = 'test.csv';
open my $ofh, '>', $outfile or die "Can't open $outfile: $!";

# Header, also used below for order of values for fields
my @hdr = qw(name age weight);
$csv->say($ofh, \@hdr);

foreach my $href (@data) {
$csv->say($ofh, [ @{$href}{@hdr} ]);
}

The values from hashrefs in a desired order are extracted using a hashref slice @{$href}{@hdr}, what is in general


@{ expression returning hash reference } { list of keys }

This returns a list of values for the given list of keys, from the hashref that the expression in the block {} must return. That is then used to build an arrayref (an anonymous array here, using []), what the module's say method needs in order to make and print a string of comma-separated-values from that list of values.

Note a block that evaluates to a hash reference, used instead of a hash name that is used for a slice of a hash. This is a general rule that

Anywhere you'd put an identifier (or chain of identifiers) as part of a variable or subroutine name, you can replace the identifier with a BLOCK returning a reference of the correct type.

Some further comments

  • Look over the supported constructor's attributes; there are many goodies

  • For very simple data you can simply join fields with a comma and print

    say $ofh join ',', @{$href}{@hdr};    

    But it is far safer to use a module to construct a valid CSV record. With the right choice of attributes in the constructor it can handle whatever is legal to embed in fields (some of what can take quite a bit of work to do correctly by hand) and it calls things which aren't

  • I list column names explicitly. Instead, you can fetch the keys and then sort in a desired order, but this will again need a hard-coded list for sorting

The program creates the file test.csv and prints to it the expected header and data lines.


But separating those "values" with commas may involve a whole lot more than merely what the acronym for the "CSV format" stands for. A variety of things may come between those commas, including commas, newlines, and whatnot. This is why one is best advised to always use a library. Seeing constructor's options is informative.


The following commentary referred to the initial question. In the meanwhile the problems this addresses were corrected in OP's code and the question updated. I'm still leaving this text for some general comments that can be useful.

As for the code in the question and its output, there is almost certainly an issue with how the data is processed to produce @data, judged by the presence of keys HASH(address) in the output.

That string HASH(0x...) is output when one prints a variable which is a hash reference (what cannot show any of hash's content). Perl handles such a print by stringifying (producing a printable string out of something which is more complex) the reference in that way.

There is no good reason to have a hash reference for a hash key. So I'd suggest that you review your data and its processing and see how that comes about. (Or briefly show this, or post another question with it if it isn't feasible to add that to this one.)

One measure you can use to bypass that is to only use a list of keys that you know are valid, like I show above; however, then you may be leaving some outright error unhandled. So I'd rather suggest to find what is wrong.

Parsing file in Perl and store the data in Hash

You are creating these variables anew for each line of the file:

$e_id, $start, $end, $priority, $node

They can't be scoped to a loop that repeats for every line of the file if you want to access the values when processing later lines.

Furthermore, you assign to the fields of the record for each line of the line, including before you even populate $e_id. You don't want to assign to every fields for each line of the file, and you need to wait until you've read an entire record before assigning to $hash{$e_id}.

My solution:

my %field_map = (
'startTime' => 'start',
'endTime' => 'end',
'Node' => 'node',
'Priority' => 'priority',
);

my %recs;
my $id;
my $rec = { };
while (1) {
$_ = <DATA>;

# If end of file or end of record.
if (!defined($_) || $_ =~ /^$/) {
$recs{$id} = $rec if defined($id);

# If end of file.
last if !defined($_);

# Start a new record.
$id = undef;
$rec = { };
next;
}

chomp;
my ($key, $val) = split(/\s*:\s*/, $_, 2);

if ( $key eq 'eventId' ) {
$id = $val;
}
elsif ( $field_map{$key} ) {
$rec->{ $field_map{$key} } = $val;
}
}

Parse text file in Perl and get a specific string

Try to use regexp:

my $variable;
if ($line =~ /TEST_SEQUENCE=(\w+)/){
$variable = $1;
}

Perl script -- Multiple text file parsing and writing

The important points in this suggestion are:

  • the "magic" diamond operator (a.k.a. readline), which reads from each file in *ARGV,
  • the eof function, which tells if the next readline on the current filehandle will return any data
  • the $ARGV variable, that contains the name of the currently opened file.

With that intro, here we go!

#!/usr/bin/perl

use strict; # Always!
use warnings; # Always!

my $header = 1; # Flag to tell us to print the header
while (<>) { # read a line from a file
if ($header) {
# This is the first line, print the name of the file
print "========= $ARGV ========\n";
# reset the flag to a false value
$header = undef;
}
# Print out what we just read in
print;
}
continue { # This happens before the next iteration of the loop
# Check if we finished the previous file
$header = 1 if eof;
}

To use it, just do: perl concat.pl *.txt > compiled.TXT



Related Topics



Leave a reply



Submit