Displaying Multiple Lines of a File, Never Repeating

displaying multiple lines of a file, never repeating

This should work:

$blocks = array_chunk(file('path/to/file'), 10);
foreach($blocks as $number => $block) {
printf('<div id="%d">%s</div>',
$number+1,
implode('<br/>', $block));
}

References:

  • array_chunk()
  • printf()

How I can keep only the non repeated lines in a file?

uniq has arg -u

  -u, --unique          only print unique lines

Example:

$ printf 'a\nb\nb\nc\nd\nd\n' | uniq -u
a
c

If your data is not sorted, do sort at first

$ printf 'd\na\nb\nb\nc\nd\n' | sort | uniq -u

Preserve the order:

$ cat foo
d
c
b
b
a
d

$ grep -f <(sort foo | uniq -u) foo
c
a

greps the file for patterns obtained by aforementioned uniq. I can imagine, though, that if your file is really huge then it will take a long time.

The same without somewhat ugly Process substitution:

$ sort foo | uniq -u | grep -f- foo
c
a

Python random N lines from large file (no duplicate lines)

There is only one way of avoiding a sequential read of all the file up to the last line you are sampling - I am surprised that none of the answers up to now mentioned it:

You have to seek to an arbitrary location inside the file, read some bytes, if you have a typical line length, as you said, 3 or 4 times that value should do it. Then split the chunk you read on the new line characters ("\n"), and pick the second field - that is a line in a random position.

Also, in order to be able to consistently seek into the file, it should be opened in "binary read" mode, thus, the conversion of the end of line markers should be taken care of manually.

This technique can't give you the line number that was read, thus you keep the selected line offset in the file to avoid repetition:

#! /usr/bin/python
# coding: utf-8

import random, os

CHUNK_SIZE = 1000
PATH = "/var/log/cron"

def pick_next_random_line(file, offset):
file.seek(offset)
chunk = file.read(CHUNK_SIZE)
lines = chunk.split(os.linesep)
# Make some provision in case yiou had not read at least one full line here
line_offset = offset + len(os.linesep) + chunk.find(os.linesep)
return line_offset, lines[1]

def get_n_random_lines(path, n=5):
lenght = os.stat(path).st_size
results = []
result_offsets = set()
with open(path) as input:
for x in range(n):
while True:
offset, line = pick_next_random_line(input, random.randint(0, lenght - CHUNK_SIZE))
if not offset in result_offsets:
result_offsets.add(offset)
results.append(line)
break
return results

if __name__ == "__main__":
print get_n_random_lines(PATH)

Display Second Result of Duplicate Lines in Text File

You may try this gnu awk solution:

s='60 60 61 64 63 78 78'
awk -v RS='[[:space:]]+' '++fq[$0] == 2' <<< "$s"

60
78

To avoid getting line breaks after each line:

awk -v RS='[[:space:]]+' '++fq[$0] == 2 {printf "%s", $0 RT}' <<< "$s"

60 78

How can i remove multiple lines from a file based on a pattern that spans multiple lines?

Using awk, you may do this:

awk -v dt='2020-05-03' -v ft='pear' '$1==dt{p=NR} p && NR==p+1{del=($1==ft)}
del && NR<=p+6{next} 1' file

2020-05-02
apple
string
string
string
string
string
2020-05-03
apple
string
string
string
string
string

Explanation:

  • -v dt='2020-05-03' -v ft='pear': Supply 2 values to awk from command line
  • $1==dt{p=NR}: If we find a line with matching date then store line no in variable p
  • p && NR==p+1{del=($1==ft)}: If p>0 and we are at next line then set a flag del to 1 if we have matching fruit name otherwise set that flag to 0.
  • del && NR<=p+6{next}: If flag del is set then skip next 6 lines
  • 1: Default action to print line

Is it possible to break a long line to multiple lines in Python?

From PEP 8 - Style Guide for Python Code:

The preferred way of wrapping long lines is by using Python's implied line
continuation inside parentheses, brackets and braces. If necessary, you
can add an extra pair of parentheses around an expression, but sometimes
using a backslash looks better. Make sure to indent the continued line
appropriately.

Example of implicit line continuation:

a = some_function(
'1' + '2' + '3' - '4')

On the topic of line breaks around a binary operator, it goes on to say:

For decades the recommended style was to break after binary operators.
But this can hurt readability in two ways: the operators tend to get scattered across different columns on the screen, and each operator is moved away from its operand and onto the previous line.

In Python code, it is permissible to break before or after a binary operator, as long as the convention is consistent locally. For new code Knuth's style (line breaks before the operator) is suggested.

Example of explicit line continuation:

a = '1'   \
+ '2' \
+ '3' \
- '4'

Multiple line, repeated occurence matching

This might work for you (GNU sed):

sed -n '/abc/h;/efg/!b;x;/abc/p;z;x' file

Store the latest abc line in the hold space (HS). When encountering a line containing efg, switch to the HS and if that line contains abc print it.



Related Topics



Leave a reply



Submit