Get last 10 lines of very large text file 10GB
Read to the end of the file, then seek backwards until you find ten newlines, and then read forward to the end taking into consideration various encodings. Be sure to handle cases where the number of lines in the file is less than ten. Below is an implementation (in C# as you tagged this), generalized to find the last numberOfTokens
in the file located at path
encoded in encoding
where the token separator is represented by tokenSeparator
; the result is returned as a string
(this could be improved by returning an IEnumerable<string>
that enumerates the tokens).
public static string ReadEndTokens(string path, Int64 numberOfTokens, Encoding encoding, string tokenSeparator) {
int sizeOfChar = encoding.GetByteCount("\n");
byte[] buffer = encoding.GetBytes(tokenSeparator);
using (FileStream fs = new FileStream(path, FileMode.Open)) {
Int64 tokenCount = 0;
Int64 endPosition = fs.Length / sizeOfChar;
for (Int64 position = sizeOfChar; position < endPosition; position += sizeOfChar) {
fs.Seek(-position, SeekOrigin.End);
fs.Read(buffer, 0, buffer.Length);
if (encoding.GetString(buffer) == tokenSeparator) {
tokenCount++;
if (tokenCount == numberOfTokens) {
byte[] returnBuffer = new byte[fs.Length - fs.Position];
fs.Read(returnBuffer, 0, returnBuffer.Length);
return encoding.GetString(returnBuffer);
}
}
}
// handle case where number of tokens in file is less than numberOfTokens
fs.Seek(0, SeekOrigin.Begin);
buffer = new byte[fs.Length];
fs.Read(buffer, 0, buffer.Length);
return encoding.GetString(buffer);
}
}
Java : Read last n lines of a HUGE file
If you use a RandomAccessFile
, you can use length
and seek
to get to a specific point near the end of the file and then read forward from there.
If you find there weren't enough lines, back up from that point and try again. Once you've figured out where the N
th last line begins, you can seek to there and just read-and-print.
An initial best-guess assumption can be made based on your data properties. For example, if it's a text file, it's possible the line lengths won't exceed an average of 132 so, to get the last five lines, start 660 characters before the end. Then, if you were wrong, try again at 1320 (you can even use what you learned from the last 660 characters to adjust that - example: if those 660 characters were just three lines, the next try could be 660 / 3 * 5, plus maybe a bit extra just in case).
read only given last x lines in txt file
What about this
List <string> text = File.ReadLines("file.txt").Reverse().Take(2).ToList()
(Python) Counting lines in a huge ( 10GB) file as fast as possible
Ignacio's answer is correct, but might fail if you have a 32 bit process.
But maybe it could be useful to read the file block-wise and then count the \n
characters in each block.
def blocks(files, size=65536):
while True:
b = files.read(size)
if not b: break
yield b
with open("file", "r") as f:
print sum(bl.count("\n") for bl in blocks(f))
will do your job.
Note that I don't open the file as binary, so the \r\n
will be converted to \n
, making the counting more reliable.
For Python 3, and to make it more robust, for reading files with all kinds of characters:
def blocks(files, size=65536):
while True:
b = files.read(size)
if not b: break
yield b
with open("file", "r",encoding="utf-8",errors='ignore') as f:
print (sum(bl.count("\n") for bl in blocks(f)))
How to read last n lines of log file
Your code will perform very poorly, since you aren't allowing any caching to happen.
In addition, it will not work at all for Unicode.
I wrote the following implementation:
///<summary>Returns the end of a text reader.</summary>
///<param name="reader">The reader to read from.</param>
///<param name="lineCount">The number of lines to return.</param>
///<returns>The last lneCount lines from the reader.</returns>
public static string[] Tail(this TextReader reader, int lineCount) {
var buffer = new List<string>(lineCount);
string line;
for (int i = 0; i < lineCount; i++) {
line = reader.ReadLine();
if (line == null) return buffer.ToArray();
buffer.Add(line);
}
int lastLine = lineCount - 1; //The index of the last line read from the buffer. Everything > this index was read earlier than everything <= this indes
while (null != (line = reader.ReadLine())) {
lastLine++;
if (lastLine == lineCount) lastLine = 0;
buffer[lastLine] = line;
}
if (lastLine == lineCount - 1) return buffer.ToArray();
var retVal = new string[lineCount];
buffer.CopyTo(lastLine + 1, retVal, 0, lineCount - lastLine - 1);
buffer.CopyTo(0, retVal, lineCount - lastLine - 1, lastLine + 1);
return retVal;
}
How do I read last 10 lines in a text file?
This is How I've finally solved. Anyway code is too slow so if any of you have any advice, please tell me:
public static string ReadEndTokens(string filename, Int64 numberOfTokens, Encoding encoding, string tokenSeparator)
{
lock (typeof(SDAccess))
{
PersistentStorage sdPS = new PersistentStorage("SD");
sdPS.MountFileSystem();
string rootDirectory = VolumeInfo.GetVolumes()[0].RootDirectory;
int sizeOfChar = 1;//The only encoding suppourted by NETMF4.1 is UTF8
byte[] buffer = encoding.GetBytes(tokenSeparator);
using (FileStream fs = new FileStream(rootDirectory + @"\" + filename, FileMode.Open, FileAccess.ReadWrite))
{
Int64 tokenCount = 0;
Int64 endPosition = fs.Length / sizeOfChar;
for (Int64 position = sizeOfChar; position < endPosition; position += sizeOfChar)
{
fs.Seek(-position, SeekOrigin.End);
fs.Read(buffer, 0, buffer.Length);
encoding.GetChars(buffer);
if (encoding.GetChars(buffer)[0].ToString() + encoding.GetChars(buffer)[1].ToString() == tokenSeparator)
{
tokenCount++;
if (tokenCount == numberOfTokens)
{
byte[] returnBuffer = new byte[fs.Length - fs.Position];
fs.Read(returnBuffer, 0, returnBuffer.Length);
sdPS.UnmountFileSystem();// Unmount file system
sdPS.Dispose();
return GetString(returnBuffer);
}
}
}
// handle case where number of tokens in file is less than numberOfTokens
fs.Seek(0, SeekOrigin.Begin);
buffer = new byte[fs.Length];
fs.Read(buffer, 0, buffer.Length);
sdPS.UnmountFileSystem();// Unmount file system
sdPS.Dispose();
return GetString(buffer);
}
}
}
//As GetString is not implemented in NETMF4.1 I've done this method
public static string GetString(byte[] bytes)
{
string cadena = "";
for (int i = 0; i < bytes.Length; i++)
cadena += Encoding.UTF8.GetChars(bytes)[i].ToString();
return cadena;
}
Count lines in large files
Try: sed -n '$=' filename
Also cat is unnecessary: wc -l filename
is enough in your present way.
Related Topics
What Is Cool About Generics, Why Use Them
Padding Is Invalid and Cannot Be Removed
Does the C# "Finally" Block Always Execute
How to Set Session Timeout in Web.Config
Prevent .Net Garbage Collection for Short Period of Time
How to Move and Resize a Form Without a Border
How to Read a Pem Rsa Private Key from .Net
How to Use Webrequest to Access an Ssl Encrypted Site Using Https
Using Stringwriter for Xml Serialization
What Does the '=>' Syntax in C# Mean
Cursor.Current VS. This.Cursor
Dot Character '.' in MVC Web API 2 for Request Such as API/People/Staff.45287
The Entity Type <Type> Is Not Part of the Model for the Current Context
Omitting All Xsi and Xsd Namespaces When Serializing an Object in .Net
Determine the Number of Lines Within a Text File
How to Show the "Paste JSON Class" in Visual Studio 2012 When Clicking on Paste Special