C# - Check If File Is Text Based

C# - Check if File is Text Based

I guess you could just check through the first 1000 (arbitrary number) characters and see if there are unprintable characters, or if they are all ascii in a certain range. If the latter, assume that it is text?

Whatever you do is going to be a guess.

How can I determine if a file is binary or text in c#?

I would probably look for an abundance of control characters which would typically be present in a binary file but rarely in an text file. Binary files tend to use 0 enough that just testing for many 0 bytes would probably be sufficient to catch most files. If you care about localization you'd need to test multi-byte patterns as well.

As stated though, you can always be unlucky and get a binary file that looks like text or vice versa.

How do I check if a file is text-based?

Honestly, given the Windows environment that you're working with, I'd consider a whitelist of known text formats. Windows users are typically trained to stick with extensions. However, I would personally relax the requirement that it not function on non-text files, instead checking with the user for goahead if the file does not match the internal whitelist. The risk of changing a binary file would be mitigated if your search string is long - that is assuming you're not performing Y2K conversion (a la sed 's/y/k/g').

Is there a way to check if a file is in use?

Updated NOTE on this solution: Checking with FileAccess.ReadWrite will fail for Read-Only files so the solution has been modified to check with FileAccess.Read.

ORIGINAL:
I've used this code for the past several years, and I haven't had any issues with it.

Understand your hesitation about using exceptions, but you can't avoid them all of the time:

protected virtual bool IsFileLocked(FileInfo file)
{
    try
    {
        using(FileStream stream = file.Open(FileMode.Open, FileAccess.Read, FileShare.None))
        {
            stream.Close();
        }
    }
    catch (IOException)
    {
        //the file is unavailable because it is:
        //still being written to
        //or being processed by another thread
        //or does not exist (has already been processed)
        return true;
    }

    //file is not locked
    return false;
}

Detect if file contains text

Generally, you cannot reliably detect if the file is a text file. It starts with the general issue, what actually is "a text file". You already hinted at encodings, but especially those cannot be reliably detected (for example see Notepad's struggle).

Having that said, you might be able to employ the heuristics to do you best (including, but of course not limited to file extensions; excluding well known non-file types like EXE, DLL, ZIP, image files, by recognizing their signature; maybe combined with the approach used by browsers or Notepad).

Depending on your application, I guess it would be pretty much feasibly, to just let the user select the files to scan (maybe having a default list of extensions to include, like *.cs, *.txt, *.resx, *.xml, ...). If a file(type) / extension is not in the default list and was not added by the user, it is not counted. If the user adds a filetype/extension to the list that is not a "text file", the results are not useful.

But comparing effort and the fact that an automatic result will never be 100% exact (at detecting all possible files) it should be good enough.

How to tell if a file is text-readable in C#

There is no general way of figuring type of information stored in the file.

Even if you know in advance that it is some sort of text if you don't know what encoding was used to create file you may not be able to load it properly.

Note that HTTP give you some hints on type of file by content-type header, but there is no such information on file system.

Check if text exists in text file

"when I run this code program writes to text file twice before recognizing the text exists"

The main problem with your code is in this condition:

for (int x = 0; x < lines.Length - 1; x++)

You are looping through all the lines except the last one, which is likely the one you're searching for in this case.

To resolve this, just remove the - 1 from your exit condition.

With that being said, your code can be simplified greatly if you use the static ReadLines and AppendAllText methods of the File class:

/// <summary>
/// Searches the specified file for the url and adds it if it doesn't exist.
/// If the specified file does not exist, it will be created.
/// </summary>
/// <param name="filePath">The path to the file to query.</param>
/// <param name="url">The url to search for and add to the file.</param>
/// <returns>True if the url was added, otherwise false.</returns>
protected static bool AddUrlIfNotExist(string filePath, string url)
{
    if (!File.Exists(filePath)) File.Create(filePath).Close();

    if (!File.ReadLines(filePath).Any(line => line.Contains(url)))
    {
        File.AppendAllText(filePath, url);
        return true;
    }

    return false;
}

Then this method could be used in your code like:

protected override bool ProcessCmdKey(ref Message msg, Keys keyData)
{
    if (keyData == Keys.F1) { Application.Exit(); return true; }

    if (keyData == Keys.F2)
    {
        if (AddUrlIfNotExist("linkx.txt", webBrowser1.Url.AbsoluteUri))
        {
            MessageBox.Show("url copied!");
        }
        else
        {
            MessageBox.Show("there is a match");
        }
    }

    // Call the base class
    return base.ProcessCmdKey(ref msg, keyData);
}

Checking if a text file is at the end

The problem is that you are declaring your content variable in an inner scope and trying to use it from outside it.

Try declaring your content variable outside the while block

    string content = string.Empty; 
    while(!sr.EndOfStream)
    {
      content = sr.ReadToEnd();       
    }      
    fs.Close();
    return content;

Note that you also need to give it a value, so that if your code doesn't enter the while block, it'll return something. I've given it string.Empty, but you could give it null or whatever.

Update: As a further tip, ReadToEnd will always read up to the end, so you don't need to check for EndOfStream at all (it will be at the end of the stream if this function exits cleanly, always).

And, as a practice code, that's fine, but this whole function is already implemented in the framework as File.ReadAllText, so this code would be equivalent:

public string loadFile(string filename)
{
  return File.ReadAllText(filename);
}

And once you have that, there's no point on having your loadFile method at all (it's just a method that calls another method with the very same parameters, so it's just a redundant function call), just use File.ReadAllText() when you were planning to use this method.

How to check if a textbox has a line from a TXT File with C#

string usersTXT = sr.ReadLine();

Reads exactly one line. So you are only checking if you match the first line in the file.

You want File.ReadALlLines (which also disposes the stream correctly, which you aren't):

if (File.ReadAllLines(usersPath).Contains(user_txt.Text))
{
}

That reads all the lines, enumerates them all checking if your line is in the collection. The only downside to this approach is that it always reads the entire file. If you want to only read until you find your input, you'll need to roll the read loop yourself. Do make sure to use the StreamReader in a using block if you take that route.

You can also just use File.ReadLines (thanks @Selman22) to get the lazy enumeration version of this. I would go with this route personally.

Implemenation that shows this at: http://referencesource.microsoft.com/#mscorlib/system/io/file.cs,675b2259e8706c26

C# - Check If File Is Text Based