Top Level Domain from Url in C#

Can I simply 'read' a file that is in use?

You can read the file only if the program that opened the file first specified read sharing rights on that file.

If the file does indeed have no read sharing rights though, you wouldn't be able to copy it in the first place.

You may not be able to access a file if you are specifying a sharing right that conflicts with the sharing right of a program that already has the file opened. For example you can't grant write access if the program that already has it opened isn't granting write access.

If the program that opened the file in the first place supports Volume Shadow Copy (VSS), you can also use VSS to gain access to the file.

There are commercial software drivers that allow you to access such files, even when they are in use. You used to be able to get Open File Manager by St-Bernards, and you can also use File Access Manager (FAM) by VisionWorks Solutions Inc. These drivers are typically OEM'ed to backup software companies for inclusion in their products.

VSS works by telling the program that has the file opened already that another program would like to read from the file. VSS then does a copy of the file and lets you read from this copy. VSS does not work for legacy applications.

FAM transparently works for legacy and non-legacy programs alike by specifying an 'allowed list' of applications that can access exclusively opened and locked files. Only programs in this list are allowed access to these files. When a file is being opened, it goes into cache mode so that you will obtain a copy of the file as it was when the 'backup/open' of the file started. At this point the program that originally opened the file sees the file as it actually is, and the second program in the allowed list, sees the file as it was when the 'open/backup' of the file happened. This ensures consistency of the file.

What is simplest way to read a file into String?

Yes, you can do this in one line (though for robust IOException handling you wouldn't want to).

String content = new Scanner(new File("filename")).useDelimiter("\\Z").next();
System.out.println(content);

This uses a java.util.Scanner, telling it to delimit the input with \Z, which is the end of the string anchor. This ultimately makes the input have one actual token, which is the entire file, so it can be read with one call to next().

There is a constructor that takes a File and a String charSetName (among many other overloads). These two constructor may throw FileNotFoundException, but like all Scanner methods, no IOException can be thrown beyond these constructors.

You can query the Scanner itself through the ioException() method if an IOException occurred or not. You may also want to explicitly close() the Scanner after you read the content, so perhaps storing the Scanner reference in a local variable is best.

Guava

com.google.common.io.Files contains many useful methods. The pertinent ones here are:

String toString(File, Charset)
- Using the given character set, reads all characters from a file into a String
List<String> readLines(File, Charset)
- ... reads all of the lines from a file into a List<String>, one entry per line

Apache Commons/IO

org.apache.commons.io.IOUtils also offer similar functionality:

String toString(InputStream, String encoding)
- Using the specified character encoding, gets the contents of an InputStream as a String
List readLines(InputStream, String encoding)
- ... as a (raw) List of String, one entry per line

How to read the content of a file to a string in C?

I tend to just load the entire buffer as a raw memory chunk into memory and do the parsing on my own. That way I have best control over what the standard lib does on multiple platforms.

This is a stub I use for this. you may also want to check the error-codes for fseek, ftell and fread. (omitted for clarity).

char * buffer = 0;
long length;
FILE * f = fopen (filename, "rb");

if (f)
{
  fseek (f, 0, SEEK_END);
  length = ftell (f);
  fseek (f, 0, SEEK_SET);
  buffer = malloc (length);
  if (buffer)
  {
    fread (buffer, 1, length, f);
  }
  fclose (f);
}

if (buffer)
{
  // start to process your data / extract strings here...
}

What happen to a file when We read it and we just open it

open is kind-like accessing the file. If you like the book analogy: open is like finding the book in the library (directory) and accessing it, but without reading it first. Only read method is the one where you actually read the inside of the book, line by line.

During open method, your operating system locates the file and create a "connection" between your program and current file (I oversimplified it obviously). It allows to put this "connection" as a variable which you can use for convenient purposes. Then you can access the file using methods like read or write.

Read and print a text file from computer using c

It is better to supply the file name as command-line argument to your program, because it makes it easier to test and use.

In the file, each line seems to be a separate record. So, it would be better to read each line, then parse the fields from the line.

Consider the following:

#include <stdlib.h>
#include <string.h>
#include <stdio.h>
#include <errno.h>

#define  MAX_LINE_LEN  500

int main(int argc, char *argv[])
{
    char  line[MAX_LINE_LEN + 1]; /* +1 for the end-of-string '\0' */
    FILE *in;

    if (argc != 2) {
        fprintf(stderr, "\n");
        fprintf(stderr, "Usage: %s FILENAME\n", argv[0]);
        fprintf(stderr, "\n");
        return EXIT_FAILURE;
    }

    in = fopen(argv[1], "r");
    if (!in) {
        fprintf(stderr, "Cannot open %s: %s.\n", argv[1], strerror(errno));
        return EXIT_FAILURE;
    }

    while (fgets(line, sizeof line, in) != NULL) {
        char  id[20], code[20], address[50], dummy;

        if (sscanf(line, " %19s %19s %49s %c", id, code, address, &dummy) == 3) {
            /* The line did consist of three fields, and they are
               now correctly parsed to 'id', 'code', and 'address'. */

            printf("id = '%s'\ncode = '%s'\naddress = '%s'\n\n",
                   id, code, address);

        } else {

            /* We do have a line, but it does not consist of
               exactly three fields. */

            /* Remove the newline character(s) at the end of line. */
            line[strcspn(line, "\r\n")] = '\0';

            fprintf(stderr, "Cannot parse line '%s'.\n", line);

        }
    }

    if (ferror(in)) {
        fprintf(stderr, "Error reading %s.\n", argv[1]);
        return EXIT_FAILURE;
    } else
    if (fclose(in)) {
        fprintf(stderr, "Error closing %s.\n", argv[1]);
        return EXIT_FAILURE;
    }

    return EXIT_SUCCESS;
}

Above, argc contains the number of command-line arguments, with the program name used as the first (zeroth, argv[0]) argument. We require two: the program name and the name of the file to be read. Otherwise, we print out an usage message.

We try to open the file for reading. If fopen() fails, it returns NULL, with the error stored in errno. strerror(errno) yields the human-readable error message.

fgets(array, sizeof array, stream) reads a line (unless too long to fit in array) from stream. If it succeeds, it returns a pointer to the first element in array. If it fails -- there is no more to read, for example --, it returns NULL.

Remember that feof(stream) does not check if stream has more data to read. It only reports whether the end of stream has already been encountered. So, instead of reading until feof() returns true, you should simply read data until reading fails, then check why the reading failed. This is what the above example program does.

We want to treat each line as a separate record. Because fscanf() does not distinguish '\n' from spaces (in neither the conversion specification, nor when implicitly skipping whitespace), using fscanf(in, " %19s %19s %49s", ...) does not restrict the parsing to a single line: they may be on the same line, or on different lines, or even have empty lines in between. To restrict our parsing to a single line, we first read each line with fgets(), then try and parse that line, and that line only, using sscanf(). (sscanf() works just like fscanf(), but takes its input from a string rather than a stream.)

To avoid buffer overflow, we must tell sscanf() how long our buffers can be, remembering to reserve one char for the end-of-string mark (NUL, '\0'). Because id is 20 chars long, we can use up to 19 for the ID string, and therefore we need to use %19s to do the conversion correctly.

The return value from sscanf() is the number of successful conversions. By adding a dummy character (%c) conversion at the end that we expect to fail in normal circumstances, we can detect if the line contained more than we expected. This is why the sscanf() pattern has four conversions, but we require exactly the first three of them to succeed, and the fourth, dummy one, to fail, if the input line has the format we expected.

Note that we could try several different sscanf() expressions, if we accept the input in different formats. I like to call this speculative parsing. You simply need to order them so that you try the most complex ones first, and accept the first one that yields the expected number of successful conversions. For a practical example of that, check out the example C code I used in another answer to allow the user to specify simulation details using name=value pairs on the command line.

The line[strcspn(line, "\r\n")] = '\0'; expression is a trick, really. strcspn() is a standard C <string.h> function, which returns the number of characters in the first string parameter, until end of string or any of the characters in the second string are encountered, whichever happens first. Thus, strcspn(line, "\r\n") yields the number of characters in line until end of string, '\r', or '\n' is encountered, whichever happens first. We trim off the rest of the string by using that as the index to the line buffer, and making the string end there. (Remember, NUL or '\0' always ends the string in C.)

After the while loop, we check why the fgets() returned NULL. If ferror() returns true, then there was a real read error. These are very, very rare nowadays, but not checking them is just like walking around with a weapon without the safety engaged: it is an unnecessary risk with zero reward.

In most operating systems, fclose() cannot even fail if you opened the file read-only, but there are some particular cases on some where it might. (Also, it can fail when you write to streams, because the C library may cache data -- keep it in an internal buffer, rather than write it immediately, for efficiency sake -- and write it out only when you close the stream. Like any write, that can fail due to a real write error; say, if the storage media is already full.)

Yet, it only costs a couple of lines of C code to check both ferror() and fclose(), and let the user know. I personally hate, with a deep-burning passion, programs that do not do that, because they really risk losing user data silently, without warning. The users may think everything is okay, but the next time they try to access their files, some of it is missing... and they usually end up blaming the operating system, not the actual culprits, the bad, evil programs that failed to warn the user about an error they could have detected.

(It is best to learn to do that as early as possible. Like security, error checking is not something you can really bolt on later: you either design it in, or it won't be reliable.)

Also note that the Linux man pages project contains a very well maintained list of C library functions (along with POSIX.1, GNU, and Linux-specific functions). Do not be fooled by its name. Each of the pages contains a Conforming to section, which tells you which standards the function or functions described on that page conforms to. If it is C89, then it works in just about all operating systems you can imagine. If it is C99 or any POSIX.1 version, it may not work in Windows or DOS (or using the ancient Borland C compiler), but it will work in most other operating systems.

Because the OP is obviously reading non-ASCII files, I would recommend trying out the localized version of the program, that uses wide characters and wide strings:

#include <stdlib.h>
#include <locale.h>
#include <string.h>
#include <wchar.h>
#include <stdio.h>
#include <errno.h>

#define  MAX_WLINE_LEN  500

int main(int argc, char *argv[])
{
    wchar_t  line[MAX_WLINE_LEN + 1]; /* +1 for the end-of-string L'\0' */
    FILE *in;

    if (argc != 2) {
        fprintf(stderr, "\n");
        fprintf(stderr, "Usage: %s FILENAME\n", argv[0]);
        fprintf(stderr, "\n");
        return EXIT_FAILURE;
    }

    if (setlocale(LC_ALL, "") == NULL)
        fprintf(stderr, "Warning: Your C library does not support your currently set locale.\n");

    if (fwide(stdout, 1) < 1)
        fprintf(stderr, "Warning: Your C library does not support wide standard output.\n");

    in = fopen(argv[1], "r");
    if (!in) {
        fprintf(stderr, "Cannot open %s: %s.\n", argv[1], strerror(errno));
        return EXIT_FAILURE;
    }
    if (fwide(in, 1) < 1)
        fprintf(stderr, "Warning: Your C library does not support wide input from %s.\n", argv[1]);

    while (fgetws(line, sizeof line / sizeof line[0], in) != NULL) {
        wchar_t  id[20], code[20], address[50], dummy;

        if (swscanf(line, L" %19ls %19ls %49ls %lc", id, code, address, &dummy) == 3) {
            /* The line did consist of three fields, and they are
               now correctly parsed to 'id', 'code', and 'address'. */

            wprintf(L"id = '%ls', code = '%ls', address = '%ls'\n",
                   id, code, address);

        } else {

            /* We do have a line, but it does not consist of
               exactly three fields. */

            /* Remove the newline character(s) at the end of line. */
            line[wcscspn(line, L"\r\n")] = L'\0';

            fprintf(stderr, "Cannot parse line '%ls'.\n", line);

        }
    }

    if (ferror(in)) {
        fprintf(stderr, "Error reading %s.\n", argv[1]);
        return EXIT_FAILURE;
    } else
    if (fclose(in)) {
        fprintf(stderr, "Error closing %s.\n", argv[1]);
        return EXIT_FAILURE;
    }

    return EXIT_SUCCESS;
}

The above code is pure C99 code, and should work on all OSes who have a standard C library conforming to C99 or later. (Unfortunately, Microsoft is not willing to implement some C99 features, even though it "contributed" to C11, which means the above code may need to have additional Windows-specific code to work on Windows. It does work fine in Linux, BSDs, and Macs, however.)

How do I use Java to read from a file that is actively being written to?

Could not get the example to work using FileChannel.read(ByteBuffer) because it isn't a blocking read. Did however get the code below to work:

boolean running = true;
BufferedInputStream reader = new BufferedInputStream(new FileInputStream( "out.txt" ) );

public void run() {
    while( running ) {
        if( reader.available() > 0 ) {
            System.out.print( (char)reader.read() );
        }
        else {
            try {
                sleep( 500 );
            }
            catch( InterruptedException ex ) {
                running = false;
            }
        }
    }
}

Of course the same thing would work as a timer instead of a thread, but I leave that up to the programmer. I'm still looking for a better way, but this works for me for now.

Oh, and I'll caveat this with: I'm using 1.4.2. Yes I know I'm in the stone ages still.

Reading a plain text file in Java

ASCII is a TEXT file so you would use Readers for reading. Java also supports reading from a binary file using InputStreams. If the files being read are huge then you would want to use a BufferedReader on top of a FileReader to improve read performance.

Go through this article on how to use a Reader

I'd also recommend you download and read this wonderful (yet free) book called Thinking In Java

In Java 7:

new String(Files.readAllBytes(...))

(docs)
or

Files.readAllLines(...)

(docs)

In Java 8:

Files.lines(..).forEach(...)

(docs)

open read and close a file in 1 line of code

You don't really have to close it - Python will do it automatically either during garbage collection or at program exit. But as @delnan noted, it's better practice to explicitly close it for various reasons.

So, what you can do to keep it short, simple and explicit:

with open('pagehead.section.htm', 'r') as f:
    output = f.read()

Now it's just two lines and pretty readable, I think.