Read Lines of a Txt File and Organize in a Json File

Read lines of a txt file and organize in a JSON file

Use the csv package if you don't want to handle parsing csv file by yourself.

const fs = require("fs");
const csv = require("csv");

const result = {};
const keys = ["cat", "dog", "bug"]

// Read data
const readStream = fs.createReadStream("yourfile.txt");

// Parser
const parser = csv.parse({ delimiter: ":" });

parser.on("data", (chunk) => {
result[chunk[0]] = {};
for(let i = 1; i < chunk.length; i ++) {
result[chunk[0]][keys[i - 1]] = chunk[i];
}
});

parser.on("end", () => {
console.log(result);
});

readStream.pipe(parser);

How to format .json data on .txt file using PHP?

You said you don't want to "clog my index file with curly brackets and all the '0', '1'... values." Well, that's how JSON is structured by definition. I'm not sure what you're expecting to be different.

That said, you can make it somewhat easier to read. If you're looking for prettier layout, with line breaks and indentation, use the JSON_PRETTY_PRINT constant:

$keywords = parseTweet ( $tweet, $tweet_id );
// print_r ( $keywords );

$json = json_encode ( $keywords, JSON_FORCE_OBJECT | JSON_PRETTY_PRINT );
print_r ( $json );

$fp = fopen('index.json', 'w');
fwrite($fp, $json);
fclose($fp);

How to sort all the JSON data into plain TXT file

First of all, your example json file contains three syntax errors:

  • A closing curly bracket } is missing at the end of text.
  • There is an extra double-quote just after false, in the second object.
  • The entire objects must be enclosed by square brackets [] as an array.

Then the modified json file will look like:

$ cat test.json
[
{"@hrinn": {"host": {"1": " \"some.host.name\",", "2": " false,"}, "password": {"1": " \"123456\","}}},
{"@hrinn": {"host": {"1": " \"another.host.name\",", "2": " false,"}, "password": {"1": " \"654321\","}}},
{"@Abnerene": {"host": {"1": " \"example.com\",", "2": " false,"}, "username": {"1": " \"username\","}, "password": {"1": " \"password123\","}}}
]

Note that the inserted newlines are just for human readability and have nothing to do with the parser.

Now you can feed the json file to jq:

$ jq -r '.[] | .[] | [.host["1"], .username["1"], .password["1"]] | @sh' test.json

which yields:

' "some.host.name",' null ' "123456",'
' "another.host.name",' null ' "654321",'
' "example.com",' ' "username",' ' "password123",'

If the values in your json file are proper ones, the result will look nicer.

Hope this helps.

C++ how to get spesific values from a JSON-like text file?

Your file is a so called structured text file, like xml or json.

A grammar can define the language (structure) of the file. And if we look at the Chomsky hierarchy classification , then we have here a so called “Context free language”, which can and should be analyzed with a typical push down automaton or a recursive descent parser.

But you mentioned that you “only want to check the unigram count”. This is crying for a Chomsky hierarchy 3 regular grammer which can be implemented with a regular expressions.

Please note: Normally you cannot use regular expressions, because they cannot identify nested structures. For example. If you have nested opening and closing braces, a regular expression cannot work. To say it simple. A regular expression cannot count.

Anyway, because of the simplicity of the given file, we can survive with pattern matching by employing regular expressions.

The good thing is that C++ supports regular expressions with its regex library.

But even so, we cannot find a solution with a single regular expression. We will use a 3 step approach by

  1. Match all lines that contain unigram data
  2. Iterate over the above matched lines and extract the word/count pairs
  3. Split the word count pairs into word and count

(The regexes can be tested in any online regex tester like here).

And if we have this, we will use a associative container, like a std::map, if ordering matters, or a std::unordered_map if speed matters for counting. The order is meant for the words and not the counts.

Using maps is more or less the standard approach for counting items. The “key” is the word the we want to count. It is unique. And the associated vale is the count.

Both maps have a convenient index operator. You can put the key (in our case the instance of a unigram word) in brackets [], and if the word is already there, the index operator will return a reference to the associated value.

If the word in the index is not yet in the map, it will be created and the value will be initialized in our case to 0. An, again a reference to this value will be returned. So in any case the result will be a reference to the counter value, which we can then increment.

Cool.

But, because of the nature of associative containers, you cannot sort them. So, we need to copy the resulting word/count pairs in a 2nd container.

And there perfectly fitting function for that in the CPP algorithm library. The std::partial_sort_copy. With that we can find the top 3 and copy them to the resulting std::vector.

So, we will heavily use the existing and advanced C++ functionality to implement the solution with a few lines of code.

There are many many different possible solutions. One example is here:

#include <iostream>
#include <fstream>
#include <string>
#include <regex>
#include <unordered_map>
#include <utility>
#include <map>

// ------------------------------------------------------------
// Create aliases. Save typing work and make code more readable
using Pair = std::pair<std::string, unsigned int>;
// Standard approach for counter
//using Counter = std::unordered_map<Pair::first_type, Pair::second_type>;
using Counter = std::map<Pair::first_type, Pair::second_type>;
// Sorted values will be stored in a vector
using Top = std::vector<Pair>;
// ------------------------------------------------------------

// Regexes
const std::regex unigramLineIdentifierRegex{R"(\"unigramCount\":\{(.*)\})"};
const std::regex unigramCountRegex{ R"((\"[a-zA-Z ,]+\"\:\d+)+)" };
const std::regex wordAndCountRegex(R"(\"([a-zA-Z ,]+)\":(\d+))");

int main() {
// Open the file and check, if it could be opened
if (std::ifstream ifs{"text.txt"}; ifs) {

// Read the complete textfile into a string
std::string text(std::istreambuf_iterator<char>(ifs), {});

// Here we will count the words
Counter counter{};

// Now extract all lines containing the unigram line pattern into a std::vector
std::vector unigramLine(std::sregex_token_iterator(text.begin(), text.end(), unigramLineIdentifierRegex), {});

// For each of the found lines containg a unigram record
for (const std::string& wordAndCount : unigramLine) {

// Extract all word count pairs
std::vector unigrams(std::sregex_token_iterator(wordAndCount.begin(), wordAndCount.end(), unigramCountRegex), {});

// Go over all word/count strings and split them up
for (const std::string unigram : unigrams) {

// Find the word part and the count part
std::smatch smWordAndCount{};
if (std::regex_match(unigram, smWordAndCount, wordAndCountRegex)) {

// Increment the global counter
counter[smWordAndCount[1]] += std::stoi(smWordAndCount[2]);
}
}
}
// Here we will store the top 3
Top top(3);

// Copy and sort. Get only the biggest 3
std::partial_sort_copy(counter.begin(), counter.end(), top.begin(), top.end(), [](const Pair& p1, const Pair& p2) { return p1.second > p2.second; });

// Debug output. Show the result´, the top3 on the screen
for (const auto& [word, count] : top)
std::cout << word << " \t--> " << count << '\n';
}
else std::cerr << "\n*** Error: Could not open source file\n\n";
}

Please be reminded. The regex solution is not the idiomatic correct approach.

And if the file is really JSON, then look at this library



Related Topics



Leave a reply



Submit