User Recognition Without Cookies or Local Storage

User recognition without cookies or local storage

Introduction

If I understand you correctly, you need to identify a user for whom you don't have a Unique Identifier, so you want to figure out who they are by matching Random Data. You can't store the user's identity reliably because:

  • Cookies Can be deleted
  • IP address Can change
  • Browser Can Change
  • Browser Cache may be deleted

A Java Applet or Com Object would have been an easy solution using a hash of hardware information, but these days people are so security-aware that it would be difficult to get people to install these kinds of programs on their system. This leaves you stuck with using Cookies and other, similar tools.

Cookies and other, similar tools

You might consider building a Data Profile, then using Probability tests to identify a Probable User. A profile useful for this can be generated by some combination of the following:

  1. IP Address

    • Real IP Address
    • Proxy IP Address (users often use the same proxy repeatedly)
  2. Cookies

    • HTTP Cookies
    • Session Cookies
    • 3rd Party Cookies
    • Flash Cookies (most people don't know how to delete these)
  3. Web Bugs (less reliable because bugs get fixed, but still useful)

    • PDF Bug
    • Flash Bug
    • Java Bug
  4. Browsers

    • Click Tracking (many users visit the same series of pages on each visit)
    • Browsers Finger Print
        - Installed Plugins (people often have varied, somewhat unique sets of plugins)
    • Cached Images (people sometimes delete their cookies but leave cached images)
    • Using Blobs
    • URL(s) (browser history or cookies may contain unique user id's in URLs, such as https://stackoverflow.com/users/1226894 or http://www.facebook.com/barackobama?fref=ts)
    • System Fonts Detection (this is a little-known but often unique key signature)
  5. HTML5 & Javascript

    • HTML5 LocalStorage
    • HTML5 Geolocation API and Reverse Geocoding
    • Architecture, OS Language, System Time, Screen Resolution, etc.
    • Network Information API
    • Battery Status API

The items I listed are, of course, just a few possible ways a user can be identified uniquely. There are many more.

With this set of Random Data elements to build a Data Profile from, what's next?

The next step is to develop some Fuzzy Logic, or, better yet, an Artificial Neural Network (which uses fuzzy logic). In either case, the idea is to train your system, and then combine its training with Bayesian Inference to increase the accuracy of your results.

Artificial Neural Network

The NeuralMesh library for PHP allows you to generate Artificial Neural Networks. To implement Bayesian Inference, check out the following links:

  • Implement Bayesian inference using PHP, Part 1
  • Implement Bayesian inference using PHP, Part 2
  • Implement Bayesian inference using PHP, Part 3

At this point, you may be thinking:

Why so much Math and Logic for a seemingly simple task?

Basically, because it is not a simple task. What you are trying to achieve is, in fact, Pure Probability. For example, given the following known users:

User1 = A + B + C + D + G + K
User2 = C + D + I + J + K + F

When you receive the following data:

B + C + E + G + F + K

The question which you are essentially asking is:

What is the probability that the received data (B + C + E + G + F + K) is actually User1 or User2? And which of those two matches is most probable?

In order to effectively answer this question, you need to understand Frequency vs Probability Format and why Joint Probability might be a better approach. The details are too much to get into here (which is why I'm giving you links), but a good example would be a Medical Diagnosis Wizard Application, which uses a combination of symptoms to identify possible diseases.

Think for a moment of the series of data points which comprise your Data Profile (B + C + E + G + F + K in the example above) as Symptoms, and Unknown Users as Diseases. By identifying the disease, you can further identify an appropriate treatment (treat this user as User1).

Obviously, a Disease for which we have identified more than 1 Symptom is easier to identify. In fact, the more Symptoms we can identify, the easier and more accurate our diagnosis is almost certain to be.

Are there any other alternatives?

Of course. As an alternative measure, you might create your own simple scoring algorithm, and base it on exact matches. This is not as efficient as probability, but may be simpler for you to implement.

As an example, consider this simple score chart:


+-------------------------+--------+------------+
| Property | Weight | Importance |
+-------------------------+--------+------------+
| Real IP address | 60 | 5 |
| Used proxy IP address | 40 | 4 |
| HTTP Cookies | 80 | 8 |
| Session Cookies | 80 | 6 |
| 3rd Party Cookies | 60 | 4 |
| Flash Cookies | 90 | 7 |
| PDF Bug | 20 | 1 |
| Flash Bug | 20 | 1 |
| Java Bug | 20 | 1 |
| Frequent Pages | 40 | 1 |
| Browsers Finger Print | 35 | 2 |
| Installed Plugins | 25 | 1 |
| Cached Images | 40 | 3 |
| URL | 60 | 4 |
| System Fonts Detection | 70 | 4 |
| Localstorage | 90 | 8 |
| Geolocation | 70 | 6 |
| AOLTR | 70 | 4 |
| Network Information API | 40 | 3 |
| Battery Status API | 20 | 1 |
+-------------------------+--------+------------+

For each piece of information which you can gather on a given request, award the associated score, then use Importance to resolve conflicts when scores are the same.

Proof of Concept

For a simple proof of concept, please take a look at Perceptron. Perceptron is a RNA Model that is generally used in pattern recognition applications. There is even an old PHP Class which implements it perfectly, but you would likely need to modify it for your purposes.

Despite being a great tool, Perceptron can still return multiple results (possible matches), so using a Score and Difference comparison is still useful to identify the best of those matches.

Assumptions

  • Store all possible information about each user (IP, cookies, etc.)
  • Where result is an exact match, increase score by 1
  • Where result is not an exact match, decrease score by 1

Expectation

  1. Generate RNA labels
  2. Generate random users emulating a database
  3. Generate a single Unknown user
  4. Generate Unknown user RNA and Values
  5. The system will merge RNA information and teach the Perceptron
  6. After training the Perceptron, the system will have a set of weightings
  7. You can now test the Unknown user's pattern and the Perceptron will produce a result set.
  8. Store all Positive matches
  9. Sort the matches first by Score, then by Difference (as described above)
  10. Output the two closest matches, or, if no matches are found, output empty results

Code for Proof of Concept

$features = array(
'Real IP address' => .5,
'Used proxy IP address' => .4,
'HTTP Cookies' => .9,
'Session Cookies' => .6,
'3rd Party Cookies' => .6,
'Flash Cookies' => .7,
'PDF Bug' => .2,
'Flash Bug' => .2,
'Java Bug' => .2,
'Frequent Pages' => .3,
'Browsers Finger Print' => .3,
'Installed Plugins' => .2,
'URL' => .5,
'Cached PNG' => .4,
'System Fonts Detection' => .6,
'Localstorage' => .8,
'Geolocation' => .6,
'AOLTR' => .4,
'Network Information API' => .3,
'Battery Status API' => .2
);

// Get RNA Lables
$labels = array();
$n = 1;
foreach ($features as $k => $v) {
$labels[$k] = "x" . $n;
$n ++;
}

// Create Users
$users = array();
for($i = 0, $name = "A"; $i < 5; $i ++, $name ++) {
$users[] = new Profile($name, $features);
}

// Generate Unknown User
$unknown = new Profile("Unknown", $features);

// Generate Unknown RNA
$unknownRNA = array(
0 => array("o" => 1),
1 => array("o" => - 1)
);

// Create RNA Values
foreach ($unknown->data as $item => $point) {
$unknownRNA[0][$labels[$item]] = $point;
$unknownRNA[1][$labels[$item]] = (- 1 * $point);
}

// Start Perception Class
$perceptron = new Perceptron();

// Train Results
$trainResult = $perceptron->train($unknownRNA, 1, 1);

// Find matches
foreach ($users as $name => &$profile) {
// Use shorter labels
$data = array_combine($labels, $profile->data);
if ($perceptron->testCase($data, $trainResult) == true) {
$score = $diff = 0;

// Determing the score and diffrennce
foreach ($unknown->data as $item => $found) {
if ($unknown->data[$item] === $profile->data[$item]) {
if ($profile->data[$item] > 0) {
$score += $features[$item];
} else {
$diff += $features[$item];
}
}
}
// Ser score and diff
$profile->setScore($score, $diff);
$matchs[] = $profile;
}
}

// Sort bases on score and Output
if (count($matchs) > 1) {
usort($matchs, function ($a, $b) {
// If score is the same use diffrence
if ($a->score == $b->score) {
// Lower the diffrence the better
return $a->diff == $b->diff ? 0 : ($a->diff > $b->diff ? 1 : - 1);
}
// The higher the score the better
return $a->score > $b->score ? - 1 : 1;
});

echo "<br />Possible Match ", implode(",", array_slice(array_map(function ($v) {
return sprintf(" %s (%0.4f|%0.4f) ", $v->name, $v->score,$v->diff);
}, $matchs), 0, 2));
} else {
echo "<br />No match Found ";
}

Output:

Possible Match D (0.7416|0.16853),C (0.5393|0.2809)

Print_r of "D":

echo "<pre>";
print_r($matchs[0]);


Profile Object(
[name] => D
[data] => Array (
[Real IP address] => -1
[Used proxy IP address] => -1
[HTTP Cookies] => 1
[Session Cookies] => 1
[3rd Party Cookies] => 1
[Flash Cookies] => 1
[PDF Bug] => 1
[Flash Bug] => 1
[Java Bug] => -1
[Frequent Pages] => 1
[Browsers Finger Print] => -1
[Installed Plugins] => 1
[URL] => -1
[Cached PNG] => 1
[System Fonts Detection] => 1
[Localstorage] => -1
[Geolocation] => -1
[AOLTR] => 1
[Network Information API] => -1
[Battery Status API] => -1
)
[score] => 0.74157303370787
[diff] => 0.1685393258427
[base] => 8.9
)

If Debug = true you would be able to see Input (Sensor & Desired), Initial Weights, Output (Sensor, Sum, Network), Error, Correction and Final Weights.

+----+----+----+----+----+----+----+----+----+----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+-----+----+---------+---------+---------+---------+---------+---------+---------+---------+---------+----------+----------+----------+----------+----------+----------+----------+----------+----------+----------+----------+----+----+----+----+----+----+----+----+----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----------+
| o | x1 | x2 | x3 | x4 | x5 | x6 | x7 | x8 | x9 | x10 | x11 | x12 | x13 | x14 | x15 | x16 | x17 | x18 | x19 | x20 | Bias | Yin | Y | deltaW1 | deltaW2 | deltaW3 | deltaW4 | deltaW5 | deltaW6 | deltaW7 | deltaW8 | deltaW9 | deltaW10 | deltaW11 | deltaW12 | deltaW13 | deltaW14 | deltaW15 | deltaW16 | deltaW17 | deltaW18 | deltaW19 | deltaW20 | W1 | W2 | W3 | W4 | W5 | W6 | W7 | W8 | W9 | W10 | W11 | W12 | W13 | W14 | W15 | W16 | W17 | W18 | W19 | W20 | deltaBias |
+----+----+----+----+----+----+----+----+----+----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+-----+----+---------+---------+---------+---------+---------+---------+---------+---------+---------+----------+----------+----------+----------+----------+----------+----------+----------+----------+----------+----------+----+----+----+----+----+----+----+----+----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----------+
| 1 | 1 | -1 | -1 | -1 | -1 | -1 | -1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | -1 | -1 | -1 | -1 | 1 | 1 | 1 | 0 | -1 | 0 | -1 | -1 | -1 | -1 | -1 | -1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | -1 | -1 | -1 | -1 | 1 | 1 | 0 | -1 | -1 | -1 | -1 | -1 | -1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | -1 | -1 | -1 | -1 | 1 | 1 | 1 |
| -1 | -1 | 1 | 1 | 1 | 1 | 1 | 1 | -1 | -1 | -1 | -1 | -1 | -1 | -1 | 1 | 1 | 1 | 1 | -1 | -1 | 1 | -19 | -1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | -1 | -1 | -1 | -1 | -1 | -1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | -1 | -1 | -1 | -1 | 1 | 1 | 1 |
| -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- |
| 1 | 1 | -1 | -1 | -1 | -1 | -1 | -1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | -1 | -1 | -1 | -1 | 1 | 1 | 1 | 19 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | -1 | -1 | -1 | -1 | -1 | -1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | -1 | -1 | -1 | -1 | 1 | 1 | 1 |
| -1 | -1 | 1 | 1 | 1 | 1 | 1 | 1 | -1 | -1 | -1 | -1 | -1 | -1 | -1 | 1 | 1 | 1 | 1 | -1 | -1 | 1 | -19 | -1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | -1 | -1 | -1 | -1 | -1 | -1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | -1 | -1 | -1 | -1 | 1 | 1 | 1 |
| -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- |
+----+----+----+----+----+----+----+----+----+----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+-----+----+---------+---------+---------+---------+---------+---------+---------+---------+---------+----------+----------+----------+----------+----------+----------+----------+----------+----------+----------+----------+----+----+----+----+----+----+----+----+----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----------+

x1 to x20 represent the features converted by the code.

// Get RNA Labels
$labels = array();
$n = 1;
foreach ( $features as $k => $v ) {
$labels[$k] = "x" . $n;
$n ++;
}

Here is an online demo

Class Used:

class Profile {
public $name, $data = array(), $score, $diff, $base;

function __construct($name, array $importance) {
$values = array(-1, 1); // Perception values
$this->name = $name;
foreach ($importance as $item => $point) {
// Generate Random true/false for real Items
$this->data[$item] = $values[mt_rand(0, 1)];
}
$this->base = array_sum($importance);
}

public function setScore($score, $diff) {
$this->score = $score / $this->base;
$this->diff = $diff / $this->base;
}
}

Modified Perceptron Class

class Perceptron {
private $w = array();
private $dw = array();
public $debug = false;

private function initialize($colums) {
// Initialize perceptron vars
for($i = 1; $i <= $colums; $i ++) {
// weighting vars
$this->w[$i] = 0;
$this->dw[$i] = 0;
}
}

function train($input, $alpha, $teta) {
$colums = count($input[0]) - 1;
$weightCache = array_fill(1, $colums, 0);
$checkpoints = array();
$keepTrainning = true;

// Initialize RNA vars
$this->initialize(count($input[0]) - 1);
$just_started = true;
$totalRun = 0;
$yin = 0;

// Trains RNA until it gets stable
while ($keepTrainning == true) {
// Sweeps each row of the input subject
foreach ($input as $row_counter => $row_data) {
// Finds out the number of columns the input has
$n_columns = count($row_data) - 1;

// Calculates Yin
$yin = 0;
for($i = 1; $i <= $n_columns; $i ++) {
$yin += $row_data["x" . $i] * $weightCache[$i];
}

// Calculates Real Output
$Y = ($yin <= 1) ? - 1 : 1;

// Sweeps columns ...
$checkpoints[$row_counter] = 0;
for($i = 1; $i <= $n_columns; $i ++) {
/** DELTAS **/
// Is it the first row?
if ($just_started == true) {
$this->dw[$i] = $weightCache[$i];
$just_started = false;
// Found desired output?
} elseif ($Y == $row_data["o"]) {
$this->dw[$i] = 0;
// Calculates Delta Ws
} else {
$this->dw[$i] = $row_data["x" . $i] * $row_data["o"];
}

/** WEIGHTS **/
// Calculate Weights
$this->w[$i] = $this->dw[$i] + $weightCache[$i];
$weightCache[$i] = $this->w[$i];

/** CHECK-POINT **/
$checkpoints[$row_counter] += $this->w[$i];
} // END - for

foreach ($this->w as $index => $w_item) {
$debug_w["W" . $index] = $w_item;
$debug_dw["deltaW" . $index] = $this->dw[$index];
}

// Special for script debugging
$debug_vars[] = array_merge($row_data, array(
"Bias" => 1,
"Yin" => $yin,
"Y" => $Y
), $debug_dw, $debug_w, array(
"deltaBias" => 1
));
} // END - foreach

// Special for script debugging
$empty_data_row = array();
for($i = 1; $i <= $n_columns; $i ++) {
$empty_data_row["x" . $i] = "--";
$empty_data_row["W" . $i] = "--";
$empty_data_row["deltaW" . $i] = "--";
}
$debug_vars[] = array_merge($empty_data_row, array(
"o" => "--",
"Bias" => "--",
"Yin" => "--",
"Y" => "--",
"deltaBias" => "--"
));

// Counts training times
$totalRun ++;

// Now checks if the RNA is stable already
$referer_value = end($checkpoints);
// if all rows match the desired output ...
$sum = array_sum($checkpoints);
$n_rows = count($checkpoints);
if ($totalRun > 1 && ($sum / $n_rows) == $referer_value) {
$keepTrainning = false;
}
} // END - while

// Prepares the final result
$result = array();
for($i = 1; $i <= $n_columns; $i ++) {
$result["w" . $i] = $this->w[$i];
}

$this->debug($this->print_html_table($debug_vars));

return $result;
} // END - train
function testCase($input, $results) {
// Sweeps input columns
$result = 0;
$i = 1;
foreach ($input as $column_value) {
// Calculates teste Y
$result += $results["w" . $i] * $column_value;
$i ++;
}
// Checks in each class the test fits
return ($result > 0) ? true : false;
} // END - test_class

// Returns the html code of a html table base on a hash array
function print_html_table($array) {
$html = "";
$inner_html = "";
$table_header_composed = false;
$table_header = array();

// Builds table contents
foreach ($array as $array_item) {
$inner_html .= "<tr>\n";
foreach ( $array_item as $array_col_label => $array_col ) {
$inner_html .= "<td>\n";
$inner_html .= $array_col;
$inner_html .= "</td>\n";

if ($table_header_composed == false) {
$table_header[] = $array_col_label;
}
}
$table_header_composed = true;
$inner_html .= "</tr>\n";
}

// Builds full table
$html = "<table border=1>\n";
$html .= "<tr>\n";
foreach ($table_header as $table_header_item) {
$html .= "<td>\n";
$html .= "<b>" . $table_header_item . "</b>";
$html .= "</td>\n";
}
$html .= "</tr>\n";

$html .= $inner_html . "</table>";

return $html;
} // END - print_html_table

// Debug function
function debug($message) {
if ($this->debug == true) {
echo "<b>DEBUG:</b> $message";
}
} // END - debug
} // END - class

Conclusion

Identifying a user without a Unique Identifier is not a straight-forward or simple task. it is dependent upon gathering a sufficient amount of Random Data which you are able to gather from the user by a variety of methods.

Even if you choose not to use an Artificial Neural Network, I suggest at least using a Simple Probability Matrix with priorities and likelihoods - and I hope the code and examples provided above give you enough to go on.

Identifying users without cookies etc

You could use the browser fingerprint.

The browser fingerprint is an identifier generated from the information that every browser sends on every connection (HTTP headers) and additional information available through basic JavaScript.

Information like:

  • User agent
  • Language
  • Installed plugins
  • Screen resolution
  • ... and more.

A browser fingerprint identification isn't bulletproof because there are self-defense tactics but it can spice up your recipe. Despite its controversy, it's widely used.

Mozilla has a great wiki article about the subject.

And you can check your own browser fingerprint at https://panopticlick.eff.org/

Local Storage vs Cookies: save and send user credentials via websockets

If you use localStorage, the user's credentials will be stored (presumable unencrypted unless you implement this yourself) on the user's local machine. These records will not expire unless you write your application to do this. As such your user would be logged in forever, not just if they have another tab open, unless you also wrote logic for that. But there is no reason to do all this additional work.

Cookies are already frequently used to accomplish this functionality. Inside the cookie you should store a session token, which identifies the user's session uniquely. Cookies have the advantage of automatic expiration, and are automatically passed to the server with each HTTP request. For more information about the differences between cookies and localStorage, take a look at this thread.

Use Cookies to storage personalization info or setting preferences: does it persist when the user logged on to a new device

Cookies are not persistent

Cookies are temporary local storage. They will not persist if the user is using a different device. They will not persist if the user is using a different browser on the same device. They may persist on the same computer for some time, but that's not guaranteed and you should not rely on that. They will not persist on the same device for a prolonged time as it's reasonable and common to clear them periodically no matter if they have reached the expiry date that you have used, for example, privacy-focused settings that remove all cookies whenever closing the browser.

You can assume that cookies will persist throughout a single browsing session, and that's it. If you want to persist some data, you have to do it on your systems. Various types of databases are a common solution for that.

Flask Login/Session without cookies?

HTTP is a stateless protocol. Your only option to track a login session is by somehow tying a request to a specific user. Unless browsers start sending some other unique identifying piece of information, cookies are your best option. Another alternative is to use Basic authentication, where the user is asked to enter a username and password in a standard dialog box (this can't be styled) and the browser will then send this data (unencrypted) along with every request.

All other techniques are far more involved, see User recognition without cookies or local storage for example. Also see How can I uniquely identify an user when cookies are not an option? for more options.

Flask-Login does support Basic Authentication; the documentation covers two different techniques for supporting this option. If you do choose to use this, make sure your site is only accessible over HTTPS encryption to prevent the username / password combination from being stolen, letting someone else log in.

Are cookies fit for just client side usage?

LocalStorage
is the way to go.
Just have a look at this great presentation.



Related Topics



Leave a reply



Submit