prevent my data from affecting google analytics

I recently launched a website and I'm using google analytics to track traffic and trends. Unfortunately I like to check on the site a lot myself along with my business partner, and our data is affecting our google analytics! Is there a simple way to make it such that certain user activity won't get tracked by google analytics?

I'm thinking of maybe attaching another domain name (or subdomain) to my site that I can access my website through. If I do that, will that still get logged?

Is there some other sort of trickery I can use so that my data will be uncorrupted by my own usage?

Thanks!

Every page that is to be tracked needs to have a small script from google in it, in order for google to receive tracking data.

If you don't run that script, google gets no data.

If your web pages are produced dynamically by PHP, django, or similar, then you could decide when to print google's script into these dynamic pages dynamically, and not do it when requests are made from IP addresses whitelisted by the company.

But if your pages are static, that is more of a problem.

For static sites, you can create a special page that contains a link to any "most used" pages by insiders, and then check document.referer to decide whether to run that script.

But that won't work for rich static sites where the internal user starts clicking around. For those, I would recommend running google's script with a setTimeout() delay, and calling clearTimeout() if the incoming user confirms they are an internal user. This might check a "secret handshake" of the form "click the company logo in the first 5 sec and you dont get tracked" to call the clearTimeout().

Alternatively, within the analytics.google.com data portal you can filter data on "network domain" of visitors. If you have your own corporate name showing as the network domain, instead of "verizon" or the isp name, then you are set for writing an exclusion filter. Otherwise you could use city or state, though that would filter out more.

this is so-called referral spam. i won't go into details here as there're lots of good sources on the net about the issue(e.g. this, this and this one).
i've handled this issue like this:

  1. go to your app view in the google analytics.
  2. open the 'admin' tab.
  3. click on 'view settings'.
  4. there will be section called 'bot filtering', check the 'exclude all hits from known bots and spiders' option.

from now on google will remove the spam hits from their analytics. this solution has two issues, though:

  1. historical data will not be affected by this option, i.e. spam hits which were made in the past will remain in your data, google will only filter your future hits.
  2. google promises to remove hits from known bots, which means that the time from the new bot appearance to the moment when it will be included in the google filter list can be indefinitely long. i use this solution for the last week though and didn't find any new bots breaking through the filter.

after lots of research, trials and error, i found something that finally works. i added this code at the beginning of oncreat() in my mainacticity (launch activity). i hope that helps others!

    mfirebaseanalytics = firebaseanalytics.getinstance(this);

    string testlabsetting =
            settings.system.getstring(getcontentresolver(), "firebase.test.lab");
    if ("true".equals(testlabsetting)) {
        //you are running in test lab

        mfirebaseanalytics.setanalyticscollectionenabled(false);  //disable analytics collection
        toast.maketext(getapplicationcontext(), "disabling analytics collection ", toast.length_long).show();
    }

reference code in firebase docs

it's in the json file with the key "client_email" (at least in all my key files), and you should find it in the "iam and admin" section of the api console (or cloud console as it is apparently called now).

however you do not need to enter this anywhere. this is a typo, or a holdover from a previous version of the api. the analytics module will take the email address from the json file.

sampling

sampling tends to occur when you have high number of sessions or events for a given time period. options to handle sampling:

  • shorten the date range.
  • reduce number of dimensions.
  • increase the samplinglevel.

take the guess work out things and verify if your results containsampleddata by checking the response for the field containssampleddata. also in your query you are requesting today's data, in the ui by default they show you up to yesterday's data. today's data is still coming in, so depending on when you query the api you will get a different answer for the number of sessions.

api errors:

there are some issues with your code. i would suggest looking at some of the examples in the docs and look at the reference docs to understand how the api is structured. for example you need to pass in the optional parameters as an array:

foreach ($profilesarray as $p) {
  $optparams = array(
      'dimensions' => 'ga:source,ga:keyword',
      'sort' => '-ga:sessions,ga:source',
      'filters' => 'ga:medium==organic',
      'max-results' => '25',
      'samplinglevel' => 'higher_precision');

  $results = $analytics->data_ga->get(
      'ga:' + $p,
      '7daysago',
      'today',
      'ga:sessions',
      $optparams);

  ...
  // do something with the $results.
}

words of warning, the api is subject to limits and quotas, so if you have more than 10 views (profiles) your api will return a ratelimiting error for querying too quickly. it is good practice to implement rate limiting and exponential backoff.

migrate to analytics reporting api v4

we all like to have the shiny new toy. go ahead think about migrating to analytics reporting api v4. you've already done the hard work of figuring out oauth, and they made available a great migration guide

stackoverflow advice

stackoverflow is a great place to get help with your implementations, and you did great job of including your code (you'd be surprised how many people don't). i would also recommend including your error responses, stacktraces as well as the resources you've seen online.


Tags: Google Analytics