为您推荐:
精华内容
最热下载
问答
  • 161KB weixin_42097967 2021-05-21 13:31:01
  • 1.52MB qq_36282404 2019-11-08 20:21:34
  • 34KB weixin_38715094 2021-01-11 02:50:50
  • 17KB weixin_42122340 2021-05-27 06:03:12
  • 4.06MB xjj800211 2021-01-13 09:12:05
  • goaccess - Fast web log analyzer and interactive viewer. SYNOPSIS goaccess [filename] [ options ... ] [-c][-M][-H][-q][-d][...] DESCRIPTION goaccess is a free (MIT Licensed) and open source real...

    Manual Page

    NAME

    goaccess - Fast web log analyzer and interactive viewer.

    SYNOPSIS

    goaccess [filename] [ options ... ] [-c][-M][-H][-q][-d][...]
    

    DESCRIPTION

    goaccess is a free (MIT Licensed) and open source real-time web log analyzer and interactive viewer that runs in a terminal in *nix systems or through your browser.

    It provides fast and valuable HTTP statistics for system administrators that require a visual server report on the fly. GoAccess parses the specified web log file and outputs the data to the X terminal. Features include:

    • General Statistics: This panel gives a summary of several metrics, some of them are: number of valid and invalid requests, time taken to analyze the data set, unique visitors, requested files, static files (CSS, ICO, JPG, etc) HTTP referrers, 404s, size of the parsed log file and bandwidth con‐ sumption.
    • Unique visitors: This panel shows metrics such as hits, unique visitors and cumulative bandwidth per date. HTTP requests containing the same IP, the same date, and the same user agent are considered a unique visitor. By default, it includes web crawlers/spiders.
      Optionally, date specificity can be set to the hour level using --date-spec=hr which will display dates such as 05/Jun/2016:16. This is great if you want to track your daily traffic at the hour level.
    • Requested files: This panel displays the most highly requested (non-static) files on your web server. It shows hits, unique visitors, and percentage, along with the cumulative bandwidth, protocol, and the request method used.
    • Requested static files: Lists the most frequently static files such as: JPGCSSSWFJSGIF, and PNG file types, along with the same metrics as the last panel. Additional static files can be added to the configuration file.
    • 404 or Not Found: Displays the same metrics as the previous request panels, however, its data contains all pages that were not found on the server, or commonly known as 404 status code.
    • Hosts: This panel has detailed information on the hosts themselves. This is great for spotting aggressive crawlers and identifying who's eating your bandwidth.
      Expanding the panel can display more information such as host's reverse DNS lookup result, country of origin and city. If the -a argument is enabled, a list of user agents can be displayed by selecting the desired IP address, and then pressing ENTER.
    • Operating Systems: This panel will report which operating system the host used when it hit the server. It attempts to provide the most specific version of each operating system.
    • Browsers: This panel will report which browser the host used when it hit the server. It attempts to provide the most specific version of each browser.
    • Visit Times: This panel will display an hourly report. This option displays 24 data points, one for each hour of the day.
      Optionally, hour specificity can be set to the tenth of a minute level using --hour-spec=min which will display hours as 16:4 This is great if you want to spot peaks of traffic on your server.
    • Virtual Hosts: This panel will display all the different virtual hosts parsed from the access log. This panel is displayed if %v is used within the log-format string.
    • Referrers URLs: If the host in question accessed the site via another resource, or was linked/diverted to you from another host, the URL they were referred from will be provided in this panel. See `--ignore-panel` in your configuration file to enable it. (disabled by default)
    • Referring Sites: This panel will display only the host part but not the whole URL. The URL where the request came from.
    • Keyphrases: It reports keyphrases used on Google search, Google cache, and Google translate that have led to your web server. At present, it only supports Google search queries via HTTP. See `--ignore-panel` in your configuration file to enable it. (disabled by default)
    • Geo Location: Determines where an IP address is geographically located. Statistics are broken down by continent and country. It needs to be compiled with GeoLocation support.
    • HTTP Status Codes: The values of the numeric status code to HTTP requests.
    • Remote User (HTTP authentication) This is the userid of the person requesting the document as determined by HTTP authentication. If the document is not password protected, this part will be "-" just like the previous one. This panel is not enabled unless %e is given within the log-format variable.
    • Cache Status If you are using caching on your server, you may be at the point where you want to know if your request is being cached and served from the cache. This panel shows the cache status of the object the server served. This panel is not enabled unless %C is given within the log-format variable. The status can be either MISSBYPASSEXPIREDSTALEUPDATINGREVALIDATED or HIT

    NOTE: Optionally and if configured, all panels can display the average time taken to serve the request.

    STORAGE

    There are three storage options that can be used with GoAccess. Choosing one will depend on your environment and needs.

    Default Hash Tables

    In-memory storage provides better performance at the cost of limiting the dataset size to the amount of available physical memory. GoAccess uses in-memory hash tables. It has very good memory usage and pretty good performance. This storage has support for on-disk persistence.

    CONFIGURATION

    Multiple options can be used to configure GoAccess. For a complete up-to-date list of configure options, run ./configure --help

    --enable-debug

    Compile with debugging symbols and turn off compiler optimizations.

    --enable-utf8

    Compile with wide character support. Ncursesw is required.

    --enable-geoip=<legacy|mmdb>

    Compile with GeoLocation support. MaxMind's GeoIP is required. legacy will utilize the original GeoIP databases. mmdb will utilize the enhanced GeoIP2 databases.

    --with-getline

    Dynamically expands line buffer in order to parse full line requests instead of using a fixed size buffer of 4096.

    --with-openssl

    Compile GoAccess with OpenSSL support for its WebSocket server.

    OPTIONS

    The following options can be supplied via the command line or long options through the configuration file.

    LOG/DATE/TIME FORMAT

    • --time-format <timeformat>

      The time-format variable followed by a space, specifies the log format time containing any combination of regular characters and special format specifiers. They all begin with a percentage (%) sign. See `man strftime`. %T or %H:%M:%S.

    • --date-format <dateformat>

      The date-format variable followed by a space, specifies the log format date containing any combination of regular characters and special format specifiers.They all begin with a percentage (%) sign. See `man strftime`.

    • --log-format <logformat>

      The log-format variable followed by a space or \t for tab-delimited, specifies the log format string.

      In addition to specifying the raw log/date/time formats, for simplicity, any of the following predefined log format names can be supplied to the log/date/time-format variables. GoAccess can also handle one predefined name in one variable and another predefined name in another variable.

      COMBINED     | Combined Log Format
      VCOMBINED    | Combined Log Format with Virtual Host
      COMMON       | Common Log Format
      VCOMMON      | Common Log Format with Virtual Host
      W3C          | W3C Extended Log File Format
      SQUID        | Native Squid Log Format
      CLOUDFRONT   | Amazon CloudFront Web Distribution
      CLOUDSTORAGE | Google Cloud Storage
      AWSELB       | Amazon Elastic Load Balancing
      AWSS3        | Amazon Simple Storage Service (S3)
      

      Note: Generally, you need quotes around values that include white spaces, commas, pipes, quotes, and/or brackets. Inner quotes must be escaped.

      Note: Piping data into GoAccess won't prompt a log/date/time configuration dialog, you will need to previously define it in your configuration file or in the command line.

    USER INTERFACE OPTIONS

    See configuration file for a sample color scheme.

    • -c --config-dialog

      Prompt log/date configuration window on program start.

    • -i --hl-header

      Color highlight active panel.

    • -m --with-mouse

      Enable mouse support on main dashboard.

    • --color=<fg:bg[attrs, PANEL>

      Specify custom colors for the terminal output.

      Color Syntax:

      DEFINITION space/tab colorFG#:colorBG# [attributes,PANEL]
      FG# = foreground color [-1...255] (-1 = default term color)
      BG# = background color [-1...255] (-1 = default term color)
      

      Optionally, it is possible to apply color attributes (multiple attributes are comma separated), such as: bold,underline,normal,reverse,blink

      If desired, it is possible to apply custom colors per panel, that is, a metric in the REQUESTS panel can be of color A, while the same metric in the BROWSERS panel can be of color B.

      • COLOR_MTRC_HITS
      • COLOR_MTRC_VISITORS
      • COLOR_MTRC_DATA
      • COLOR_MTRC_BW
      • COLOR_MTRC_AVGTS
      • COLOR_MTRC_CUMTS
      • COLOR_MTRC_MAXTS
      • COLOR_MTRC_PROT
      • COLOR_MTRC_MTHD
      • COLOR_MTRC_HITS_PERC
      • COLOR_MTRC_HITS_PERC_MAX
      • COLOR_MTRC_VISITORS_PERC
      • COLOR_MTRC_VISITORS_PERC_MAX
      • COLOR_PANEL_COLS
      • COLOR_BARS
      • COLOR_ERROR
      • COLOR_SELECTED
      • COLOR_PANEL_ACTIVE
      • COLOR_PANEL_HEADER
      • COLOR_PANEL_DESC
      • COLOR_OVERALL_LBLS
      • COLOR_OVERALL_VALS
      • COLOR_OVERALL_PATH
      • COLOR_ACTIVE_LABEL
      • COLOR_BG
      • COLOR_DEFAULT
      • COLOR_PROGRESS
    • --color-scheme <1|2|3>

      Choose among terminal color schemes. 1 for the monochrome scheme. 2 for the green scheme and 3 for the Monokai scheme (shown only if terminal supports 256 colors).

    • --crawlers-only

      Parse and display only crawlers (bots).

    • --html-custom-css=<path.css>

      Specifies a custom CSS file path to load in the HTML report.

    • --html-custom-js=<path.js>

      Specifies a custom JS file path to load in the HTML report.

    • --html-report-title=<title>

      Set HTML report page title and header.

    • --html-prefs=<JSON>

      Set HTML report default preferences. Supply a valid JSON object containing the HTML preferences. It allows the ability to customize each panel plot. See example below.

      --html-prefs='{"theme":"bright","perPage":5,"layout":"horizontal","showTables":true,"visitors":{"plot":{"chartType":"bar"}}}'
      
      Note: Note: The JSON object passed needs to be a one line JSON string. For instance,
    • --json-pretty-print

      Format JSON output using tabs and newlines.

    • --max-items=<num>

      The maximum number of items to display per panel. The maximum can be a number between 1 and n.
      Note: Only a static HTML, CSV and JSON output allow a maximum number greater than the default value of 366 (or 50 in the real-time HTML output) items per panel.

    • --no-color

      Turn off colored output. This is the default output on terminals that do not support colors.

    • --no-column-names

      Don't write column names in the terminal output. By default, it displays column names for each available metric in every panel.

    • --no-csv-summary

      Disable summary metrics on the CSV output.

    • --no-progress

      Disable progress metrics [total requests/requests per second] when parsing a log.

    • --no-tab-scroll

      Disable scrolling through panels when TAB is pressed or when a panel is selected using a numeric key.

    • --no-html-last-updated

      Do not show the last updated field displayed in the HTML generated report.

    • --no-parsing-spinner

      Do now show the progress metrics and parsing spinner.

    SERVER OPTIONS

    • --addr=<address>

      Specify IP address to bind the server to. Otherwise it binds to 0.0.0.0.

      Usually there is no need to specify the address, unless you intentionally would like to bind the server to a different address within your server.

    • --daemonize

      Run GoAccess as daemon (only if --real-time-html enabled).

    • --user-name=<username>

      Run GoAccess as the specified user.

      Note: Note: It's important to ensure the user or the users' group can access the input and output files as well as any other files needed. Other groups the user belongs to will be ignored. As such it's advised to run GoAccess behind a SSL proxy as it's unlikely this user can access the SSL certificates.

    • --origin=<url>

      Ensure clients send the specified origin header upon the WebSocket handshake. The specified origin should look exactly to the origin header field sent by the browser. e.g., --origin=http://goaccess.io

    • --pid-file=<path/goaccess.pid>

      Write the daemon PID to a file when used along the --daemonize option.

    • --port=<port>

      Specify the port to use. By default GoAccess listens on port 7890 for the WebSocket server. Ensure this port is opened.

    • --real-time-html

      Enable real-time HTML output.

    • --ws-url=<[scheme://]url[:port]>

      URL to which the WebSocket server responds. This is the URL supplied to the WebSocket constructor on the client side.

      Optionally, it is possible to specify the WebSocket URI scheme, such as ws:// or wss:// for unencrypted and encrypted connections. e.g., wss://goaccess.io

      If GoAccess is running behind a proxy, you could set the client side to connect to a different port by specifying the host followed by a colon and the port. e.g., goaccess.io:9999

      By default, it will attempt to connect to the generated report's hostname. If GoAccess is running on a remote server, the host of the remote server should be specified here. Also, make sure it is a valid host and NOT an http address.

    • --fifo-in=<path/file>

      Creates a named pipe (FIFO) that reads from on the given path/file.

    • --fifo-out=<path/file>

      Creates a named pipe (FIFO) that writes to the given path/file.

    • --ssl-cert=<path/cert.crt>

      Path to TLS/SSL certificate. In order to enable TLS/SSL support, GoAccess requires that --ssl-cert and --ssl-key are used.
      Only if configured using --with-openssl

    • --ssl-key=<path/priv.key>

      Path to TLS/SSL private key. In order to enable TLS/SSL support, GoAccess requires that --ssl-cert and --ssl-key are used.
      Only if configured using --with-openssl

    FILE OPTIONS

    • -

      The log file to parse is read from stdin.

    • -f --log-file=<logfile>

      Specify the path to the input log file. If set in the config file, it will take priority over -f from the command line.

    • -l --log-debug=<filename>

      Send all debug messages to the specified file. Needs to be configured with --enable-debug

    • -p --config-file=<configfile>

      Specify a custom configuration file to use. If set, it will take priority over the global configuration file (if any).

    • --invalid-requests=<filename>

      Log invalid requests to the specified file.

    • --no-global-config

      Do not load the global configuration file. This directory should normally be /usr/etc//etc/ or /usr/local/etc/, unless specified with --sysconfdir=/dir at the time of running ./configure

    PARSE OPTIONS

    • -a --agent-list

      Enable a list of user-agents by host. For faster parsing, do not enable this flag.

    • -d --with-output-resolver

      Enable IP resolver on the HTML or JSON output.

    • -e --exclude-ip <IP|IP-range>

      Exclude an IPv4 or IPv6 from being counted. Ranges can be included as well using a dash in between the IPs (start-end).

      • exclude-ip 127.0.0.1
      • exclude-ip 192.168.0.1-192.168.0.100
      • exclude-ip ::1
      • exclude-ip 0:0:0:0:0:ffff:808:804-0:0:0:0:0:ffff:808:808

       

    • -H --http-protocol=<yes|no>

      Set/unset HTTP request protocol. This will create a request key containing the request protocol + the actual request.

    • -M --http-method=<yes|no>

      Set/unset HTTP request method. This will create a request key containing the request method + the actual request.

    • -o --output=<json|csv|html>

      Write output to stdout given one of the following files and the corresponding extension for the output format:

      • /path/file.csv - Comma-separated values (CSV)
      • /path/file.json - JSON (JavaScript Object Notation)
      • /path/file.html - HTML

       

    • -q --no-query-string

      Ignore request's query string. i.e., www.google.com/page.htm?query => www.google.com/page.htm
      Note: Removing the query string can greatly decrease memory consumption, especially on timestamped requests.

    • -r --no-term-resolver

      Disable IP resolver on terminal output.

    • --444-as-404

      Treat non-standard status code 444 as 404.

    • --4xx-to-unique-count

      Add 4xx client errors to the unique visitors count.

    • --anonymize-ip

      Anonymize the client IP address. The IP anonymization option sets the last octet of IPv4 user IP addresses and the last 80 bits of IPv6 addresses to zeros. e.g.,

      192.168.20.100 => 192.168.20.0
      2a03:2880:2110:df07:face:b00c::1 => 2a03:2880:2110:df07::
      

       

    • --all-static-files

      Include static files that contain a query string.

    • --browsers-file=<path>

      By default GoAccess parses an "essential/basic" curated list of browsers & crawlers. If you need to add additional browsers, use this option. Include an additional tab delimited list of browsers/crawlers/feeds etc. See config/browsers.list.
      Note: The SIZE of the list is proportional to the run time. Thus, the longer the list, the more time GoAccess will take to parse it.

    • --date-spec=<date|hr>

      Set the date specificity to either date (default) or hr to display hours appended to the date.
      This is used in the visitors panel. It's useful for tracking visitors at the hour level. For instance, an hour specificity would yield to display traffic as 18/Dec/2010:19

    • --double-decode

      Decode double-encoded values. This includes, user-agent, request, and referer.

    • --enable-panel=<PANEL>

      Enable parsing/displaying the given panel. List of panels:

      • VISITORS
      • REQUESTS
      • REQUESTS_STATIC
      • NOT_FOUND
      • HOSTS
      • OS
      • BROWSERS
      • VISIT_TIMES
      • VIRTUAL_HOSTS
      • REFERRERS
      • REFERRING_SITES
      • KEYPHRASES
      • STATUS_CODES
      • REMOTE_USER
      • GEO_LOCATION
    • --hide-referer<NEEDLE>

      Hide a referer but still count it. Wild cards are allowed in the needle. i.e., *.bing.com.

    • --hour-spec=<hour|min>

      Set the time specificity to either hour (default) or min to display the tenth of an hour appended to the hour.
      This is used in the time distribution panel. It's useful for tracking peaks of traffic on your server at specific times.

    • --ignore-crawlers

      Ignore crawlers.

    • --ignore-panel=<PANEL>

      Ignore parsing/displaying the given panel. List of panels:

      • VISITORS
      • REQUESTS
      • REQUESTS_STATIC
      • NOT_FOUND
      • HOSTS
      • OS
      • BROWSERS
      • VISIT_TIMES
      • VIRTUAL_HOSTS
      • REFERRERS
      • REFERRING_SITES
      • KEYPHRASES
      • STATUS_CODES
      • REMOTE_USER
      • GEO_LOCATION
    • --ignore-referer=<referer>

      Ignore referers from being counted. Wildcards allowed. e.g., *.domain.com ww?.domain.*

    • --ignore-referer-report

      Hide referers from output report only.

    • --ignore-statics=<req|panel>

      Ignore static file requests.

      • req = Only ignore request from valid requests
      • panel = Ignore request from panels.
      Note: It will count them towards the total number of requests.
    • --ignore-status=<STATUS>

      Ignore parsing and displaying one or multiple status code(s). For multiple status codes, use this option multiple times.

    • --keep-last=<ndays>

      Keep the last specified number of days in storage. This will recycle the storage tables. e.g., keep & show only the last 7 days.

    • --no-ip-validation

      Disable client IP validation. Useful if IP addresses have been obfuscated before being logged.

      Note: The log still needs to contain a placeholder for %h, usually it's a resolved IP. e.g. ord37s19-in-f14.1e100.net.
    • --num-tests=<number>

      Number of lines from the access log to test against the provided log/date/time format. By default, the parser is set to test 10 lines. If set to 0, the parser won't test any lines and will parse the whole access log. If a line matches the given log/date/time format before it reaches number, the parser will consider the log to be valid, otherwise GoAccess will return EXIT_FAILURE and display the relevant error messages.

    • --process-and-exit

      Parse log and exit without outputting data. Useful if we are looking to only add new data to the on-disk database without outputting to a file or a terminal.

    • --real-os

      Display real OS names. e.g, Windows XP, Snow Leopard.

    • --sort-panel=<PANEL,FIELD,ORDER>

      Sort panel on initial load. Sort options are separated by comma. Options are in the form: PANEL,METRIC,ORDER
      Available Metrics

      • BY_HITS Sort by hits
      • BY_VISITORS Sort by unique visitors
      • BY_DATA Sort by data
      • BY_BW Sort by bandwidth
      • BY_AVGTS Sort by average time served
      • BY_CUMTS Sort by cumulative time served
      • BY_MAXTS Sort by maximum time served
      • BY_PROT Sort by http protocol
      • BY_MTHD Sort by http method
      Available orders
      • ASC
      • DESC

       

    • --static-file=<extension>

      Add static file extension. e.g.: .mp3. Extensions are case sensitive.

    GEOLOCATION OPTIONS

    GeoIP Legacy

    Legacy GeoIP has been discontinued. If your Linux distribution does not ship with the legacy databases, you may still be able to find them through different sources. Make sure to download the .dat files.

    Distributed with Creative Commons Attribution-ShareAlike 4.0 International License. https://mailfud.org/geoip-legacy/

    # IPv4 Country database:
    # Download the GeoIP.dat.gz
    # gunzip GeoIP.dat.gz
    #
    # IPv4 City database:
    # Download the GeoIPCity.dat.gz
    # gunzip GeoIPCity.dat.gz
    
    • -g --std-geoip

      Standard GeoIP database for less memory usage.

    GeoIP2

    For GeoIP2 databases, you can use DB-IP Lite databases.

    DB-IP is licensed under a Creative Commons Attribution 4.0 International License. https://db-ip.com/db/lite.php

    Or you can download them from MaxMind https://dev.maxmind.com/geoip/geoip2/geolite2/

    # For GeoIP2 City database:
    # Download the GeoLite2-City.mmdb.gz
    # gunzip GeoLite2-City.mmdb.gz
    #
    # For GeoIP2 Country database:
    # Download the GeoLite2-Country.mmdb.gz
    # gunzip GeoLite2-Country.mmdb.gz
    
    • --geoip-database <geocityfile>

      Specify path to GeoIP database file. i.e., GeoLiteCity.dat. File needs to be downloaded from maxmind.com. IPv4 and IPv6 files are supported as well. Note: --geoip-city-data is an alias of --geoip-database.
      Note: If using GeoIP2, you will need to download the City/Country database from MaxMind and use the option --geoip-database to specify the database. Currently cities are only shown in the hosts panel (per host).

    OTHER OPTIONS

    • -h --help

      The help.

    • -V --version

      Display version information and exit.

    • -s --storage

      Display current storage method. i.e., B+ Tree, Hash.

    • --dcf

      Display the path of the default config file when -p is not used.

    PERSISTENCE STORAGE OPTIONS

    • --persist

      Persist parsed data into disk. If database files exist, files will be overwritten. This should be set to the first dataset. See examples below.

    • --restore

      Load previously stored data from disk. If reading persisted data only, the database files need to exist. See --persist and examples below.

    • --db-path <dir>

      Path where the on-disk database files are stored. The default value is the /tmp directory.

    CUSTOM LOG/DATE FORMAT

    GoAccess can parse virtually any web log format.

    Predefined options include, Common Log Format (CLF), Combined Log Format (XLF/ELF), including virtual host, W3C format (IIS) and Amazon CloudFront (Download Distribution).

    GoAccess allows any custom format string as well.

    There are two ways to configure the log format. The easiest is to run GoAccess with -c to prompt a configuration window. However this won't make it permanent, for that you will need to specify the format in the configuration file.

    The configuration file resides under: %sysconfdir%/goaccess.conf or ~/.goaccessrc

    Note %sysconfdir% is either /etc//usr/etc/ or /usr/local/etc/

    time-format The time-format variable followed by a space, specifies the log-format time containing any combination of regular characters and special format specifiers. They all begin with a percentage (%) sign. See `man strftime`. %T or %H:%M:%S.

    Note: If a timestamp is given in microseconds, %f must be used as time-format

    date-format The date-format variable followed by a space, specifies the log-format date containing any combination of regular characters and special format specifiers. They all begin with a percentage (%) sign. See `man strftime`.

    Note: If a timestamp is given in microseconds, %f must be used as date-format

    log-format The log-format variable followed by a space or \t for tab-delimited, specifies the log format string.

    SPECIFIERS

    • %x A date and time field matching the time-format and date-format variables. This is used when a timestamp is given instead of the date and time being in two separate variables.
    • %ttime field matching the time-format variable.
    • %ddate field matching the date-format variable.
    • %vThe server name according to the canonical name setting (Server Blocks or Virtual Host).
    • %eThis is the userid of the person requesting the document as determined by HTTP authentication.
    • %hhost (the client IP address, either IPv4 or IPv6)
    • %rThe request line from the client. This requires specific delimiters around the request (single quotes, double quotes, etc) to be parsable. Otherwise, use a combination of special format specifiers such as %m%U%q and %H to parse individual fields.
      • Note: Use either %r to get the full request OR %m%U%q and %H to form your request, do not use both.
    • %mThe request method.
    • %UThe URL path requested.
      • Note: If the query string is in %U, there is no need to use %q. However, if the URL path, does not include any query string, you may use %q and the query string will be appended to the request.
    • %qThe query string.
    • %HThe request protocol.
    • %sThe status code that the server sends back to the client.
    • %bThe size of the object returned to the client.
    • %RThe "Referer" HTTP request header.
    • %uThe user-agent HTTP request header.
    • %DThe time taken to serve the request, in microseconds.
    • %TThe time taken to serve the request, in seconds with milliseconds resolution.
    • %L The time taken to serve the request, in milliseconds as a decimal number.
    • %^Ignore this field.
    • %~Move forward through the log string until a non-space (!isspace) char is found.
    • ~hThe host (the client IP address, either IPv4 or IPv6) in a X-Forwarded-For (XFF) field.

     Note
    For XFF, GoAccess uses a special specifier which consists of a tilde before the host specifier, followed by the character(s) that delimit the XFF field, which are enclosed by curly braces (i.e., ~h{,"}).
    For example, ~h{," } is used in order to parse "11.25.11.53, 17.68.33.17" field which is delimited by a double quote, a comma, and a space.

     Note
    In order to get the average, cumulative and maximum time served in GoAccess, you will need to start logging response times in your web server. In Nginx you can add $request_time to your log format, or %D in Apache.

     Important
    If multiple time served specifiers are used at the same time, the first option specified in the format string will take priority over the other specifiers.

    GoAccess requires the following fields:

    • a valid IPv4/6 %h
    • a valid date %d
    • the request %r

    INTERACTIVE KEYS

    • F1 or hMain help.
    • F5Redraw main window.
    • qQuit the program, current window or collapse active module
    • o or ENTERExpand selected module or open window
    • 0-9 and Shift + 0Set selected module to active
    • jScroll down within expanded module
    • kScroll up within expanded module
    • cSet or change scheme color
    • ^ fScroll forward one screen within active module
    • ^ bScroll backward one screen within active module
    • TABIterate modules (forward)
    • SHIFT + TABIterate modules (backward)
    • sSort options for active module
    • /Search across all modules (regex allowed)
    • nFind position of the next occurrence
    • gMove to the first item or top of screen
    • Gmove to the last item or bottom of screen

    EXAMPLES

    DIFFERENT OUTPUTS

    To output to a terminal and generate an interactive report:

    # goaccess access.log
    

    To generate an HTML report:

    # goaccess access.log -a -o report.html
    

    To generate a JSON report:

    # goaccess access.log -a -d -o report.json
    

    To generate a CSV file:

    # goaccess access.log --no-csv-summary -o report.csv
    

    GoAccess also allows great flexibility for real-time filtering and parsing. For instance, to quickly diagnose issues by monitoring logs since goaccess was started:

    # tail -f access.log | goaccess -
    

    And even better, to filter while maintaining opened a pipe to preserve real-time analysis, we can make use of tail -f and a matching pattern tool such as grepawksed, etc:

    # tail -f access.log | grep -i --line-buffered 'firefox' | goaccess --log-format=COMBINED -
    

    or to parse from the beginning of the file while maintaining the pipe opened and applying a filter

    # tail -f -n +0 access.log | grep --line-buffered 'Firefox' | goaccess -o out.html --real-time-html -
    

    MULTIPLE LOG FILES

    There are several ways to parse multiple logs with GoAccess. The simplest is to pass multiple log files to the command line:

    # goaccess access.log access.log.1
    

    It's even possible to parse files from a pipe while reading regular files:

    # cat access.log.2 | goaccess access.log access.log.1 -
    

    Note that the single dash is appended to the command line to let GoAccess know that it should read from the pipe.

    Now if we want to add more flexibility to GoAccess, we can do a series of pipes. For instance, if we would like to process all compressed log files access.log.*.gz in addition to the current log file, we can do:

    # zcat access.log.*.gz | goaccess access.log -
    

    Note: On Mac OS X, use gunzip -c instead of zcat.

    REAL TIME HTML OUTPUT

    GoAccess has the ability to output real-time data in the HTML report. You can even email the HTML file since it is composed of a single file with no external file dependencies, how neat is that!

    The process of generating a real-time HTML report is very similar to the process of creating a static report. Only --real-time-html is needed to make it real-time.

    # goaccess access.log -o /usr/share/nginx/html/site/report.html --real-time-html
    

    By default, GoAccess will use the host name of the generated report. Optionally, you can specify the URL to which the client's browser will connect to. See http://goaccess.io/faq for a more detailed example.

    # goaccess access.log -o report.html --real-time-html --ws-url=goaccess.io
    

    By default, GoAccess listens on port 7890, to use a different port other than 7890, you can specify it as (make sure the port is opened):

    # goaccess access.log -o report.html --real-time-html --port=9870
    

    And to bind the WebSocket server to a different address other than 0.0.0.0, you can specify it as:

    # goaccess access.log -o report.html --real-time-html --addr=127.0.0.1
    

    Note: To output real time data over a TLS/SSL connection, you need to use --ssl-cert=<cert.crt> and --ssl-key=<priv.key>.

    WORKING WITH DATES

    Another useful pipe would be filtering dates out of the web log

    The following will get all HTTP requests starting on 05/Dec/2010 until the end of the file.

    # sed -n '/05\/Dec\/2010/,$ p' access.log | goaccess -a -
    

    or using relative dates such as yesterdays or tomorrows day:

    # sed -n '/'$(date '+%d\/%b\/%Y' -d '1 week ago')'/,$ p' access.log | goaccess -a -
    

    If we want to parse only a certain time-frame from DATE a to DATE b, we can do:

    # sed -n '/5\/Nov\/2010/,/5\/Dec\/2010/ p' access.log | goaccess -a -
    

    If we want to preserve only certain amount of data and recycle storage, we can keep only a certain number of days. For instance to keep & show the last 5 days:

    # goaccess access.log --keep-last=5
    

    VIRTUAL HOSTS

    Assuming your log contains the virtual host field. For instance:

    vhost.com:80 10.131.40.139 - - [02/Mar/2016:08:14:04 -0600] "GET /shop/bag-p-20 HTTP/1.1" 200 6715 "-" "Apache (internal dummy connection)"
    

    And you would like to append the virtual host to the request in order to see which virtual host the top urls belong to

    awk '$8=$1$8' access.log | goaccess -a -
    

    To exclude a list of virtual hosts you can do the following:

    # grep -v "`cat exclude_vhost_list_file`" vhost_access.log | goaccess -
    

    FILES, STATUS CODES & BOTS

    To parse specific pages, e.g., page views, html, htm, php, etc. within a request:

    # awk '$7~/\.html|\.htm|\.php/' access.log | goaccess -
    

    Or to parse page views with out extesion, e.g., /contact /profile/user

    # awk '$7!~/\..*$/' access.log | goaccess -
    

    Note, $7 is the request field for the common and combined log format, (without Virtual Host), if your log includes Virtual Host, then you probably want to use $8 instead. It's best to check which field you are shooting for, e.g.:

    # tail -10 access.log | awk '{print $8}'
    

    Or to parse a specific status code, e.g., 500 (Internal Server Error):

    # awk '$9~/500/' access.log | goaccess -
    

    Or multiple status codes:

    # tail -f -n +0 access.log | awk '$9~/3[0-9]{2}|5[0-9]{2}/' | goaccess -o out.html -
    

    And to get an estimated overview of how many bots (crawlers) are hitting your server:

    # tail -F -n +0 access.log | grep -i --line-buffered 'bot' | goaccess -
    

    SERVER

    Also, it is worth pointing out that if we want to run GoAccess at lower priority, we can run it as:

    # nice -n 19 goaccess access.log -a
    

    and if you don't want to install it on your server, you can still run it from your local machine:

    # ssh root@server 'cat /var/log/apache2/access.log' | goaccess -a -
    

    PROCESSING LOGS INCREMENTALLY

    GoAccess has the ability to process logs incrementally through its internal storage and dump its data to disk. It works in the following way:

    • A dataset must be persisted first with --persist, then the same dataset can be loaded with
    • --restore. If new data is passed (piped or through a log file), it will append it to the original dataset.

     Note

    GoAccess keeps track of inodes of all the files processed (assuming files will stay on the same partition) along with the last line parsed of each file and the timestamp of the last line parsed. e.g., inode:29627417|line:20012|ts:20171231235059

    If the inode does not match the current file, it parses all lines. If the current file matches the inode, it then reads the remaining lines and updates the count of lines parsed and the timestamp. As an extra precaution, it won't parse log lines with a timestamp ≤ than the one stored.

    Piped data works based off the timestamp of the last line read. For instance, it will parse and discard all incoming entries until it finds a timestamp >= than the one stored.

     Important
    Since piped data works based on a timestamp and there's no way to determine the inode nor the last line parsed, some issues could arise. For instance, a piped log could have multiple consecutive lines with the same timestamp (even at the second level), so it's likely to end up with duplicates entries. However, as a best practice and a reasonable assumption is that in most cases, for incremental log processing, users will parse data directly with goaccess instead of piping it through.

     Important
    Previous Tokyo Cabinet database files are not compatible with the new database files. You will need to parse your logs from scratch.

    Examples

    // last month access log
    # goaccess access.log.1 --persist
    

    then, load it with

    // append this month access log, and preserve new data
    # goaccess access.log --restore --persist
    

    To read persisted data only (without parsing new data)

    # goaccess --restore
    

    NOTES

    Each active panel has a total of 366 items or 50 in the real-time HTML report. The number of items is customizable using max-items However, only the CSV and JSON output allow a maximum number greater than the default value of 366 items per panel.

    A hit is a request (line in the access log), e.g., 10 requests = 10 hits. HTTP requests with the same IP, date, and user agent are considered a unique visit.

    BUGS

    If you think you have found a bug, please send me an email to 

    AUTHOR

    Gerardo Orellana. For more details about it, or new releases, please visit http://goaccess.io

    展开全文
    allway2 2020-08-15 07:52:55
  • 1.7MB bigicy 2018-03-26 17:02:44
  • GoAccess - 可视化 Web 日志分析工具。 语法 goaccess [filename] [ options ... ] [-c][-M][-H][-q][-d][...] 描述 GoAccess 是一款开源(MIT许可证)的且具有交互视图界面的实时 Web 日志分析...

    名字

    GoAccess - 可视化 Web 日志分析工具。

    语法

                        goaccess [filename] [ options ... ] [-c][-M][-H][-q][-d][...]
                    

    描述

                    GoAccess 是一款开源(MIT许可证)的且具有交互视图界面的实时 Web 日志分析工具,通过你的 Web 浏览器或者 *nix 系统下的终端程序即可访问。                
    
                    能为系统管理员提供快速且有价值的 HTTP 统计,并以在线可视化服务器的方式呈现。                   GoAccess 解析指定的 Web 日志文件并将统计结果输出到 X 终端。功能如下:                 
    •                       **通用统计:**                       此面板展示了几个主要指标,比如:有效和无效请求的数量,分析这些数据所花费的时间,独立访客的情况,请求的文件,静态文件(CSS, ICO, JPG 等)的完整URL,404错误,被解析的日志文件的大小以及消耗的带宽。                     
    •                       **独立访客:**                       此面板按照日期展示了访问次数,独立访客数,以及累计消耗的带宽等指标。具有相同IP,相同访问时间,相同的 UserAgent 的 HTTP 请求将会被识别为独立访客。默认情况下包含了网络爬虫。                         
                            您也可以选择使用 --date-spec=hr 参数将按照日期分析修改为按照小时,例如:05/Jun/2016:16 。这对于希望在小时级别去跟踪每日流量非常有帮助。                     
    •                       **请求的文件:**                      此面板展示您服务器上被请求次数最多的文件。包含访问次数,独立访客数,百分比,累计消耗带宽,使用的协议,请求方式。                    
    •                       **请求的静态文件:**                        列出请求频率最高的静态文件类型,例如: `JPG`, `CSS`, `SWF`, `JS`, `GIF`, 和 `PNG` , 以及和上一个面板一样的其他指标。 另外静态文件可以被添加到配置文件中。                     
    •                       **404 或者文件未找到:**                        展示内容与之前的面板类似,但是其数据包含了所有未找到的页面,以及俗称的 404 状态码。                    
    •                       **主机:**                         此面板展示主机自身的详细信息。能够很好的发现不怀好意的爬虫以及识别出是谁吃掉了你的带宽。                        
                            扩展面板将向您展示更多信息,比如主机的反向DNS解析结果,主机所在国家和城市。如果开启了 参数,选择想查看的 IP 地址并按回车,将会显示 UserAgent 列表。                     
    •                       **操作系统:**                       此面板将显示主机使用的操作系统的信息。GoAccess 将尽可能尝试为每一款操作系统提供详细的信息。                  
    •                       **浏览器:**                        此面板将显示来访主机使用的浏览器信息。GoAccess 将尽可能尝试为每一款浏览器提供详细的信息。                   
    •                       **访问次数:**                       此面板按小时报告。因此将显示24个数据点,每一个均对应每一天的某一个小时。                       
                            使用 --hour-spec=min 参数可以设定为按每十分钟报告,并将以 16:4 的格式显示时间。这对发现服务器的峰值访问时段很有帮助。                  
    •                       **虚拟主机:**                       此面板将显示从访问日志中解析出来的不同的虚拟主机的情况。此面板仅在日志格式中启用了 %v 参数时显示。                     
    •                       **来路URL:**                      如果问题主机通过其他的资源访问了你的站点,以及通过从其他主机上的链接或者跳转到你的站点,则这些来路URL将会被显示在此面板。可以在配置文件中通过 ``--ignore-panel`` 开启此功能。**(默认关闭)**                    
    •                       **来路站点:**                       此面板将仅显示主机的部分,而不是完整的URL。                     
    •                       **关键字:**                        报告支持用在谷歌搜索,谷歌缓存,谷歌翻译上使用关键字。目前仅支持通过 HTTP 使用谷歌搜索。 可以在配置文件中通过 ``--ignore-panel`` 开启此功能。**(默认关闭)**                  
    •                       **地理位置:**                       根据 IP 地址判断地理位置。统计数据按照大洲和国家分组。需要地理位置模块的支持。                   
    •                       **HTTP 状态码:**                       以数字表示的 HTTP 请求的状态编码。                    
    •                       **远程用户(HTTP验证)**                        通过 HTTP 验证来确定访问文档的权限。如果文档没有被密码保护起来,这部分将会显示为 “-”。此面板默认为开启,除非在日志格式变量中设置了参数 `%e` 。                     
      
                    注意:                     如果配置了可选项,所有面板将显示处理请求的平均时间消耗。                

    存储

                    GoAccess 支持三种类型的存储方式。请根据你的需要和系统环境进行选择。              

    默认哈希表

                      内存哈希表可以提供较好的性能,缺点是数据集的大小受限于物理内存的大小。GoAccess 默认使用内存哈希表。如果你的内存可以装下你的数据集,那么这种模式的表现非常棒。此模式具有非常好的内存利用率和性能表现。                 

    Tokyo Cabinet 磁盘 B+ 树

                      使用这种模式来处理巨大的数据集,大到不可能在内存中完成任务。当数据提交到磁盘以后,B+树数据库比任何一种哈希数据库都要慢。但是,使用 SSD 可以极大的提高性能。往后您可能需要快速载入保存的数据,那么这种方式就可以被使用。                 

    Tokyo Cabinet 内存哈希表

                      作为默认哈希表的替换方案。因为使用通用类型在内存表现以及速度方面都很平均。               


    配置

                    GoAccess 拥有多个配置选项。获取完整的最新配置选项列表,请运行:*./configure --help*                
    • --enable-debug

      使用调试标志编译且关闭编译器优化。

    • --enable-utf8

      宽字符支持。依赖 Ncursesw 模块。

    • --enable-geoip=<legacy|mmdb>

      地理位置支持。依赖 MaxMind GeoIP 模块。 legacy 将使用原始 GeoIP 数据库。 mmdb 将使用增强版 GeoIP2 数据库。

    • --enable-tcb=<memhash|btree>

      Tokyo Cabinet 存储支持。 memhash 将使用 Tokyo Cabinet 的内存哈希数据库。btree 将使用 Tokyo Cabinet 的磁盘 B+Tree 数据库。

    • --disable-zlib

      禁止在 B+Tree 数据库上使用 zlib 压缩。

    • --disable-bzip

      禁止在 B+Tree 数据库上使用 bzip2 压缩。

    • --with-getline

      使用动态扩展行缓冲区用来解析完整的行请求,否则将使用固定大小(4096)的缓冲区。

    • --with-openssl

      使 GoAccess 与其 WebSocket 服务器之间的通信能够支持 OpenSSL。

    选项

                    下面的选项可以通过命令行使用,如果是长选项则通过配置文件。               

    日志/日期/时间 格式

    • --time-format

                              参数 time-format 后跟随一个空格符,指定日志的时间格式,包含普通字符与特殊格式说明符的任意组合。他们都由百分号 (%)开始。参考 `man strftime`。 %T 或者 %H:%M:%S.                      
    • --date-format

                              参数 date-format 后跟随一个空格符,指定日志的日期格式,包含普通字符与特殊格式说明符的任意组合。他们都由百分号 (%)开始。参考 `man strftime`。                      
    • --log-format

                              参数 log-format 后跟随一个空格符或者制表分隔符(`\t`),用于指定日志字符串格式。                        
      
                              另外可以指定原始 日志/日期/时间 格式,简单来说,下表中的预定义日志格式名称可以用作 日志/日期/时间 格式的变量。GoAccess 可以处理在一个变量中处理一个预定义名称,而在另一变量中处理另外一个预定义名称。                         
      COMBINED     | 联合日志格式
      VCOMBINED    | 支持虚拟主机的联合日志格式
      COMMON       | 通用日志格式
      VCOMMON      | 支持虚拟主机的通用日志格式
      W3C          | W3C 扩展日志格式
      SQUID        | Native Squid 日志格式
      CLOUDFRONT   | 亚马逊 CloudFront Web 分布式系统
      CLOUDSTORAGE | 谷歌云存储
      AWSELB       | 亚马逊弹性负载均衡
      AWSS3        | 亚马逊简单存储服务 (S3)
                              注意: 一般来说,需要在包含空格、逗号、管道符、引号、/、括号的值的周围引用引号。内部引号必须进行转义处理。                      
      
                              注意: 使用管道传送数据给 GoAccess 时不会提示 日志/日期/时间 配置对话框,你需要在配置文件或者命令行中提前定义。                         

    用户交互选项

    • -c --config-dialog

                              在程序开始运行时显示 日志/日期 配置窗口。                      
    • -i --hl-header

      颜色高亮活动面板。

    • -m --with-mouse

                              在主仪表盘面板使能鼠标支持。                      
    • --color=<fg:bg[attrs, PANEL>

      使用终端输出时指定自定义颜色。

      颜色语法:

      DEFINITION space/tab colorFG#:colorBG# [attributes,PANEL]
                                  FG# = 前景色 [-1...255] (-1 = 默认配色)
                                  BG# = 背景色 [-1...255] (-1 = 默认配色)
      
                              使用如下方式应用颜色属性也是允许的(多个属性使用逗号分隔),例如:                           `bold,underline,normal,reverse,blink`                       
      
                              如果喜欢,可以为同一个指标在不同面板设置不同颜色,比如一个指标在页面请求面板使用颜色A,同时在浏览器面板则显示颜色B。                         
      • COLOR_MTRC_HITS
      • COLOR_MTRC_VISITORS
      • COLOR_MTRC_DATA
      • COLOR_MTRC_BW
      • COLOR_MTRC_AVGTS
      • COLOR_MTRC_CUMTS
      • COLOR_MTRC_MAXTS
      • COLOR_MTRC_PROT
      • COLOR_MTRC_MTHD
      • COLOR_MTRC_HITS_PERC
      • COLOR_MTRC_HITS_PERC_MAX
      • COLOR_MTRC_VISITORS_PERC
      • COLOR_MTRC_VISITORS_PERC_MAX
      • COLOR_PANEL_COLS
      • COLOR_BARS
      • COLOR_ERROR
      • COLOR_SELECTED
      • COLOR_PANEL_ACTIVE
      • COLOR_PANEL_HEADER
      • COLOR_PANEL_DESC
      • COLOR_OVERALL_LBLS
      • COLOR_OVERALL_VALS
      • COLOR_OVERALL_PATH
      • COLOR_ACTIVE_LABEL
      • COLOR_BG
      • COLOR_DEFAULT
      • COLOR_PROGRESS
    • 请查阅配置文件中颜色方案示例。

    • --color-scheme <1|2|3>

                              选择终端配色方案。`1` 为单色方案。`2` 为绿色方案以及 `3` 为 Monokai 方案(需终端支持 256 色)。                       
    • --crawlers-only

                              仅解析和显示爬虫(机器人)。                      
    • --html-custom-css=<path.css>

                              在 HTML 报告中按照指定的自定义 CSS 文件路径加载 CSS 样式。                       
    • --html-custom-js=<path.js>

                              在 HTML 报告中按照指定的自定义 JS 文件路径加载 JS 代码。                         
    • --html-report-title=

                              设定 HTML 报告页面的标题和头部内容。                       
    • --html-prefs=

                              设定 HTML 报告的默认参数。通过提交一个有效的包含相关参数的 JSON 对象来设置。允许用户为每一个面板单独设置。参考如下示例。                      
                                  --html-prefs='{"theme":"bright","perPage":5,"layout":"horizontal","showTables":true,"visitors":{"plot":{"chartType":"bar"}}}'
      

      注意:

                          提交的 JSON 对象必须保存在一行 JSON 字符串中。                   
    • --json-pretty-print

                              使用制表符和新行格式化 JSON 输出。                        
    • --max-items=

                              设置每个面板最多可以显示的单元个数。取值范围是 1 到 n。                          
                              注意: 仅 CSV 和 JSON 格式的报告允许大于默认值 366 个单元每面板(或者 50 个在实时 HTML 报告中)。                      
    • --no-color

                              关闭颜色输出。此选项在不支持色彩的终端上为默认选项。                      
    • --no-column-names

                              在终端输出中不显示列名。默认在每一个面板的每一个有效指标都会显示列名。                         
    • --no-csv-summary

                              在 CSV 格式输出中禁止汇总指标。                      
    • --no-progress

                              解析日志时不显示进度指标[总请求数/每秒请求数]。                       
    • --no-tab-scroll

                              禁止通过 TAB 键滚动面板或者使用数字键选择面板。                      
    • --no-html-last-updated

                              在生成的 HTML 报告中不显示“最近更新”。                         

    服务器选项

    • --addr=

                              将服务器绑定到指定 IP 地址。默认绑定到 0.0.0.0 。                         
      
                              通常无需指定,除非您希望将服务器绑定到主机上的其他地址。                        
    • --daemonize

                              使 GoAccess 作为守护程序运行(仅在 --real-time-html 开启下有效)。                         
    • --origin=

                              E在 WebSocket 握手中确保客户端发送指定的源头。且指定的源应与浏览器发送源头字段完全相同。例如:`--origin=http://goaccess.io`                      
    • --port=

                              指定服务使用的端口。GoAccess 默认使用端口 7890 作为 WebSocket 服务器。请确保此端口可用。                       
    • --real-time-html

                              使能实时 HTML 报告。                       
    • --ws-url=<[scheme://]url[:port]>

                              此 URL 用于 WebSocket 服务器的回应。用于客户端侧的 WebSocket 构建器。                        
      
                              同时可以选择指定 WebSocket 的 URI 协议,比如:`ws://` 用于非加密连接, 以及 `wss://` 用于加密连接。示例:`wss://goaccess.io`                       
      
                              如果 GoAccess 运行在代理服务器的后面,您需要通过在主机名后跟随冒号加端口号的方式让客户端连接到另外一个不同的端口。示例:`goaccess.io:9999`                         
      
                              默认情况下,会尝试去连接生成报告的主机名。如果 GoAccess 运行在一台远程服务器上,则远程主机名也应该在 URL 中指定。当然,必须保证主机是有效的。                      
    • --fifo-in=

                              创建一个管道(先入先出)从指定的路径/文件读取数据。                      
    • --fifo-out=

                              创建一个管道(先入先出)往指定的路径/文件写入数据。                      
    • --ssl-cert=<path/cert.crt>

                              指定 TLS/SSL 证书的路径。使 GoAccess 支持 TLS/SSL,需要使用参数 `--ssl-cert` 和 `--ssl-key`。                           
                              仅在使用了参数 --with-openssl 时有效                      
    • --ssl-key=<path/priv.key>

                              指定 TLS/SSL 私钥的路径。使 GoAccess 支持 TLS/SSL,需要使用参数 `--ssl-cert` 和 `--ssl-key`。                           
                              仅在使用了参数 --with-openssl 时有效                      

    FILE OPTIONS

    • -f --log-file=

                              指定输入日志文件的路径。如果在配置文件中指定了输入文件,则其优先级要高于在命令行中通过 `-f` 参数指定。                      
    • -l --log-debug=

                              发送所有调试信息到指定文件。需要指定配置选项 `--enable-debug`                         
    • -p --config-file=

                              指定使用自定义配置文件。如果设置了此参数,其优先级将高于全局配置文件(如果有)。                        
    • --invalid-requests=

                              记录无效请求到指定文件。                        
    • --no-global-config

                              禁止加载全局配置文件。可能的目录应该是 `/usr/etc/`, `/etc/` 或者                             `/usr/local/etc/`, 除非在运行 ./configure 时指定了 `--sysconfdir=/dir` 。                         

    解析选项

    • -a --agent-list

                              开启 UserAgent 列表。开启后会降低解析速度。                         
    • -d --with-output-resolver

                              输出 `HTML` 或者 `JSON` 报告时开启 IP 解析。                        
    • -e --exclude-ip <IP|IP-range>

                              排除一个 IPv4 或者 IPv6 地址。 使用连接符表示 IP 段(开始-结束)。                      
      • exclude-ip 127.0.0.1
      • exclude-ip 192.168.0.1-192.168.0.100
      • exclude-ip ::1
      • exclude-ip 0:0:0:0:0:ffff:808:804-0:0:0:0:0:ffff:808:808
    • -H --http-protocol=<yes|no>

                              HTTP 请求协议开关。将创建一个请求字段包含请求协议+真实请求。                       
    • -M --http-method=<yes|no>

                              HTTP 请求方法开关。将创建一个请求字段包含请求方法+真实请求。                       
    • -o --output=<json|csv>

                              将给定文件重定向到标准输出,通过后缀名决定输出格式:                          
      •                                   /path/file.csv  - Comma-separated values (CSV)                              
      •                                   /path/file.json - JSON (JavaScript Object Notation)                                 
      •                                   /path/file.html - HTML                              
    • -q --no-query-string

                              忽略请求的查询字符串。即: www.google.com/page.htm?query => www.google.com/page.htm
                              注意: 去掉查询字符串将极大降低内存消耗,特别对带时间戳的请求。                        
    • -r --no-term-resolver

                              在终端输出时禁止 IP 解析。                         
    • --444-as-404

                              将非标准状态 444 作为 404 处理。                       
    • --4xx-to-unique-count

                              将 4xx 客户端错误数加到独立访客数中。                       
    • --all-static-files

                              统计包含查询字符串的静态文件。                         
    • --date-spec=<date|hr>

                              设置日期的显示格式,一种是标准日期格式(默认),一种是日期后附加小时的格式。                          
                              仅在访客面板有效。对于在小时级别分析访客数据很有帮助。显示格式示例:`18/Dec/2010:**19**`                      
    • --double-decode

                              解码双重编码的值。包括 UserAgent,Request 以及 Referer。                       
    • --enable-panel=

      开启指定面板。面板列表:

      • VISITORS
      • REQUESTS
      • REQUESTS_STATIC
      • NOT_FOUND
      • HOSTS
      • OS
      • BROWSERS
      • VISIT_TIMES
      • VIRTUAL_HOSTS
      • REFERRERS
      • REFERRING_SITES
      • KEYPHRASES
      • STATUS_CODES
      • REMOTE_USER
      • GEO_LOCATION
    • --hour-spec=<hour|min>

                              设定时间的显示格式,一种是标准时间格式(默认),一种是时间后附加分钟数(每十分钟)的格式。                           
                              用于时间分布面板。对于在特定时间段分析流量峰值很有用处。                        
    • --ignore-crawlers

      忽略爬虫。

    • --ignore-panel=

      忽略指定面板。面板列表:

      • VISITORS
      • REQUESTS
      • REQUESTS_STATIC
      • NOT_FOUND
      • HOSTS
      • OS
      • BROWSERS
      • VISIT_TIMES
      • VIRTUAL_HOSTS
      • REFERRERS
      • REFERRING_SITES
      • KEYPHRASES
      • STATUS_CODES
      • REMOTE_USER
      • GEO_LOCATION
    • --ignore-referer=

      忽略被统计的来路。支持通配符。例如: *.domain.com ww?.domain.*

    • --ignore-status=

      忽略解析或者显示一个或者多个状态码。如果有多个状态码,使用此参数每次指定一个。

    • --num-tests=

                              设定测试行数,即使用给定的 日志/日期/时间 格式测试访问日志。默认值为 10 行。如果设置为 0 ,解析器不会做任何测试而是直接解析整个文件。如果在达到 `number` 之前,有一行匹配上了给定的 日志/日期/时间 格式,则解析器会认为日志文件是有效的,否则 GoAccess 会返回 `EXIT_FAILURE` 并显示相关的错误信息。                         
    • --process-and-exit

                              解析日志,且退出时不输出数据。主要用于仅希望往磁盘数据库中添加数据而无需输出报告时使用。                        
    • --real-os

                              显示真实的操作系统名称。例如: Windows XP, Snow Leopard.                       
    • --sort-panel=<PANEL,FIELD,ORDER>

                              S在初始化载入是对面板进行排序。排序选项使用逗号分隔。选项使用这样的格式:PANEL,METRIC,ORDER
                              **可用指标**                        
      • BY_HITS 按访问量
      • BY_VISITORS 按独立访客数
      • BY_DATA 按数据
      • BY_BW 按带宽
      • BY_AVGTS 按平均处理时间
      • BY_CUMTS 按累积处理时间
      • BY_MAXTS 按最大处理时间
      • BY_PROT 按 HTTP 协议
      • BY_MTHD 按 HTTP 方法

      可用排序

      • ASC
      • DESC
    • --static-file

                              添加静态文件后缀名。例如:`.mp3`。 后缀名区分大小写。                      

    地理位置选项

    • -g --std-geoip

      标准 GeoIP 数据库,低内存占用。

    • --geoip-database

                              设定 GeoIP 数据库路径。例如:GeoLiteCity.dat。需要从 maxmind.com 上下载到本地。IPv4 和 IPv6 均可用支持。注意:`--geoip-city-data` 是  `--geoip-database` 的别名。                            
                              注意: 如果使用 GeoIP2,您需要从 [MaxMind](http://dev.maxmind.com/geoip/geoip2/geolite2/) 下载 城市/国家 数据库,并通过 `--geoip-database` 设定。                       

    其他选项

    • -h --help

      查看帮助信息。

    • -V --version

                              显示版本信息并退出。                      
    • -s --storage

                              显示当前存储方法。比如:B+ Tree, Hash。                      
    • --dcf

                              显示默认配置文件的路径,如果没有使用 `-p` 指定。                         

    磁盘存储选项

    • --keep-db-files

                              在磁盘上保存已解析的数据。如果数据库文件存在,则文件将被覆盖。此参数应用于第一个数据集。如果设置此参数为 false 则在退出程序时将删除所有数据库。示例见下文。                       
      
                              仅在设置了 --enable-tcb=btree 时有效。                       
    • --load-from-disk

                              从磁盘载入之前存储过的数据。如果仅读取已保存的数据,则需要退出数据库文件。参考 `keep-db-files` 即相关示例见下文。                       
      
                              仅在设置了 --enable-tcb=btree 时有效。                       
    • --db-path

                              设置磁盘数据库文件的存储路径。默认值为 `/tmp` 目录。                      
      
                              仅在设置了 --enable-tcb=btree 时有效。                       
    • --xmmap

                              设置附加内存映射的大小,单位为字节。默认值为0。                        
      
                              仅在设置了 --enable-tcb=btree 时有效。                       
    • --cache-lcnum

                              指定被缓存的最大叶子节点数目。如果取值小于 0,则使用默认值。默认值为 1024。设定较大的值以获得较快的处理速度,同时会增加内存消耗。较小的值则会降低内存消耗。                       
      
                              仅在设置了 --enable-tcb=btree 时有效。                       
    • --cache-ncnum

                              指定被缓存的最大非叶子节点数目。如果取值小于 0,则使用默认值。默认值为 512。                       
      
                              仅在设置了 --enable-tcb=btree 时有效。                       
    • --tune-lmemb

                              指定每一个叶子页面的成员数量。如果取值小于 0,则使用默认值。默认值为 128。                        
      
                              仅在设置了 --enable-tcb=btree 时有效。                       
    • --tune-nmemb

                              指定每一个非叶子页面的成员数量。如果取值小于 0,则使用默认值。默认值为 256。                       
      
                              仅在设置了 --enable-tcb=btree 时有效。                       
    • --tune-bnum

                              指定每个 Bucket 组的元素数量。如果取值小于 0,则使用默认值。默认值为 32749。建议 Bucket 组大小的取值为已保存页面数的 1 到 4 倍。                         
      
                              仅在设置了 --enable-tcb=btree 时有效。                       
    • --compression <zlib|bz2>

                              指定页面采用的压缩编码(ZLIB|BZ2)。                      
      
                              仅在设置了 --enable-tcb=btree 时有效。                       

    自定义 日志/日期 格式

    GoAccess 可以解析虚拟的任意 Web 日志格式。

                    预定义的选项包括:通用日志格式,联合日志格式,包含虚拟主机,W3C 格式以及亚马逊 CloudFront(分布式下载)。                

    GoAccess 允许任意的自定义格式字符串。

                    有两种方法配置日志格式。最简单的方式是运行 GoAccess 时使用 `-c` 显示一个配置窗口。但是这种方式不是永久有效的,因此你需要在配置文件中设定格式。                 
    
                    配置文件位于:`%sysconfdir%/goaccess.conf` 或者 `~/.goaccessrc`              
    
                    注意: `%sysconfdir%` 可能是 `/etc/`, `/usr/etc/` 或者 `/usr/local/etc/`                
    
                    **time-format** 参数 time-format 后跟随一个空格符,指定日志的时间格式,包含普通字符与特殊格式说明符的任意组合。他们都由百分号 (%)开始。参考 `man strftime`。 `%T` 或者 `%H:%M:%S`.              
    
                    注意: 如果给定的时间戳以微秒计算,则必须在 *time-format* 中使用参数 `%f` 。               
    
                    **date-format** 参数 date-format 后跟随一个空格符,指定日志的日期格式,包含普通字符与特殊格式说明符的任意组合。他们都由百分号 (%)开始。参考 `man strftime`。              
    
                    注意: 如果给定的时间戳以微秒计算,则必须在 *time-format* 中使用参数 `%f` 。               
    
                    **log-format** 参数 log-format 后跟随一个空格符或者制表分隔符(`\t`),用于指定日志字符串格式。                 

    特殊格式说明符

    •                   `%x` 匹配 time-format 和 date-format 变量的日期和时间字段。用于使用时间戳来代替日期和时间两个独立变量的场景。              
    •                   `%t` 匹配 time-format 变量的时间字段。                
    •                   `%d` 匹配 date-format 变量的日期字段。                
    •                   `%v` 根据 canonical 名称设定的服务器名称(服务区或者虚拟主机)。                
    •                   `%e` 请求文档时由 HTTP 验证决定的用户 ID。                
    •                   `%h` 主机(客户端IP地址,IPv4 或者 IPv6)。              
    • %r

      客户端请求的行数。这些请求使用分隔符(单引号,双引号)引用的部分可以被解析。否则,需要使用由特殊格式说明符(例如:

      %m

      ,

      %U

      ,

      %q

      %H

      )组合格式去解析独立的字段。

      • 注意: 既可以使用 %r 获取完整的请求,也可以使用 %m, %U, %q and %H 去组合你的请求,但是不能同时使用。
    •                   `%m` 请求的方法。                 
    • %U

      请求的 URL。

      • 注意: 如果查询字符串在 %U中,则无需使用 %q。但是,如果 URL 路径中没有包含任何查询字符串,则你可以使用 %q 查询字符串将附加在请求后面。
    •                   `%q` 查询字符串。                 
    •                   `%H` 请求协议。              
    •                   `%s` 服务器回传客户端的状态码。              
    •                   `%b` 回传客户端的对象的大小。               
    •                   `%R` HTTP 请求的 "Referer" 值。              
    •                   `%u` HTTP 请求的 "UserAgent" 值。                
    •                   `%D` 处理请求的时间消耗,使用微秒计算。              
    •                   `%T` 处理请求的时间消耗,使用带秒和毫秒计算。               
    •                   `%L` 处理请求的时间消耗,使用十进制数表示的毫秒计算。               
    •                   `%^` 忽略此字段。                 
    •                   `%~` 继续解析日志字符串直到找到一个非空字符(!isspace)。                 
    •                   `~h` 在 X-Forwarded-For (XFF) 字段中的主机(客户端 IP 地址,IPv4 或者 IPv6)。                
      
                     **注意**
                    针对 XFF, GoAccess 使用了一个特殊符号,即由一个波浪号+主机说明符构成,然后紧跟由大括号封装起来的 XFF 限定字段(例:`~h{,"}`)。                  
                    举例如下, `~h{,"  }` 用于解析 `"11.25.11.53, 17.68.33.17"` 字段由一对双引号,一个逗号和一个空格限定。                
      
                     **注意**
                    为了得到平均,累计,最大处理时间,将需要开始在 Web 服务器中记录响应次数。在 Nginx 中可以添加 `$request_time` 到日志格式中,或者 `%D` 在 Apache 中。                 
      
                     **重要**
                    如果同时使用了多个处理时间的特殊说明符,则在格式字符串中第一个指定的选项具有最高优先级。                
      
                    GoAccess **要求**下列字段:                
    • 有效的 IPv4/6 地址 %h
    • 有效的日期 %d
    • 请求 %r

    操作热键

    • F1h主帮助页面。
    • F5重绘主窗口。
    • q退出程序,当前窗口或者崩溃了的模块。
    • oENTER扩展选中的模块或打开窗口。
    • 0-9Shift + 0激活选中的模块。
    • j在已扩展模块中向下滚动。
    • k在已扩展模块中向上滚动。
    • c设置或者改变配色方案。
    • ^ f在当前模块中向前滚动一屏。
    • ^ b在当前模块中向后滚动一屏。
    • TAB切换模块(向前)。
    • SHIFT + TAB切换模块(向后)。
    • s给活跃模块的选项排序。
    • /在所有模块中搜索(支持正则)。
    • n找到下次发生事件的位置。
    • g移动到第一个选项或者屏幕顶部。
    • G移动到第最后一个选项或者屏幕底部。


    示例

    不同的输出

                    输出到终端且生成一个可交互的报告:               
                        # goaccess access.log
                    
                    生成一份 HTML 报告:               
                        # goaccess access.log -a -o report.html
                    
                    生成一份 JSON 报告:               
                        # goaccess access.log -a -d -o report.json
                    
                    生成一份 CSV 文件:                
                        # goaccess access.log --no-csv-summary -o report.csv
                    
                    GGoAccess 非常灵活,支持实时解析和过滤。例如:需要通过监控实时日志来快速诊断问题:              
                        # tail -f access.log | goaccess -
                    
                    更厉害的是,还可以使用 tail -f 和一个模式匹配工具一起工作,比如: `grep`, `awk`, `sed` 等等               
                        # tail -f access.log | grep -i --line-buffered 'firefox' | goaccess --log-format=COMBINED -
                    
                    又或者可以在管道打开的状态下从头开始解析文件,并同时应用一个过滤器:              
                        # tail -f -n +0 access.log | grep --line-buffered 'Firefox' | goaccess -o out.html --real-time-html -
                    

    多日志文件

                    有多种方法可以实现让 GoAccess 同时解析多个日志文件。最简单的方法是直接将多个文件通过命令行传给 GoAccess:              
                        # goaccess access.log access.log.1
                    
                    甚至在正常读取文件的时候也可以同时通过管道解析文件:              
                        # cat access.log.2 | goaccess access.log access.log.1 -
                    
                    注意 添加到命令行末尾的破折号是为了告诉 GoAccess 应该从管道中读取数据。               
    
                    要更加灵活的使用 GoAccess,我们可以使用一系列的管道。例如,我们希望处理所有压缩过的日志文件 `access.log.*.gz` 并附加到当前日志文件中,则我们可以这样做:              
                        # zcat access.log.*.gz | goaccess access.log -
                    
                    注意: 在 Mac OS X 上, 请使用 `gunzip -c` 代替 `zcat`。                

    实时 HTML 输出

                    GoAccess 有能力在 HTML 报告中展示实时数据。您甚至可以通过电子邮件发送 HTML 报告,因为它是由没有外部文件依赖的单个文件组成,是不是很酷!              
    
                    生成实时 HTML 报告的过程和生成静态报告的过程非常相似。实时报告仅仅需要使用参数 `--real-time-html` 。                 
                        # goaccess access.log -o /usr/share/nginx/html/site/report.html --real-time-html
                    
                    GoAccess 默认使用生成报告的主机名。您也可以指定 URL 用于客户端浏览器访问。参考 [FAQ](https://www.goaccess.cc/FAQ.html) 上更详细的示例。                 
                        # goaccess access.log -o report.html --real-time-html --ws-url=goaccess.io
                    
                    GoAccess 默认侦听端口 7890,如下使用其他端口可以这样操作(确保端口已经打开):              
                         # goaccess access.log -o report.html --real-time-html --port=9870
                    
                    绑定 WebSocket 服务器到不同于 0.0.0.0 的另外一个地址,可以这样操作:                
                        # goaccess access.log -o report.html --real-time-html --addr=127.0.0.1
                    
                    注意:                     如果需要在加密连接上输出实时数据,则需要使用 `--ssl-cert=<cert.crt>` 和 `--ssl-key=<priv.key>`。                

    日期处理

                    另一个强大的管道应该是从 Web 日志中过滤日期。               
    
                    下面的命令将获取从 `05/Dec/2010` 开始并直到文件结束的所有的 HTTP 请求。              
                        # sed -n '/05\/Dec\/2010/,$ p' access.log | goaccess -a -
                    
                    或者使用相对日期,比如昨天或者明天:              
                        # sed -n '/'$(date '+%d\/%b\/%Y' -d '1 week ago')'/,$ p' access.log | goaccess -a -
                    
                    如果需要解析一个固定的时间段,则可以这样写:              
                        # sed -n '/5\/Nov\/2010/,/5\/Dec\/2010/ p' access.log | goaccess -a -
                    

    虚拟主机

                    假定您的日志中包含虚拟主机字段。比如:                 
                        vhost.com:80 10.131.40.139 - - [02/Mar/2016:08:14:04 -0600] "GET /shop/bag-p-20 HTTP/1.1" 200 6715 "-" "Apache (internal dummy connection)"
                    
                    并且您希望查看最高访问量的 URL 属于哪一台虚拟主机。                
                        awk '$8=$1$8' access.log | goaccess -a -
                    
                    当然,您可以可以排除不想看的虚拟主机:                 
                        # grep -v "`cat exclude_vhost_list_file`" vhost_access.log | goaccess -
                    

    文件 & 状态码

                    解析特定页面,比如:页面访问数,html,htm,php,等等:                
                        # awk '$7~/\.html|\.htm|\.php/' access.log | goaccess -
                    
                    注意, `$7` 是通用以及联合日志格式中的一个请求字段,(不含虚拟主机),如果您的日志包含了虚拟主机,则应该使用 `$8` 代替。检查哪一个字段是您需要的,可以这样做:               
                        # tail -10 access.log | awk '{print $8}'
                    
                    或者解析一个特定的状态码,比如:500(服务器内部错误):               
                        # awk '$9~/500/' access.log | goaccess -
                    

    服务器

                    值得一提的是,如果希望 GoAccess 运行在一个较低优先级,可以这样做:              
                        # nice -n 19 goaccess access.log -a
                    
                    如果您不希望在服务器上安装 GoAccess,那在本地机器运行就好了:                 
                        # ssh root@server 'cat /var/log/apache2/access.log' | goaccess -a -
                    

    处理不断增长中的日志

                    GoAccess 通过磁盘 B+树 数据库能够处理不断增长的日志。工作原理如下:                
    •                       首先数据集必须使用 `--keep-db-files`, 参数保存,然后相同的数据集可以使用参数 `--load-from-disk` 载入。                     
    •                       收到新的数据(来自管道或者文件)后,将会被附加到原始数据集上。                     
    •                       在任何时候都保存数据, 则必须使用 `--keep-db-files` 参数。                     
    •                       果在使用参数 `--load-from-disk` 时没有同时使用 `--keep-db-files` 参数,则数据库文件在程序关闭时将会被删除。                   

    示例

                        // 上个月的访问日志
                        goaccess access.log.1 --keep-db-files
                    
                    然后,载入               
                        // 添加这个月的新日志,并保存为新数据
                        goaccess access.log --load-from-disk --keep-db-files
                    
                    读取已经保存的数据(不解析新数据)               
                        goaccess --load-from-disk --keep-db-files
                    

    注意事项

                    每一个活动面板上最多有 366 个对象,如果是实时 HTML 报告则为 50 个对象。对象上限可以通过最大对象数自定义,但是只有 CSV 和 JSON 格式的输出允许超过默认值,即 366 对象每面板。               
    
                    在使用磁盘B+树(使用参数 `--keep-db-files` 和 `--load-from-disk`)加载了同一个日志两次,则 GoAccess 会将每个请求也计算两次。问题[#334](https://github.com/allinurl/goaccess/issues/334) 详细说明了此问题。              
    
                    一次访问就是一次请求(访问日志中的每一行),例如,10 次请求 = 10 次访问。具有相同 IP,日期,和 UserAgent 的 HTTP 请求将被认为是一个独立访问。               

    BUGS

                    如果您认为您发现了一个 Bug,请发送电子邮件到                    ![GoAccess' email](GoAccess操作手册.assets/genemail.php)                

    作者

                    [Gerardo Orellana](https://github.com/allinurl). 获取更多信息以及最新软件版本,请访问 https://goaccess.io

    转载于:https://www.cnblogs.com/sanduzxcvbnm/p/11322131.html

    展开全文
    weixin_30381793 2019-09-29 04:49:38
  • 4、goaccess日志格式配置 一:使用nginx2goaccess.sh脚本将nginx日志格式格式化为goaccess能识别的日志格式,nginx2goaccess.sh脚本内容在 [root@iZbp1f0xuq9rc41s6gdvfyZ goaccess-1.3]# vi nginx2goaccess.sh #!...

    其实别人已经写得很好了,但是我将几个需要注意的点给标注一下。

    需求:及时得到线上用户nginx访问日志分析统计结果!我需要知道IP访问排行、链接访问排行、静态资源、404错误排行等
    直接上效果图:

    具体安装步骤如下:
    1、安装依赖

    [root@iZbp1f0xuq9rc41s6gdvfyZ ~]# mkdir access
    [root@iZbp1f0xuq9rc41s6gdvfyZ access]# yum install glib2 glib2-devel GeoIP-devel  ncurses-devel zlib zlib-devel -y
    [root@iZbp1f0xuq9rc41s6gdvfyZ access]# wget https://github.com/maxmind/geoip-api-c/releases/download/v1.6.11/GeoIP-1.6.11.tar.gz
    [root@iZbp1f0xuq9rc41s6gdvfyZ access]# tar -zvxf GeoIP-1.6.11.tar.gz
    [root@iZbp1f0xuq9rc41s6gdvfyZ access]# cd GeoIP-1.6.11
    [root@iZbp1f0xuq9rc41s6gdvfyZ GeoIP-1.6.11]# ./configure
    [root@iZbp1f0xuq9rc41s6gdvfyZ GeoIP-1.6.11]# make && make install
    

    2、安装goaccess

    [root@iZbp1f0xuq9rc41s6gdvfyZ GeoIP-1.6.11]# cd ..
    [root@iZbp1f0xuq9rc41s6gdvfyZ access]# wget https://tar.goaccess.io/goaccess-1.3.tar.gz
    [root@iZbp1f0xuq9rc41s6gdvfyZ access]# tar -xzvf goaccess-1.3.tar.gz
    [root@iZbp1f0xuq9rc41s6gdvfyZ access]# cd goaccess-1.3
    [root@iZbp1f0xuq9rc41s6gdvfyZ goaccess-1.3]# ./configure --enable-utf8 --enable-geoip=legacy
    [root@iZbp1f0xuq9rc41s6gdvfyZ goaccess-1.3]# make && make install

    3、校对nginx和goaccess日志格式
    查看当前的nginx配置,日志格式

    [root@iZbp1f0xuq9rc41s6gdvfyZ conf]# pwd
    /usr/local/nginx/conf
    [root@iZbp1f0xuq9rc41s6gdvfyZ conf]# cat /usr/local/nginx/conf/nginx.conf

    记住这一段,后面有用。每个机器配置可能不一样哦,使用你自己的

    log_format  main  '$remote_addr - $remote_user [$time_local] "$request" '
    		'$status $body_bytes_sent "$http_referer" ' 
    		'"$http_user_agent" "$http_x_forwarded_for" "$request_body"';
    

    4、goaccess日志格式配置

    一:使用nginx2goaccess.sh脚本将nginx日志格式格式化为goaccess能识别的日志格式,nginx2goaccess.sh脚本内容在 

    [root@iZbp1f0xuq9rc41s6gdvfyZ goaccess-1.3]# vi nginx2goaccess.sh
    
    #!/bin/bash
    #
    # Convert from this:
    #   http://nginx.org/en/docs/http/ngx_http_log_module.html
    # To this:
    #   https://goaccess.io/man
    #
    # Conversion table:
    #   $time_local         %d:%t %^
    #   $host               %v
    #   $http_host          %v
    #   $remote_addr        %h
    #   $request_time       %T
    #   $request_method     %m
    #   $request_uri        %U
    #   $server_protocol    %H
    #   $request            %r
    #   $status             %s
    #   $body_bytes_sent    %b
    #   $bytes_sent         %b
    #   $http_referer       %R
    #   $http_user_agent    %u
    #
    # Samples:
    #
    # log_format combined '$remote_addr - $remote_user [$time_local] '
    # '"$request" $status $body_bytes_sent '
    # '"$http_referer" "$http_user_agent"';
    #   ./nginx2goaccess.sh '$remote_addr - $remote_user [$time_local] "$request" $status $body_bytes_sent "$http_referer" "$http_user_agent"'
    #
    # log_format compression '$remote_addr - $remote_user [$time_local] '
    # '"$request" $status $bytes_sent '
    # '"$http_referer" "$http_user_agent" "$gzip_ratio"';
    #   ./nginx2goaccess.sh '$remote_addr - $remote_user [$time_local] "$request" $status $bytes_sent "$http_referer" "$http_user_agent" "$gzip_ratio"'
    #
    # log_format main
    # '$remote_addr\t$time_local\t$host\t$request\t$http_referer\t$http_x_mobile_group\t'
    # 'Local:\t$status\t$body_bytes_sent\t$request_time\t'
    # 'Proxy:\t$upstream_cache_status\t$upstream_status\t$upstream_response_length\t$upstream_response_time\t'
    # 'Agent:\t$http_user_agent\t'
    # 'Fwd:\t$http_x_forwarded_for';
    #   ./nginx2goaccess.sh '$remote_addr\t$time_local\t$host\t$request\t$http_referer\t$http_x_mobile_group\tLocal:\t$status\t$body_bytes_sent\t$request_time\tProxy:\t$upstream_cache_status\t$upstream_status\t$upstream_response_length\t$upstream_response_time\tAgent:\t$http_user_agent\tFwd:\t$http_x_forwarded_for'
    #
    # log_format main
    # '${time_local}\t${remote_addr}\t${host}\t${request_method}\t${request_uri}\t${server_protocol}\t'
    # '${http_referer}\t${http_x_mobile_group}\t'
    # 'Local:\t${status}\t*${connection}\t${body_bytes_sent}\t${request_time}\t'
    # 'Proxy:\t${upstream_status}\t${upstream_cache_status}\t'
    # '${upstream_response_length}\t${upstream_response_time}\t${uri}${log_args}\t'
    # 'Agent:\t${http_user_agent}\t'
    # 'Fwd:\t${http_x_forwarded_for}';
    #   ./nginx2goaccess.sh '${time_local}\t${remote_addr}\t${host}\t${request_method}\t${request_uri}\t${server_protocol}\t${http_referer}\t${http_x_mobile_group}\tLocal:\t${status}\t*${connection}\t${body_bytes_sent}\t${request_time}\tProxy:\t${upstream_status}\t${upstream_cache_status}\t${upstream_response_length}\t${upstream_response_time}\t${uri}${log_args}\tAgent:\t${http_user_agent}\tFwd:\t${http_x_forwarded_for}'
    #
    # Author: Rogério Carvalho Schneider <stockrt@gmail.com>
    
    # Params
    log_format="$1"
    
    # Usage
    if [[ -z "$log_format" ]]; then
        echo "Usage: $0 '<log_format>'"
        exit 1
    fi
    
    # Variables map
    conversion_table="time_local,%d:%t_%^
    host,%v
    http_host,%v
    remote_addr,%h
    request_time,%T
    request_method,%m
    request_uri,%U
    server_protocol,%H
    request,%r
    status,%s
    body_bytes_sent,%b
    bytes_sent,%b
    http_referer,%R
    http_user_agent,%u"
    
    # Conversion
    for item in $conversion_table; do
        nginx_var=${item%%,*}
        goaccess_var=${item##*,}
        goaccess_var=${goaccess_var//_/ }
        log_format=${log_format//\$\{$nginx_var\}/$goaccess_var}
        log_format=${log_format//\$$nginx_var/$goaccess_var}
    done
    log_format=$(echo "$log_format" | sed 's/${[a-z_]*}/%^/g')
    log_format=$(echo "$log_format" | sed 's/$[a-z_]*/%^/g')
    
    # Config output
    echo "
    - Generated goaccess config:
    time-format %T
    date-format %d/%b/%Y
    log_format $log_format
    "
    
    # EOF
    

    二:使用nginx2goaccess.sh方式获取日志格式

    将第二步里面Nginx日志的单引号删除:
    
    删除前:'$remote_addr - $remote_user [$time_local] "$request" '
    		'$status $body_bytes_sent "$http_referer" ' 
    		'"$http_user_agent" "$http_x_forwarded_for" "$request_body"'
    
    删除后:$remote_addr - $remote_user [$time_local] "$request" 
    		$status $body_bytes_sent "$http_referer" 
    		"$http_user_agent" "$http_x_forwarded_for" "$request_body"
    
    然后执行命令如:
    [root@iZbp1f0xuq9rc41s6gdvfyZ goaccess-1.3]# sh nginx2goaccess.sh '删除后'
    执行真实命令:
    [root@iZbp1f0xuq9rc41s6gdvfyZ goaccess-1.3]# sh nginx2goaccess.sh '$remote_addr - $remote_user [$time_local] "$request" $status $body_bytes_sent "$http_referer" "$http_user_agent" "$http_x_forwarded_for" "$request_body"'
    

    得到如下结果:

    
    - Generated goaccess config:
    time-format %T
    date-format %d/%b/%Y
    log_format %h - %^ [%d:%t %^] "%r" %s %b "%R" "%u" "%^" "%r_body"
    

    三:将上面生成的格式在goaccess-1.3/config下面创建一个nginxlog.conf

    [root@iZbp1f0xuq9rc41s6gdvfyZ goaccess-1.3]# cd config
    
    [root@iZbp1f0xuq9rc41s6gdvfyZ config]# vi nginxlog.conf
    
    # Generated goaccess config:
    time-format %T
    date-format %d/%b/%Y
    log_format %h - %^ [%d:%t %^] "%r" %s %b "%R" "%u" "%^" "%r_body"

    5、配置完成,开始见证奇迹的时刻,生成统计页面。

    第一步:生成结果文件夹的位置   /data/nginx
    [root@iZbp1f0xuq9rc41s6gdvfyZ /]# mkdir /data
    [root@iZbp1f0xuq9rc41s6gdvfyZ /]# cd data/
    [root@iZbp1f0xuq9rc41s6gdvfyZ data]# mkdir nginx
    
    第二步:nginx日志位置、最好做了每日切割,否则太大了
    [root@iZbp1f0xuq9rc41s6gdvfyZ logs]# pwd
    /usr/local/nginx/logs
    
    第三步:开始日志分析,需要回到goaccess-1.3 文件夹
    [root@iZbp1f0xuq9rc41s6gdvfyZ goaccess-1.3]# ./goaccess -f /usr/local/nginx/logs/access.log -p config/nginxlog.conf -o /data/nginx/report.html
    
    第四步:生成中文的方法
    [root@iZbp1f0xuq9rc41s6gdvfyZ goaccess-1.3]# LANG="zh_CN.UTF-8" bash -c "./goaccess -f /usr/local/nginx/logs/access.log -p config/nginxlog.conf -o /data/nginx/report.html"
    
    说明:第三步、第四步分别生成的是当前时间的日志分析结果,五分钟之后想看新的得再执行一次
    下面我们写个定时器,让五分钟分析一次
    
    下面配置定时更新
    第五步:创建脚本
    [root@iZbp1f0xuq9rc41s6gdvfyZ goaccess-1.3]# mkdir scripts
    [root@iZbp1f0xuq9rc41s6gdvfyZ goaccess-1.3]# cd scripts
    [root@iZbp1f0xuq9rc41s6gdvfyZ scripts]# vi goaccess.sh
    
    #!/bin/bash
    LANG="zh_CN.UTF-8"           #页面转换为中文
    "/root/goaccess-1.3/goaccess -f /usr/local/nginx/logs/access.log -p /root/goaccess-1.3/config/nginxlog.conf -o /data/nginx/report.html"
    
    第六步:加入定时器
    [root@iZbp1f0xuq9rc41s6gdvfyZ scripts]# chmod + ./goaccess.sh
    [root@iZbp1f0xuq9rc41s6gdvfyZ scripts]# pwd
    /root/goaccess-1.3/scripts
    [root@iZbp1f0xuq9rc41s6gdvfyZ scripts]# crontab -e
    # 定时goaccess 每五分钟生成的信息保存为html
    5 * * * * /root/goaccess-1.3/scripts/goaccess.sh
    

    6、配置nginx的配置文件,可访问

    配置nginx访问刚刚生成的report.html页面,阿里云的朋友自行配置即可,没多大的区别!

    [root@iZbp1f0xuq9rc41s6gdvfyZ conf]# pwd
    /usr/local/nginx/conf
    
    [root@iZbp1f0xuq9rc41s6gdvfyZ conf]# vi nginx.conf
    加上此段:
    
    server{
    	listen 8080;
    	server_name localhost;
    
    	location /report.html {
                alias /data/nginx/report.html;
            } 
    }
    

    重启nginx

    [root@iZbp1f0xuq9rc41s6gdvfyZ conf]# /usr/local/nginx/sbin/nginx -c /usr/local/nginx/conf/nginx.conf
    

    在浏览器就 http://192.168.2.253:8080/report.html 可以浏览上上面的截图了。

    展开全文
    yexiaomodemo 2020-06-26 17:15:28
  • 找了各种工具,最终还是觉得goaccess不仅图文并茂,而且速度快,每秒8W 的日志记录解析速度,websocket10秒刷新统计数据,站在巨人肩膀上你也会看得更远…先上图: 具体方案如下步骤:一、linux上安装goaccess...

    找了各种工具,最终还是觉得goaccess不仅图文并茂,而且速度快,每秒8W 的日志记录解析速度,websocket10秒刷新统计数据,站在巨人肩膀上你也会看得更远…先上图: 
    这里写图片描述

    具体方案如下步骤: 
    一、linux上安装goaccess(版本 1.1.1,一般安装在nginx所在机器上的/opt目录) 
    a.先安装依赖包

    yum install ncurses-devel
    wget http://geolite.maxmind.com/download/geoip/api/c/GeoIP.tar.gz
    tar -zxvf GeoIP.tar.gz
    cd GeoIP-1.4.8/
    ./configure
    make && make install
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7

    b.安装goaccess 
    wget http://tar.goaccess.io/goaccess-1.1.1.tar.gz 
    tar -xzvf goaccess-1.1.1.tar.gz 
    cd goaccess-1.1.1/ 
    ./configure –enable-geoip –enable-utf8 
    make 
    make install

    二、校对nginx的配置的日志格式(nginx.conf中log_format 使用以下自定义格式)

        log_format main '$remote_addr - $remote_user [$time_local] "$request" '
    
                        '$status $body_bytes_sent "$http_referer" '
    
                        '"$http_user_agent" "$http_x_forwarded_for" '
    
                        '$connection $upstream_addr '
    
                        '$upstream_response_time $request_time';
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10

    修改后重启: 
    nginx -s stop 
    nginx

    三、生成配置文件goaccess_log_conf_nginx.conf 
    vi /opt/goaccess/goaccess_log_conf_nginx.conf

    time-format %T
    
    date-format %d/%b/%Y
    
    log_format %h - %^ [%d:%t %^] "%r" %s %b "%R" "%u" "%^" %^ %^ %^ %T
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6

    四、生成统计页面

    手工生成当日统计页面:

    goaccess -f /opt/nginx/logs/access.log -p /opt/goaccess/goaccess_log_conf_nginx.conf -o /opt/www/day-report.html
    • 1

    生成实时统计页面:

    nohup goaccess -f /opt/nginx/logs/access.log -p /opt/goaccess/goaccess_log_conf_nginx.conf -o /opt/www/real-time-yong-report.html --real-time-html --ws-url=report.xxx.com &
    • 1
    检查是否存在进程:  ps -ef|grep goaccess
    
    • 1
    • 2

    五、开通对外访问地址 
    a.安装新的tomcat(假设在/opt/report-tomcat目录) 端口:7891 修改conf/server.xml中的端口,并增加访问目录:

    <Host name="localhost"  appBase="webapps"
                unpackWARs="true" autoDeploy="false">
    
            <Context path="/" docBase="/opt/www" />
    </Host>
    
    然后增加权限与角色( conf/tomcat-users.xml) :
    
    <role rolename="report"/>
    <user username="report" password="reportxxx" roles="report"/>
    
    最后在webapps/ROOT/WEB-INFO/web.xml的web-app中增加:
    
    <security-constraint>
     <web-resource-collection>
         <web-resource-name>
           Restricted Area
         </web-resource-name>
         <url-pattern>/*</url-pattern>
     </web-resource-collection>
    
     <auth-constraint>
       <role-name>report</role-name>
     </auth-constraint>
    </security-constraint>
    
    
    <login-config>
        <auth-method>BASIC</auth-method>
        <realm-name>Authenticate yourself</realm-name>
    </login-config>
    
    
    
    b.确认统计端口:7890、7891对外开放
    
    c.访问页面是否ok
       手工生成当日统计页面:http://report.xxx.com:7891/day-report.html
       实时统计页面:   http://report.xxx.com:7891/real-time-yong-report.html
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34
    • 35
    • 36
    • 37
    • 38
    • 39
    • 40

    附: 
    参考链接: 
    各种日志格式转换工具:https://github.com/stockrt/nginx2goaccess 
    goaccess官网:https://goaccess.io

     以上是我直接复制的,感谢
    https://blog.csdn.net/yown/article/details/56027112

    转载于:https://www.cnblogs.com/already/p/9324400.html

    展开全文
    weixin_30772105 2018-07-17 17:11:00
  • soulteary 2020-12-16 14:22:01
  • 451KB yangchao99 2018-12-04 16:03:30
  • csdn_wu5 2020-01-17 15:28:37
  • aganliang 2019-11-18 22:11:53
  • weixin_31865445 2021-06-24 12:13:03
  • qq_40829735 2021-04-29 00:30:09
  • 496KB weixin_42115513 2021-01-31 20:36:17
  • allway2 2020-08-14 12:34:49
  • lpwmm 2020-11-18 11:00:44
  • aipoqiu2566 2019-09-27 22:39:15
  • weixin_36300623 2021-06-24 12:12:12
  • qq_45071373 2021-03-08 16:32:32
  • liaowunonghen 2020-03-09 17:13:37
  • weixin_43860781 2020-03-05 11:02:50
  • qq_34730511 2020-01-16 13:45:58
  • 15KB weixin_42181686 2021-02-17 02:35:46
  • chj_1224365967 2020-07-24 10:37:16
  • m0_38128647 2019-09-15 22:18:50
  • xiaowei2019w 2020-11-05 15:51:21

空空如也

空空如也

1 2 3 4 5 ... 20
收藏数 1,419
精华内容 567
关键字:

goaccess

友情链接: 动态无功优化.rar