Open Tech archive: February 2010

Saturday, 27 February 2010

PostgreSQL Cheat Sheet

CREATE DATABASE

CREATE DATABASE dbName;

CREATE TABLE (with auto numbering integer id)

CREATE TABLE tableName (
id serial PRIMARY KEY,
name varchar(50) UNIQUE NOT NULL,
dateCreated timestamp DEFAULT current_timestamp
);

Add a primary key

ALTER TABLE tableName ADD PRIMARY KEY (id);

Create an INDEX

CREATE UNIQUE INDEX indexName ON tableName (columnNames);

Backup a database (command line)

pg_dump dbName > dbName.sql

Backup all databases (command line)

pg_dumpall > pgbackup.sql

Run a SQL script (command line)

psql -f script.sql databaseName

Search using a regular expression

SELECT column FROM table WHERE column ~ 'foo.*';

The first N records

SELECT columns FROM table LIMIT 10;

Pagination

SELECT cols FROM table LIMIT 10 OFFSET 30;

Prepared Statements

PREPARE preparedInsert (int, varchar) AS
INSERT INTO tableName (intColumn, charColumn) VALUES ($1, $2);
EXECUTE preparedInsert (1,'a');
EXECUTE preparedInsert (2,'b');
DEALLOCATE preparedInsert;

Create a Function

CREATE OR REPLACE FUNCTION month (timestamp) RETURNS integer
AS 'SELECT date_part(''month'', $1)::integer;'
LANGUAGE 'sql';

Table Maintenance

VACUUM ANALYZE table;

Reindex a database, table or index

REINDEX DATABASE dbName;

Show query plan

EXPLAIN SELECT * FROM table;

Import from a file

COPY destTable FROM '/tmp/somefile';

Show all runtime parameters

SHOW ALL;

Grant all permissions to a user

GRANT ALL PRIVILEGES ON table TO username;

Perform a transaction

BEGIN TRANSACTION
UPDATE accounts SET balance += 50 WHERE id = 1;
COMMIT;

Basic SQL
Get all columns and rows from a table

SELECT * FROM table;

Add a new row

INSERT INTO table (column1,column2)
VALUES (1, 'one');

Update a row

UPDATE table SET foo = 'bar' WHERE id = 1;

Delete a row

DELETE FROM table WHERE id = 1;

MYSQL query output to a file

select
*
from
table
where
condition
into
outfile '/tmp/result'

Thursday, 18 February 2010

Using meta tags to block access to your site

To entirely prevent a page's contents from being listed in the Google web index even if other sites link to it, use a noindex meta tag. As long as Googlebot fetches the page, it will see the noindex meta tag and prevent that page from showing up in the web index.

The noindex meta standard is described at http://www.robotstxt.org/meta.html. This method is useful if you don't have root access to your server, as it allows you to control access to your site on a page-by-page basis.

To prevent all robots from indexing a page on your site, place the following meta tag into the <head> section of your page:

<meta name="robots" content="noindex">

To allow other robots to index the page on your site, preventing only Google's robots from indexing the page:

<meta name="googlebot" content="noindex">

When we see the noindex meta tag on a page, Google will completely drop the page from our search results, even if other pages link to it. Other search engines, however, may interpret this directive differently. As a result, a link to the page can still appear in their search results.

Note that because we have to crawl your page in order to see the noindex meta tag, there's a small chance that Googlebot won't see and respect the noindex meta tag. If your page is still appearing in results, it's probably because we haven't crawled your site since you added the tag. (Also, if you've used your robots.txt file to block this page, we won't be able to see the tag either.)

If the content is currently in our index, we will remove it after the next time we crawl it. To expedite removal, use the URL removal request tool in Google Webmaster Tools.

Taken from: http://www.google.com/support/webmasters/bin/answer.py?answer=93710

Preventing DOS attack

DOS attack is an attempt to make a computer resource unavailable to its intended users.

In a public site, the usual vulnerability we see is a DOS attack. Most of the time it used to be a crawler/spiders/bots/web-rippers. What we do know to stop them from attacking our site is to block them by individual or range of IP at network level.

Some spider/bots are intelligent enough to spoof IPs and so after some time, we again see a threat from them since they change their IP/IP range.

One other way to stop such crawler/spiders/bots/web-rippers is to block them by using useragent control.

#This was achieved using below snippet in httpd.conf

----------------------------------------
#Block access to robots
BrowserMatch emailsiphon badrobot
BrowserMatch BPFTP badrobot
BrowserMatch MSIECrawler badrobot
BrowserMatch WebStripper badrobot
BrowserMatch Offline badrobot
BrowserMatch Teleport badrobot
BrowserMatch Alkaline badrobot
BrowserMatch DLExpert badrobot
BrowserMatch HTTrack badrobot
BrowserMatch Controller badrobot

deny from env=badrobot

Also, for more information on apache access control http://httpd.apache.org/docs/2.2/howto/access.html

Open Tech archive

About Me