Instagram

Wiki It

Search results

Tuesday, 26 February 2013

PHP Web Page Scraping in Codeigniter


Hey there guys and welcome to my another interesting blog! Interesting because today I am going to show you how to scrape contents from a web page. 

First of all we need to understand what is Web page scraping. 

Web Scraping is a technique of extracting information from the websites using specially coded programs!

There are 3 ways to access a website data. One is through web browser, other is using an API(if the site provides one) and the last one is known as Web Scraping, which is what I am going to show you today!

Before starting with scraping we need to download one cool library which is available for free from SourceForge

This writing shows a simple scraper using the simplehtmldom library. But before we continue we need to be careful that if a website does not provide any RSS feed or an API to extract information from their web pages then their website is probably copyright protected and grabbing information from a site and use it somewhere else may well be a violation of someone's rights and eventually may land you in trouble! So keep this in mind.

Lets get started,

INSTALLING simplehtmldom IN CODEIGNITER:

After downloading your library from SourceForge unzip the library and copy simple_html_dom.php file into your library folder present inside your application folder.

And as you may know this you need to open your autoload.php file from config folder and add the following line

$autoload['libraries'] = array('simple_html_dom');

Save it and close. That's it you have successfully loaded your library to your application! Now lets create a controller file in the next step.

CREATING A SCRAPING CONTROLLER:

Copy paste this code into your php class file and save it.

class Scraper extends CI_Controller
{
    public function __construct()
    {
        parent::__construct();
    }
   
    public function get_html_dom()
    {
        $url = "http://play.google.com/";
        $html = file_get_html($url);
        $data =  $html->find('.top-list-container');
       
        if(isset($data[0]))
        {
            echo $data[0]->children(1)->children(1);
        }
    }
}


Lets go through the method now.

If you take a look then you will probably come to know that I am scraping content from Google Play Store!! :P

The id and class selectors may change in the future when Google will change their page templates or make any changes to their selectors so you need to make changes in your program too!

The function  file_get_html( )  is a function of the simple_html_dom.php class to which I am passing in the play store url. 

Now we need to traverse the DOM using find( ) function as shown above. I am finding 
class = top-list-container from the html page and then I am echoing out the child tag inside the container! Refresh your browser you will see how the library fetches the contents from the web page! 

This is a simple example you can even use foreach loop to loop through more than one conatiners inside the html page and so on! 

Hope you people like this blog! Thank you see you next time.


Saturday, 23 February 2013

How to install JDK, LAMPP server and Netbeans IDE(UBUNTU 12.04)

Installing OpenJDK:

Hey there again every one! Today I am going to show you how to set up the LAMPP server(XAMPP) on your Ubuntu 12.04.

Lets start with installing JDK(OpenJDK). The steps are as follows:

1) Open the terminal and key in the following commands:

$ sudo apt-get install openjdk-7-jre   // wait until the process downloads and installs the jre.

$ sudo apt-get install openjdk-7-jdk  // again wait until the process downloads and installs the jdk.

That's it this should install your jdk!

2)In case if you want to remove the installed jdk then key in this command in your terminal

$ sudo apt-get remove openjdk-7-jdk openjdk-7-jre

That's it this should uninstall your openjdk!

 

Installing Netbeans IDE:

To install the IDE first of all you need to download the latest IDE from Netbeans website.

After downloading key in the following command in your terminal

$ sudo bash filename 

My filename here was netbeans-7.2.1-ml-linux.sh and click enter and thats it! Your IDE should be installed through a netbeans wizard!

 

Installing XAMPP server:

To install xampp you need to be sure to have a tar.gz downloaded before and place them in Downloads folder and if so then use this following commands

$ cd Downloads  // go into the downloads folder

~/Downloads $ sudo tar xvfz xampp-linux-1.8.1.tar.gz -C /opt

This should extract the lampp folders into the /opt folder. you can choose your own folder to extract as well.

Now start your xampp server using

$ sudo /opt/lampp/lampp start

Sometimes we need vim editor to make changes to the files. So we need to install it. Use,

$ sudo apt-get install vim

This will install the vim editor.

Now point your browser to http://localhost and click enter. You will see your XAMPP start up page. Choose your language and click on phpmyadmin on the left panel. You will get Access Forbidden Error Page. This is because we have not granted permissions to the user. To make changes open httpd-xampp.conf file from /opt/lampp/etc/extra/ path like below,

$ vim /opt/lampp/etc/extra/httpd-xampp.conf

If at all if you get permissions error then change permissions using the following command,

$ sudo chmod -R 777 /opt/lampp/etc/extra/httpd-xampp.conf

Now again run the previous command to make changes to your file and then search for the following 

<Directory "/opt/lampp/phpmyadmin"> and add this line 

Require all granted

</Directory>

and save it. To save press esc key then :wq and then enter.

After this restart your server to do this use this command

$ sudo /opt/lampp/lampp restart

now refresh your page in the browser you will be displayed with the phpmyadmin page.

NOTE: One more important thing is that when people cannot create their project inside htdocs folder in Ubuntu. This is because the directory in which the project is created does not have permission to be created. So you need to use the following command

$ sudo chmod -R 777 /opt/lampp/htdocs

Now you can create your project in htdocs folder!

That's it for today! hope you like this blog on configuring your lampp server on ubuntu 12.04!