Web scraping in PHP
I recommend you consider simple_html_dom for this. It will make it very easy.
Here is a working example of how to pull the title, and first image.
<?php
require 'simple_html_dom.php';
$html = file_get_html('http://www.google.com/');
$title = $html->find('title', 0);
$image = $html->find('img', 0);
echo $title->plaintext."<br>\n";
echo $image->src;
?>
Here is a second example that will do the same without an external library. I should note that using regex on HTML is NOT a good idea.
<?php
$data = file_get_contents('http://www.google.com/');
preg_match('/<title>([^<]+)<\/title>/i', $data, $matches);
$title = $matches[1];
preg_match('/<img[^>]*src=[\'"]([^\'"]+)[\'"][^>]*>/i', $data, $matches);
$img = $matches[1];
echo $title."<br>\n";
echo $img;
?>
Website Scraping Using PHP
I'm not an xpath guru, but what I would do is to target first that particular table using that needle categories, then from there get those rows based on that and start looping on found rows.
Rough example:
$grep = new DOMDocument();
@$grep->loadHTMLFile("http://www.tradeindia.com/");
$finder = new DOMXpath($grep);
$products = array();
$nodes = $finder->query("
//td[@class='showroom1'][contains(text(), 'CATEGORIES')]
/parent::tr/parent::table/parent::td/parent::tr
/following-sibling::tr
/td[1]/table/tr/td/table/tr
");
if($nodes->length > 0) {
foreach($nodes as $tr) {
if($finder->evaluate('count(./td/a)', $tr) > 0) {
foreach($finder->query('./td/a[@class="cate_menu"]', $tr) as $row) {
$text = $row->nodeValue;
$number = $finder->query('./following-sibling::text()', $row)->item(0)->nodeValue;
$products[] = "$text $number";
}
}
}
}
echo '<pre>';
print_r($products);
Sample Output
HTML Scraping in Php
I would recomend PHP Simple HTML DOM Parser after you have scraped the HTML from the page. It supports invalid HTML, and provides a very easy way to handle HTML elements.
Simple web scraping in PHP
There are a couple ways to scrape websites, one would be to use CSS Selectors and another would be to use XPath, which both select elements from the DOM.
Since I can't see the full HTML of the webpage it would be hard for me to determine which method is better for you. There is another option which may be frowned upon, but in this case it might work.
You could use a Regex (regular expressions) to find the characters, I'm not the best at regular expressions but here is some sample code of how that might work:
<?php
$subject = "<html><body><p>Some User</p><p>User status: Online.</p></body></html>";
$pattern = '/User status: (.*)\<\/p\>/';
preg_match($pattern, $subject, $matches);
print_r($matches);
?>
Sample output:
Array
(
[0] => User status: Online.</p>
[1] => Online.
)
Basically what the regex above is doing is matching a pattern, in this case it looks for the string "User status: " then matches all the characters (.*) up to the ending paragraph tag (escaped).
Here is the pattern that will return just "Online" without the period, wasn't sure if all statuses ended in a period but here is what it would look like:
'/User status: (.*)\.\<\/p\>/'
Related Topics
Comparing Two Dates in a SQL Request
Windows 10 Docker:/Usr/Bin/Env: 'Php\R': No Such File or Directory
Codeigniter - Call to a Member Function Result() on Boolean In
How to Get Comma Separated Values from a Table in Laravel
Auto Update Cart on Click in Woocommerce
How to Remove Duplicate Rows in Laravel
How to Block Direct Access to My JavaScript Files
How to Increase Maximum Execution Time in Laravel
Convert HTML Form Data into a Pdf File Using PHP
How to Make a Checkbox Checked and Disabled
Sum Values from Column in HTML
How to Remove Accents from Characters in a PHP String
Dynamic Menu and Role Permission in Codeigniter
Getting Error in Console:Failed to Load Resource: Net::Err_Connection_Reset
Reading Numbers as Text Format With Phpexcel