日誌

Master huang's diary of entrepreneurship(Marck.31.2019)

已有 3331 次閱讀2019-3-31 22:47 |個人分類:创业日记

1.从今天开始，我想使用一种标准的、简单的、国际通用语言来记录我所有的工作，最好的选择应该就是英语。。。这也是现在所有的孩子都在学习英语的原因吧，呵呵。。。虽然刚一开始有些困难，但是慢慢的就能轻车熟路了。那句话怎么说来着"最容易的学习英语的方式就是每天使用它"，现在就让我来试一试这种方法吧！！！加油了，黄师傅！！！（首先就来翻译这一段话，呵呵）

1.From now on, I want to use an standard simple international common language to record all of my work, the best choice should be English......and this could be reason why all of our students are learning that all day and night, hehe...... Although it's diffcult at first, but slowly I'm sure I would get used to it. There is a saying that goes "The easist way to learn English is to use it everyday and everynight". Now let me try this method!!!So come on, Dr. huang!!! (and firstly translate this passage,hehe)

2.ahahahaha......, I feel so good. And I'm sure it feels better than having sex(OOOO......, Even in English, I'm still very yellow and very violent).So now I can better understand what that buddy said. When you fight for your dream in your heart and It's very technical itself, you are the happiest person in the whole world（当你为了心中的梦想奋斗，这件事本身又很有技术含量的时候，你是这个世界上最幸福的人。。。）

3.So let's begin our great work......
    According to my current design, there are 4 dataflow in my logic, of course those're just a few large basic workflows, smaller ones should be more.I must research and elaborate those now because their quality is related to the quality of the overall process. And their content are as follows:
    (Automation can be based on Web-Principle-Perspective, and it can also be based on User-Perspective. In my logic there're 2 concepts: Network-Collector and Network-Publisher. So if we use the knowledge of permutation, 4 results can be obtained.)
    (1).A Web-Principle-Perspective based Network-Collector: Can be implemented directly using scrapy.
    (2).A Web-Principle-Perspective based Network-Publisher: Can also be implemented using scrapy, some code adjustments may be required, because Scrapy's source code would be confuser to implement the Network-Publisher.
    (3).A User-Perspective based Network-Collector: Scrapy and Selenium must be organically integrated, and the architecture of Scrapy needs to be redesigned. its must be a very diffcult, technical and challenging work, but but I'm also a person who likes to challenge myself, haha......
    (4).A User-Perspective based Network-Publisher: Ditto......

4.Let's start with Scrapy's architecture. Although Scrapy can smoothly implement the functions I want, Necessary code organization is also essential when writing crawler. We need to know that when programming is huge, a little logic problem can lead to hard-to-maintain consequences.
    Scrapy workflow as follows: (Articles on the official website)
        The data flow in Scrapy is contralled by the execution engine, and goes like this:
            <1>.The Engine gets the initial Requests to crawl from the Spider.
            <2>.The Engine schedules the Requests in the Scheduler and asks for the next Requests to crawl.
            <3>.The Scheduler returns the next Requests to the Engine.
            <4>.The Engine sends the Requests to the Downloader, passing through the DownloaderMiddlewares.
            <5>.Once the page finished downloading, the Downloader generates a Response with that page and sends it to the Engine, passing through the DownloadMiddlewares.
            <6>.The Engine receives the Response from the Downloader and sends it to the Spider for processing, passing through the SpiderMiddleware.
            <7>.The Spider processs the Response from the Engine and returns scraped items and new Requests to follow to the Engine, passing through the SpiderMiddleware.
            <8>.The Engine sends processed items to ItemPipeline, then sends processed new Requests to the Scheduler and asks for possible Reqests to crawl.
            <9>.The program repeats from step 1 until there are no more Requests from the Scheduler.

5.For the last 3 cases, I hope to find a common solution. I had a nap this afternoon, and think of an good idea for this......that I can write a new Downloader using Selenium and passing necessary parameter using HTTP's header field "METE"(I wonder if I was wrong,h ehe...),anyway there must be a field we can use to pass the information.
    After that I got online for a while and understand several Scrapy APIs on the official website. I found that I can well handle all of this and I also had a look at the source code of Scrapy, they are not so many, So I decided to read and understand all of those code before my programing......>et's begin!!!

收藏分享邀請舉報

gaojunyangde的個人空間 http://e3-1275v3.bl-phx0.141.9.8.b8.securedservers.com/?928411 [收藏] [複製] [分享] [RSS]

日誌

Master huang's diary of entrepreneurship(Marck.31.2019)

全部作者的其他最新日誌

評論 (0 個評論)

gaojunyangde