10 مفاهيم خاطئة حول الشبكات العصبية.
الشبكات العصبية هي واحدة من الطبقات الأكثر شعبية وقوية من خوارزميات التعلم الآلي. في الشبكات المالية التمويل الكمي غالبا ما تستخدم للتنبؤ السلاسل الزمنية، وبناء مؤشرات الملكية، وتداول الخوارزميات، وتصنيف الأوراق المالية ونمذجة مخاطر الائتمان. وقد استخدمت أيضا في بناء نماذج العمليات العشوائية ومشتقات الأسعار. على الرغم من فائدة الشبكات العصبية تميل إلى أن تكون سمعة سيئة لأن أدائها هو "المزاجية". في رأيي يمكن أن يعزى هذا إلى ضعف تصميم الشبكة بسبب المفاهيم الخاطئة حول كيفية عمل الشبكات العصبية. تتناول هذه المقالة بعض هذه المفاهيم الخاطئة.
1. الشبكات العصبية ليست نماذج من الدماغ البشري.
الدماغ البشري هو واحد من أسرار كبيرة من عصرنا والعلماء لم تتوصل إلى توافق في الآراء حول كيف يعمل بالضبط. هناك نظريتان للدماغ هما نظرية خلية الجدة ونظرية التمثيل الموزعة. تؤكد النظرية الأولى أن الخلايا العصبية الفردية لديها قدرة إعلامية عالية وقادرة على تمثيل مفاهيم معقدة مثل جدتك أو حتى جنيفر أنيستون. تؤكد الخلايا العصبية النظرية الثانية أن الخلايا العصبية هي أكثر بساطة بكثير، ويتم توزيع تمثيلات الكائنات المعقدة عبر العديد من الخلايا العصبية. الشبكات العصبية الاصطناعية مستوحاة بشكل فضفاض من النظرية الثانية.
أحد الأسباب التي تجعلني أعتقد أن شبكات الجيل الحالي العصبية ليست قادرة على المصداقية (مفهوم مختلف للاستخبارات) لأنني أعتقد أن الخلايا العصبية البيولوجية أكثر تعقيدا بكثير من الخلايا العصبية الاصطناعية.
وهناك فرق كبير آخر بين الدماغ والشبكات العصبية هو الحجم والتنظيم. تحتوي العقول البشرية على العديد من الخلايا العصبية والنقاط العصبية من الشبكة العصبية وهي ذاتية التنظيم والتكيف. الشبكات العصبية، على سبيل المقارنة، يتم تنظيمها وفقا للهندسة المعمارية. الشبكات العصبية ليست "ذاتية التنظيم" في نفس الشعور مثل الدماغ الذي يشبه إلى حد كبير رسم بياني من شبكة أمر.
بعض وجهات النظر مثيرة جدا للاهتمام من الدماغ كما تم إنشاؤها من قبل الدولة من الفن الدماغ تخيل التقنيات. اضغط على الصورة لمزيد من المعلومات.
فماذا يعني ذلك؟ فكر في هذه الطريقة: الشبكة العصبية مستوحاة من الدماغ بنفس الطريقة التي يستلهم بها الملعب الأولمبي في بكين عش الطيور. هذا لا يعني أن الملعب الأولمبي هو عش الطيور، فهذا يعني أن بعض عناصر أعشاش الطيور موجودة في تصميم الملعب. وبعبارة أخرى، عناصر الدماغ موجودة في تصميم الشبكات العصبية ولكنها أقل كثيرا مما قد تعتقد.
في الواقع ترتبط الشبكات العصبية ارتباطا وثيقا بالطرق الإحصائية مثل تركيب المنحنى وتحليل الانحدار من الدماغ البشري. في سياق التمويل الكمي أعتقد أنه من المهم أن نتذكر أنه في حين أنه قد يبدو باردا أن أقول أن شيئا ما "مستوحاة من الدماغ"، وهذا البيان قد يؤدي إلى توقعات غير واقعية أو الخوف. لمزيد من المعلومات انظر 'لا! الذكاء الاصطناعي ليس تهديدا وجوديا ".
مثال على منحنى المناسب المعروف أيضا باسم تقريب وظيفة. الشبكات العصبية غالبا ما تستخدم لتقريب الدوال الرياضية المعقدة.
2. الشبكات العصبية ليست "شكل ضعيف" من الإحصاءات.
تتكون الشبكات العصبية من طبقات من العقد المترابطة. وتسمى العقد الفردية بيرسيبترونس وتشبه الانحدار الخطي المتعدد. الفرق بين الانحدار الخطي المتعدد و بيرسيبترون هو أن بيرسيبترون يغذي إشارة ولدت من الانحدار الخطي المتعدد في وظيفة التنشيط التي قد تكون أو لا تكون غير الخطية. في بيرسيبترون متعددة الطبقات (ملب) يتم ترتيب بيرسيبترونس في طبقات وترتبط طبقات مع أخرى أخرى. توجد في طبقة ملب ثلاثة أنواع من الطبقات وهي طبقة الإدخال وطبقة (طبقات) مخفية وطبقة الإخراج. وتتلقى الطبقة المدخلة أنماط الإدخال ويمكن أن تحتوي طبقة المخرجات على قائمة بالتصنيفات أو إشارات المخرجات التي قد تحددها أنماط المدخلات هذه. الطبقات المخفية ضبط الأوزان على تلك المدخلات حتى يتم تقليل الخطأ من الشبكة العصبية. تفسير واحد من هذا هو أن الطبقات الخفية تستخرج السمات البارزة في بيانات المدخلات التي لديها القدرة التنبؤية فيما يتعلق النواتج.
مدخلات الخرائط: المخرجات.
يتلقى بيرسيبترون ناقلا للمدخلات، ويتكون من الصفات. ويسمى هذا المتجه من المدخلات نمط الإدخال. يتم وزن هذه المدخلات وفقا لمتجه الوزن الذي ينتمي إلى ذلك بيرسيبترون،. في سياق الانحدار الخطي المتعدد هذه يمكن أن ينظر إليها على أنها الانحدار كو-إفيسيانتس أو بيتا. وتكون إشارة الدخل الصافية، من البرسيبترون هي عادة ناتج مجموع نمط المدخلات وأوزانها. الخلايا العصبية التي تستخدم مجموع المنتج ل تسمى وحدات الجمع.
ثم يتم تغذية إشارة الدخل الصافية، ناقص التحيز في بعض وظيفة التنشيط،. وظائف التنشيط عادة ما تكون وظائف زيادة روتينية التي تكون محصورة بين أي من (أو تتم مناقشة ذلك بشكل أكبر في هذه المقالة). يمكن أن تكون وظائف التنشيط خطية أو غير خطية.
أبسط شبكة عصبية هو واحد الذي لديه واحد فقط الخلايا العصبية التي خرائط المدخلات إلى الإخراج. وبالنظر إلى نمط ما، فإن هدف هذه الشبكة هو تقليل الخطأ في إشارة الخرج، بالنسبة إلى بعض القيمة المستهدفة المعروفة لبعض أنماط التدريب المعطاة. على سبيل المثال، إذا كان من المفترض للخريطة العصبية لتخطيط ل -1 ولكن تعيينه إلى 1 ثم الخطأ، كما تقاس مسافة مربع التربيع، من الخلايا العصبية سيكون 4،.
كما هو مبين في الصورة أعلاه بيرسيبترونس يتم تنظيمها في طبقات. الطبقة الأولى أو بيرسيبترونس، ودعا الإدخال في وقت لاحق، يتلقى أنماط،، في مجموعة التدريب،. خرائط الطبقة الأخيرة إلى المخرجات المتوقعة لتلك الأنماط. ومن الأمثلة على ذلك أن الأنماط قد تكون قائمة بكميات المؤشرات الفنية المختلفة فيما يتعلق بالأمن، وقد تكون النواتج المحتملة هي الفئات.
طبقة مخفية هي التي تستقبل كمدخلات مخرجات من طبقة أخرى؛ والتي تشكل النواتج المدخلات إلى طبقة أخرى. فماذا تفعل هذه الطبقات الخفية؟ أحد التفسيرات هو أنها تستخرج السمات البارزة في بيانات المدخلات التي لديها القدرة التنبؤية فيما يتعلق بالمخرجات. وهذا ما يسمى استخراج ميزة وبطريقة أنها تؤدي وظيفة مماثلة للأساليب الإحصائية مثل تحليل المكون الرئيسي.
الشبكات العصبية العميقة لديها عدد كبير من طبقات مخفية وقادرة على استخراج ميزات أعمق بكثير من البيانات. في الآونة الأخيرة، كانت الشبكات العصبية العميقة أداء جيدا بشكل خاص لمشاكل التعرف على الصور. ويرد أدناه توضيح لاستخلاص الميزات في سياق التعرف على الصور،
وأعتقد أن واحدة من المشاكل التي تواجه استخدام الشبكات العصبية العميقة للتداول (بالإضافة إلى خطر واضح من الإفراط في الإمداد) هو أن المدخلات في الشبكة العصبية هي دائما تقريبا بشكل كبير قبل معالجتها بمعنى أنه قد يكون هناك عدد قليل من الميزات في الواقع استخراج لأن المدخلات هي بالفعل إلى حد ما الميزات.
قواعد التعلم.
كما ذكر سابقا هدف الشبكة العصبية هو التقليل من بعض الخطأ من الخطأ،. إن المقياس الأكثر شيوعا للخطأ هو خطأ مربع-مربع على الرغم من أن هذا المقياس حساس للقيم المتطرفة وقد يكون أقل ملاءمة من خطأ التتبع في سياق الأسواق المالية.
سوم تربيع الخطأ (سس)،
وبالنظر إلى أن الهدف من الشبكة هو للحد من أننا يمكن استخدام خوارزمية الأمثل لضبط الأوزان في الشبكة العصبية. خوارزمية التعلم الأكثر شيوعا للشبكات العصبية هي خوارزمية النسب التدرج على الرغم من أنه يمكن استخدام خوارزميات أخرى يحتمل أن تكون أفضل. الانحدار النسب يعمل عن طريق حساب المشتقة الجزئية من الخطأ فيما يتعلق الأوزان لكل طبقة في الشبكة العصبية ومن ثم الانتقال في الاتجاه المعاكس إلى التدرج (لأننا نريد للحد من الخطأ في الشبكة العصبية). عن طريق تقليل الخطأ نحن تحقيق أقصى قدر من الأداء للشبكة العصبية في العينة.
ويعبر رياضيا عن قاعدة التحديث للأوزان في الشبكة العصبية () من قبل،
حيث هو معدل التعلم الذي يتحكم في سرعة أو ببطء الشبكة العصبية تتقارب. ولا جدوى من أن حساب المشتقات الجزئية فيما يتعلق بصافي إشارة الدخل لنمط يمثل مشكلة لأي وظائف تنشيط متقطعة؛ وهذا هو أحد الأسباب التي يمكن أن تستخدم خوارزميات التحسين البديلة. اختيار معدل التعلم له تأثير كبير على أداء الشبكة العصبية. قد تؤدي القيم الصغيرة إلى التقارب البطيء جدا، في حين أن القيم العالية يمكن أن تؤدي إلى الكثير من التباين في التدريب.
على الرغم من أن بعض الإحصائيين الذين التقيت بهم في وقتي يعتقدون أن الشبكات العصبية ليست مجرد "شكل ضعيف من الإحصاءات لمحللين كسوليين" (لقد قيل لي هذا من قبل وكان مضحكا جدا). تمثل الشبكات العصبية تجريدا من التقنيات الإحصائية الصلبة التي تعود إلى مئات السنين. للحصول على تفسير رائع للإحصاءات وراء الشبكات العصبية أوصي قراءة هذا الفصل. بعد أن قلت أنا أتفق على أن بعض الممارسين مثل لعلاج الشبكات العصبية بأنها "الصندوق الأسود" التي يمكن طرحها في أي مشكلة من دون أخذ الوقت لأول مرة لفهم طبيعة المشكلة وعما إذا كانت الشبكات العصبية هي الخيار المناسب أم لا . ومن الأمثلة على ذلك استخدام الشبكات العصبية لأغراض التداول؛ الأسواق هي ديناميكية بعد الشبكات العصبية تفترض توزيع أنماط المدخلات لا تزال ثابتة مع مرور الوقت. ويناقش هذا بمزيد من التفصيل هنا.
3. الشبكات العصبية تأتي في العديد من المعماريات.
حتى الآن ناقشنا للتو أبسط بنية الشبكة العصبية، وهي بيرسيبترون متعدد الطبقات. هناك العديد من أبنية الشبكة العصبية المختلفة (الكثير جدا أن نذكر هنا) وأداء أي شبكة العصبية هي وظيفة لهندسة المعمارية والأوزان. العديد من التقدم في العصر الحديث في مجال التعلم الآلي لا تأتي من إعادة النظر في الطريقة التي بيرسيبترونس وخوارزميات الأمثل العمل ولكن بدلا من أن تكون خلاقة بشأن كيفية هذه المكونات تناسب معا. وفيما يلي مناقشة بعض مثيرة جدا للاهتمام والإبداعية أبنية الشبكة العصبية التي تم تطويرها مع مرور الوقت،
الشبكات العصبية المتكررة - بعض أو كل الاتصالات تتدفق إلى الوراء مما يعني أن تغذية الحلقات مرة أخرى موجودة في الشبكة. ويعتقد أن هذه الشبكات تؤدي أداء أفضل في بيانات السلاسل الزمنية. وعلى هذا النحو، قد تكون ذات أهمية خاصة في سياق الأسواق المالية. لمزيد من المعلومات هنا هو رابط لمقالة رائعة بعنوان، أداء غير معقول من المتكررة [العميقة] الشبكات العصبية.
ويظهر هذا الرسم البياني ثلاثة متكررة الشبكة المعمارية العصبية المتكررة وهي الشبكة العصبية إلمان، الشبكة العصبية الأردنية، والشبكة العصبية هوبفيلد طبقة واحدة.
ومن أحدث الهندسة المعمارية العصبية المتكررة في الشبكة العصبية هي آلة تورينج العصبية. تجمع هذه الشبكة بين بنية الشبكة العصبية المتكررة والذاكرة. وقد تبين أن هذه الشبكات العصبية هي تورينج كاملة وكانت قادرة على تعلم خوارزميات الفرز والمهام الحوسبة الأخرى.
الشبكة العصبية بولتزمان - واحدة من أول الشبكات العصبية متصلة تماما كانت شبكة بولتزمان العصبية a. k.a بولتزمان آلة. وكانت هذه الشبكات أول شبكات قادرة على تعلم التمثيلات الداخلية وحل المشاكل التوفيقية الصعبة للغاية. تفسير واحد من آلة بولتزمان هو أنه هو نسخة مونت كارلو من الشبكة العصبية المتكررة هوبفيلد. على الرغم من هذا، الشبكة العصبية يمكن أن يكون من الصعب جدا لتدريب ولكن عندما تقيد أنها يمكن أن تثبت أكثر كفاءة من الشبكات العصبية التقليدية. القيد الأكثر شعبية على آلات بولتزمان هو عدم السماح بالاتصال المباشر بين الخلايا العصبية المخفية. ويشار إلى هذا العمارة الخاصة على أنها آلة بولتزمان المقيدة، والتي تستخدم في آلات بوتلزمان العميقة.
يوضح هذا الرسم البياني كيف يمكن لآلات بولتزمان المختلفة التي لها علاقات بين العقد المختلفة أن تؤثر بشكل كبير على نتائج الشبكة العصبية (رسوم بيانية على يمين الشبكات)
الشبكات العصبية العميقة - هناك شبكات عصبية مع طبقات مخفية متعددة. أصبحت الشبكات العصبية العميقة شعبية للغاية في السنوات الأخيرة بسبب نجاحها لا مثيل لها في مشاكل التعرف على الصور والصوت. عدد معماريات الشبكات العصبية العميقة ينمو بسرعة كبيرة ولكن بعض من أكثر المباني شعبية تشمل شبكات الاعتقاد العميق، الشبكات العصبية التلافيفية، آلات بولتزمان مقيدة عميق، مكدسة لصناعة السيارات في الترميز، وغيرها الكثير. واحدة من أكبر المشاكل مع الشبكات العصبية العميقة، وخاصة في سياق الأسواق المالية التي هي غير ثابتة، هو الإفراط في الكتابة. المزيد من المعلومات انظر ديبلارنينغ.
يوضح هذا الرسم البياني شبكة عصبية عميقة تتكون من طبقات مخفية متعددة.
الشبكات العصبية التكيفية - هي الشبكات العصبية التي تتكيف في الوقت نفسه وتحسين أبنيتها أثناء التعلم. ويتم ذلك إما عن طريق زيادة العمارة (إضافة المزيد من الخلايا العصبية الخفية) أو تقلصها (تشذيب الخلايا العصبية الخفية غير الضرورية). وأعتقد أن الشبكات العصبية التكيفية هي الأنسب للأسواق المالية لأن الأسواق غير ثابتة. أقول هذا لأن الميزات المستخرجة من الشبكة العصبية قد تعزز أو تضعف مع مرور الوقت اعتمادا على ديناميات السوق. إن ما يترتب على ذلك هو أن أي بنية تعمل على النحو الأمثل في الماضي تحتاج إلى تغيير للعمل على النحو الأمثل اليوم.
يوضح هذا الرسم البياني نوعين مختلفين من معماريات الشبكات العصبية التكيفية. الصورة اليسرى هي شبكة عصبية متتالية والصورة الصحيحة هي خريطة ذاتية التنظيم.
الشبكات على أساس شعاعي - على الرغم من عدم وجود نوع مختلف من العمارة بمعنى الإدراك والتوصيلات، فإن وظائف الأساس الشعاعي تستفيد من وظائف الأساس الشعاعي كدالات تنشيط لها، وهي وظائف قيمة حقيقية يعتمد خرجها على المسافة من نقطة معينة. وأكثر الوظائف المستخدمة شعاعيا شيوعا هي التوزيع الغوسي. لأن وظائف أساس شعاعي يمكن أن تتخذ على أشكال أكثر تعقيدا بكثير، كانت تستخدم في الأصل لأداء الاستيفاء وظيفة. على هذا النحو، يمكن للشبكة العصبية وظيفة أساس شعاعي لديها قدرة المعلومات أعلى بكثير. وتستخدم أيضا وظائف أساس شعاعي في نواة آلة دعم ناقلات.
يوضح هذا الرسم البياني كيف يمكن القيام به منحنى المناسب باستخدام وظائف أساس شعاعي.
باختصار، هناك العديد من المئات من أبنية الشبكة العصبية موجودة وأداء الشبكة العصبية واحدة يمكن أن تكون متفوقة بشكل كبير على آخر. وعلى هذا النحو، فإن المحللين الكميين المهتمين باستخدام الشبكات العصبية يجب أن يختبروا على الأرجح عدة معماريات للشبكات العصبية وأن ينظروا في الجمع بين مخرجاتهم معا في مجموعة لتعظيم أدائهم الاستثماري. أوصي بقراءة مقالتي، جميع النماذج الخاصة بك خاطئة، 7 مصادر المخاطر النموذجية، قبل استخدام الشبكات العصبية للتداول لأن العديد من المشاكل لا تزال سارية.
4. حجم المسائل، ولكن أكبر ليس دائما أفضل.
بعد اختيار بنية واحدة يجب أن تقرر ثم كيف كبيرة أو صغيرة يجب أن تكون الشبكة العصبية. كم عدد المدخلات هناك؟ كم عدد الخلايا العصبية الخفية يجب أن تستخدم؟ كم عدد الطبقات المخفية التي يجب استخدامها (إذا كنا نستخدم شبكة عصبية عميقة)؟ وكم عدد النواتج العصبية مطلوبة؟ والأسباب التي تجعل هذه الأسئلة مهمة لأنه إذا كانت الشبكة العصبية كبيرة جدا (صغيرة جدا)، فإن الشبكة العصبية يمكن أن تحيد (نقص البيانات) البيانات بمعنى أن الشبكة لن تتعمق جيدا من العينة.
كم عدد المدخلات التي ينبغي استخدامها؟
ويعتمد عدد المدخلات على حل المشكلة، وكمية ونوعية البيانات المتاحة، وربما بعض الإبداع. المدخلات هي ببساطة المتغيرات التي نعتقد أن لديها بعض القدرة التنبؤية على المتغير التابع يجري التنبؤ بها. إذا كانت المدخلات إلى مشكلة غير واضحة، يمكنك تحديد منهجي المتغيرات التي ينبغي تضمينها من خلال النظر في الارتباطات والارتباط المتبادل بين المتغيرات المستقلة المحتملة والمتغيرات التابعة. هذا النهج مفصل في المقال، ما الذي يدفع نمو إجمالي الناتج المحلي الحقيقي؟
هناك مشكلتان مع استخدام الارتباطات لتحديد متغيرات الإدخال. أولا، إذا كنت تستخدم مقياس ارتباط خطي، يمكنك استبعاد المتغيرات المفيدة عن غير قصد. وثانيا، يمكن الجمع بين متغيرين غير مترابطين نسبيا لإنتاج متغير قوي الارتباط. إذا نظرتم إلى المتغيرات في عزلة قد تفوت هذه الفرصة. للتغلب على المشكلة الثانية يمكنك استخدام تحليل المكون الرئيسي لاستخراج إيجنفكتورس مفيدة (تركيبات الخطية للمتغيرات) كمدخلات. وهذا يعني أن المشكلة في هذا الأمر هي أن المتجهين قد لا يعممون جيدا وأنهم يفترضون أيضا أن توزيعات أنماط الإدخال ثابتة.
وهناك مشكلة أخرى عند اختيار المتغيرات هي متعددة الألوان. متعدد الألوان هو عندما اثنين أو أكثر من المتغيرات المستقلة يجري تغذية في نموذج ترتبط ارتباطا وثيقا. في سياق نماذج الانحدار قد يؤدي هذا إلى الانحدار المشترك الكفاءة للتغيير بشكل غير منتظم استجابة للتغيرات الصغيرة في النموذج أو البيانات. وبالنظر إلى أن الشبكات العصبية ونماذج الانحدار متشابهة وأظن أن هذا هو أيضا مشكلة للشبكات العصبية.
وأخيرا، وليس آخرا، التحيز الإحصائي واحد الذي يمكن إدخاله عند اختيار المتغيرات هو تحيز متغير محذوف. يحدث تحيز متغير محذوف عندما يتم إنشاء نموذج الذي يترك واحد أو أكثر من المتغيرات السببية الهامة. ينشأ التحيز عندما يعوض النموذج بشكل غير صحيح عن المتغير المفقود عن طريق تقدير أو تقليل تأثير أحد المتغيرات الأخرى، أي أن الأوزان قد تصبح كبيرة جدا على هذه المتغيرات أو سس ستكون كبيرة.
كم عدد الخلايا العصبية المخفية التي يجب استخدامها؟
العدد الأمثل للوحدات المخفية هو مشكلة محددة. ومع ذلك، كقاعدة عامة، فإن الوحدات الأكثر خفية تستخدم أكثر احتمالا يصبح خطر الإفراط في التجميع. الكتابة الزائدة هي عندما الشبكة العصبية لا تتعلم الخصائص الإحصائية الأساسية للبيانات، ولكن بدلا من "يحفظ" الأنماط وأي ضوضاء قد تحتوي عليها. وهذا يؤدي إلى الشبكات العصبية التي تؤدي أداء جيدا في العينة ولكن سيئة من العينة. فكيف يمكننا تجنب الإفراط؟ هناك نوعان من النهج الشعبية المستخدمة في الصناعة وهي التوقف المبكر وتنظيمها ثم هناك بلدي النهج المفضل الشخصية، والبحث العالمي،
يتضمن التوقف المبكر تقسيم مجموعة التدريب الخاصة بك إلى مجموعة التدريب الرئيسية ومجموعة التحقق من الصحة. ثم بدلا من تدريب الشبكة العصبية لعدد ثابت من التكرارات، يمكنك تدريب ثم حتى أداء الشبكة العصبية على مجموعة التحقق من صحة يبدأ في التدهور. أساسا هذا يمنع الشبكة العصبية من استخدام كل من المعلمات المتاحة ويحد من القدرة على حفظ ببساطة كل نمط يراه. تظهر الصورة الموجودة على اليمين نقطتي توقف محتملتين للشبكة العصبية (أ و ب).
التنظيم يعاقب الشبكة العصبية لاستخدام المعماريات المعقدة. يتم قياس التعقيد في هذا النهج من خلال حجم الأوزان الشبكة العصبية. تتم عملية التسوية عن طريق إضافة مصطلح إلى مجموع مربعات خطأ وظيفة الهدف الذي يعتمد على حجم الأوزان. هذا هو ما يعادل إضافة قبل الذي يجعل أساسا الشبكة العصبية نعتقد أن وظيفة هو تقريبي على نحو سلس،
حيث هو عدد من الأوزان في الشبكة العصبية. المعلمات والتحكم في الدرجة التي الشبكة العصبية أكثر أو يضعف البيانات. قيم جيدة ويمكن استخلاصها باستخدام تحليل بايزي والتحسين. هذا، وما سبق، يتم شرحها بقدر أكبر من التفصيل في هذا الفصل الرائع.
تقنيتي المفضلة، والتي هي أيضا إلى حد بعيد الأكثر تكلفة حسابيا، هو البحث العالمي. في هذا النهج يتم استخدام خوارزمية البحث في محاولة مختلف أبنية الشبكة العصبية والوصول إلى الخيار الأمثل القريب. وغالبا ما يتم ذلك باستخدام الخوارزميات الجينية التي تتم مناقشتها في هذه المقالة.
ما هي المخرجات؟
الشبكات العصبية يمكن أن تستخدم إما الانحدار أو التصنيف. تحت نموذج الانحدار يتم إخراج قيمة واحدة والتي قد يتم تعيينها إلى مجموعة من الأرقام الحقيقية مما يعني أنه مطلوب واحد فقط الخلايا العصبية الناتج. تحت نموذج التصنيف مطلوب الخلايا العصبية الناتج لكل فئة يحتمل أن النمط قد تنتمي. إذا كانت الطبقات غير معروفة تقنيات الشبكة العصبية غير الخاضعة للرقابة مثل خرائط التنظيم الذاتي ينبغي أن تستخدم.
في الختام، فإن أفضل نهج هو اتباع أوكهامز الحلاقة. يجادل الحلاقة أوكهام أنه لنموذجين من الأداء المكافئ، فإن نموذج مع عدد أقل من المعلمات الحرة تعميم أفضل. ومن ناحية أخرى، لا ينبغي للمرء أن يختار نموذجا مفرطا في التبسيط على حساب الأداء. وبالمثل، لا ينبغي للمرء أن يفترض أنه لمجرد أن الشبكة العصبية لديها الخلايا العصبية أكثر خفية وربما أكثر طبقات خفية فإنه سوف يتفوق على شبكة أبسط بكثير. لسوء الحظ يبدو لي أن هناك الكثير من التركيز على الشبكات الكبيرة ويتم التركيز قليلا جدا على اتخاذ قرارات التصميم الجيد. في حالة الشبكات العصبية، أكبر ليس دائما أفضل.
يجب ألا تتضاعف الكيانات بعد الضرورة - ويليام أوف أوكهام.
يجب عدم تخفيض الكيانات إلى درجة عدم كفاية - كارل منجر.
5. العديد من خوارزميات التدريب موجودة للشبكات العصبية.
تحاول خوارزمية التعلم في الشبكة العصبية تحسين الأوزان للشبكة العصبية حتى يتم استيفاء بعض شروط التوقف. وعادة ما يكون هذا الشرط إما عندما يصل خطأ الشبكة إلى مستوى مقبول من الدقة في مجموعة التدريب عندما يبدأ خطأ الشبكة في مجموعة التحقق من التدهور أو عند استنفاد الميزانية الحسابية المحددة. خوارزمية التعلم الأكثر شيوعا للشبكات العصبية هي خوارزمية باكبروباغاتيون التي تستخدم النسب التدرج ستوكاستيك الذي نوقش في وقت سابق في هذه المقالة. يتكون باكبروباغاتيون من خطوتين:
يتم تمرير مجموعة بيانات التدريب من خلال الشبكة ويتم تسجيل الإخراج من الشبكة العصبية ويتم حساب خطأ الشبكة الانتشار الخلفي - يتم تمرير إشارة الخطأ مرة أخرى من خلال الشبكة وأوزان الشبكة العصبية هي الأمثل باستخدام الانحدار النسب.
وهذه بعض المشاكل في هذا النهج. ضبط جميع الأوزان في وقت واحد يمكن أن يؤدي إلى حركة كبيرة من الشبكة العصبية في مساحة الوزن، وخوارزمية الانحدار الانحدار بطيئة جدا، وتكون عرضة للحد الأدنى المحلي. الحد الأدنى المحلي مشكلة لأنواع محددة من الشبكات العصبية بما في ذلك جميع الشبكات العصبية وصلة المنتج. ويمكن معالجة المشكلتين الأوليين من خلال استخدام متغيرات منحدر الانحدار بما في ذلك نسب التدرج الزخم (كيكبروب) ونسب التدرج المتسارع من نيستروف (ناغ) وخوارزمية التدرج التكيفي (أداغراد) والانتشار المرن (ربروب) والانتشار التربيعي المتوسط الجذر ( RMSProp). كما يمكن أن يرى من الصورة أدناه يمكن إجراء تحسينات كبيرة على خوارزمية النسب الانحدار الكلاسيكي.
بعد أن قيل، هذه الخوارزميات لا يمكن التغلب على الحد الأدنى المحلي، وأيضا أقل فائدة عند محاولة تحسين كل من بنية وأوزان الشبكة العصبية في وقت واحد. من أجل تحقيق هذه الخوارزميات الأمثل العالمية هناك حاجة. اثنين من خوارزميات التحسين العالمية شعبية هي تحسين سرب الجسيمات (بسو) والخوارزمية الجينية (غا). هنا كيف يمكن استخدامها لتدريب الشبكات العصبية:
تمثيل ناقلات الشبكة العصبية - من خلال ترميز الشبكة العصبية كمتجه للأوزان، كل يمثل وزن اتصال في الشبكة العصبية، يمكننا تدريب الشبكات العصبية باستخدام معظم خوارزميات البحث التجميعية. هذه التقنية لا تعمل بشكل جيد مع الشبكات العصبية العميقة لأن ناقلات تصبح كبيرة جدا.
ويوضح هذا الرسم البياني كيف يمكن تمثيل الشبكة العصبية في تدوين ناقلات وتتعلق بمفهوم فضاء البحث أو المشهد اللياقة البدنية.
تحسين سرب الجسيمات - لتدريب الشبكة العصبية باستخدام بسو نبني السكان / سرب من تلك الشبكات العصبية. يتم تمثيل كل شبكة العصبية باعتبارها ناقلات الأوزان ويتم تعديلها وفقا لموقفها من أفضل الجسيمات العالمية وانها الشخصية أفضل.
يتم حساب وظيفة اللياقة البدنية على النحو الخطأ مجموع مربع من الشبكة العصبية أعيد بناؤها بعد الانتهاء من تمرير فيدفوروارد واحد من مجموعة بيانات التدريب. الاعتبار الرئيسي مع هذا النهج هو سرعة التحديثات الوزن. ويرجع ذلك إلى أنه إذا تم تعديل الأوزان بسرعة كبيرة جدا، فإن الخطأ التربيعي الإجمالي للشبكات العصبية سوف يركد ولن يحدث أي تعلم.
يوضح هذا الرسم البياني كيفية جذب الجسيمات إلى بعضها البعض في سرب واحد الجسيمات سرب الأمثل خوارزمية.
الخوارزمية الجينية - لتدريب الشبكة العصبية باستخدام خوارزمية جينية نحن أولا بناء السكان من ناقلات تمثل الشبكات العصبية. ثم نطبق العوامل الوراثية الثلاثة على تلك المجموعة السكانية لتطوير شبكات عصبية أفضل وأفضل. هذه الشركات الثلاث هي،
اختيار - باستخدام الخطأ مجموع مربع من كل شبكة محسوبة بعد مرور فيدفوروارد واحد، ونحن ترتيب السكان من الشبكات العصبية. يتم اختيار أعلى نسبة مئوية من السكان إلى 'البقاء على قيد الحياة' إلى الجيل القادم واستخدامها في التقاطع. كروسوفر - يسمح ل x٪ أعلى من جينات السكان بالعبور مع بعضهم البعض. هذه العملية تشكل "ذرية". في السياق، فإن كل ذرية تمثل شبكة عصبية جديدة مع الأوزان من كل من الشبكات الأم 'الأم'. الطفرة - هذا المشغل مطلوب للحفاظ على التنوع الوراثي في السكان. يتم اختيار نسبة صغيرة من السكان الخضوع للتحور. بعض الأوزان في هذه الشبكات العصبية سيتم تعديلها عشوائيا ضمن نطاق معين.
هذه الخوارزمية تبين اختيار، كروس، وتحور العوامل الوراثية التي تطبق على عدد من الشبكات العصبية ممثلة ناقلات.
بالإضافة إلى هذه الخوارزميات البحثية الميتاهوريستية المستندة إلى السكان، تم استخدام خوارزميات أخرى لتدريب الشبكات العصبية بما في ذلك باكبروباغاتيون مع الزخم وأضاف، تطور التفاضلية، ليفنبرغ ماركاردت، الصلب محاكاة، وغيرها الكثير. شخصيا أود أن أوصي باستخدام مجموعة من خوارزميات التحسين المحلية والعالمية للتغلب على أوجه القصور على حد سواء.
6. الشبكات العصبية لا تتطلب دائما الكثير من البيانات.
الشبكات العصبية يمكن أن تستخدم واحدة من ثلاث استراتيجيات التعلم وهي استراتيجية التعلم تحت إشراف، استراتيجية التعلم غير خاضعة للإشراف، أو استراتيجية التعلم التعزيز. ويتطلب التعلم الخاضع للمراقبة ما لا يقل عن مجموعتين من البيانات، مجموعة تدريب تتألف من مدخلات مع المخرجات المتوقعة، ومجموعة اختبار تتألف من مدخلات دون الإنتاج المتوقع. يجب أن تتكون كل من مجموعات البيانات هذه من بيانات مصنفة، أي أنماط البيانات التي يعرف بها الهدف مسبقا. وعادة ما تستخدم استراتيجيات التعلم غير الخاضعة للرقابة لاكتشاف الهياكل المخفية (مثل سلاسل ماركوف المخفية) في البيانات غير المسماة. أنها تتصرف بطريقة مماثلة لخوارزميات تجميع. ويستند التعلم التعزيز على فرضية بسيطة من الشبكات العصبية مجزية للسلوكيات الجيدة ومعاقبتهم على السلوكيات السيئة. لأن استراتيجيات التعلم غير الخاضعة للرقابة والتعزيز لا تتطلب أن يتم وضع علامة على البيانات التي يمكن تطبيقها على مشاكل ضعيفة حيث الإخراج الصحيح غير معروف.
تعليم غير مشرف عليه.
واحدة من أبنية الشبكة العصبية غير الخاضعة للرقابة الأكثر شعبية هي خريطة التنظيم الذاتي (المعروف أيضا باسم خريطة كوهونين). خرائط التنظيم الذاتي هي في الأساس تقنية تحجيم متعددة الأبعاد التي تقترب من وظيفة الكثافة الاحتمالية لبعض مجموعة البيانات الأساسية، مع الحفاظ على الهيكل الطوبوغرافي لمجموعة البيانات تلك. ويتم ذلك عن طريق تعيين ناقلات الإدخال،، في مجموعة البيانات،، إلى ناقلات الوزن،، (الخلايا العصبية) في خريطة ميزة،. إن الحفاظ على البنية الطوبوغرافية يعني ببساطة أنه إذا كانت نواقل الإدخال قريبة من بعضها البعض، فإن الخلايا العصبية التي تكون خريطة ناقلات المدخلات فيها قريبة أيضا.
لمزيد من المعلومات حول خرائط التنظيم الذاتي وكيف يمكن استخدامها لإنتاج مجموعات بيانات ذات أبعاد أقل انقر هنا. تطبيق آخر للاهتمام من سوم هو في تلوين الرسوم البيانية سلسلة الوقت لتداول الأسهم. ويتم ذلك لإظهار ما هي ظروف السوق في تلك المرحلة من الزمن. يوفر هذا الموقع تعليمي مفصل ومقتطفات الشفرة لتنفيذ فكرة تحسين استراتيجيات تداول العملات الأجنبية.
تعزيز التعلم.
وتتكون استراتيجيات التعلم التعزيزية من ثلاثة مكونات. سياسة تحدد كيفية اتخاذ الشبكة العصبية قرارات مثل باستخدام المؤشرات التقنية والأساسية. وظيفة المكافأة التي تميز جيدة من سيئة على سبيل المثال. مما يجعل مقابل خسارة المال. ودالة القيمة التي تحدد الهدف على المدى الطويل. في سياق الأسواق المالية (واللعب) استراتيجيات التعلم تعزيز مفيدة بشكل خاص لأن الشبكة العصبية يتعلم لتحسين كمية معينة مثل قياس مناسب للعودة تعديل المخاطر.
ويبين هذا الرسم البياني كيف يمكن للشبكة العصبية أن تكون إما سلبا أو إيجابيا.
7. لا يمكن تدريب الشبكات العصبية على أي بيانات.
واحد من أكبر الأسباب التي قد لا تعمل الشبكات العصبية هو لأن الناس لا بشكل صحيح قبل معالجة البيانات التي تغذي الشبكة العصبية. يجب أن يتم تنفيذ تطبيع البيانات، وإزالة المعلومات الزائدة عن الحاجة، وإزالة أوتلير لتحسين احتمال أداء الشبكة العصبية جيدة.
تطبيع البيانات - تتكون الشبكات العصبية من طبقات مختلفة من بيرسيبترونس ترتبط معا من خلال اتصالات مرجحة. ولكل من المحسسات وظيفة تنشيط يكون لكل منها "نطاق نشط" (باستثناء وظائف الأساس الشعاعي). المدخلات في الشبكة العصبية تحتاج إلى تحجيمها ضمن هذا النطاق بحيث الشبكة العصبية قادرة على التفريق بين أنماط الإدخال المختلفة.
على سبيل المثال، نظرا لنظام تداول الشبكة العصبية الذي يتلقى مؤشرات حول مجموعة من الأوراق المالية كمدخلات والمخرجات ما إذا كان ينبغي شراء كل أمن أو بيعها. واحد من المدخلات هو سعر الأمن، ونحن نستخدم وظيفة تفعيل سيغمواد. ومع ذلك، فإن معظم تكلفة الأوراق المالية بين 5 $ و 15 $ للسهم الواحد ومخرجات وظيفة سيغمويد تقترب 1.0. وبالتالي فإن الناتج من وظيفة سيغمويد يكون 1.0 لجميع الأوراق المالية، وجميع بيرسيبترونز 'النار' والشبكة العصبية لن تتعلم.
الشبكات العصبية المدربة على البيانات غير المجهزة تنتج نماذج حيث 'أضواء على ولكن لا أحد منزل'
أوتلير إزالة - أوتلير هو القيمة التي هي أصغر بكثير أو أكبر من معظم القيم الأخرى في بعض مجموعة من البيانات. يمكن أن تسبب القيم المتطرفة مشاكل مع التقنيات الإحصائية مثل تحليل الانحدار ومنحنى المناسب لأنه عندما يحاول النموذج "استيعاب" خارج، أداء النموذج عبر جميع البيانات الأخرى تتدهور،
ويوضح هذا الرسم البياني تأثير إزالة البيانات الخارجية من الانحدار الخطي. النتائج قابلة للمقارنة للشبكات العصبية. مصدر الصورة: Statistics. laerd / ستاتيستيكال-غيدس / إمغ / بيرسون-6.png.
The illustration shows that trying to accommodate an outlier into the linear regression model results in a poor fits of the data set. The effect of outliers on non-linear regression models, including neural networks, is similar. Therefore it is good practice is to remove outliers from the training data set. That said, identifying outliers is a challenge in and of itself, this tutorial and paper discuss existing techniques for outlier detection and removal.
Remove redundancy - when two or more of the independent variables being fed into the neural network are highly correlated (multiplecolinearity) this can negatively affect the neural networks learning ability. Highly correlated inputs also mean that the amount of unique information presented by each variable is small, so the less significant input can be removed. Another benefit to removing redundant variables is faster training times. Adaptive neural networks can be used to prune redundant connections and perceptrons.
8. Neural networks may need to be retrained.
Given that you were able to train a neural network to trade successfully in and out of sample this neural network may still stop working over time. This is not a poor reflection on neural networks but rather an accurate reflection of the financial markets. Financial markets are complex adaptive systems meaning that they are constantly changing so what worked yesterday may not work tomorrow. This characteristic is called non-stationary or dynamic optimization problems and neural networks are not particularly good at handling them.
Dynamic environments, such as financial markets, are extremely difficult for neural networks to model. Two approaches are either to keep retraining the neural network over-time, or to use a dynamic neural network. Dynamic neural networks 'track' changes to the environment over time and adjust their architecture and weights accordingly. They are adaptive over time. For dynamic problems, multi-solution meta-heuristic optimization algorithms can be used to track changes to local optima over time. One such algorithm is the multi-swarm optimization algorithm, a derivative of the particle swarm optimization. Additionally, genetic algorithms with enhanced diversity or memory have also been shown to be robust in dynamic environments.
The illustration below demonstrates how a genetic algorithm evolves over time to find new optima in a dynamic environment. This illustration also happens to mimic trade crowding which is when market participants crowd a profitable trading strategy, thereby exhausting trading opportunities causing the trade to become less profitable.
This animated image shows a dynamic fitness landscape (search space) change over time. Image source: en. wikipedia/wiki/Fitness_landscape.
9. Neural networks are not black boxes.
By itself a neural network is a black-box. This presents problems for people wanting to use them. For example, fund managers wouldn't know how a neural network makes trading decisions, so it is impossible to assess the risks of the trading strategies learned by the neural network. Similarly, banks using neural networks for credit risk modelling would not be able to justify why a customer has a particular credit rating, which is a regulatory requirement. That having been said, state of the art rule-extraction algorithms have been developed to vitrify some neural network architectures. These algorithms extract knowledge from the neural networks as either mathematical expressions, symbolic logic, fuzzy logic, or decision trees.
This image shows a neural network as a black box and how it related to rule extraction techniques.
Mathematical rules - algorithms have been developed which can extract multiple linear regression lines from neural networks. The problem with these techniques is that the rules are often still difficult to understand, therefore these do not solve the 'black-box' problem.
Propositional logic - propositional logic is a branch of mathematical logic which deals with operations done on discrete valued variables. These variables, such as A or B, are often either TRUE or FALSE, but they could occupy values within a discrete range e. g. .
Logical operations can then be applied to those variables such as OR, AND, and XOR. The results are called predicates which can also be quantified over sets using the exists or for-all quantifiers. This is the difference between predicate and propositional logic. If we had a simple neural network which Price (P), Simple Moving Average (SMA), and Exponential Moving Average (EMA) as inputs and we extracted a trend following strategy from the neural network in propositional logic, we might get rules like this,
Fuzzy logic - fuzzy logic is where probability and propositional logic meet. The problem with propositional logic is that is deals in absolutes e. g. BUY or SELL, TRUE or FALSE, 0 or 1. Therefore for traders there is no way to determine the confidence of these results. Fuzzy logic overcomes this limitation by introducing a membership function which specifies how much a variable belongs to a particular domain. For example, a company (GOOG) might belong 0.7 to the domain and 0.3 to the domain . Combinations of neural networks and fuzzy logic are called Neuro-Fuzzy systems. This research survey discusses various fuzzy rule extraction techniques.
Decision trees - decision trees show how decisions are made when given certain information. This article describes how to evolve security analysis decision trees using genetic programming. Decision tree induction is the term given to the process of extracting decision trees from neural networks.
An example of a simple trading strategy represented using a decision tree. The triangular boxes represent decision nodes, these could be to BUY, HOLD, or SELL a company. Each box represents a tuple of <indicator, inequality,="" value="">. An example might be <sma,>, 25> or <ema, <="," 30="">.
10. Neural networks are not hard to implement.
This list is updated, from time to time, when I have time. Last updated: November 2018.
Speaking from experience, neural networks are quite challenging to code from scratch. Luckily there are now hundreds open source and proprietary packages which make working with neural networks a lot easier. Below is a list of packages which quants may find useful for quantitative finance. The list is NOT exhaustive, and is ordered alphabetically. If you have any additional comments, or frameworks to add, please share via the comment section.
"Caffe is a deep learning framework made with expression, speed, and modularity in mind. It is developed by the Berkeley Vision and Learning Center (BVLC) and by community contributors. Yangqing Jia created the project during his PhD at UC Berkeley." - Caffe webpage (November 2018)
"Encog is an advanced machine learning framework that supports a variety of advanced algorithms, as well as support classes to normalize and process data. Machine learning algorithms such as Support Vector Machines, Artificial Neural Networks, Genetic Programming, Bayesian Networks, Hidden Markov Models, Genetic Programming and Genetic Algorithms are supported. Most Encog training algoritms are multi-threaded and scale well to multicore hardware. Encog can also make use of a GPU to further speed processing time. A GUI based workbench is also provided to help model and train machine learning algorithms." - Encog webpage.
H2O is not strictly a package for machine learning, instead they expose an API for doing fast and scalable machine learning for smarter applications which use big data. Their API supports deep learning model, generalized boosting models, generalized linear models, and more. They also host a cool conference, checkout the videos :).
Google TensorFlow.
" TensorFlow is an open source software library for numerical computation using data flow graphs. Nodes in the graph represent mathematical operations, while the graph edges represent the multidimensional data arrays (tensors) that flow between them. This flexible architecture lets you deploy computation to one or more CPUs or GPUs in a desktop, server, or mobile device without rewriting code." - GitHub repository ( November 2018)
Microsoft Distributed Machine Learning Tookit.
"DMTK includes the following projects: DMTK framework(Multiverso): The parameter server framework for distributed machine learning. LightLDA: Scalable, fast and lightweight system for large-scale topic modeling. Distributed word embedding: Distributed algorithm for word embedding. Distributed skipgram mixture: Distributed algorithm for multi-sense word embedding." - GitHub repository (November 2018)
Microsoft Azure Machine Learning.
The machine learning / predictive analytics platform in Microsoft Azure is a fully managed cloud service that enables you to easily build, deploy, and share predictive analytics solutions. This software basically allows you to drag and drop pre-built components (including machine learning models) and custom-built components which manipulate data sets into a process. This flow-chart is then compiled into a program and can be deployed as a web-service. It is similar to the older SAS enterprise miner solution except that is it more modern, more functional, supports deep learning models, and exposes clients for Python and R.
"MXNet is a deep learning framework designed for both efficiency and flexibility. It allows you to mix the flavours of symbolic programming and imperative programming together to maximize the efficiency and your productivity. In its core, a dynamic dependency scheduler that automatically parallelizes both symbolic and imperative operations on the fly. A graph optimization layer is build on top, which makes symbolic execution fast and memory efficient. The library is portable and lightweight, and is ready scales to multiple GPUs, and multiple machines." - MXNet GitHub Repository (November 2018)
"neon is Nervana's Python based Deep Learning framework and achieves the fastest performance on many common deep neural networks such as AlexNet, VGG and GoogLeNet. We have designed it with the following functionality in mind: 1) Support for commonly used models and examples: convnets, MLPs, RNNs, LSTMs, autoencoders, 2) Tight integration with nervanagpu kernels for fp16 and fp32 (benchmarks) on Maxwell GPUs, 3) Basic automatic differentiation support, 4) Framework for visualization, and 5) Swappable hardware backends . " - neon GitHub repository (November 2018)
"Theano is a Python library that allows you to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently. It can use GPUs and perform efficient symbolic differentiation." - Theano GitHub repository (November 2018). Theano, like TensorFlow and Torch, is more broadly applicable than just Neural Networks. It is a framework for implementing existing or creating new machine learning models using off-the-shelf data-structures and algorithms.
"Torch is a scientific computing framework with wide support for machine learning algorithms . A summary of core features include an N-dimensional array, routines for indexing, slicing, transposing, an interface to C, via LuaJIT, linear algebra routines, neural network, energy-based models, numeric optimization routines, Fast and efficient GPU support, Embeddable, with ports to iOS, Android and FPGA" - Torch Webpage (November 2018). Like Tensorflow and Theano, Torch is more broadly applicable than just Neural Networks. It is a framework for implementing existing or creating new machine learning models using off-the-shelf data-structures and algorithms.
SciKit Learn.
SciKit Learn is a very popular package for doing machine learning in Python. It is built on NumPy, SciPy, and matplotlib Open source, and exposes implementations of various machine learning models for classification, regression, clustering, dimensionality reduction, model selection, and data preprocessing.
As I mentioned, there are now hundreds of machine learning packages and frameworks out there. Before committing to any one solution I would recommend doing a best-fit analysis to see which open source or proprietary machine learning package or software best matches your use-cases. Generally speaking a good rule to follow in software engineering and model development for quantitative finance is to not reinvent the wheel . that said, for any sufficiently advanced model you should expect to have to write some of your own code.
استنتاج.
Neural networks are a class of powerful machine learning algorithms. They are based on solid statistical foundations and have been applied successfully in financial models as well as in trading strategies for many years. Despite this, they have a bad reputation due to the many unsuccessful attempts to use them in practice. In most cases, unsuccessful neural network implementations can be traced back to inappropriate neural network design decisions and general misconceptions about how they work. This article aims to articulate some of these misconceptions in the hopes that they might help individuals implementing neural networks meet with success.
For readers interested in getting more information, I have found the following books to be quite instructional when it comes to neural networks and their role in financial modelling and algorithmic trading.
Some instructional textbooks when it comes to implementing neural networks and other machine learning algorithms in finance. Many of the misconceptions presented in this article are discussed in more detail in Professor Andries Engelbrecht's book, 'An Introduction to Computational Intelligence'
القصة السابقة.
Simulated Annealing for Portfolio Optimization.
قصة المقبلة.
Regression analysis using Python.
Great effort behind this article, Stuart.
Kindly check the email.
Hi Michal, thank you for your email. I'm glad you enjoyed the article, please let me know if you have any suggestions for further material!
November 28, 2017.
A terrific resource.
It would be really illustrative to understand how the example applications mentioned - time-series forecasting, proprietary trading signal generation, fully automated trading (decision making), financial modelling, derivatives pricing, credit risk assessments, pattern matching, and security classification - are solved using neural networks or other machine learning methods. Is there a resource or blog that covers this?
November 28, 2017.
Hi Dinesh, thanks for commenting. I think that online literature for the topic of Neural Networks applied to finance is fragmented. Therefore, it may be worthwhile trying to get a copy a book called "Neural Networks in Finance" by Paul D. McNelis. The book is a bit dated, and probably won't cover all the latest developments in Neural Networks but it will definitely covers most of the applications I mentioned in my blog. Otherwise, the best resources are academic journal articles written on the topic. Journal articles are obviously a bit more technical but there is no better way to learn in my humble opinion. حظا طيبا وفقك الله!
Excellent blog Stuart. well-written, articulate & nuanced in its descriptions.
Thank you very much Faiyaz. I only hope that you and other readers are able to find good applications of the techniques discussed here 🙂
Nice blog Mr Stuart, and thanks for summarizing alot of things. I was working on a neural network for my company inkunzi markets in Sandton, and just finished after 3 months(built from scratch), fuzzy neurons are not as easy to control and build indeed, but rather better when done perfectly interms of pattern recognition and market forecasting. Keep up the good work fellow Quant,
BSc Mathematical Statistics, Physics and Electronics from Rhodes University.
January 12, 2018.
Hi Brian, thanks for getting in touch. Thank you for the information, I have only read up on the neuro-fuzzy systems but never applied them in practice. I will check them out in more detail this year :).
February 12, 2018.
Hi Stu, I am starting a quant invest platform development project here in Beijng based on big data intelligence from market emotion to technical trading signal using, and I am looking for international partner's join, if you have interests, maybe we can schedule a skype chat. Thank you with regards, your personal blog is awesome! Jack.
February 12, 2018.
Hi Jack, thank you for the compliments :). I will definitely be in touch, Beijing is an incredible city which I was lucky enough to visit last year for a conference.
Thanks man. I appreciate the comment, that said this article is getting a little bit old now 🙂 so I'm busy working on a more technical follow up with implementation-level detail.
Should come out in the next few months. شكرًا لك مرة أخرى!
Please sign me up for updates.
My concern with neural networks is its ability to handle categorical data. I get the impression that in supervised learning situations, neural networks work best when all your independent variables are numeric (or at least mostly numeric). Is there any truth to this?
Hi Li, you can train neural networks with categorical inputs, usually each potential category forms an individual input into the neural network.
Thanks for the Article. I think this article is a must read for everyone 'new' at this field. As I call this method is a 'breadth-first' learning approach to Introduction to Neural Networks.
Sorry for my bad english.
Thank you for the kind words, your English is fine 🙂
Looking for something like this for a while, all i can find are click-bait articles.
Great research! Favorited!
Thanks John; I also really dislike all the mindless click-bait articles out there. This blog is all about content 🙂 - I really need to write more about neural networks though.
Thank you very very much. Your article is amazing especially for the beginner like me. From your article, I get an outline for what Neural Network is, how many kinds of NNs and how to use them properly. Plus, the external resources you provided are excellent too.
Thanks for the kind words Steven. I'm happy to hear that the article was helpful to you 🙂 good luck!
September 18, 2018.
Great article Stuart. Would you recommend any open source ANN tools that implement the Levenberg Marquardt learning algorithm?
September 18, 2018.
Hi Ankur. The one package I used a few years ago which offered Levenberg Marquardt (often referred to as LMA) was Encog. I'm sure some of the others offer it as well.
September 21, 2018.
Great article. Neural network and article.
Hi Stuart, Thank you for this article - it was most illuminating!
How do spiking neural networks fit into the overall picture of neural networks? (architecturally speaking and from the point of view of most suitable applications)
Hey Louis, thanks for the comment!
That's an interesting question. Let me preface my response by stating that I have neither worked with nor explicitly studied spiking neural networks.
That said, I have come across them before. Architecturally they are similar to any other neural network except that each individual neuron's complexity is higher like with product unit neural networks - which, by the way, I quite like. This added complexity makes spiking neural networks "more similar" to biological neural networks in the sense that neuron activation is not a continuous process, it is discontinuous. Which is actually how I came across them originally :-): I was researching applications for jump diffusion stochastic processes one of which is modelling the firing rate of neurons in spiking neural networks. But like I said, I haven't worked with them or studied them explicitly and I am not one hundred perfect sure of their use cases.
All I can say us that I am supportive of complex neural network architectures because I believe they may hold the key to more efficient and human-esque intelligence in machines.
My name is Michael. I have read that Neural Network Regression can predict the market more than any other software or strategy. I have a question then; how can i use the neural network in trading, my main concern is the forex market. Is neural network regression a software? ما هذا؟ How can i use it in trading forex? How can it predict or forcaste the price of the eurusd for me?
Am like totally naive on this and i need your help. If its a program software, how can i get one?
Thank you and i anticipate your reply.
Wow, thanks for the excellent write-up. It was incredibly well-researched and articulated. أبقه مرتفعاً!
October 21, 2018.
I know what you mean, but there's a dichotomy with your title of "10 misconceptions. " and the fact that you listed not the 10 misconceptions but actually the conceptions anti to the ten misconceptions. The reader at first thinks that your list are the misconceptions. Double negative thing going on.
December 1, 2018.
thank you for the wonderful article, a great resource in deed.
December 3, 2018.
An Amazing article with perfect definitions and clear examples. Great heads up for someone like me, trying to develop a new ANN framework from the scratch.
December 30, 2018.
Titled as a warning against common knowledge, covered stuff that classical texts didn't. Great blog! شكرا لكم!
Thx for great overview article ✔
إرسال تعليق.
إلغاء الرد.
اتبع تورينج المالية.
تورينغ المالية القائمة البريدية.
أصدقاء تورينج المالية.
الكوانتوقراطية هو أفضل التمويل الكمي بلوق مجمع مع وصلات لتحليل جديد نشرت كل يوم.
نمركل هو صندوق التحوط الكمي أنا جزء من. نحن نستخدم آلة التعلم في محاولة للفوز على السوق.

Neural Networks: Forecasting Profits.
Neural networks are state-of-the-art, trainable algorithms that emulate certain major aspects in the functioning of the human brain. This gives them a unique, self-training ability, the ability to formalize unclassified information and, most importantly, the ability to make forecasts based on the historical information they have at their disposal.
Neural networks have been used increasingly in a variety of business applications, including forecasting and marketing research solutions. In some areas, such as fraud detection or risk assessment, they are the indisputable leaders. The major fields in which neural networks have found application are financial operations, enterprise planning, trading, business analytics and product maintenance. Neural networks can be applied gainfully by all kinds of traders, so if you're a trader and you haven't yet been introduced to neural networks, we'll take you through this method of technical analysis and show you how to apply it to your trading style.
Use Neural Networks to Uncover Opportunities.
Just like any kind of great product or technology, neural networks have started attracting all those who are looking for a budding market. Torrents of ads about next-generation software have flooded the market - ads celebrating the most powerful of all the neural network algorithms ever created. Even in those rare cases when advertising claims resemble the truth, keep in mind that a 10% increase in efficiency is probably the most you will ever get from a neural network. In other words, it doesn't produce miraculous returns and regardless of how well it works in a particular situation, there will be some data sets and task classes for which the previously used algorithms remain superior. Remember this: it's not the algorithm that does the trick. Well-prepared input information on the targeted indicator is the most important component of your success with neural networks.
Is Faster Convergence Better?
Many of those who already use neural networks mistakenly believe that the faster their net provides results, the better it is. This, however, is a delusion. A good network is not determined by the rate at which it produces results and users must learn to find the best balance between the velocity at which the network trains and the quality of the results it produces.
Correct Application of Neural Nets.
Many traders apply neural nets incorrectly because they place too much trust in the software they use all without having been provided with proper instructions on how to use it properly. To use a neural network the right way and, thus, gainfully, a trader ought to pay attention to all the stages of the network preparation cycle. It is the trader and not his or her net that is responsible for inventing an idea, formalizing this idea, testing and improving it, and, finally, choosing the right moment to dispose of it when it's no longer useful. Let us consider the stages of this crucial process in more detail:
1. Finding and Formalizing a Trading Idea.
2. Improving the Parameters of Your Model.
3. Disposing of the Model When it Becomes Obsolete.
Every neural-network based model has a life span and cannot be used indefinitely. The longevity of a model's life span depends on the market situation and on how long the market interdependencies reflected in it remain topical. However, sooner or later any model becomes obsolete. When this happens, you can either retrain the model using completely new data (i. e. replace all the data that has been used), add some new data to the existing data set and train the model again, or simply retire the model altogether.
Many traders make the mistake of following the simplest path - they rely heavily on and use the approach for which their software provides the most user-friendly and automated functionality. This simplest approach is forecasting a price a few bars ahead and basing your trading system on this forecast. Other traders forecast price change or percentage of the price change. This approach seldom yields better results than forecasting the price directly. Both the simplistic approaches fail to uncover and gainfully exploit most of the important longer-term interdependencies and, as a result, the model quickly becomes obsolete as the global driving forces change.
The Most Optimal Overall Approach to Using Neural Networks.

Trading strategies using neural networks

Hybrid Neural Network Stop-and-Reverse Strategies for Forex.
مايكل R. براينت.
Neural networks have been used in trading systems for many years with varying degrees of success. Their primary attraction is that their nonlinear structure is better able to capture the complexities of price movement than standard, indicator-based trading rules. One of the criticisms has been that neural network-based trading strategies tend to be over-fit and therefore don't perform well on new data. A possible solution to this problem is to combine neural networks with rule-based strategy logic to create a hybrid type of strategy. This article will show how this can be done using Adaptrade Builder.
In particular, this article will illustrate the following:
Combining neural network and rule-based logic for trade entries.
Targeting multiple platforms simultaneously (MetaTrader 4 and TradeStation)
Developing a strategy with asymmetrical stop-and-reverse logic.
Using intraday forex data.
A three-segment data approach will be used, with the third segment used to validate the final strategies. The resulting strategy code for both MetaTrader 4 and TradeStation will be shown, and it will be demonstrated that the validation results are positive for each platform.
Neural Networks as Trade Entry Filters.
Mathematically, a neural network is a nonlinear combination of one or more weighted inputs that generates one or more output values. For trading, a neural network is generally used in one of two ways: (1) as a prediction of future price movement, or (2) as an indicator or filter for trading. Here, its use as an indicator or trade filter will be considered.
As an indicator, a neural network acts as an additional condition or filter that must be satisfied before a trade can be entered. The inputs to the network are typically other technical indicators, such as momentum, stochastics, ADX, moving averages, and so on, as well as prices and combinations of the preceding. The inputs are scaled and the neural network is designed so that the output is a value between -1 and +1. One approach is to allow a long entry if the output is greater than or equal to a threshold value, such as 0.5, and a short entry if the output is less than or equal to the negative of the threshold; e. g., -0.5. This condition would be in addition to any existing entry conditions. For example, if there were a long entry condition, it would have to be true and the neural network output would have to be at least equal to the threshold value for a long entry.
When setting up a neural network, a trader would typically be responsible for choosing the inputs and the network topology and for "training" the network, which determines the optimal weights values. As will be shown below, Adaptrade Builder performs these steps automatically as part of the evolutionary build process that the software is based on. Using the neural network as a trade filter allows it to be easily combined with other rules to create a hybrid trading strategy, one that combines the best features of traditional, rule-based approaches with the advantages of neural networks. As a simple example, Builder might combine a moving average crossover rule with a neural network so that a long position is taken when the fast moving average crosses above the slow moving average and the neural network output is at or above its threshold.
Stop-and-Reverse Trading Strategies.
A stop-and-reverse trading strategy is one that is always in the market, either long or short. Strictly speaking, "stop-and-reverse" means that you reverse the trade when your stop order is hit. However, I use it as a short-hand for any trading strategy that reverses from long to short to long and so on, so that you're always in the market. By this definition, it's not necessary for the orders to be stop orders. You could enter and reverse using market or limit orders as well. It's also not necessary that each side use the same logic or even the same order type. For example, you could enter long (and exit short) on a stop order and enter short (and exit long) on a market order, using different rules and conditions for each entry/exit. This would be an example of an asymmetrical stop-and-reverse strategy.
The primary advantage of a stop-and-reverse strategy is that by always being in the market, you never miss any big moves. Another advantage is simplicity. When there are separate rules and conditions for entering and exiting trades, there is more complexity and more that can go wrong. Combining entries and exits means fewer timing decisions have to be made, which can mean fewer mistakes.
On the other hand, it can be argued that the best conditions for exiting a trade are rarely the same as those for entering in the opposite direction; that entering and exiting trades are inherently separate decisions that should therefore employ separate rules and logic. Another potential drawback of always being in the market is that the strategy will trade through every opening gap. A large opening gap against the position can mean a large loss before the strategy is able to reverse. Strategies that enter and exit more selectively or that exit by the end of the day can minimize the impact of opening gaps.
Since the goal is to build a forex strategy, MetaTrader 4 (MT4) is an obvious choice for the trading platform given that MetaTrader 4 is designed primarily for forex and is widely used for trading those markets (see, for example, MetaTrader vs. TradeStation: A Language Comparison). However, in recent years, TradeStation has targeted the forex markets much more aggressively. Depending on your trading volume and/or account level, it's possible to trade the forex markets through TradeStation without incurring any platform fees or paying any commissions. Spreads are reportedly tight with good liquidity on the major forex pairs. For these reasons, both platforms were targeted for this project.
Several issues arise when targeting multiple platforms simultaneously. First, the data may be different on different platforms, with differences in time zones, price quotes for some bars, volume, and available date ranges. To smooth over these differences, data were obtained from both platforms, and the strategies were built over both data series simultaneously. The best strategies were therefore the ones that worked well on both data series despite any differences in the data.
The data settings used in Builder are shown below in Fig. 1. As can be inferred from the Market Data table in the figure, the Euro/dollar forex market was targeted (EURUSD) with a bar size of 4 hours (240 minutes). Other bar sizes or markets would have served just as well. I was only able to obtain as much data through my MT4 platform as indicated by the date range shown in Fig. 1 (data series #2), so the same date range was used in obtaining the equivalent data series from TradeStation (data series #1). 80% of the data was used for Building (combined in-sample and "out-of-sample"), with 20% (6/20/14 to 2/10/15) set aside for validation. 80% of the original 80% was then set to "in-sample" with 20% set to "out-of-sample," as shown in Fig. 1. The bid/ask spread was set to 5 pips, and trading costs of 6 pips or $60 per full-size lot (100,000 shares) were assumed per round-turn. Both data series were included in the build, as indicated by the checkmarks in the left-hand column of the Market Data table.
Figure 1. Market data settings for building a forex strategy for MetaTrader 4 and TradeStation.
Another potential problem when targeting multiple platforms is that Builder is designed to duplicate the way each supported platform calculates its indicators, which can mean that the indicator values will be different depending on which platform is selected. To avoid this possible source of discrepancy, any indicators that evaluate differently in MetaTrader 4 than in TradeStation should be eliminated from the build, which means the following indicators should be avoided:
Slow D stochastic.
Fast D stochastic.
All other indicators that are available for both platforms are calculated the same way in both platforms. TradeStation includes all of the indicators that are available in Builder, whereas MetaTrader 4 does not. Therefore, to include only indicators that are available in both platforms, the MetaTrader 4 platform should be selected as the code type in Builder. That will automatically remove any indicators from the build set that are not available for MT4, which will leave the indicators that are available in both platforms. Additionally, since I noticed differences in the volume data obtained from each platform, I removed all volume-dependent indicators from the build set. Lastly, the time-of-day indicator was removed because of differences in the time zones between data files.
In Fig. 2, below, the list of indicators used in the build set is shown sorted by whether or not the indicator was considered by the build process ("Consider" column). The indicators removed from consideration for the reasons discussed above are shown at the top of the list. The remaining indicators, starting with "Simple Mov Ave", were all part of the build set.
Figure 2. Indicator selections in Builder, showing the indicators removed from the build set.
The evaluation options used in the build process are shown in Fig. 3. As discussed, MetaTrader 4 was selected as the code output choice. After strategies are built in Builder, any of the options on the Evaluation Options tab, including the code type, can be changed and the strategies re-evaluated, which will also rewrite the code in whichever language is selected. This feature was used to obtain the TradeStation code for the final strategy after the strategies were built for MetaTrader 4.
Figure 3. Evaluation options in Builder for the EURUSD forex strategy.
To create stop-and-reverse strategies, all exit types were removed from the build set, as shown below in Fig. 4. All three types of entry orders -- market, stop, and limit -- were left as "consider", which means the build process could consider any of them during the build process.
Figure 4. Order types selected in Builder to create a stop-and-reverse strategy.
The Builder software automatically generates rule-based logical conditions for entry and/or exit. To add a neural network to the strategy, it's only necessary to select the option "Include a neural network in entry conditions" on the Strategy Options tab, as shown below in Fig. 5. The neural network settings were left at their defaults. As part of the stop-and-reverse logic, the Market Sides option was set to Long/Short, and the option to "Wait for exit before entering new trade" was unchecked. The latter is necessary to enable the entry order to exit the current position on a reversal. All other settings were left at the defaults.
Figure 5. Strategy options selected in Builder to create a hybrid strategy using both rule-based and neural network conditions.
The evolutionary nature of the build process in Builder is guided by the fitness , which is calculated from the objectives and conditions defined on the Metrics tab, as shown below in Fig. 6. The build objectives were kept simple: maximizing the net profit while minimizing the complexity, which was given a small weight relative to the net profit. More emphasis was placed on the build conditions, which included the correlation coefficient and significance for general strategy quality, as well as the average bars in trades and the number of trades.
Initially, only the average bars in trades was included as a build condition. However, in some of the early builds, the net profit was being favored over the trade length, so the number-of-trades metric was added. The specified range for the number of trades (between 209 and 418) is equivalent to average trade lengths between 15 and 30 bars based on the number of bars in the build period. As a result, adding this metric put more emphasis on the trade length goal, which resulted in more members of the population with the desired range of trade lengths.
Figure 6. Build objectives and conditions set on the Metrics tab determine how the fitness is calculated.
The "Conditions for Selecting Top Strategies" duplicate the build conditions except that the top strategies conditions are evaluated over the entire range of data (not including the validation segment, which is separate), rather than just over the build period, as is the case for the build conditions. The top strategies conditions are used by the program to set aside any strategies that meet all the conditions in a separate population.
The final settings are made on the Build Options tab, as shown below in Fig. 7. The most important options here are the population size, number of generations, and the option to reset based on the "out-of-sample" performance. The population size was chosen to be large enough to get good diversity in the population while still being small enough to build in a reasonable amount of time. The number of generations was based on how long it took during a few preliminary builds for the results to start to converge.
Figure 7. Build options include the population size, number of generations, and options for resetting the population based on "out-of-sample" performance.
The option to "Reset on Out-of-Sample (OOS) Performance" starts the build process over after the specified number of generations if the specified condition is met; in this case, the population will be reset if the "out-of-sample" net profit is less than $20,000. This value was chosen based on preliminary tests to be a high enough value that it probably would not be reached. As a result, the build process was repeated every 30 generations until manually stopped. This is a way to let the program identify strategies based on the Top Strategies conditions over an extended period of time. Periodically, the Top Strategies population can be checked and the build process cancelled when suitable strategies are found.
Notice that I put "out-of-sample" in quotes. When the "out-of-sample" period is used to reset the population in this manner, the "out-of-sample" period is no longer truly out-of-sample. Since that period is now being used to guide the build process, it's effectively part of the in-sample period. That's why it's advisable to set aside a third segment for validation, as was discussed above.
After several hours of processing and a number of automatic rebuilds, a suitable strategy was found in the Top Strategies population. Its closed trade equity curve is shown below in Fig. 8. The equity curve demonstrates consistent performance across both data segments with an adequate number of trades and essentially the same results over both data series.
Figure 8. Closed-trade equity curve for the EURUSD stop-and-reverse strategy.
To check the strategy over the validation period, the date controls on the Markets tab (see Fig. 1) were changed to the end date of the data (2/11/2018), and the strategy was re-evaluated by selecting the Evaluate command from the Strategy menu in Builder. The results are shown below in Fig. 9. The validation results in the red box demonstrate that the strategy held up on data not used during the build process.
Figure 9. Closed-trade equity curve for the EURUSD stop-and-reverse strategy, including the validation period.
The final check is to see how the strategy performed on each data series separately using the code output option for that platform. This is necessary because, as explained above, there may be differences in the results depending on (1) the code type, and (2) the data series. We need to verify that the chosen settings minimized these differences, as intended. To test the strategy for MetaTrader 4, the data series from TradeStation was deselected on the Markets tab, and the strategy was re-evaluated. The results are shown below in Fig. 10, which duplicates the bottom curve in Fig. 9.
Figure 10. Closed-trade equity curve for the EURUSD stop-and-reverse strategy, including the validation period, for MetaTrader 4.
Finally, to test the strategy for TradeStation, the data series from TradeStation was selected and the series for MetaTrader 4 was deselected on the Markets tab, the code output was changed to "TradeStation," and the strategy was re-evaluated. The results are shown below in Fig. 11 and appear to be very similar to the middle curve in Fig. 9, as expected.
Figure 11. Closed-trade equity curve for the EURUSD stop-and-reverse strategy, including the validation period, for TradeStation.
The code for both platforms is provided below in Fig. 12. Click the image to open the code file for the corresponding platform. Examining the code reveals that the rule-based part of the strategy uses different volatility-related conditions for the long and short sides. The neural network inputs consist of a variety of indicators, including day-of-week, trend (ZLTrend), intraday high, oscillators (InvFisherCycle, InvFisherRSI), Bollinger bands, and standard deviation.
The hybrid nature of the strategy can be seen directly in the code statement (from the TradeStation code):
If EntCondL and NNOutput >= 0.5 then begin.
Buy("EnMark-L") NShares shares next bar at market;
The variable "EntCondL" represents the rule-based entry conditions, and "NNOuput" is the output of the neural network. Both conditions have to be true to place the long entry order. The short entry condition works the same way.
Figure 12. Trading strategy code for the EURUSD stop-and-reverse strategy (left, MetaTrader 4; right, TradeStation). Click the figure to open the corresponding code file.
This article looked at the process of building a hybrid rule-based/neural network strategy for the EURUSD using a stop-and-reverse (always in the market) approach with Adaptrade Builder. It was shown how the strategy code can be generated for multiple platforms by selecting a common subset of the indicators that work the same way in each platform. The settings necessary to generate strategies that reverse from long to short and back were described, and it was demonstrated that the resulting strategy performed positively on a separate, validation segment of data. It was also verified that the strategy generated similar results with the data and code option for each platform.
As discussed above, the stop-and-reverse approach has several drawbacks and may not appeal to everyone. However, an always-in-the-market approach may be more attractive with forex data because the forex markets trade around the clock. As a result, there are no session-opening gaps, and the trading orders are always active and available to reverse the trade when the market changes. The use of intraday data (4-hour bars) provided more bars of data for use in the build process but was otherwise fairly arbitrary in that the always-in-the-market nature of the strategy means that trades are carried overnight.
The build process was allowed to evolve different conditions for entering long and short, resulting in an asymmetric stop-and-reverse strategy. Despite the name, the resulting strategy enters both long and short trades on market orders, although market, stop, and limit orders were all considered by the build process independently for each side. In practice, reversing from long to short would mean selling short twice the number of shares at the market as the strategy was currently long; e. g., if the current long position was 100,000 shares, you would sell short 200,000 shares at market. Likewise, if the current short position was 100,000 shares, you would buy 200,000 shares at market to reverse from short to long.
A shorter price history was used than would be ideal. Nonetheless, the results were positive on the validation segment, suggesting the strategy was not over-fit. This supports the idea that a neural network can be used in a trading strategy without necessarily over-fitting the strategy to the market.
The strategy presented here is not intended for actual trading and was not tested in real-time tracking or trading. However, this article can be used as a template for developing similar strategies for the EURUSD or other markets. As always, any trading strategy you develop should be tested thoroughly in real-time tracking or on separate data to validate the results and to familiarize yourself with the trading characteristics of the strategy prior to live trading.
This article appeared in the February 2018 issue of the Adaptrade Software newsletter.
نتائج الأداء البدني أو المحاكاة لها بعض القيود المتراكمة. لا سجل الأداء الفعلي، النتائج المحاكاة لا تمثل التداول الفعلي. أيضا، وبما أن التجارة لم تكن قد تم تنفيذها بشكل فعلي، فقد تكون النتائج قد تم تعويضها أو تعويضها بشكل أكبر عن التأثيرات، إن وجدت، لبعض عوامل السوق، مثل عدم وجود السيولة. برامج التداول المحاكاة بشكل عام هي أيضا تخضع لحقيقة أنها تم تصميمها مع الاستفادة من الأذهان. لا يتم تمثيل أي حساب أو سيكون من المرجح تحقيق الأرباح أو الخسائر مماثلة لتلك التي تظهر.
إذا كنت ترغب في أن تكون على علم بالتطورات الجديدة، والأخبار، والعروض الخاصة من أدابتريد البرمجيات، يرجى الانضمام إلى قائمة البريد الإلكتروني لدينا. Thank you.
حقوق الطبع والنشر © 2004-2018 أدابتريد البرمجيات. كل الحقوق محفوظة.

Neural networks for algorithmic trading. Simple time series forecasting.
IMPORTANT UPDATE:
This is first part of my experiments on application of deep learning to finance, in particular to algorithmic trading.
I want to implement trading system from scratch based only on deep learning approaches, so for any problem we have here (price prediction, trading strategy, risk management) we gonna use different variations of artificial neural networks (ANNs) and check how well they can handle this.
Now I plan to work on next sections:
Time series forecasting with raw data Time series forecasting with custom features Hyperparameters optimization Implementation of trading strategy, backtesting and risk management More sophisticated trading strategies, reinforcement learning Going live, brokers API, earning (l̶o̶s̶i̶n̶g̶) money.
I highly recommend you to check out code and IPython Notebook in this repository.
In this, first part, I want to show how MLPs, CNNs and RNNs can be used for financial time series prediction. In this part we are not going to use any feature engineering. Let’s just consider historical dataset of S&P 500 index price movements. We have information from 1950 to 2018 about open, close, high, low prices for every day in the year and volume of trades. First, we will try just to predict close price in the end of the next day, second, we will try to predict return (close price — open price). Download the dataset from Yahoo Finance or from this repository.
Problem definiton.
We will consider our problem as 1) regression problem (trying to forecast exactly close price or return next day) 2) binary classification problem (price will go up [1; 0] or down [0; 1]).
For training NNs we gonna use framework Keras.
First let’s prepare our data for training. We want to predict t+1 value based on N previous days information. For example, having close prices from past 30 days on the market we want to predict, what price will be tomorrow, on the 31st day.
We use first 90% of time series as training set (consider it as historical data) and last 10% as testing set for model evaluation.
Here is example of loading, splitting into training samples and preprocessing of raw input data:
Regression problem. MLP.
It will be just 2-hidden layer perceptron. Number of hidden neurons is chosen empirically, we will work on hyperparameters optimization in next sections. Between two hidden layers we add one Dropout layer to prevent overfitting.
Important thing is Dense(1) , Activation(‘linear’) and ‘mse’ in compile section. We want one output that can be in any range (we predict real value) and our loss function is defined as mean squared error.
Let’s see what happens if we just pass chunks of 20-days close prices and predict price on 21st day. Final MSE= 46.3635263557, but it’s not very representative information. Below is plot of predictions for first 150 points of test dataset. Black line is actual data, blue one — predicted. We can clearly see that our algorithm is not even close by value, but can learn the trend.
Let’s scale our data using sklearn’s method preprocessing. scale() to have our time series zero mean and unit variance and train the same MLP. Now we have MSE = 0.0040424330518 (but it is on scaled data). On the plot below you can see actual scaled time series (black)and our forecast (blue) for it:
For using this model in real world we should return back to unscaled time series. We can do it, by multiplying or prediction by standard deviation of time series we used to make prediction (20 unscaled time steps) and add it’s mean value:
MSE in this case equals 937.963649937. Here is the plot of restored predictions (red) and real data (green):
Not bad, isn’t it? But let’s try more sophisticated algorithms for this problem!
Regression problem. CNN.
I am not going to dive into theory of convolutional neural networks, you can check out this amazing resourses:
Let’s define 2-layer convolutional neural network (combination of convolution and max-pooling layers) with one fully-connected layer and the same output as earlier:
Let’s check out results. MSEs for scaled and restored data are: 0.227074542433; 935.520550172. Plots are below:
Even looking on MSE on scaled data, this network learned much worse. Most probably, deeper architecture needs more data for training, or it just overfitted due to too high number of filters or layers. We will consider this issue later.
Regression problem. RNN.
As recurrent architecture I want to use two stacked LSTM layers (read more about LSTMs here).
Plots of forecasts are below, MSEs = 0.0246238639582; 939.948636707.
RNN forecasting looks more like moving average model, it can’t learn and predict all fluctuations.
So, it’s a bit unexpectable result, but we can see, that MLPs work better for this time series forecasting. Let’s check out what will happen if we swith from regression to classification problem. Now we will use not close prices, but daily return (close price-open price) and we want to predict if close price is higher or lower than open price based on last 20 days returns.
Classification problem. MLP.
Code is changed just a bit — we change our last Dense layer to have output [0; 1] or [1; 0] and add softmax output to expect probabilistic output.
To load binary outputs, change in the code following line:
Also we change loss function to binary cross-entopy and add accuracy metrics.
Oh, it’s not better than random guessing (50% accuracy), let’s try something better. Check out the results below.
Classification problem. CNN.
Classification problem. RNN.
الاستنتاجات.
We can see, that treating financial time series prediction as regression problem is better approach, it can learn the trend and prices close to the actual.
What was surprising for me, that MLPs are treating sequence data better as CNNs or RNNs which are supposed to work better with time series. I explain it with pretty small dataset (
16k time stamps) and dummy hyperparameters choice.
You can reproduce results and get better using code from repository.
I think we can get better results both in regression and classification using different features (not only scaled time series) like some technical indicators, volume of sales. Also we can try more frequent data, let’s say minute-by-minute ticks to have more training data. All these things I’m going to do later, so stay tuned :)
عن طريق التصفيق أكثر أو أقل، يمكنك أن تشير لنا القصص التي تبرز حقا.
Alex Honchar.
teaching machines and rapping.
Machine Learning World.
The best about Machine Learning, Computer Vision, Deep Learning, Natural language processing and other.

Getting Started with Neural Networks for Algorithmic Trading.
If you’re interested in using artificial neural networks (ANNs) for algorithmic trading, but don’t know where to start, then this article is for you. Normally if you want to learn about neural networks, you need to be reasonably well versed in matrix and vector operations – the world of linear algebra. This article is different. I’ve attempted to provide a starting point that doesn’t involve any linear algebra and have deliberately left out all references to vectors and matrices. If you’re not strong on linear algebra, but are curious about neural networks, then I think you’ll enjoy this introduction. In addition, if you decide to take your study of neural networks further, when you do inevitably start using linear algebra, it will probably make a lot more sense as you’ll have something of head start.
The best place to start learning about neural networks is the perceptron . The perceptron is the simplest possible artificial neural network, consisting of just a single neuron and capable of learning a certain class of binary classification problems. Perceptrons are the perfect introduction to ANNs and if you can understand how they work, the leap to more complex networks and their attendant issues will not be nearly as far. So we will explore their history, what they do, how they learn, where they fail. We’ll build our own perceptron from scratch and train it to perform different classification tasks which will provide insight into where they can perform well, and where they are hopelessly outgunned. Lastly, we’ll explore one way we might apply a perceptron in a trading system.
A Brief History of the Perceptron.
The perceptron has a long history, dating back to at least the mid 1950s. Following its discovery, the New York Times ran an article that claimed that the perceptron was the basis of an artificial intelligence (AI) that would be able to walk, talk, see and even demonstrate consciousness. Soon after, this was proven to be hyperbole on a staggering scale, when the perceptron was shown to be wholly incapable of classifying certain types of problems. The disillusionment that followed essentially led to the first AI winter, and since then we have seen a repeating pattern of hyperbole followed by disappointment in relation to artificial intelligence.
Still, the perceptron remains a useful tool for some classification problems and is the perfect place to start if you’re interested in learning more about neural networks. Before we demonstrate it in a trading application, let’s find out a little more about it.
Artificial Neural Networks: Modelling Nature.
Algorithms modelled on biology are a fascinating area of computer science. Undoubtedly you’ve heard of the genetic algorithm, which is a powerful optimization tool modelled on evolutionary processes. Nature has been used as a model for other optimization algorithms, as well as the basis for various design innovations. In this same vein, ANNs attempt to learn relationships and patterns using a somewhat loose model of neurons in the brain. The perceptron is a model of a single neuron.
In an ANN, neurons receive a number of inputs, weight each of those inputs, sum the weights, and then transform that sum using a special function called an activation function , of which there are many possible types. The output of that activation function is then either used as the prediction (in a single neuron model) or is combined with the outputs of other neurons for further use in more complex models, which we’ll get to in another article.
Here’s a sketch of that process in an ANN consisting of a single neuron:
Here, x 1 , x 2,etc are the inputs. b is called the bias term, think of it like the intercept term in a linear model y=mx+b . w 1,w 2,etc are the weights applied to each input. The neuron firstly sums the weighted inputs (and the bias term), represented by S in the sketch above. Then, S is passed to the activation function, which simply transforms S in some way. The output of the activation function, z is then the output of the neuron.
The idea behind ANNs is that by selecting good values for the weight parameters (and the bias), the ANN can model the relationships between the inputs and some target. In the sketch above, z is the ANN’s prediction of the target given the input variables.
In the sketch, we have a single neuron with four weights and a bias parameter to learn. It isn’t uncommon for modern neural networks to consist of hundreds of neurons across multiple layers , where the output of each neuron in one layer is input to all the neurons in the next layer. Such a fully connected network architecture can easily result in many thousands of weight parameters. This enables ANNs to approximate any arbitrary function, linear or nonlinear.
The perceptron consists of just a single neuron, like in our sketch above. This greatly simplifies the problem of learning the best weights, but it also has implications for the class of problems that a perceptron can solve.
What’s an Activation Function?
The purpose of the activation function is to take the input signal (that’s the weighted sum of the inputs and the bias) and turn it into an output signal. There are many different activation functions that convert an input signal in a slightly different way, depending on the purpose of the neuron.
Recall that the perceptron is a binary classifier. That is, it predicts either one or zero, on or off, up or down, etc. It follows then that our activation function needs to convert the input signal (which can be any real-valued number) into either a one or a zero corresponding to the predicted class.
In biological terms, think of this activation function as firing (activating) the neuron (telling it to pass the signal on to the next neuron) when it returns 1, and doing nothing when it returns 0.
What sort of function accomplishes this? It’s called a step function, and its mathematical expression looks like this:
And when plotted, it looks like this:
This function then transforms any weighted sum of the inputs (S) and converts it into a binary output (either 1 or 0). The trick to making this useful is finding (learning) a set of weights, w , that lead to good predictions using this activation function.
How Does a Perceptron Learn?
We already know that the inputs to a neuron get multiplied by some weight value particular to each individual input. The sum of these weighted inputs is then transformed into an output via an activation function. In order to find the best values for our weights, we start by assigning them random values and then start feeding observations from our training data to the perceptron, one by one. Each output of the perceptron is compared with the actual target value for that observation, and, if the prediction was incorrect, the weights adjusted so that the prediction would have been closer to the actual target. This is repeated until the weights converge.
In perceptron learning, the weight update function is simple: when a target is misclassified, we simply take the sign of the error and then add or subtract the inputs that led to the misclassifiction to the existing weights.
If that target was -1 and we predicted 1, the error is −1−1=−2 . We would then subtract each input value from the current weights (that is, wi=wi–xi ). If the target was 1 and we predicted -1, the error is 1–−1=2 , so then add the inputs to the current weights (that is, wi=wi+xi ).
This has the effect of moving the classifier’s decision boundary (which we will see below) in the direction that would have helped it classify the last observation correctly. In this way, weights are gradually updated until they converge. Sometimes (in fact, often) we’ll need to iterate through each of our training observations more than once in order to get the weights to converge. Each sweep through the training data is called an epoch .
Implementing a Perceptron from Scratch.
Next, we’ll code our own perceptron learning algorithm from scratch using R. We’ll train it to classify a subset of the iris data set.
In the full iris data set, there are three species. However, perceptrons are for binary classification (that is, for distinguishing between two possible outcomes). Therefore, for the purpose of this exercise, we remove all observations of one of the species (here, virginica ), and train a perceptron to distinguish between the remaining two. We also need to convert the species classification into a binary variable: here we use 1 for the first species, and -1 for the other. Further, there are four variables in addition to the species classification: petal length, petal width, sepal length and sepal width. For the purposes of illustration, we’ll train our perceptron using only petal length and width and drop the other two measurements. These data transformations result in the following plot of the remaining two species in the two-dimensional feature space of petal length and petal width:
The plot suggests that petal length and petal width are strong predictors of species – at least in our training data set. Can a perceptron learn to tell them apart?
Training our perceptron is simply a matter of initializing the weights (here we initialize them to zero) and then implementing the perceptron learning rule, which just updates the weights based on the error of each observation with the current weights. We do that in a for() loop which iterates over each observation, making a prediction based on the values of petal length and petal width of each observation, calculating the error of that prediction and then updating the weights accordingly.
In this example we perform five sweeps through the entire data set, that is, we train the perceptron for five epochs. At the end of each epoch, we calculate the total number of misclassified training observations, which we hope will decrease as training progresses. Here’s the code:
epochs 0, 1, -1) #prediction of current observation.
error = iris$Species[i] - yhat #will be either 0, 2 or -2.
w1 0, 1, -1) #predict on whole training set.
Here’s the plot of the error rate:
We can see that it took two epochs to train the perceptron to correctly classify the entire dataset. After the first epoch, the weights hadn’t been sufficiently updated. In fact, after epoch 1, the perceptron predicted the same class for every observation! Therefore it misclassified 50 out of the 100 observations (there are 50 observations of each species in the data set). However after two epochs, the perceptron was able to correctly classify the entire data set by learning appropriate weights.
Another, perhaps more intuitive way, to view the weights that the perceptron learns is in terms of its decision boundary . In geometric terms, for the two-dimensional feature space in this example, the decision boundary is the a straight line separating the perceptron’s predictions. On one side of the line, the perceptron always predicts -1, and on the other, it always predicts 1.
We can derive the decision boundary from the perceptron’s activation function:
where z = w 1 x 1 + w 2 x 2 + b.
The decision boundary is simply the line that defines the location of the step in the activation function. That step occurs at z=0 , so our decision boundary is given by.
which defines a straight line in x1,x2 feature space.
In our iris example, the perceptron learned the following decision boundary:
Here’s the complete code for training this perceptron and producing the plots shown above:
main = 'Iris Classifications')
legend("bottomright", c("species1", "species2"), col=c("blue", "red"), pch=c("-","+"), cex=1.1)
error = iris$Species[i] - yhat #will be either 0, 2 or -2.
تهانينا! You just built and trained your first neural network.
Let’s now ask our perceptron to learn a slightly more difficult problem. Using the same iris data set, this time we remove the setosa species and train a perceptron to classify virginica and versicolor on the basis of their petal lengths and petal widths. When we plot these species in their feature space, we get this:
This looks a slightly more difficult problem, as this time the difference between the two classifications is not as clear cut. Let’s see how our perceptron performs on this data set.
This time, we introduce the concept of the learning rate , which is important to understand if you decide to pursue neural networks beyond the perceptron. The learning rate controls the speed with which weights are adjusted during training. We simply scale the adjustment by the learning rate: a high learning rate means that weights are subject to bigger adjustments. Sometimes this is a good thing, for example when the weights are far from their optimal values. But sometimes this can cause the weights to oscillate back and forth between two high-error states without ever finding a better solution. In that case, a smaller learning rate is desirable, which can be thought of as fine tuning of the weights.
Finding the best learning rate is largely a trial and error process, but a useful approach is to reduce the learning rate as training proceeds. In the example below, we do that by scaling the learning rate by the inverse of the epoch number.
Here’s a plot of our error rate after training in this manner for 400 epochs:
You can see that training proceeds much less smoothly and takes a lot longer than last time, which is a consequence of the classification problem being more difficult. Also note that the error rate is never reduced to zero, that is, the perceptron is never able to perfectly classify this data set. Here’s a plot of the decision boundary, which demonstrates where the perceptron makes the wrong predictions:
Here’s the code for this perceptron:
main = 'Iris Classifications')
legend("bottomright", c("species1", "species2"), col=c("blue", "red"), pch=c("-","+"), cex=1.1)
error = iris$Species[i] - yhat #will be either 0, 2 or -2.
Where Do Perceptrons Fail?
In the first example above, we saw that our versicolor and setosa iris species could be perfectly separated by a straight line (the decision boundary) in their feature space. Such a classification problem is said to be linearly separable and (spoiler alert) is where perceptrons excel. In the second example, we saw that versicolor and virginica were almost linearly separable, and our perceptron did a reasonable job, but could never perfectly classify the whole data set. In this next example, we’ll see how they perform on a problem that isn’t linearly separable at all.
Using the same iris data set, this time we classify our iris species as either versicolor or other (that is setosa and virginica get the same classification) on the basis of their petal lengths and petal widths. When we plot these species in their feature space, we get this:
This time, there is no straight line that can perfectly separate the two species. Let’s see how our perceptron performs now. Here’s the error rate over 400 epochs and the decision boundary:
We can see that the perceptron fails to distinguish between the two classes. This is typical of the performance of the perceptron on any problem that isn’t linearly separable. Hence my comment at the start of this unit (see footnote 2) that I’m skeptical that perceptrons can find practical application in trading. Maybe you can find a use case in trading, but even if not, they provide an excellent foundation for exploring more complex networks which can model more complex relationships.
A Perceptron Implementation for Algorithmic Trading.
The Zorro trading automation platform includes a flexible perceptron implementation. If you haven’t heard of Zorro, it is a fast, accurate and powerful backtesting/execution platform that abstracts a lot of tedious programming tasks so that the user is empowered to concentrate on efficient research. It uses a simple C-based scripting language that takes almost no time to learn if you already know C, and a week or two if you don’t (although of course mastery can take much longer). This makes it an excellent choice for independent traders and those getting started with algorithmic trading. While the software sacrifices little for the abstraction that enables efficient research, experienced quant developers or those with an abundance of spare time might take issue with that aspect of the software, as it’s not open source, so it isn’t for everyone. But it’s a great choice for beginners and DIY traders who maintain a day job. If you want to learn to use Zorro, even if you’re not a programmer, we can help.
Zorro’s perceptron implementation allows us to define any features we think are pertinent, and to specify any target we like, which Zorro automatically converts it to a binary variable (by default, positive values are given one class; negative values the other). After training, Zorro’s perceptron predicts either a positive or negative value corresponding to the positive and negative classes respectively.
Here’s the Zorro code for implementing a perceptron that tries to predict whether the 5-day price change in the EUR/USD exchange rate will be greater than 200 pips, based on recent returns and volatility, whose predictions are tested under a walk-forward framework:
if(Train) Hedge = 2; //needed for training trade results.
int TST = 50*1440/BarPeriod; //number of bars in test period.
int TRN = 500*1440/BarPeriod; //number of bars in training period.
vars Close = series(priceClose());
var Sig1 = scale(ATR(10)-ATR(50), 100);
var Sig2 = (Close[0]-Close[1])/Close[1];
var Sig3 = (Close[0]-Close[5])/Close[5];
var Sig4 = (Close[0]-Close[10])/Close[10];
if(priceClose(-5) - priceClose(0) > 200*PIP) ObjLong = 1;
else ObjLong = -1;
if(priceClose(-5) - priceClose(0) 0 and s 0 and l.
Zorro firstly outputs a trained perceptron for predicting long and short 5-day price moves greater than 200 pips for each walk-forward period, and then tests their out-of-sample predictions.
Here’s the walk-forward equity curve of our example perceptron trading strategy:
I find this result particularly interesting because I expected the perceptron to perform poorly on market data, which I find it hard to imagine falling into the linearly separable category. However, sometimes simplicity is not a bad thing, it seems.
الاستنتاجات.
I hope this article not only whet your appetite for further exploration of neural networks, but facilitated your understanding of the basic concepts, without getting too hung up on the math.
I intended for this article to be an introduction to neural networks where the perceptron was to be nothing more than a learning aid. However, given the surprising walk-forward result from our simple trading model, I’m now going to experiment with this approach a little further. If this interests you too, some ideas you might consider include extending the backtest, experimenting with different signals and targets, testing the algorithm on other markets and of course considering data mining bias. I’d love to hear about your results in the comments.
شكرا للقراءة!
--by Kris Longmore from blog R obotwealth.
About the Author Kris Longmore.
Kris is the founder of RobotWealth, a community of passionate algorithmic traders seeking to learn together and profit from the markets.
With a background in mechanical and environmental engineering, Kris has worked as a hedge fund quant and now consults to financial institutions on applying machine learning and harnessing big data through Quantify Partners Pty Ltd.
Kris also loves engaging with DIY traders who are equally passionate about algo trading – that’s why he started RobotWealth!
الوظائف ذات الصلة.
استراتيجية محطمة أو تغيير السوق: التحقيق في الأداء الضعيف.
العثور على ما يعمل، وماذا لا تعمل.
التداول منحنى الأسهم & # 038؛ وراء.
منشورات شائعة.
كونورس 2-بيريودي رسي أوبديت فور 2018.
هذا مؤشر بسيط يجعل المال مرة أخرى ومرة أخرى.
محفظة اللبلاب.
تحسين استراتيجية الفجوة البسيطة، الجزء 1.
كوبيرايت © 2017 بي كابيتال إفولوتيون ليك. - صمم من قبل تزدهر المواضيع | مدعوم من وورد.
الرجاد الدخول على الحساب من جديد. سيتم فتح صفحة تسجيل الدخول في نافذة جديدة. بعد تسجيل الدخول يمكنك إغلاقه والعودة إلى هذه الصفحة.

الفوركس Umm Birka

Tuesday, 13 February 2018

استراتيجيات التداول باستخدام الشبكات العصبية

Trading strategies using neural networks

No comments:

Post a Comment